7 datasets
  • Oil and Gas Extraction Industry--Establishments, Employees, and Payroll by State: 2004

    Free Download — The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
  • 99 Wikipedia Sources Aiding the Semantic Web ยป AI3:::Adaptive Information

  • Data for Data Mining

  • Information Extraction: The RISE Repository of Information Sources

    Offsite — RISE is a distributed repository of online information sources that are used for the empirical analysis of learning algorithms that generate extraction patterns. The sources included in this repository are provided by people from the information extraction (IE) and wrapper generation (WG) communities. Both communities use machine learning algorithms to generate ...
  • Text Analytics Solutions from ClearForest

  • Web Content Extraction

    Offsite — The dataset contains the HTML version as well as the true content of a web page. True content is used to mean the text excluding the ads, navigational links/text, comments, etc. For example, for a blog post only the content of the post and not the comments and other surrounding text will be extracted. The dataset contains the HTML source and text content (true content) ...
  • Data.gov Catalog

    Offsite — Additional facts from data.gov Dataset Summary Date Released: June, 2009 Date Updated: June, 2009 Time Period: Daily Data.gov Data Category Type: Raw Data Catalog Frequency: 24 hours Specialized Data Category Designation: Administrative Contributing Agency Information Citation: Data.gov Catalog Agency Program Page: http://www.data.gov Agency Data Series Page: ...