Tag

web

Showing 21 - 40 out of 43 datasets
  • Archive-It.org

    Offsite
  • CommonCrawl - About

    Offsite
  • Access to Web Research Collections VLC2/WT10g/WT2g

    Offsite
  • ur1 Generator

    Offsite — About > ur1 is an Open Service from Contrôlez-Vous, Inc., powered by lilURL. Full source available under the terms of the GNU General Public License. Licensing A full database dump is available – and all material is licensed under CC zero.
  • Web Content Extraction

    Offsite — The dataset contains the HTML version as well as the true content of a web page. True content is used to mean the text excluding the ads, navigational links/text, comments, etc. For example, for a blog post only the content of the post and not the comments and other surrounding text will be extracted. The dataset contains the HTML source and text content (true content) ...
  • Compete - Compete Developer Resources

    Offsite — The Compete API is a web service that acts as a middleman between your application and their humongous stores of web metrics for over 1 million web sites.  The API calls are structured around the concept of a “Site”, meaning you call the API about a specific site and get back metrics or charts for that site.  Compete Site Analytics – provides information for every site ...
  • Web Reference Simple

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Web Reference Simple.
  • Metrolink Web

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Metrolink Web.
  • Amtrak Web

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Amtrak Web.
  • Web Reference

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Web Reference.
  • Cta Web

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Cta Web.
  • Metra Web

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Metra Web.
  • Pace Web

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Pace Web.
  • Web Cite

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Web Cite. A random seeming collection of links.
  • TheCounter.com - The Affordable Web Site Analysis Tool

    Offsite — TheCounter.com provides accurate, up-to-the-minute reports on how your Web pages are being used by visitors to your site. Which Web browsers your site’s visitors are using The screen resolution of your site’s visitors Which Operating System they’re using Which sites are referring others to your site Which search engines are being used to find your site Which ...
  • List of User-Agents (Categorized by Spiders, Robots, Crawler, Browser)

    Offsite — A searchable database of user-agents as used by browsers, search-engines spiders and crawlers, web-directories, download managers, link checkers, proxy servers, web filtering tools, harvesters, spambots, badbots. Sorted by the user-agents names with informations about their type, purpose and origin. The info-field at every user-agents entry offers even more information. ...
  • Usage share of web browsers - aggregated from several surveys [Wikipedia]

    Offsite — Usage share of web browsers from Wikipedia, the free encyclopedia The usage share of web browsers is the percentage of visitors to a group of websites that use a particular web browser. For example, when it is said that Internet Explorer has 66% usage share, it means that some version of Internet Explorer is used by 66% of visitors that visit a given set of sites. ...
  • Applications for Food Stamps and Medi-Cal from DataSF.org

    No Data — Applications for Food Stamps and Medi-Cal via the on-line web site, www.BenefitsSF Category: human-services Format: Excel Data dictionary: Data Dictionary File Frequency: Month Time period: Begins 6/1/2009 Agency name: Human Services Agency
  • The ClueWeb09 Dataset

    Offsite — The ClueWeb09 dataset was created to support research on information retrieval and related human language technologies. It consists of about 1 billion web pages in ten languages that were collected in January and February 2009. The dataset is used by several tracks of the TREC conference. Dataset Specifications Web Pages: 1,040,809,705 web pages, in 10 languages 5 TB, ...
  • CRM Management Solutions

    No Data — Customer Relationship Management (CRM) is an acronym for the customers’ needs and behavior and how to develop a strong relationship with the customer and keep them for a long relationship. Good relationships with customers is a success. Customer Relationship Management is a strategic process that helps customers better understand how we can meet your needs and satisfy ...