-
Offsite
—
DMOZ100k06 is a large research data set about document metadata based on a random sample of 100,000 web documents from the Open Directory combined with data retrieved from the social bookmarking service delicious.com, the content rating system ICRA, and the search engine Google. The data set is freely available for other research.
Michael G. Noll
-
Offsite
—
This python module is a quasi-API to make it easier to authenticate into Google Trends for those who want to squeeze the extra level of functionality out of their data. The advantage of programmatic access is that the data can be automatically trended and merged. It can be snuck into a 9:00 AM daily email to the VP of Marketing so that she knows to ramp up Google Adwords ...
-
Offsite
—
-
Offsite
—
Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near real-time. This site has a visualization of Google Flu Trends in comparison to the CDC’s data. There is also a link to a dataset of Google Flu Trends weekly influenza activity estimates for the world, from December 2002 to the present. Each week, millions of ...
-
Offsite
—
-
Offsite
—
-
Offsite
—
-
Offsite
—
-
Offsite
—
-
Offsite
—
-
Offsite
—
The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC’s host institution. The LDC was founded in 1992 with a grant from the Advanced ...
-
Free Download
—
This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Google Video.
-
Offsite
—
-
Offsite
—
-
Free Download
—
Google Voice: Calling Rates
-
Offsite
—
List of publicly-accessible transit data feeds
This is a list of transit schedule data published by transit agencies and operators in GTFS format for developers to use. They contain scheduled times, stop locations, route information and optionally fare information and detailed route shapes.
-
Offsite
—
Description Here are the datasets backing the Google Books Ngram Viewer. These datasets were generated in July 2009; we will update these datasets as our book scanning continues, and the updated versions will have distinct and persistent version identifiers (20090715 for the current set). Each of the links will directly download a fragment of the given corpus. For ...
-
Free Download
—
A list of the most popular sports-related keywords in the Arabic languages, mainly from Arab countries. This dataset show the keywords and how many times they were requested on Google and appeared for a sports advertiser. Problems with numbers: 1. Not all campaigns have the same budgets due to targeting and pricing issues, so this will affect the accuracy 2. The ...
-
Offsite
—
Here are the datasets backing the Google Books Ngram Viewer. These datasets were generated in July 2009; we will update these datasets as our book scanning continues, and the updated versions will have distinct and persistent version identifiers (20090715 for the current set). Each of the links below will directly download a fragment of the given corpus. For instance, ...
-
Offsite
—
Transit schedule data published by SFMTA in GTFS format for developers to use.
Category: transportation
Format: CSV
Data dictionary: http://code.google.com/transit/spec/transit_feed_specification.html#Google_Transit_Feed_Field_Definitions
Frequency: Signup cycle
Time period: 6-13-09 to prior day
Agency name: SFMTA