Tag

news

Showing 1 - 20 out of 30 datasets
  • Digg.com Data Set

    Free Download — Digg is a social news website. The dataset spans, from August to November, 2008, when Digg’s cornerstone function still consisted of letting people vote stories up or down, called digging and burying, respectively. In the dataset, the total number of user-user links in the social graph is about 56,000 spanning over about 10,000 users. This dataset is useful for studying ...
  • Reuters Spotlight - Article and Media API

    Offsite — The Reuters Spotlight service provides Reuters.com content in the form of multimedia articles, pictures, videos and text news through a set standards based consumer XML APIs. The Spotlight service also provides an option to receive the content automatically annotated with rich semantic metadata.
  • Akvopedia

    Offsite — About From the [about page](http://www.akvo.org/wiki/index.php/Akvopedia:About): > Akvopedia is a Web-based, free content project about water and sanitation subjects. The name Akvopedia is a portmanteau (combination of words and their meanings) of the words akvo (water in the language Esperanto) and encyclopedia. Akvopedia articles provide links to guide the user to ...
  • Al Jazeera Creative Commons Repository

    Offsite — About > Select Al Jazeera video footage – at this time footage of the War on Gaza – is available for free to be downloaded, shared, remixed, subtitled and eventually rebroadcasted by users and TV stations across the world with acknowledgement to Al Jazeera. Access/Re-use Available for download via website. War on Gaza footage is licensed under Creative Commons ...
  • reddit.com: Ask Reddit: Where to download a DB dump of Reddit?

    Offsite
  • History Commons

    Offsite — About From [about](http://www.historycommons.org/aboutsite.jsp) page: > What is the History Commons website? > The History Commons website is run by the Center for Grassroots Oversight (“CGO”), an organization that is fiscally sponsored by The Global Center, a 501©3 non-profit organization. CGO was incorporated as a public benefit corporation in late 2006, and is ...
  • The New York Times Annotated Corpus

    Offsite — From [website](http://ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19): The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff ...
  • Primary News Sources of Adults: 2005

    Free Download — The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
  • UK National Health Service Choices web services

    Offsite — The NHS web services support business-to-business syndication of content over the internet. NHS Choices has created a set of web services to allow approved partners to interact with the service, free of charge. The web services return NHS Choices content in a form that can be easily integrated into a website or application. NHS intends to make most of the data and ...
  • 20 Newsgroups Dataset (De-Duped Version)

    Free Download — The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It is speculated that it was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly mention this collection. The 20 Newsgroups collection has become a ...
  • Westbury Lab Usenet Corpus: 28M postings from 47000+ newsgroups 2005-2009

    Offsite — A USENET corpus (2005-2009) This corpus is a collection of public USENET postings. This corpus was collected between Oct 2005 and Jan 2010, and covers 47860 English language, non-binary-file news groups. Despite our best effots, this corpus includes a very small number of non-English words, non-words, and spelling errors. The corpus is untagged, raw text. It may be ...
  • Wikinews

    Offsite — “We are a group of volunteers whose mission is to present reliable, unbiased, relevant and entertaining News. All content is released under a free license. By making our content perpetually available for free redistribution and use, we hope to contribute to a global digital commons."
  • What public data is already available?

    Offsite
  • Office of Advocacy's News Update File

    Offsite — This is a xml news update file to inform the public about recent regulatory alerts, Advocacy small business statistics reports, Advocacy small business research reports, and Advocacy regulatory comment letters Additional facts from data.gov Dataset Summary Date Released: 2005 Date Updated: Weekly (5/11/09) Time Period: 3 months from posted date Data.gov Data Category ...
  • The New York Times Congress API

    Offsite — Get biographical information on Congresspeople dating back to 1947 and voting records dating back to 1989 in JSON and XML. Based on information from THOMAS, senate.gov, and house.gov. Read the announcement on Open for more information. See also the New York Times Congress API Ruby Wrapper with Congresh Shell.
  • New York Times Congress API Ruby Wrapper with Cong

    Offsite — An easy Ruby wrapper for the New York Times Congress API. Also provides a command shell called Congresh for interacting with the API directly. Available for download under an MIT License.submitted by: Patrick Ewing
  • The New York Times Article Search API

    Offsite — The NYT Article Search API provides searchable access to nearly three million New York Times articles from 1981 to the present day. Results returned in JSON.
  • The New York Times Campaign Finance API

    Offsite — Retrieve political campaign contribution and expenditure data based on United States Federal Election Commission filings. Data available in JSON, XML or serialized PHP. Registration required. See the announcement on Open for more information.
  • outside.in API

    Offsite — The outside.in API provides news articles, blog posts, tweets and more within 1,000 feet of any latitude and longitude in the United States in XML or JSON format. Licensed under a simple Terms of Service.
  • TimesPeople API

    Offsite — With the TimesPeople API, you can retrieve user data for nytimes.com, including the user profiles, activities, news feeds, and networks. Returns data in JSON or XML. Read the announcement on Open for more information.