-
Free Download
—
This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Language.
-
Free Download
—
6,213 acronyms (acronyms.txt) common acronyms & abbreviations
-
Free Download
—
Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.
-
Free Download
—
4,160 official crosswords delta (crswd-d.txt) When combined with the 113,809 crosswords file, it produces the official crossword list compatible with the second edition of the Official Scrabble Players Dictionary. (Scrabble is a registered trademark of Milton-Bradley licensed to Merriam-Webster.)
-
Free Download
—
21,986 names (names.txt)
This database contains the most common names used in the United
States and Great Britain. Spelling checkers may want to supplement
their basic word list with this one.
-
Free Download
—
4,946 female names (names-f.txt) Frequent given names of females in English speaking countries. Spelling checkers may want to supplement their basic word list with this one.
-
Free Download
—
3,800 male names Frequent given names of male in English speaking countries. Spelling checkers may want to supplement their basic word list with this one.
-
Free Download
—
366 often misspelled words (oftenmis.txt) many of the most commonly misspelled words in English speaking countries
-
Offsite
—
National survey that collects data from a sample of the resident population in the United States. Housing units in every county in the United States and municipio in Puerto Rico, including institutional and non-institutional group quarters, are included in the sample. Additional facts from data.gov Dataset Summary Date Released: 16-Jan-09 Date Updated: 1-Apr-09 Time ...
-
Offsite
—
Frequently occurring first names and surnames from the 1990 Census.
-
Offsite
—
Query a database of 145,000 English language words for synonyms. Returns data in JSON, XML, serialized PHP array or plain text formats. Based on data from the Princeton University WordNet database and the Carnegie Mellon Pronouncing Dictionary. By John Watson.
-
Offsite
—
The Capitol Words API provides several methods of accessing detailed information from the Capitol Words database of word frequency from the U.S. Congressional Record. Returns results in JSON and XML.
-
Offsite
—
Two CSV files available for download: all New York City baby names dating back to 1920, and New York City baby names broken down by ethnicity, dating back to 1990.
Data supplied by the New York City Department of Health and Mental Hygiene, compiled by Jennifer 8. Lee for the New York Times City Room Blog.
-
Offsite
—
10,000 color/label pairs, based on data collected through Amazon’s Mechanical Turk crowdsourcing marketplace. By Brendan O’Connor.
-
Offsite
—
Four datasets on given name popularity.
-
Offsite
—
Programmatic access to the open source version of the Cyc Knowledge Base, the world’s largest and most complete general knowledge base and commonsense reasoning engine. The Cyc Knowledge Base is a formalized representation of a vast quantity of fundamental human knowledge: facts, rules of thumb, and heuristics for reasoning about the objects and events of everyday ...
-
Offsite
—
N-gram counts extracted from over 700,000 online product reviews in Chinese, English, German and Japanese. Formatted to be read as R data frames. By Noah Constant, Christopher Davis, Christopher Potts and Florian Schwarz. Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.
-
Offsite
—
We Feel Fine is a data collection engine that scours the internet every ten minutes, harvesting and identifying expressions of human feelings from a large number of blogs. 15,000 to 20,000 feelings are identified and saved per day. You can use the We Feel Fine API to access this data. Optional parameters include feeling, gender, weather conditions and country. By Sep ...
-
Offsite
—
Provides information about students enrolled at Welsh Higher Education Institutions (HEIs) taught through the medium of Welsh and information on staff teaching through the medium of Welsh. Source agency: Welsh Assembly Government Designation: National Statistics Language: English Alternative title: Welsh in Higher Education Institutions 2007/08 2006/07 Import source: ...
-
Offsite
—
Statistical bulletin examining the available data on examination entry and performance in modern foreign languages GCSE (not including short course GCSE) and A levels. Source agency: Welsh Assembly Government Designation: National Statistics Language: English Alternative title: Modern Foreign Languages in Schools in Wales Import source: ONS-ons_hub_2008_08.xml External ...