Tag

language

Showing 41 - 60 out of 64 datasets
  • Language

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Language.
  • Word List - List of Acronyms

    Free Download — 6,213 acronyms (acronyms.txt) common acronyms & abbreviations
  • Word List - 350,000+ Words

    Free Download — Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.
  • Word List - Official Scrabble (TM) Player's Dictionary (OSPD) 2nd ed

    Free Download — 4,160 official crosswords delta (crswd-d.txt) When combined with the 113,809 crosswords file, it produces the official crossword list compatible with the second edition of the Official Scrabble Players Dictionary. (Scrabble is a registered trademark of Milton-Bradley licensed to Merriam-Webster.)
  • Word List - 21,000+ Common Given Names (US & Great Britain)

    Free Download — 21,986 names (names.txt) This database contains the most common names used in the United States and Great Britain. Spelling checkers may want to supplement their basic word list with this one.
  • Word List - 4,900+ Common Female Given Names (English-speaking Countries)

    Free Download — 4,946 female names (names-f.txt) Frequent given names of females in English speaking countries. Spelling checkers may want to supplement their basic word list with this one.
  • Word List - 3,800+ Common Male Given Names (English-speaking Countries)

    Free Download — 3,800 male names Frequent given names of male in English speaking countries. Spelling checkers may want to supplement their basic word list with this one.
  • Word List - Commonly Misspelled English Words

    Free Download — 366 often misspelled words (oftenmis.txt) many of the most commonly misspelled words in English speaking countries
  • 2005-2007 American Community Survey Three-Year PUMS Population File

    Offsite — National survey that collects data from a sample of the resident population in the United States. Housing units in every county in the United States and municipio in Puerto Rico, including institutional and non-institutional group quarters, are included in the sample. Additional facts from data.gov Dataset Summary Date Released: 16-Jan-09 Date Updated: 1-Apr-09 Time ...
  • U.S. Census Bureau - 1990 Names

    Offsite — Frequently occurring first names and surnames from the 1990 Census.
  • Big Huge Thesaurus API

    Offsite — Query a database of 145,000 English language words for synonyms. Returns data in JSON, XML, serialized PHP array or plain text formats. Based on data from the Princeton University WordNet database and the Carnegie Mellon Pronouncing Dictionary. By John Watson.
  • Capitol Words API

    Offsite — The Capitol Words API provides several methods of accessing detailed information from the Capitol Words database of word frequency from the U.S. Congressional Record. Returns results in JSON and XML.
  • New York City Baby Name Data

    Offsite — Two CSV files available for download: all New York City baby names dating back to 1920, and New York City baby names broken down by ethnicity, dating back to 1990. Data supplied by the New York City Department of Health and Mental Hygiene, compiled by Jennifer 8. Lee for the New York Times City Room Blog.
  • Dolores Labs' Color Name Dataset

    Offsite — 10,000 color/label pairs, based on data collected through Amazon’s Mechanical Turk crowdsourcing marketplace. By Brendan O’Connor.
  • Given Name Frequency Project

    Offsite — Four datasets on given name popularity.
  • OpenCyc API

    Offsite — Programmatic access to the open source version of the Cyc Knowledge Base, the world’s largest and most complete general knowledge base and commonsense reasoning engine. The Cyc Knowledge Base is a formalized representation of a vast quantity of fundamental human knowledge: facts, rules of thumb, and heuristics for reasoning about the objects and events of everyday ...
  • UMass Amherst Linguistics Sentiment Corpora

    Offsite — N-gram counts extracted from over 700,000 online product reviews in Chinese, English, German and Japanese. Formatted to be read as R data frames. By Noah Constant, Christopher Davis, Christopher Potts and Florian Schwarz. Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.
  • We Feel Fine API

    Offsite — We Feel Fine is a data collection engine that scours the internet every ten minutes, harvesting and identifying expressions of human feelings from a large number of blogs. 15,000 to 20,000 feelings are identified and saved per day. You can use the We Feel Fine API to access this data. Optional parameters include feeling, gender, weather conditions and country. By Sep ...
  • Welsh in Higher Education Institutions

    Offsite — Provides information about students enrolled at Welsh Higher Education Institutions (HEIs) taught through the medium of Welsh and information on staff teaching through the medium of Welsh. Source agency: Welsh Assembly Government Designation: National Statistics Language: English Alternative title: Welsh in Higher Education Institutions 2007/08 2006/07 Import source: ...
  • Modern Foreign Languages in Schools in Wales

    Offsite — Statistical bulletin examining the available data on examination entry and performance in modern foreign languages GCSE (not including short course GCSE) and A levels. Source agency: Welsh Assembly Government Designation: National Statistics Language: English Alternative title: Modern Foreign Languages in Schools in Wales Import source: ONS-ons_hub_2008_08.xml External ...