-
Offsite
—
The data is a smorgasbord of word lists, including spell check oriented word lists, an inflection database, parts of speech word list, jargon file word lists, the contents from Ispell, spell check dictionaries, tables that convert between American, British and Canadian spellings, and links to several other word lists.
-
Free Download
—
This file consists of the 1,000 most frequently used English words as used on the Internet computer network in 1992.
-
Free Download
—
This file consists of the 1,000 most frequently used English words from a wide variety of common texts listed in decreasing order of frequency
-
Offsite
—
The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC’s host institution. The LDC was founded in 1992 with a grant from the Advanced ...
-
Free Download
—
The song data download available on this page is the additional files (SQLite databases, textfiles, etc) that will help you understand each of the other data sets within the Million Songs Collection). The Million Songs collection includes audio metadata and features, and can be found, fully-cataloged, on the Infochimps site. The Million Song Data Set is a ...
-
Free Download
—
The song data download available on this page is the Letter H subset from the Million Songs collection, including audio metadata and features which can be found, fully-cataloged, on the Infochimps site. The Million Song Data Set is a freely-available collection of audio metadata and features for a million contemporary popular music tracks. What to do with a vast library ...
-
Free Download
—
The song data download available on this page is the Letter Y subset from the Million Songs collection, including audio metadata and features which can be found, fully-cataloged, on the Infochimps site. The Million Song Data Set is a freely-available collection of audio metadata and features for a million contemporary popular music tracks. What to do with a vast library ...
-
Free Download
—
The song data download available on this page is the Letter E subset from the Million Songs collection, including audio metadata and features which can be found, fully-cataloged, on the Infochimps site. The Million Song Data Set is a freely-available collection of audio metadata and features for a million contemporary popular music tracks. What to do with a vast library ...
-
Offsite
—
About From the website: > XDXF is a project to unite all existing open dictionaries and provide both users and developers with universal XML-based format, convertible to and from other popular dictionary formats. There are currently 308 dictionary files in various languages. Format It appears dictionary files are in XML. Access/Re-use The [SourceForge XDXF ...
-
Free Download
—
1,185 King James Version frequent substrings (KJVfreq.txt) The most frequently occurring 1,185 substrings in the King James Version Bible ranked and counted by order of frequency.
-
Free Download
—
This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Australian Dictionary Of Biography.
-
Free Download
—
This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Dictionary Of Australian Biography.
-
Free Download
—
366 often misspelled words (oftenmis.txt) many of the most commonly misspelled words in English speaking countries