Showing 61 - 80 out of 716 datasets


Not finding the data sets you're looking for? Not all of our data sets are categorized yet. Try checking out tags instead.
  • Biomedical Text corpora and related data collection resources.

  • Computer hacker wordlists from

  • PAIDA - Pure Python scientific analysis package

  • Wikipedia:Lists of common misspellings/For machines - Wikipedia, the free encyclopedia

  • Temperature data (HadCRUT3 and CRUTEM3)

  • Word List - 250,000+ Hyphenated, Capitalized and Compound English words

    Free Download — A common word list with over 250,000 entries of hyphenated, capitalized and compound English words. The download consists of entries containing more than one word, as well as capitalized words and acronyms. Phrases are considered “common” if they or variations of them occur in a standard dictionary or thesaurus. This word list is available in a simple, ...
  • EMDAT - The International Emergency Disasters Database

    Offsite — Description From front page: > Since 1988 the WHO Collaborating Centre for Research on the Epidemiology of Disasters (CRED) has been maintaining an Emergency Events Database EM-DAT. EM-DAT was created with the initial support of the WHO and the Belgian Government. > > The main objective of the database is to serve the purposes of humanitarian action at national and ...
  • Ensembl Genome Browser

    Offsite — About From website: > Ensembl is a joint project between EMBL – EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes. Ensembl is primarily funded by the Wellcome Trust. > This site provides free access to all the data and software from the Ensembl project. Click on a species name ...
  • The New York Times Annotated Corpus

    Offsite — From [website]( The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff ...
  • True Marble Imagery

    Offsite — About From website: > Unearthed Outdoors is proud to present our line of full color global imagery. Our True Marble™ imagery is some of the most realistic medium resolution imagery available on the market. At 15 meter resolution, this true color imagery can be used for: GIS & web mapping applications High Definition Television (HDTV) effects (weather, movies, ...
  • Meta Package for Network Related Datasets

    Offsite — Description This is a meta-package: i.e. a listing of other packages and/or material to add to CKAN. tag: ckan toadd network Material to Process <> Canada Geospatial Data Infrastructure Roads inventory content – all in GML. Reference info here: <> Listings: ...
  • The collaborative, 3D encyclopedia of proteins and other molecules

    Offsite — Description From the email excerpted on [Peter Murray-Rust’s blog]( > Hi Dr. Murray-Rust, > > I’m a student in Joel Sussman’s lab at the Weizmann Insitute of > Science. Joel, Jaime Prilusky and I have developed Proteopedia, a > new online tool/database with the overall goal of making structural > biology clearer for ...
  • Python Cheese Shop : Shakespeare 0.4

  • i2b2: Informatics for Integrating Biology &amp;amp; the Bedside

  • Word Lists Collection

    Offsite — The data is a smorgasbord of word lists, including spell check oriented word lists, an inflection database, parts of speech word list, jargon file word lists, the contents from Ispell, spell check dictionaries, tables that convert between American, British and Canadian spellings, and links to several other word lists.
  • Central Nervous System (CNS) Compound Library from OTAVA

    Free Download — There are many different drug targets expressed in brain tissue. The delivery of drugs to central nervous system (CNS) is complicated by blood brain barrier (BBB) which controls entry of drugs into CNS. BBB is one of the most important factors limiting the development of drugs that specifically target brain disorders. Nowadays BBB remains a bottleneck in brain drug ...
  • Wikipedia³ - Conversion of Wikipedia into RDF

    Offsite — Wikipedia³ is a conversion of the English Wikipedia into RDF. It’s a monthly updated dataset containing around 47 million triples. The creation of the dataset is motivated by several factors, one being the desire to have more real-world RDF datasets of reasonable size. Wikipedia assembles a wealth of information created and maintained by people all over the globe – ...
  • Cloudiness, Wind Speed, Heating/Cooling Days, and Relative Humidity for Select Cities - 1971-2000

    Free Download — All information is airport data, except as noted. The data is from a period of record through 2005, except heating and cooling normals for period 1971-2000. The temperature is in Fahrenheit degrees. The source is the U.S. National Oceanic and ...
  • Word List - 1000 Most Frequent Words from an Internet Corpus

    Free Download — This file consists of the 1,000 most frequently used English words as used on the Internet computer network in 1992.
  • Hydrofracking - Bradford PA Hydraulic Fracturing Fluid Product Disclosure

    Free Download — Hydrofracking – Bradford PA Hydraulic Fracturing Fluid Product Disclosure – ATGAS 2H CHESAPEAKE APPALACHIA LL of the Bradford PA Hydrofracking well blowout from: Daniel Spadoni | Community Relations Coordinator Department of Environmental Protection 208 West Third Street, Suite 101, Williamsport, PA 17701 Phone: (570) 327-3659 | Fax: (570) 327-3565 ...