Collection

The Comprehensive Knowledge Archive Network (CKAN) Collection

Showing 101 - 150 out of 369 datasets

From their website:

CKAN is the Comprehensive Knowledge Archive Network, a registry of open knowledge packages and projects (and a few closed ones)…Those familiar with freshmeat, CPAN or PyPI can think of CKAN as providing an analogous service for open knowledge…CKAN is developed and maintained by the Open Knowledge Foundation. Both the CKAN code and data are open: free for anyone to use and reuse. To find out more check out the the CKAN project at knowledgeforge.net

CKAN is a peer in the global data commons and Infochimps is proud to be able to mirror their collection of over 300 datasets.

  • Apertium

    Offsite — Description “Apertium is a toolbox to build open-source shallow-transfer machine translation systems, especially suitable for related language pairs: it includes the engine, maintenance tools, and open linguistic data for several language pairs.” Language-pair data includes: Spanish–Catalan (apertium-es-ca) Spanish–Portuguese (apertium-es-pt) Spanish–Galician ...
  • Vinismo

    Offsite — Vinismo is a project to create a free, complete, up-to-date, and reliable wine guide. It is built in collaboration by wine lovers around the globe, and seeks to document every wine, every winery, every wine region, grape, and wine issue in the world. All content is under an Open Knowledge license (CC-by-sa-2.5-Canada), and available for bulk download.
  • UN Data

    Offsite — Description “The United Nations Statistics Division (UNSD) of the Department of Economic and Social Affairs (DESA) launched a new internet based data service for the global user community. It brings UN statistical databases within easy reach of users through a single entry point (http://data.un.org/). Users can now search and download a variety of statistical ...
  • Akvopedia

    Offsite — About From the [about page](http://www.akvo.org/wiki/index.php/Akvopedia:About): > Akvopedia is a Web-based, free content project about water and sanitation subjects. The name Akvopedia is a portmanteau (combination of words and their meanings) of the words akvo (water in the language Esperanto) and encyclopedia. Akvopedia articles provide links to guide the user to ...
  • OpenGuides (tm): The Guides Made by You

    Offsite — Description From front page: > OpenGuides™ is a network of free, community-maintained wiki guidebooks to places around the world. Anyone is free to contribute, whether it’s by writing new articles or editing the articles that we already have.
  • Audioscrobbler Data

    Offsite — Description “Much of the data available to view on Last.fm is available in several formats through the Audioscrobbler Web Services API.” Format Data variously available in Plain, XML, XSPF, iCal and RSS. License “All web services here are for non-commercial use only under the Creative Commons Attribution-NonCommercial-ShareAlike License. If you want to use these ...
  • Crocodyl: Collaborative Research on Corporations

    Offsite — Description From home page: > Crocodyl is a collaboration sponsored by CorpWatch, the Center for Corporate Policy and the Corporate Research Project. Our aim is to stimulate collaborative research among NGOs, journalists, activists, whistleblowers and academics from both the global South and North in order to develop publicly-available profiles of the world’s most ...
  • Irish Minstrels and Musicians

    Offsite — This classic text is full of lore about pipers and pipemaking, as well as many other aspects of Irish Traditional Music both in Ireland and in the American diaspora. The author, Captain Francis O’Neill, was also the author of the book known commonly as “O’Neill’s”, and probably did more than any other single person in the first half of the 20th century to preserve Irish ...
  • Collaborative publishing house

    Offsite — A place where you could publish your work, request a free book, or be assigned to a project, where a distribution of valuable information could be found. Help us develop the project
  • The DGT Multilingual Translation Memory of the Acquis Communautaire

    Offsite — As of November 2007, the European Commission’s Directorate-General for Translation (DGT) made publicly accessible its multilingual Translation Memory for the Acquis Communautaire (the body of EU law) – a collection of parallel texts (texts and their translation, also referred to as bi-texts) in 22 languages. This is a page for technical users, where you will find a ...
  • Libre Map Project - Free Maps and GIS Data

    Offsite — Description Mainly seem to be an aggregator of data from elsewhere. From front page: > The purpose of the Libre Map Project is to aggregate and make digital maps and related GIS data available for Free. > > Data – search and download Maps and other GIS data > > Documentation – information on Cartography and GIS related topics brought to you by Wikipedia. > > ...
  • Research Papers in Economics

    Offsite — Openness: NOT OPEN license: Restricts commercial use potentially. See <http://repec.org/docs/RePEcDataUse.html>. details: “You do not charge for it or include it in a service or product that is not free of charge.”
  • Numbrary

    Offsite — Description Not a producer of data but focused on extracting and aggregating data from other sources. Openness: OPEN License: no explicit license used but all underlying data from US government so PD. Access: ok. www: yes. bulk: no. api: no.
  • Barcodepedia

    Offsite — About “Barcodepedia.com is a community based online barcode database. The database is completely free to use, and everyone is welcome to contribute.” License “All data inserted into the database by users are released under a Creative Commons Attribution-ShareAlike 2.5 License. Sadly we do not currently expose this data via an API or as downloads but it is a top ...
  • CEPR Data

    Offsite — Description From the front page: > ceprDATA.org provides consistent, user-friendly versions of the Survey of Income and Program Participation (SIPP), Current Population Survey (CPS), and other datasets used at CEPR available to all interested policy researchers and academics. > > Each dataset listed above is available to download. In addition, you can download and ...
  • PubChem: Information on Biological Activities of Small Molecules

    Offsite — “PubChem provides information on the biological activities of small molecules.” For license information, see: <http://www.ncbi.nlm.nih.gov/About/disclaimer.html>
  • EMDAT - The International Emergency Disasters Database

    Offsite — Description From front page: > Since 1988 the WHO Collaborating Centre for Research on the Epidemiology of Disasters (CRED) has been maintaining an Emergency Events Database EM-DAT. EM-DAT was created with the initial support of the WHO and the Belgian Government. > > The main objective of the database is to serve the purposes of humanitarian action at national and ...
  • ILO Labor Statistics Database

    Offsite — Description Main topics areas: Total and Economically Active Population Yearly statistics Economically Active Population Estimates and Projections 1980-2020 (EAPEP) Employment Yearly statistics Periodical statistics Employment for detailed occupational groups by sex (Segregat) ILO-Comparable Estimates – Adjusted annual average employment and unemployment ...
  • Bibsonomy - A blue social bookmark and publication sharing system

    Offsite — Description From front page: > BibSonomy is a system for sharing bookmarks and lists of literature. When discovering a bookmark or a publication on the web, you can store it on our server. You can add tags to your entry to retrieve it more easily. This is very similar to the bookmarks/favorites that you store within your browser. The advantage of BibSonomy is that ...
  • Flossmetrics - Free Libre and Open Source Software Metrics

    Offsite — Description From front page: > The main objective of FLOSSMETRICS is to construct, publish and analyse a large scale database with information and metrics about libre software development coming from several thousands of software projects, using existing methodologies, and tools already developed. The project will also provide a public platform for validation and ...
  • Crystal Eye: Aggregated Crystallographic Data

    Offsite — Description The aim of the CrystalEye project is to aggregate crystallography from web resources, and to provide methods to easily browse, search, and to keep up to date with the latest published information. Openness: OPEN License: not specified (but have open data logo and authors clear intention is for data to be open). Access: ok. bulk: no. www: yes. ...
  • The Arabidopsis Information Resource

    Offsite — Description > The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. Data available from TAIR includes the complete genome sequence along with gene structure, gene product information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and physical ...
  • Ensembl Genome Browser

    Offsite — About From website: > Ensembl is a joint project between EMBL – EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes. Ensembl is primarily funded by the Wellcome Trust. > This site provides free access to all the data and software from the Ensembl project. Click on a species name ...
  • Economagic Economic Time Series

    Offsite — Description A large collection of USA time series data taken from a variety of sources, primarily state and central government in the USA. From the about page <http://www.economagic.com/about/>: > This page is meant to be a comprehensive site of free, easily available economic time series data useful for economic research, in particular economic forecasting. This ...
  • GeoNames

    Offsite — The geonames.org geographical database is available for download free of charge under a creative commons attribution license. It contains over eight million geographical names and consists of 6.3 million unique features whereof 2.2 million populated places and 1.8 million alternate names. All features are categorized into one out of nine feature classes and further ...
  • ARONIS

    Offsite — Description The ARONIS database “includes over 200000 organic compounds”. Openness No information is provided about re-using the data provided on the site.
  • Australian Bureau of Statistics (ABS)

    Offsite — About From website: > The Australian Bureau of Statistics can help you by providing statistical solutions to make informed decisions. The ABS provides statistics on a wide range of economic and social matters, serving government, business and the general population. Statistics include: a wide range of free statistics on Australia’s economy, environment, industry and ...
  • Pew Research Center For The People &amp; The Press

    Offsite — About From website: > Welcome to the Pew Research Center For The People & The Press data archive. This page contains links to the Center’s survey data which are currently available on the web. Survey data are released six months after the reports are issued and are posted on the web as quickly as possible. Openness/re-use Not open. Clickwrap agreement for each ...
  • Dict.cc - English German Dictionary

    Offsite — About From [about page](http://www.dict.cc/?s=about%3A): > dict.cc is not only an online dictionary. It’s an attempt to create a platform where users from all over the world can share their knowledge in the field of translations. Every visitor can suggest new translations and correct or confirm other users’ suggestions. The challenging and most important part of the ...
  • AnthroKids

    Offsite — About > This Web points to the results of two studies which collected anthropometric data of children. The report of the first study, performed in 1975, exists here as a scanned document with an HTML “front end” and as data in several formats. The second study, performed in 1977, exists here as only data, however users can access the data via a more graphically ...
  • National Institute of Standards and Technology (NIST) Data Gateway

    Offsite — About The NIST Data Gateway provides easy access to NIST scientific and technical data. These data cover a broad range of substances and properties from many different scientific disciplines. Openness Much of the material appears to be in the public domain as it is produced by the US Federal Government, but it varies from dataset to dataset.
  • Ordnance Survey Boundary Line

    Offsite — About > Boundary-Line is a specialist 1:10 000 scale boundaries dataset. It contains all levels of electoral and administrative boundaries, from district, wards and civil parishes (or communities) up to parliamentary, assembly and European constituencies. Access/re-use [Available to buy](http://www.ordnancesurvey.co.uk/oswebsite/products/boundaryline/pricing.html) ...
  • MetWare

    Offsite — About > The MetWare project is a collaboration between – but not limited to – several metabolomics groups in the Netherlands and Germany. Our goal is to develop distributed databases and analysis tools for metabolomics research. License Egon Willighagen, one of the project contributors, says on [his ...
  • Commonwealth Legal Information Institute (CommonLII) - English Reports

    Offsite — About From [website](http://www.commonlii.org/int/cases/EngR/): > This database contains the English Reports (1220-1873) and is based on data provided by Justis. Re-use/opennness Allows limited re-distribution for noncommercial purposes – hence not compliant with the [Open Knowledge Definition](http://opendefinition.org/). From [copyright ...
  • Open Wetware

    Offsite — About > OpenWetWare is an effort to promote the sharing of information, know-how, and wisdom among researchers and groups who are working in biology & biological engineering. OWW provides a place for labs, individuals, and groups to organize their own information and collaborate with others easily and efficiently. In the process, we hope that OWW will not only lead ...
  • Amazon Web Services - Public Data Sets

    Offsite — About From [website](http://aws.amazon.com/publicdatasets/): > Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their ...
  • ISO language, territory, currency codes and their translations

    Offsite — Description This is a set of ISO codes including those for country and currency collected together into a useful package by the Debian project. From the package page: > This package provides the ISO-639 Language code list, the ISO-4217 currency list, the ISO-3166 Territory code list, and ISO-3166-2 sub-territory lists. > > It also (more importantly) provides their ...
  • NASA - Multimedia

    Offsite — About Images, video and audio from NASA. Re-use/openness All material is in the public domain. See [copyright statement](http://www.nasa.gov/centers/goddard/multimedia/gtv_copyright.html) and [photo guidelines](http://www.nasa.gov/audience/formedia/features/MP_Photo_Guidelines.html) for further details.
  • United States Department of Agriculture (USDA) Agricultural Research Service (ARS)

    Offsite — About From [About page](http://www.ars.usda.gov/AboutUs/AboutUs.htm): > The Agricultural Research Service (ARS) is the U.S. Department of Agriculture’s chief scientific research agency. Our job is finding solutions to agricultural problems that affect Americans every day, from field to table. From [Datasets page](http://www.ars.usda.gov/services/docs.htm?docid=1328): ...
  • US Copyright Renewal Database

    Offsite — Released in 2008 and funded by Hewlett Foundation. From front page: > … This database makes searchable the copyright renewal records received by the US Copyright Office between 1950 and 1992 for books published in the US between 1923 and 1963. Note that the database includes ONLY US Class A (book) renewals. > > The period from 1923-1963 is of special interest for US ...
  • VoxForge

    Offsite — About > VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac). > We will make available all submitted audio files under the GPL license, and then ‘compile’ them into acoustic models for use with Open Source speech recognition engines such as Sphinx, ISIP, Julius and HTK (note: HTK ...
  • Human Metabolome Database

    Offsite — About > The Human Metabolome Database (HMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body. It is intended to be used for applications in metabolomics, clinical chemistry, biomarker discovery and general education. The database is designed to contain or link three kinds of data: 1) ...
  • Enron Email Dataset

    Offsite — From the CALO Project at Carnegie-Mellon University a massive dataset of emails recovered from discovery documents in the Enron trials About This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a ...
  • Shuttle Radar Topography Mission Elevation Data

    Offsite — Description From <http://www2.jpl.nasa.gov/srtm/>: > … The Shuttle Radar Topography Mission (SRTM) obtained elevation data on a near-global scale to generate the most complete high-resolution digital topographic database of Earth. > > … > SRTM is an international project spearheaded by the National Geospatial-Intelligence Agency (NGA) and the National Aeronautics ...
  • MOCHA-TIMIT

    Offsite — About Authors: Alan Wrench, Queen Margaret University College. Funded by: Engineering and Physical Sciences Research Council. When created: November 1999. Purpose: Phonetically balanced dataset for training an automatic speech recognition system Openness Availability: English speakers available here free for non-commercial use and may be distributed on CDROM for a ...
  • TalkBank

    Offsite — About About TalkBank: > The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of the subfields studying communication. It will use these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary materials via ...
  • The IBL Corpus

    Offsite — About > The IBL Corpus was collected by the University of Plymouth and the University of Edinburgh as part of the EPSRC funded project IBL, Instruction-based Learning for Mobile Robots (GR/M90023, GR/M90160). IBL focused on the problem of how natural language instructions can be used by an intelligent embodied agent to build a hierarchy of complex functions based on a ...
  • eXtended WordNet

    Offsite — About From website: > WordNet is a lexical database for English that has been widely adopted in artificial intelligence and computational linguistics for a variety of practical applications. Since WordNet was designed as a lexical database, it exhibits some limitations when used for knowledge processing applications. Often one needs to retrieve words that are ...
  • FLOSS Manuals

    Offsite — About > FLOSS Manuals is a collection of manuals that explain how to install and use a range of free and open source software. The manuals are friendly and simple, and they are intended to encourage people to explore the wide range of free, open source alternatives to expensive and restrictively licensed software. At FLOSS Manuals you can find manuals for free and ...
  • The New York Times Annotated Corpus

    Offsite — From [website](http://ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19): The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff ...