User

Ganglion

Jacob Perkins
74bdaad1c9f09c79e5c229fc995e674c.jpg?size=80&default=http%3a%2f%2fwww.infochimps.com%2fmarketplace%2fassets%2fgravatar-sample

Uploaded datasets

Showing 1 - 20 out of 36 datasets
  • 11,000+ Youtube Videos

    Free Download — This dataset is useful for studying the dynamics of threaded comments in rich media sharing, as well as interesting participants in the conversations. This dataset involves a set of about 11,000 videos. Included is information about: tags number of views number of comments ratings textual content of the comments the authors and timestamps of the comments. Citation: ...
  • 20 Newsgroups Dataset (De-Duped Version)

    Free Download — The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It is speculated that it was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly mention this collection. The 20 Newsgroups collection has become a ...
  • 3.5 Million+ US Domestic Flights from 1990 to 2009

    Free Download — Description: Over 3.5 million monthly domestic flight records from 1990 to 2009. Data are arranged as an adjacency list with metadata. Ready for immediate database import and analysis. Fields: Short name Type Description Origin String Three letter airport code of the origin airport Destination String Three letter airport code of the destination airport ...
  • Airports and Their Locations

    Free Download — A list of over 9,000 global and domestic airport locations. Data includes airport code, geographic coordinates, other geo-related data, and data unique to airports like runway length and elevation. Airports and their surrounding areas are hubs of business activity, in many cases its a tourist’s first glimpse of a city, or in other instance the epicenter of shipments. ...
  • Amsterdam Museum Data Set (RDF)

    Offsite — The Amsterdam Museum dataset describes more than 70,000 cultural heritage objects related to the city of Amsterdam described by the museum. The metadata was retrieved from an XML Web API of the museum’s Adlib collection database and converted to RDF compliant with the Europeana Data Model (EDM). This makes the Amsterdam Museum data the first of its kind to be officially ...
  • AOL Search Data

    Free Download — The AOL Search Data is a collection of real query log data that is based on real users. The data set consists of 20M web queries collected from 650k users over three months. These private searches are perfect for research and mining. The data is sorted by anonymous user ID and sequentially arranged. The collection can be used for personalization, query reformulation or ...
  • C. Elegans Neural Network, Flat Adjaceny List

    Free Download — A directed, weighted network representing the neural network of C. Elegans. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998). Original experimental data taken from J. G. White, E. Southgate, J. N. Thompson, and S. Brenner, Phil. Trans. R. Soc. London 314, 1-340 (1986).
  • Color List

    Free Download — A comprehensive list of colors that are included in wikipedia articles about color. This includes the color name, hex triplets, and rgb values. Source: http://en.wikipedia.org/wiki/List_of_colors
  • Condensed Matter Collaboration Network

    Free Download — Description: Data describes a collaboration network of scientists posting preprints on the condensed matter archive at www.arxiv.org. This version is based on preprints posted to the archive between January 1, 1995 and March 31, 2005. The network is weighted, with weights assigned as described in M. E. J. Newman, Phys. Rev. E 64, 016132 (2001). These data can be cited ...
  • Digg.com Data Set

    Free Download — Digg is a social news website. The dataset spans, from August to November, 2008, when Digg’s cornerstone function still consisted of letting people vote stories up or down, called digging and burying, respectively. In the dataset, the total number of user-user links in the social graph is about 56,000 spanning over about 10,000 users. This dataset is useful for studying ...
  • Disasters worldwide from 1900-2008

    Free Download — Disaster data from 1900 – 2008, organized by start and end date, country (and sub-location), disaster type (and sub-type), disaster name, cost, and persons killed and affected by the disaster. Create disaster data trend reporting, based on geography, frequency, date or nature of the event. Design a visualization or time lapse illustrating disaster events around the ...
  • Flickr Images

    Free Download — This Flickr data set contains over 2,000 downloaded images from 52 different groups. The information can be utilized for image content analysis in issues related to rich social media. Each image is indexed by its Flickr photo id and the corresponding group to which it belongs. Citation: Choudhury, M. D., Sundaram, H., Lin, Y-R., John, A., and Seligmann, D. D. (2009). ...
  • Flora of North America

    Offsite — FNA presents for the first time, in one published reference source, information on the names, taxonomic relationships, continent-wide distributions, and morphological characteristics of all plants native and naturalized found in North America north of Mexico. Source: http://www.fna.org/
  • Google Books Ngrams

    Offsite — Description Here are the datasets backing the Google Books Ngram Viewer. These datasets were generated in July 2009; we will update these datasets as our book scanning continues, and the updated versions will have distinct and persistent version identifiers (20090715 for the current set). Each of the links will directly download a fragment of the given corpus. For ...
  • Hex color codes to RGB values and color names

    Free Download — A simple mapping from hex color codes to color names and rgb values. Eg: color, hex, r, g, b Almond,#EFDECD,239,222,205 Dodger blue,#1E90FF,30,144,255 Meat brown,#E5B73B,229,183,59 Scarlet,#FF2000,255,32,0 Tiffany Blue,#0ABAB5,10,186,181 Violet (color wheel),#7F00FF,127,0,255 Source: http://en.wikipedia.org/wiki/List_of_colors
  • Jazz Musicians Network

    Free Download — Description: List of edges of a network of Jazz musicians as a flat (.tsv) file. Data compiled by members the Alex Arenas group (from Dept. of Computer Science and Mathematics, Universidad Rovira i Virgili). Please cite P.Gleiser and L. Danon , Adv. Complex Syst.6, 565 (2003). Fields: Short name Type Description Source Int Integer vertex label Target Int ...
  • Marvel Universe Chronology Project

    Offsite — The MCP is an effort to catalog every actual appearance by every significant character in the Marvel Universe, and place them in their proper chronological order. If there’s a particular character that’s struck your fancy, that you just can’t live without owning their every appearance, this is the place to start. Simply click on the first character of their name (if it’s ...
  • Marvel Universe Social Graph

    Free Download — A fun Marvel Comics character collaboration graph constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. The Marvel Universe, that is, the artificial world that takes place in the universe of the Marvel comic books, is an example of a social collaboration network. They compare the characteristics of this universe to ...
  • Mushroom Data Set

    Offsite — This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no ...
  • National Geophysical Data - Nighttime Lights Time Series

    Offsite — Version 2 DMSP-OLS Nighttime Lights Time Series The files are cloud-free composites made using all the available archived DMSP-OLS smooth resolution data for calendar years. In cases where two satellites were collecting data – two composites were produced. The products are 30 arc second grids, spanning -180 to 180 degrees longitude and -65 to 65 degrees latitude. A ...