Category

Showing 21 - 40 out of 283 datasets

Technology

Not finding the data sets you're looking for? Not all of our data sets are categorized yet. Try checking out tags instead.
  • List of Dirty, Obscene, Banned and otherwise unacceptable words

    Free Download — A banned word list representing a collection of many lists from around the web of words considered socially unacceptable for one reason or another. What to do with a banned word list? Use this dirty word list to screen for spammers and griefers, to censor dissidents; to better understand the semiotic role of taboo signifiers in an online modality; to monitor user ...
  • PyGTrends: Python API for Google Trends Data

    Offsite — This python module is a quasi-API to make it easier to authenticate into Google Trends for those who want to squeeze the extra level of functionality out of their data. The advantage of programmatic access is that the data can be automatically trended and merged. It can be snuck into a 9:00 AM daily email to the VP of Marketing so that she knows to ramp up Google Adwords ...
  • Internet Access and Usage and Online Service Usage: 2006

    Free Download — The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
  • Flickr Images

    Free Download — This Flickr data set contains over 2,000 downloaded images from 52 different groups. The information can be utilized for image content analysis in issues related to rich social media. Each image is indexed by its Flickr photo id and the corresponding group to which it belongs. Citation: Choudhury, M. D., Sundaram, H., Lin, Y-R., John, A., and Seligmann, D. D. (2009). ...
  • Free Public WiFi data set from wigle.net

    Free Download — When a computer running an older version of XP can’t find any of its “favorite” wireless networks, it will automatically create an ad hoc network with the same name as the last one it connected to — in this case, “Free Public WiFi.” Other computers within range of that new ad hoc network can see it, luring other users to connect. Computers with the XP bug that try to ...
  • Information and Communications Technology

    Free Download — The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
  • Juiced Google Analytics Python API: Juice Analytics

    Offsite
  • Flickr - The Commons

    Offsite — About > The key goals of The Commons on Flickr are to firstly show you hidden treasures in the world’s public photography archives, and secondly to show how your input and knowledge can help make these collections even richer. Re-use/Openness Photos have “no known copyright restrictions”. From [rights page](http://www.flickr.com/commons/usage/): > Participating ...
  • Sampling Techniques for Massive Data - Google Video

    Offsite
  • Semantic Search the US Library of Congress

    Offsite
  • Stanford Large Network Dataset Collection

    Offsite — Stanford Large Network Dataset Collection Social networks: online social networks, edges represent interactions between people Communication networks: email communication networks with edges representing communication Citation networks: nodes represent papers, edges represent citations Collaboration networks: nodes represent scientists, edges represent collaborations ...
  • LETOR: Benchmark Datasets for Learning to Rank

    Offsite — We release two large scale datasets for research on learning to rank: MSLR-WEB30k with more than 30,000 queries and a random sampling of it MSLR-WEB10K with 10,000 queries. The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment labels: ...
  • im2gps: estimating geographic information from a single image

    Offsite
  • Enron Email Dataset

    Offsite
  • Amazon Web Services Public Datasets » Data Wrangling Blog

    Offsite
  • Enron Dataset

    Offsite
  • Data for Data Mining

    Offsite
  • Office of Defects Investigation (ODI), Flat File Downloads

    Offsite
  • Pascal Learning Challenge Large Datasets

    Offsite
  • Internet Archive: Details: Amazon ASIN listing and similarity graph

    Offsite