The Million Song Dataset - 10,000 Songs Subset

Added By thierrybm

The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. What to do with a vast library of audio metadata and millions of detailed datapoints about music? Develop smarter platforms to enjoy music, apps that deliver enhanced listening experiences and tools for detailed musical analysis.

The download available on this page is a 10,000 song subset of audio features and metadata form the Million Songs collection, which may be found, fully-cataloged, on the Infochimps site.

Its purposes are:

  • To encourage research on algorithms that scale to commercial sizes
  • To provide a reference dataset for evaluating research
  • As a shortcut alternative to creating a large dataset with The Echo Nest’s API
  • To help new researchers get started in the MIR field

The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features. Note, however, that sample audio can be fetched from services like 7digital, using code we provide.

The Million Song Dataset started as a collaborative project between The Echo Nest and LabROSA. It is supported in part by the NSF.

To get a sense of the dataset, you can look at this description of one of the million songs.


Most of the data is licensed the same way as Echo Nest’s API. The code is under GNU public license.