Dataset

The Million Song Dataset - Additional Files

Added By thierrybm

The song data download available on this page is the additional files (SQLite databases, textfiles, etc) that will help you understand each of the other data sets within the Million Songs Collection). The Million Songs collection includes audio metadata and features, and can be found, fully-cataloged, on the Infochimps site.

The Million Song Data Set is a freely-available collection of audio metadata and features for a million contemporary popular music tracks. What to do with a vast library of song data, including audio metadata and millions of detailed datapoints about music? Develop smarter platforms to enjoy music, apps that deliver enhanced listening experiences and tools for detailed musical analysis.


The Million Song Data Set started as a collaborative project between The Echo Nest and LabROSA. It is supported in part by the NSF.

Its purposes are:

  • To encourage research on algorithms that scale to commercial sizes
  • To provide a reference song data set for evaluating research
  • As a shortcut alternative to creating a large data set with The Echo Nest’s API
  • To help new researchers get started in the MIR field

The core of the song data set is the feature analysis and audio metadata for one million songs, provided by The Echo Nest. The song data set does not include any audio, only the derived features. Note, however, that sample audio can be fetched from services like 7digital, using code we provide.


To get a sense of the data set, you can look at this description of one of the million songs.

A free sample consisting of 10,000 songs is also available.