Dataset

The Million Song Dataset - Letter A

Added By thierrybm

The song data download available on this page is the Letter A subset from the Million Songs collection, including audio metadata and features which can be found, fully-cataloged, on the Infochimps site.

The Million Song Data Set is a freely-available collection of audio metadata and features for a million contemporary popular music tracks. What to do with a vast library of song data, including audio metadata and millions of detailed datapoints about music? Develop smarter platforms to enjoy music, apps that deliver enhanced listening experiences and tools for detailed musical analysis.


The Million Song Data Set started as a collaborative project between The Echo Nest and LabROSA. It is supported in part by the NSF.

Its purposes are:

  • To encourage research on algorithms that scale to commercial sizes
  • To provide a reference song data set for evaluating research
  • As a shortcut alternative to creating a large data set with The Echo Nest’s API
  • To help new researchers get started in the MIR field

The core of the song data set is the feature analysis and audio metadata for one million songs, provided by The Echo Nest. The song data set does not include any audio, only the derived features. Note, however, that sample audio can be fetched from services like 7digital, using code we provide.


To get a sense of the data set, you can look at this description of one of the million songs. You may also want to download some additional files which will help you interpret the data offered here.

A free sample consisting of 10,000 songs is also available.