Dataset

ProductDB - Products by UPC, ISBN, ASIN, GTIN, EAN

Added By mrflip

ProductDB aims to be the World’s most comprehensive and open source of product data. Not only do we want to create a page for every product in the world, we want to connect the underlying structured data together into one huge interlinked dataset. All the data is published as Linked Data. More about ProductDB interlinking.

WHO IS BEHIND PRODUCTDB?

ProductDB has been developed by Ian Davis with infrastructure support from Talis. Please direct feedback and ideas to nospam@iandavis.com.

WHERE DOES THE DATA COME FROM?

We analyse and process data from a variety of open sources including ProductWiki, MusicBrainz, dbpedia FreeBase and OpenLibrary. We also crawl sites that publish GoodRelations RDFa (such as Best Buy) or Open Graph Protocol data in their pages (for example IMDB and Spotify). This data is analysed to form linkages and correspondences between resources.

Please see the ProductDB licensing information for further information.

HOW CAN I GET INVOLVED?

A large amount of the data used to form ProductDB is gathered by our crawler. Many of the crawled sources have open contribution policies which allow you to edit the content directly. Head over to ProductWiki, OpenLibrary, MusicBrainz, FreeBase or Wikipedia and make your contribution. Your changes will be included in ProductDB automatically next time we crawl those sites.

We are considering adding editing and annotation facilities to ProductDB allowing direct updating of some of the data. Please let us know if you think this is something you would use.

GETTING THE DATA

We plan to offer full dumps of the ProductDB dataset as well as some specialised extracts. Please check our dumps page for updates.

HISTORY

ProductDB was originally based on a crawl of ProductWiki performed over the 11th and 12th August 2009. This first conversion contained over 20,000 product names, list prices, images, alternate names, reviews and links to external sites.

The current version of ProductDB vastly increases the coverage of products but does not includes price information or reviews. Read more on our news page.

URL SPACE

Each resource is allocated a URI in the productdb.org domain. The URI space is subdivided into areas using the path of the URI.

URI Pattern	Identified Resource
/gtin/{id}	A product, {id} is its 14 digit GTIN
/ean/{id}	A product, {id} is its 13 digit EAN
/upc/{id}	A product, {id} is its 12 digit UPC (barcode)
/asin/{id}	A product, {id} is its 10 digit Amazon identifier
/isbn/{id}	A product, {id} is its 13 or 10 digit ISBN
/brands/{id}	A brand, {id} is a unique identifier
/groups/{id}	A grouping of related products, e.g. a film that has been issued on various media.
/classifications/{provider}/{scheme}/{term}	A concept in a taxonomy

Data about each resource is published using Linked Data principles. The URI for each resource issues an HTTP redirect with status code 303 to direct the request to a document containing data about the resource. The server performs content negotiation to determine the best document to redirect the request to. The following document formats are provided:

File extension	Media Type	Notes
.html	text/html	 
.rdf	application/rdf+xml	 
.json	application/json	See RDF/JSON specification
.ttl	text/turtle	See Turtle specification