Dataset
ProductDB - Products by UPC, ISBN, ASIN, GTIN, EAN
ProductDB aims to be the World’s most comprehensive and open source of product data. Not only do we want to create a page for every product in the world, we want to connect the underlying structured data together into one huge interlinked dataset. All the data is published as Linked Data. More about ProductDB interlinking.
WHO IS BEHIND PRODUCTDB?
ProductDB has been developed by Ian Davis with infrastructure support from Talis. Please direct feedback and ideas to nospam@iandavis.com.
WHERE DOES THE DATA COME FROM?
We analyse and process data from a variety of open sources including ProductWiki, MusicBrainz, dbpedia FreeBase and OpenLibrary. We also crawl sites that publish GoodRelations RDFa (such as Best Buy) or Open Graph Protocol data in their pages (for example IMDB and Spotify). This data is analysed to form linkages and correspondences between resources.
Please see the ProductDB licensing information for further information.
HOW CAN I GET INVOLVED?
A large amount of the data used to form ProductDB is gathered by our crawler. Many of the crawled sources have open contribution policies which allow you to edit the content directly. Head over to ProductWiki, OpenLibrary, MusicBrainz, FreeBase or Wikipedia and make your contribution. Your changes will be included in ProductDB automatically next time we crawl those sites.
We are considering adding editing and annotation facilities to ProductDB allowing direct updating of some of the data. Please let us know if you think this is something you would use.
GETTING THE DATA
We plan to offer full dumps of the ProductDB dataset as well as some specialised extracts. Please check our dumps page for updates.
HISTORY
ProductDB was originally based on a crawl of ProductWiki performed over the 11th and 12th August 2009. This first conversion contained over 20,000 product names, list prices, images, alternate names, reviews and links to external sites.
The current version of ProductDB vastly increases the coverage of products but does not includes price information or reviews. Read more on our news page.
URL SPACE
Each resource is allocated a URI in the productdb.org domain. The URI space is subdivided into areas using the path of the URI.
URI Pattern Identified Resource
/gtin/{id} A product, {id} is its 14 digit GTIN
/ean/{id} A product, {id} is its 13 digit EAN
/upc/{id} A product, {id} is its 12 digit UPC (barcode)
/asin/{id} A product, {id} is its 10 digit Amazon identifier
/isbn/{id} A product, {id} is its 13 or 10 digit ISBN
/brands/{id} A brand, {id} is a unique identifier
/groups/{id} A grouping of related products, e.g. a film that has been issued on various media.
/classifications/{provider}/{scheme}/{term} A concept in a taxonomy
Data about each resource is published using Linked Data principles. The URI for each resource issues an HTTP redirect with status code 303 to direct the request to a document containing data about the resource. The server performs content negotiation to determine the best document to redirect the request to. The following document formats are provided:
File extension Media Type Notes .html text/html .rdf application/rdf+xml .json application/json See RDF/JSON specification .ttl text/turtle See Turtle specification