Why a UPC code lookup can get pretty complicated

Because the “U” in UPC doesn’t mean “Universal”.

--

The strange story of the humble barcode

Universal Product Codes, more commonly known as UPCs or barcodes, have been around for ages. The first UPC marked item ever scanned at a retail checkout was at the Marsh supermarket in Troy, Ohio at 8:01 a.m. on June 26, 1974. It was a 10-pack of Wrigley’s Juicy Fruit chewing gum. In an almost quaint way, the cashier remarked:

“I was a little bit nervous at the time,” she said. “I mean what if this doesn’t work? Everybody was there taking pictures, the photographers, the local press, people from around town. But it worked just fine. It was quite my 15 minutes of fame, I suppose.”

Worked it did … and boy did it grow!

In a quiet way, its innocuous start gave way to one of modern retail’s biggest game changers. From being featured once as controls on the Star Trek Enterprise, to being challenged by the new usurpers that are RFID tags and QR codes, barcodes are both the mundane minutiae of modern life and cultural icons of cold efficiency, identification and control.

Yep.

What is a UPC ?

UPCs are based on a pretty simple principle; the black lines of the barcode absorb light which is read and translated into bits of information. The thickness and order of these lines represent a number. That number is filed through a database to find a matching one. When a match is read, any needed information is pulled to whatever use it is for.

What does a UPC look like?

There are some key differences between a UPC and a barcode. A UPC is a barcode symbology that maps a product’s information to its visual manifestation in the form of a barcode. It usually is a 12 digit code that is uniquely assigned to every trade item, and references that item alone, along with the company that produces that item.

A barcode, on the other hand, is a pattern of stripes of varying widths that translates into numbers, which put together, forms the UPC.

Users use a laser scanner that reads the barcode, retrieves the UPC referencing that item, and uses the UPC to reference that product’s information on a database (which is usually a retailer’s inventory system or manufacturer)

So are UPCs unique?

Actually, no. Today, a lot has changed. Getting hold of a UPC isn't nearly as easy as it used to be. For some reason or another, contrary to what most people think, the Universal Product Code is no more Universal than Federal Express is Federal.

Why not, doesn't anyone coordinate UPC databases at a global level?

Firstly, there is no centralized database to store all UPCs and check against. While our UPC database is one of the most extensive out there, there have been few other attempts to create this (the Internet UPC Database would be one, but coverage is, at best, thin).

Which brings us to the governing body that maintains UPC standards, GS1. Most companies initially come to GS1 to get a barcode number for their products. The current architecture of GS1 standards is as follows:

  • Identify: Standards for the identification of items, locations, shipments, assets, etc.
  • Capture: Standards for encoding and capturing data in physical data carriers such as barcodes and RFID tags
  • Share: Standards for sharing data between parties

GS1 itself doesn’t maintain a product inventory of every UPC issued, making it difficult to cross-reference. While most companies initially come to GS1 to get a barcode number for their products, they do not provide GS1 with a list of their products. Most references are checked against private inventory systems, or on retail websites.

In short, there is no coordination between GS1, the global standards body in charge of maintaining the UPC system and the actual issuance of UPC codes, leading to the following (and very confusing) result:

3 completely unrelated items, all sharing the same UPC
The same UPC on a different website

The entire process of actually creating a UPC is error-prone. The system’s efficiency has been adversely effected by the emergence of e-commerce marketplaces like Amazon and Bestbuy Marketplace, where online sellers enter inaccurate UPCs, sometimes even manually. In some cases, sellers even “Google” their product UPCs and enter the first one that crops up.

Additionally, some companies have their own internal UPC systems which inadvertently leak into global databases, making things even more confusing and hard to coordinate.

The state of affairs becomes even worse especially when companies resell barcode regardless of country or origin, which throws the entire UPC structure out of the window.

Furthermore, UPCs don’t take into account product variations. In some cases, manufacturers choose to market different variations of the same product differently (e.g. an iPod Red is marketed differently from regular iPods), and subsequently use different UPCS, while other manufacturers consider all variations to be the same entity. UPCs can even vary depending on who sells it (i.e. T-mobile iPhones have different UPCs than AT&T iPhones).

Bono and the iPod red

The problem is that the decision to use similar or different UPCs is arbitrary and varies greatly by manufacturer, making consolidation even more difficult.

Isn’t it cheap to generate a barcode?

Getting a UPC can be an expensive process. Businesses need to register with GS1, paying up to $10,500 in setup fees and $2,100 annually to maintain an active registration. All of these payments only go towards obtaining a company prefix that allows products with that prefix to be attributed to that particular company.

Why are there companies reselling barcodes, and how does this affect the system?

The prohibitive costs has led to a thriving sub-industry of reselling barcodes to smaller players. This has undermined the actual function of UPCs — to tie products to the original manufacturers, not the owner of the license who then resells UPCs to other companies. Its is currently legal to resell barcodes.

It is apparently legal to resell barcodes

But all of that doesn't mean we should abandon the UPC. It’s a relatively cheap way of tying a physical product to an online registry. The problem doesn’t lie in the code itself.

This is how we lost a glorious opportunity in creating a global unified database of every product manufactured, with prices, manufacturer, retailer and product descriptions and create a more perfect market through product and price visibility.

We can do better than this.

How we created a global UPC Database

When we started out, our vision was to organize the world’s ecommerce data. It was much much harder than we previously thought. While we initially looked at UPCs as a way of uniquely identifying products, it quickly became apparent that the problems plaguing the system were making it difficult to do so. There were too many edge-cases; similar products with different package sizes, similar products but marketed differently, inaccurate and error-prone codes — the list goes on.

Eventually, we decided to build an organization system from ground-up. We developed a proprietary product matching algorithm that finds similar products across our vast database of 50 million products, and tries to infer if they are, in reality, the same product. To this effect, the algorithm generates a confidence interval by trying to match every product attribute available — name, brand, manufacturer, specifications (weight, measurements, etc), and over 50 other attributes.

Two products are only ever inferred to be the same product if, and only if the algorithm generates an extremely high confidence interval.

We also created an absolute, unique, product identifier called the sem3_id that is independent of manufacturer or country of origin. A sem3_id can be used to pull all of a product’s data from our database, including manufacturer, prices, online retailers and price histories.

A sem3_id is also tagged onto a UPC (when available), allowing people to lookup a product’s sem3_id (or for that matter, any product’s information) in our database using UPCs

When new product is found to be an exact match of an existing item in our database, that product’s information is updated into the other body of information that’s tagged to a sem3_id.

If a sem3_id doesn’t exist, a new one is created and all matching product data is attributed to it.

Why create a new standard?

We’re not creating a new standard, rather we’ve created an extensive database that attributes product metadata to unique products, which are linked to by a uniquely-generated sem3_id (like a web address)

Not all products have UPCs. But all products in our UPC products database have sem3_ids.

Companies don’t have to register with any governing body to obtain sem3_ids — they are automatically generated free of cost whenever a new product is listed online and subsequently indexed in our database.

As a result, Semantics3 has created the world’s largest database of distinctly classified products. Using our proprietary data extraction technology, we’ve matched products sold online to UPCs and sem3_ids, creating a centralized database of product information that people can look up using both sem3_ids and UPCs.

Test out our UPC Database API through a 30-day free trial.

--

--