Information about UPC, ASIN, Walmart Product Codes

Photo by Toa Heftiba on Unsplash

Types of Product Codes

UPC: Universal Product Code (UPC) is a 12-digit bar code used extensively for retail packaging in the United States. A typical process of obtaining a 12-digit UPC number is as follows: — License a unique Company Prefix from your local GS1 office. GS1 is a not-for-profit organisation that develops and maintains global standards for business communication. — Assign product number(s) to unique products making your number equal 11 digits — Using a check digit calculator with your 11 digit number, generate your check digit.

How to Get UPC Product Codes?

Online Converter:

There are several data providers that take product URLs and convert them the corresponding Code to a UPC code. Some of these are:

UPC Databases

The above sites provide product analytics and information of product across the various platforms it is being sold on. There are also large databases that contain UPCs and we can search through them. One particular example of it is barcodelookup.com Here all we need to do is type in the UPC and we get details on the product.

How to scrape UPC codes?

UPC codes can be found on all the product websites if you look close enough. Every product contains a dictionary which includes all product info- name, price, description, product code, upc etc.

my_header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" + "(KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"} querystring = {"page":"10", 'product_url':'https://www.walmart.com/search/?query=watch'} item_url=[] total_pages=int(querystring["page"]) for i in range(1,total_pages+1): url='https://www.walmart.com/search/?page='+ str(i) +'&ps=48&query=watches' r= requests.get(url, headers= my_header) price_per_page=[] if r.status_code==200: soup_main= BeautifulSoup(r.content, 'html') summary=soup_main.find('div', {'class':'search-product-result', 'id':'searchProductResult'}) product_list= summary.find_all('li') for prod in product_list: try: item_url.append(prod.find('a', {"class":"product-title-link line-clamp line-clamp-2 truncate-title"}).get('href')) except: pass else: print("Error-",r.status_code) product_code=[u.split('/')[-1] for u in item_url]
upc=[] item_ID=[] product_name=[] for prod_code in product_code: item_url= 'https://www.walmart.com/reviews/product/'+ prod_code +'?page=2' r = requests.get(item_url,headers=proxy_headers) r.status_code soup = BeautifulSoup(urlopen(item_url),'html.parser') for val in soup.find_all("script"): #print(val) if 'upc' in str(val): val=str(val) prob_dict = val.split('upc')[1] UPC=prob_dict.split(',')[0] UPC=UPC[3:-1] upc.append(UPC) prob_dict = val.split('usItemId')[1] item_id=prob_dict.split(',')[0] item_id=item_id[3:-1] item_ID.append(item_id) prob_dict = val.split('productName')[1] product=prob_dict.split(',')[0] product=product[3:-1] product_name.append(product)
df_dict={"Product Name": product_name, "Walmart Product Code": item_ID, "UPC": upc} df=pd.DataFrame(df_dict) df.to_csv("UPCs of Walmart Watches")

--

--

We cover all the cutting edge natural language processing, machine learning and AI powered strategies to extract web data on big data scale.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jay M. Patel

Cofounder/principal data scientist at Specrom Analytics (specrom.com) natural language processing and web crawling/scraping expert. Personal site: JayMPatel.com