How to get all product data from shopify stores

J Devansh
2 min readApr 13, 2020

--

A friend asked me to help him scrape some shopify product information to populate a prototype he was working on. I thought it might a nice thing for people to learn how to do with python, it’s really easy. I’ll describe how I did it, and link to the code at the bottom.

How I did it -

Some googling told me that shopify stores will return all their information if you add `/products.json` to the end of any shopify store url.

If you do that, you get a json object, which you can throw into a json prettifier such as https://jsonformatter.org/json-pretty-print.

You can examine the object, and figure out which fields you want.

My friend wanted vendor, title, link to product page, description, price and links to any images on the page.

Vendor and title were given in the first level object, so it was easy to pull them out from the json object. The price was in a nested array of product variants, so I pulled out the price of the first variant, which was good enough for the purpose. They were normally different sizes or colors with similar prices.

For a link to the product, you have to append field called the handle to the stores url. Easy, since you have the base url handy.

Description was tricky, there was no field for that, but the text on the page came from paragraph tags in the HTML description. You would have to parse the HTML and extract it, but I didn’t want to get into it for now, so I just pulled all the HTML.

For images, you just had to go through them in the list they come in, and pull out the src attribute.

He wanted a csv, so I wrote out to a file, but there were tonnes of commas in the HTML descriptions, so I create a custom delimiter which could be used to differentiate columns when putting the file into excel or google sheets.

Oh, lastly — shopify paginates responses for stores with more than a certain number of products, so from the url — `shopifystore.com/products.json`, you had to add `?page=n`, where n begins at 1 and keeps going, with the request eventually returning empty objects. So for each store link, you add and increment the page requests till you start getting empty objects.

Add that’s it.

My friend gave me a list of stores, and the script is running now and has about 16,000 products in the file already.

Link to the github to so you can play around with this script.

If anyone is interested in doing this as a learn to code project, can help with that. :)

Thanks for reading!

--

--