As I entered grad school, I wanted to learn how to scrape websites using Python, and I knew Grailed had to be the first one, for I had spent way too many hours manually going through its posts. For those who don’t know the company, it is essentially eBay but tailored to fashion and streetwear (“grails” refers to rare and often very expensive pieces). The company doesn’t offer an API and the website isn’t the easiest to scrape. I will address a few hurdles in this article and hopefully, this can be of some help for those running into similar issues.
- Endless scrolling. Not uncommon, lots of marketplace/social media websites are like this.
- Need to load the page to see the products. Can’t just use BeautifulSoup.
- Inconsistent elements in item posts. Some items have a crossed-out “old” price and a red “new” price, some don’t.
Now let’s get into the solutions.
Must Load the Page
Solution: open a test browser (Chrome) that loads. I have never taken a single class in computer science so I don’t understand the underlying mechanics, but you need to have an actual browser in order for the page and items to load. That is easily done through Selenium. Headless Chrome is a way to run the browser in an environment without the full browser UI, so it gives a real browser context without needing the full-version memory overhead. For the first part, DON’T RUN THE HEADLESS LINE. You need to open the browser for the page to load and display all the items.
You can choose the url to scrape from— I use John Elliott as an example here — and it starts on that page. It should open up a browser to take you directly to the page, and if the page prompts you to log in you can just manually close the prompt (when I wrote the code it didn’t happen, so I will update it when I have time). Then you can see the all of the postings.
Solution: find the height of the page, and scroll down. Essentially, it’s as simple as telling your mouse how many pixels to move down. If you go to the bottom of the page, it loads the next page.
It’s helpful knowing how many pages to scroll down by. Thankfully, Grailed shows you how many listings found. You can see how many items there are per page, and divide the number of listings by number per page then add 1 to that to be sure you hit the absolute end.
Inconsistent Elements in Posts
Solution: For each element you want to scrape, run a “try — else” sequence. I took the conservative route and did this for every single element.
The crossed-out grey part is a class called: “sub-title original-price strike-through”, while the red part is a class called: “sub-title new-price”. If there is no crossed-out price and just one black price, the class is “ sub-title original-price”.
With all of that out of the way, let’s scrape!
Import all the Packages
As a side note, you have to add your browser.exe file to PATH before you can use Selenium.
At this point, you need to manually close the pop-up window that asks you to log in.
Start Scrolling Down
The end result is a dataframe that looks like this:
At this point, I scraped all of the items with a link into each item. Then I go into every single link in order to get more data, such as seller data, item specific description, and user rating. For this, it would be best to have headless chrome to speed up the process.
The result is this:
Now that we have the two dataframes, we can clean and join them. But that process is much less fun than scraping, so I won’t show it here.
I hope this article has been helpful or just fun to read. This was my first web-scraping project and it took me a while to piece together how to do it, but I will be sure to keep exploring more efficient ways of scraping this site. Please comment below if you know any methods, or if you are a Grailed fan like me.
P.S. If anyone from Grailed is reading this, I recommend adding a feature where anytime a similar item someone has followed is listed for sale, a notification is sent to them or it is recommended on the front page.