Web Scraping Sold Clothing on Grailed with Selenium

Published in

Analytics Vidhya

5 min readJan 27, 2021

Introduction

Throughout high school and college, I have used Grailed's platform to sell used clothing that I have import from Japan. For those who may not know, Grailed is essentially an eBay for men's clothing where users can list their used items.

Ultimately, it is my goal to create a tool to improve the Grailed experience, where users can get an accurate price prediction for the item they are about to post. This tool would allow users to better understand the value of the items they are posting.

To create this tool, I first needed a dataset. This need for Grailed data is where my web scraping adventure began. Grailed is by no means an easy website to scrape. Some of my initial concerns were dealing with pop-ups, endless scrolling, and dropdown menus, however, as I delved deeper into web scrapping Grailed many more difficulties became apparent.

To create an accurate price prediction model, I needed sold listings rather than listed, but not yet sold listings. Unfortunately, Grailed makes it more difficult to scrape sold listings compared to unsold listings.

To web scrape Grailed sold listings I will be leveraging python and the Selenium package.

I would first like to recognize Mike Liu for his post on web scraping un-sold Grailed listings, which can be found here, he provided the foundation for much of my code.

Difficulties in Scraping Sold Grailed Listings:

Navigating endless scrolling
Pop-ups
Drop-down menus

These three initial issues were just the tip of the iceberg. I originally hoped to scrape https://www.grailed.com/sold for all recently sold listings. This idea was dropped, however, as I discovered sold items from this address would only load for the first 20-page scrolls. That means that only about 800 listings could be gathered from the standard sold page.

My workaround was to gather sold items from each of the 6,000 designer-specific pages. On each of the 6,000 designer-specific sold pages, there is no scroll limit. This means I could collect far more data by cycling through each designer page and gathering sold item information.

example of designer specific sold page (Gucci).

Gathering Grailed Designer Page Links:

I first collected the links to all the designer pages on Grailed. In total Grailed hosts 6,028 unique designers that I need to collect. To do this, I gathered the links from the designer's page on Grailed.

Code for collecting designer page links.

Collecting the Sold Page for Each Designer:

Next, I needed to get to the sold listings for each specific designer page, rather than the current un-sold listings. This initially seemed like a straightforward process. I suspected that the sold page would just require adding “/sold” to the end of each link. This was not true.

For example, looking at the designer Gucci, I found that the designer page is https://www.grailed.com/designers/gucci, yet to get to the Gucci sold page the URL is https://www.grailed.com/sold/kZ4iDqlEsQ. This means there is no clear order to the URLs of the designer sold pages on Grailed. To work around this, I used Selenium to cycle through each designer page link, click the “show only” drop-down, close the login pop-up, and click the sold box. Once the sold page was loaded, Selenium then gathered the link for the sold page for each of the designers in the ItemDF dataframe.

What ultimately needed to be selected to get to the sold page for each designer.

Code for collecting the sold page for each designer.

Gathering Specific Item Sold Links:

Next, I took all the designer's sold pages, that were gathered in the previous section, and collected the individual item links. Due to time constraints, I decided to not gather from designers that had less than 25 items sold, and I also gather items from a maximum of 4-page scrolls.

Example of the links that I need to gather.

Code for gathering item sold links.

Gathering Item Information:

Finally, I used selenium to cycle through each item link and gather the essential item information that I sought in the first place. For each item I gathered the following information:

Username of poster
Designer of posting
Sub-title of the posting
Size of posting
Color of posting
Condition of posting
Category of posting
Feedback count of the poster
Price sold of posting
Description of posting
Number of images in the posting

Example of what is gathered from each item link.

Code for gathering individual item information.

Conclusion:

After the sold listings had been scraped, I cleaned the data. In the end, the Grailed sold dataset had 103,537 total listings. The above code can be easily adapted to scrape every sold post on Grailed by increasing the scroll limit. The issue, however, is that even gathering around 100,000 listings took around 2-weeks of run time so larger data collection will require patience.

Below is the final dataframe:

Feel free to download the full dataframe on my Github found here.