Web Scraping Myntra’s Product Listings

Tamana R
4 min readNov 7, 2023

--

attached image from google search

Hi. This is my first article on Medium. Let me begin by introducing myself. My name is Tamana, someone who has recently transitioned from a Commerce background to Analytics background. During this transition and learning phase, Medium was a platform that I visited often. The articles on this platform provided valuable insights and guidance. Now, I am excited to share what I have learnt .

While working on a project recently, I encountered the need for scraping data from Myntra’s web application. was looking for some help online with scraping the data from Myntra’s web application. After searching and scrolling through many different sources, I gathered necessary information and compiled it into one block of code which is simple and very convenient to scrape data (at least for beginners like me)

I have used Selenium for scraping the data. Web scraping with Selenium is a powerful technique that enables you to extract data from websites by automating the interaction with web pages. Unlike traditional web scraping methods that rely on parsing HTML or using APIs, Selenium allows you to control a web browser programmatically, making it a versatile tool for extracting data from dynamic and interactive websites.

Let’s dive into the heart of web scraping with Selenium. We’ll start by importing the required libraries,

import pandas as pd
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
import time

Next, we set up the Selenium web driver and create a function that accepts a search term as its input (allowing you to choose any product category you wish to scrape). Concurrently, we initialize lists to store the specific data we’re interested in. To enhance the readability of the web application’s content, I employed the Beautiful Soup HTML parser, making it easier to navigate and subsequently extract data based on HTML tags and classes. To illustrate,

   try:
brand = soup.find_all('h3', class_="product-brand")
for a in brand:
brands.append(a.text)
except AttributeError:
continue
The highlighted section indicates how we can find the html tag for a specific section. To activate this you can use Ctrl+Shift+C and as you navigate your cursor you will be able to locate the tags for your desired sections

Below is the entire code block that simplifies the data extraction process from Myntra, a popular online fashion store

def search_url(search_term, page_number):
template = 'https://www.myntra.com/{}?rawQuery={}&p={}'
return template.format(search_term, search_term, page_number)

driver = webdriver.Chrome()

org_url = input('enter your search term: ')

brands = []
price=[]
original_price=[]
description=[]
ratings=[]
product_url=[]

for i in range(1, 11):
driver.get(search_url(org_url, i))

soup = BeautifulSoup(driver.page_source, 'html.parser')

try:
brand = soup.find_all('h3', class_="product-brand")
for a in brand:
brands.append(a.text)
except AttributeError:
continue

#price
try:
pr = soup.find_all('span',class_="product-discountedPrice")
for b in pr:
price.append(int(b.text.strip('Rs. ')))
except AttributeError:
price.append(None)

missing_count = len(brands) - len(price)
if missing_count > 0:
price.extend([None]*missing_count)

#original price
try:
mrp = soup.find_all('span', class_ = 'product-strike')
for c in mrp:
original_price.append(int(c.text.strip('Rs. ')))
except AttributeError:
original_price.append(None)

missing_counts = len(brands) - len(original_price)
if missing_counts > 0:
original_price.extend([None]*missing_counts)

#description
try:
des = soup.find_all('h4', class_='product-product')
description.extend([i.text for i in des])
except AttributeError:
description =' '

#product url

try:
li_elements = soup.find_all('li', class_="product-base")
for d in li_elements:
a_elements = d.find_all('a', {'data-refreshpage': 'true', 'target': '_blank'})
for a in a_elements:
href = 'http://myntra.com/' + a['href']
product_url.append(href)
except AttributeError:
product_url=' '


driver.close()

This code block is refined and well-curated after multiple trials. It enables you to scrape essential data from Myntra, including brand names, prices, original prices, product descriptions, and product URLs.

A snippet of how your scraped data after storing as a df would look like:

I have scrapped information for the product listings of lipsticks

For the full project code and detailed information, visit my GitHub repository: https://github.com/TamanaKhatri/Web-scraping-using-Selenium

After scraping the data, you can store them in a dataframe and in your local system as a csv file. The acquired data can be used for further analysis.

Conclusion

Web scraping with Selenium is a powerful technique for extracting valuable data from the web, and it has certainly been a game-changer in my transition to Analytics. I hope you find this introduction to web scraping useful and that it inspires you to explore the endless possibilities it offers. Feel free to leave comments, ask questions, or share your own experiences with web scraping. Let’s embark on this exciting journey together.

Thank you for joining me on this adventure, and I look forward to sharing more insights with you in the future. Happy web scraping!

You can reach out to me for any doubts related to this project on khatritamana12@gmail.com

--

--