How to scrape data from Crown & Caliber using python
Welcome again….IN crown & caliber you can Buy and sell preowned luxury watches at Crown & Caliber. crown & caliber is a premier buyer and seller of used luxury watches including Rolex, Cartier, and OMEGA ,We’ll be using Python, so if you have no experience using Python I would recommend brushing up on your knowledge of that language.
As always, before web scraping we have to figure out which specific data points we need to scrape, after looking at a page, I automatically saw a ton of different data points we can get, in this project we’ll be scraping the product name, product link and price of the products in crown & caliber.
Let’s begin!
Basic introduction you could probably skip that I copied from my other article
First things first, we will need to have Python installed, make sure you have Python and some IDE installed. Selenium pro is a web scraping package that allows us to mimic a web browser using Python, it might be best to have more understanding on web scraping. Selenium pro package — https://pypi.org/project/selenium-pro/
pip install selenium-pro
Installing extension
Download Selenium Auto Code Generator from chrome web store, instead of copy pasting the xpath, this tool help and makes the process easy, without copy pasting hassle. Download from here — https://chrome.google.com/webstore/detail/selenium-auto-code-genera/ocimgcpcnobcnmclomhhmjidgoiekeaf/related
Let’s get started!
Now that we have our Python environment setup, let’s open up a blank Python script. Let’s import the Selenium pro package that you hopefully preinstalled from the previous paragraph (just pip install selenium-pro). Once installed, import the following packages:
from selenium_pro import webdriver
import time
from selenium_pro.webdriver.common.keys import Keys
We are using the Google Chrome browser as our GUI, but you can use other browsers within Selenium pro, if you’de like to use a different browser go for it! Make sure to have the specific browser installed on your machine.
Now, within Selenium pro we need to define our web browser, so let’s do so by using the following line of code:
driver = webdriver.Start()
I would recommend running all of your code up to this point and see if the code runs successfully, if so you’re pretty much ready to continue!
Crown & Caliber Pipeline
Next up is the fun part, click the DK extension which we have installed earlier, and click “start recording”, this is definitely wont be a complex problem but thankfully you have me here anyway.
Open crown & caliber website and add wait of 3 sec for the website to load and search for the keyword in the website and press enter. To add wait event right click on the screen and click wait -> 3. Now, if you click on the extension you will find the code already there in the extension as below.
# to open the url in browser
driver.get(‘https://www.crownandcaliber.com/')
time.sleep(3)
Great! This will point our Python Chrome browser to that specific website above, the “time.sleep(3)” function just tells Python to wait 3 seconds before continuing on, this is not a necessity but something I put in anyways.
After that, extension will search the Id from driver.find_element_by_pro and click(), click event will click the Id
# to click on the element found
driver.find_element_by_pro(‘n8N5dLseJA0ZxwJ’).click_pro()
and send_keys(‘watch’) will enter the keyword watch and send_keys(Keys.ENTER) will click enter
# to type content in input field
driver.find_element_by_pro(‘V2OcJ4QxNSn2tKU’).type(‘watch’)
# press Enter key
driver.switch_to.active_element.type(‘Enter’)
Copy the code in extension and test the code until here..
Getting The Data
Awesome! So let’s go and resume recording, after entering the keyword in crown & caliber website, hover over to product title and then right click and click scrape->text to get the product text.
now, in the same way you can scrape link , description and also price. In the extension, you actions will be mimicked like below:
# to fetch the text of element
title 1=list_element.find_element_by_pro(‘Ix470DWysgh5hvf’).text
# to fetch the text of element
title 2=list_element.find_element_by_pro(‘AdGgHQwac5DkWkB’).text
# to fetch the text of element
title 3=list_element.find_element_by_pro(‘A5iPVEbRrjfHxAD’).text
# to fetch the text of element
price=list_element.find_element_by_pro(‘kZZwvGf5kSiWl2R’).text
# to fetch the link of element
link=list_element.find_element_by_pro(‘lYgU15XKz5O5lnk’).get_attribute(‘href’)
we are done here.. believe it or not
Complete Code
In case you got stuck or confused, here is the entire code for this project:
from selenium_pro import webdriver
import time
from selenium_pro.webdriver.common.keys import Keys
driver = webdriver.Start()
# to open the url in browser
driver.get(‘https://www.crownandcaliber.com/')
time.sleep(3)
# to click on the element found
driver.find_element_by_pro(‘n8N5dLseJA0ZxwJ’).click_pro()
# to type content in input field
driver.find_element_by_pro(‘V2OcJ4QxNSn2tKU’).type(‘watch’)
# press Enter key
driver.switch_to.active_element.type(‘Enter’)
time.sleep(3)
# to fetch the text of element
title 1=list_element.find_element_by_pro(‘Ix470DWysgh5hvf’).text
# to fetch the text of element
title 2=list_element.find_element_by_pro(‘AdGgHQwac5DkWkB’).text
# to fetch the text of element
title 3=list_element.find_element_by_pro(‘A5iPVEbRrjfHxAD’).text
# to fetch the text of element
price=list_element.find_element_by_pro(‘kZZwvGf5kSiWl2R’).text
# to fetch the link of element
link=list_element.find_element_by_pro(‘lYgU15XKz5O5lnk’).get_attribute(‘href’)
Running This Program
Now, to run this program, copy the code from extension and just save it as a .py file > open your terminal / command prompt and type in the following line:
python3 PATH/TO/YOUR/.PY/FILE
Or if you’re using an IDE such as Pycharm, just run the program within it. When you run this program you will see the Chrome Browser open up, it will wait a few seconds and then print out the data points to the Python console!
Congrats! I would recommend looking at ways in which you can improve on this project: Can you add a front end where people can post their links in? Can add loop to scrape all the links and text from all the pages Otherwise, you should be proud of yourself for making it through this tutorial!