How to scrape data from KillStar using python
Welcome again…Another website, Fashion & Lifestyle brand with a twist of darkness, channeling emotional power and raw energy into every thread. We’ll be using Python, so if you have no experience using Python I would recommend brushing up on your knowledge of that language.
As always, before web scraping we have to figure out which specific data points we need to scrape, after looking at a page, I automatically saw a ton of different data points we can get, in this project we’ll be scraping product name, product link and price.
Let’s begin!
Basic introduction you could probably skip that I copied from my other article
First things first, we will need to have Python installed, make sure you have Python and some IDE installed. Selenium pro is a web scraping package that allows us to mimic a web browser using Python, it might be best to have more understanding on web scraping. Selenium pro package — https://pypi.org/project/selenium-pro/
pip install selenium-pro
Installing extension
Download Selenium Auto Code Generator from chrome web store, instead of copy pasting the xpath, this tool help and makes the process easy, without copy pasting hassle. Download from here — https://chrome.google.com/webstore/detail/selenium-auto-code-genera/ocimgcpcnobcnmclomhhmjidgoiekeaf/related
Let’s get started!
Now that we have our Python environment setup, let’s open up a blank Python script. Let’s import the Selenium pro package that you hopefully preinstalled from the previous paragraph (just pip install selenium-pro). Once installed, import the following packages:
from selenium_pro import webdriver
import time
from selenium_pro.webdriver.common.keys import Keys
We are using the Google Chrome browser as our GUI, but you can use other browsers within Selenium pro, if you’de like to use a different browser go for it! Make sure to have the specific browser installed on your machine.
Now, within Selenium pro we need to define our web browser, so let’s do so by using the following line of code:
driver = webdriver.Start()
I would recommend running all of your code up to this point and see if the code runs successfully, if so you’re pretty much ready to continue!
Kill Star Pipeline
Next up is the fun part, click the DK extension which we have installed earlier, and click “start recording” , this is definitely wont be a complex problem but thankfully you have me here anyway.
Open kiss star website, and add wait of 3 sec for the website to load and search for the keyword in the website and press enter. To add wait event right click on the screen and click wait -> 3. Now, if you click on the extension you will find the code already there in the extension as below.
# to open the url in browser
driver.get('https://www.killstar.com/')
time.sleep(2)
Great! This will point our Python Chrome browser to that specific website above, the “time.sleep(3)” function just tells Python to wait 3 seconds before continuing on, this is not a necessity but something I put in anyways.
After that, extension will search the Id from driver.find_element_by_pro and click(), click event will click the Id
# to click on the element found
driver.find_element_by_pro(‘8hOdJ2ynXAqHDBO’).click()
time.sleep(2)
# to click on input field
driver.find_element_by_pro(‘skbNql5q1rppQft’).click()
and send_keys(‘black’) will enter the keyword black and send_keys(Keys.ENTER) will click enter
# to type content in input field
driver.find_element_by_pro(‘cHffT3erKJoE7nF’).send_keys(‘black’)
# press Enter key
driver.switch_to.active_element.send_keys(Keys.ENTER)
Copy the code in extension and test the code until here..
Getting The Data
Awesome! So let’s go and resume recording, after entering the keyword in passion planner website, hover over to product title and then right click and click scrape->text to get the product text.
now, in the same way you can scrape title, link, and price. In the extension, you actions will be mimicked like below:
# to fetch the text of element
product_name=list_element.find_element_by_pro(‘EFVHNXOc6W087AI’).text
# to fetch the link of element
product_link=list_element.find_element_by_pro(‘rHMtu3iTZhyZwAA’).get_attribute(‘href’)
# to fetch the text of element
price=list_element.find_element_by_pro(‘jX43MMdNDT5bREb’).text
# to fetch the text of element
reviews=list_element.find_element_by_pro(‘vmzW6NqMmlemyB9’).text
we are done here..believe it or not
Complete Code
In case you got stuck or confused, here is the entire code for this project:
from selenium_pro import webdriver
import time
from selenium_pro.webdriver.common.keys import Keys
driver = webdriver.Start()
# to open the url in browser
driver.get('https://www.killstar.com/')
time.sleep(2)
# to click on the element found
driver.find_element_by_pro('8hOdJ2ynXAqHDBO').click()
time.sleep(2)
# to click on input field
driver.find_element_by_pro('skbNql5q1rppQft').click()
# to type content in input field
driver.find_element_by_pro('cHffT3erKJoE7nF').send_keys('black')
# press Enter key
driver.switch_to.active_element.send_keys(Keys.ENTER)
time.sleep(3)
# to fetch the text of element
product_name=list_element.find_element_by_pro(‘EFVHNXOc6W087AI’).text
# to fetch the link of element
product_link=list_element.find_element_by_pro(‘rHMtu3iTZhyZwAA’).get_attribute(‘href’)
# to fetch the text of element
price=list_element.find_element_by_pro(‘jX43MMdNDT5bREb’).text
# to fetch the text of element
reviews=list_element.find_element_by_pro(‘vmzW6NqMmlemyB9’).text
Running This Program
Now, to run this program, copy the code from extension and just save it as a .py file > open your terminal / command prompt and type in the following line:
python3 PATH/TO/YOUR/.PY/FILE
Or if you’re using an IDE such as Pycharm, just run the program within it. When you run this program you will see the Chrome Browser open up, it will wait a few seconds and then print out the data points to the Python console!
Congrats! I would recommend looking at ways in which you can improve on this project: Can you add a front end where people can post their links in? Can add loop to scrape all the links and text from all the pages Otherwise, you should be proud of yourself for making it through this tutorial!