Web-Scraping Kijiji Ads with Python & BeautifulSoup

Kaine Black
5 min readFeb 3, 2022

--

logos trademark of python and kijiji

There is an endless number of websites on the internet today, and they contain a wealth of information and data … but how can we collect and utilize that data?

Well, one common answer to that question is to use a programming technique called web-scraping.

Web-scraping is the process of collecting web data in an automated fashion, and storing it in a more structured, user friendly format for future analysis.

Kijiji

Kijiji is a website where users can post different products/services that they are offering, and reach a wide audience of potential buyers.

Users can search for products and services on Kijiji based on location, category, keyword search, and many other filters.The image below is an example of a list view (left) of ads being shown, and if you click on the ads you are taken to a more detailed individual ad view (right):

Use Case Scenario — Watch Company

Let’s pretend that we are working for a client who is an avid watch enthusiast. Their company buys used watches, refurbishes them, and resells them for a profit.

The client uses a variety of websites, including Kijiji, to buy used watches that come up for sale. They are a Canadian based company and do all of their buying and selling within Canada. The client doesn’t have time to sift through all of the ads, so they want to be able to quickly take a look at the watches being sold on Kijiji across Canada.

We determine that web-scraping is a good solution for this problem, and decide to build a simple web-scraping tool in Python.

Setting Up The Proper Search

First we will setup a search on Kijiji:

  • Set the category to “Jewelry & Watches”
  • Set the search text to “watch” in attempt to filter out other jewelry in this category
  • Set location to “Canada” since the client only wishes to see watches being sold in Canada

So our search should look like this:

You’ll notice that when we complete the search, our URL will change and look something like the URL below; it now contains our search parameters!

It is also important to note that if we switches pages the URL changes again, and a parameter for the page number appears:

Web-Crawling — Collecting Ad URL’s

Now that we have our search configured, we will land on a list view of some Kijiji ads that meet our search requirements. We want to grab the URL’s for each ad, so that we can process the information on the individual advertisement pages in the next step.

Time to build out some python code! Let’s start by importing our libraries.

Before we get into any coding, a great place to start is always to right click the page of interest and click “Inspect”:

inspect page

This allows us to inspect the HTML code that makes up the webpage we are viewing. Furthermore, if we switch on the “inspect on hover” (shown in the image below) we can inspect elements as we hover over them with our mouse!

example of inspecting with hover option

Using the inspect tool we can see that each of the ads are contained in a div element with a search-item class. Also, the titles of each ad are within an a element with a title class and an associated URL (href).

Now that we have the above information, we can do some HTML parsing with the BeautifulSoup library in python!

gathering ad URLs

The above code grabs the URL for each ad on the page, and stores it in a list variable called ad_links. This step of web-scraping is often referred to as web-crawling because our tool searches for all of the relevant URL’s to visit before scraping the desired data from each URL.

Web Scraping — Collecting Data

Now that we have the URL’s for the pages we wish to scrape information from, we can go ahead and go through the same steps we did above in order to find the HTML elements that contain the information we wish to collect.

  1. Visit an example advertisement
  2. Use the inspect tool to find the HTML elements we want to collect
  3. Parse the page with BeautifulSoup to get the desired data

Here are the fields I decided to collect from each ad:

  • Ad Title, Ad Price, Date Posted, Address, Description, URL

We can use the following code to gather this information from each ad:

scraping data from each ad

Voila! We have now successfully web-scraped data from watch advertisements on the Kijiji website!!

Here is an example of what the output CSV looks like:

final CSV of scraped data

Additional Improvements to Think About

If you want to take this example a few steps further, here are a couple of things you could think about and try out yourself!

  • Can we collect the images from the ads as well?
  • How could we automate the scraping for multiple pages of ads?
  • How could we make a better OOP (object oriented programming) solution?
  • How could we fix the data type issue with the price column?
    (currently has $ symbol either in front or behind the number)
  • If we want to perform this scraping on a regular basis, how could we make sure we aren’t duplicating data in our DB?

Conclusion

Web-scraping is a powerful automation tool that has a wide variety of use cases and is a valuable skill to have in your tool belt. Practicing parsing HTML code and building your own simple web-scraping scripts, like the one presented in this article, is great way you can develop your web-scraping skills!

I hope this article was helpful for you, and thanks for reading!

Connect with me on LinkedIn! 💡👤

--

--

Kaine Black

Data Scientist in Canada 🇨🇦 Articles on ... Data Science | Python Coding | Data Visualization | Data Storytelling | Machine Learning | Databases | Statistics