An Introduction to Web Scraping With Python

A starters guide on how to extract data from websites

Yohan Kulasinghe
LinkIT
2 min readMay 15, 2020

--

Image by kreatikar from Pixabay

For many purposes, you may need to extract data from web sites. We call this “Web Scraping” in short. In this article, you’ll learn how to perform a basic web scraping task using Python.

First, you need to install Python to your machine.

You can easily download and install python for any platform from https://www.python.org/downloads/

There are many ways to perform Web Scraping. I will mainly discuss web scraping using “requests and “BeautifulSouplibraries.

Python does not include the above-mentioned libraries by default. Therefore you have to manually install these two libraries.

Let’s see how we can install the required libraries. Open your preferred terminal. Run following commands;

Now we are ready to use Python for Web Scrapping. Let’s learn a little bit of coding now.

Step 1 — Import request and BeautifulSoup libraries

Step 2 — Connect your required internet source. (Here I have used the famous “lorem” site for this demonstration)

Step 3 — Go through the required resource and examine the DOM structure of the web source. (You can do this by inspecting HTML using any web browser)

In the “lorem” site, they use <h2> tags to mark titles. So we can get all <h2> tags by using the following line of code

Step 4 — Loop through the result and extract what you want

Step 5 — You can view the output by simply printing the “topics” array

Refer the BeautifulSoup library documentation for more advanced use cases
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Here I have added the full code for your reference

Stay tuned for more articles!

--

--

Yohan Kulasinghe
LinkIT

Undergraduate at Faculty of IT, University of Moratuwa