Web Scraper with Python

A simple Python web scraper in 5 lines of code

Oliver Lövström
Internet of Technology
2 min readFeb 2, 2024

--

In this article, we’re building the simplest possible web scraper. While the idea of “scraping” a webpage for useful information might seem daunting, today’s lesson will demonstrate how simple it can actually be.

Photo by amol sonar on Unsplash

In this guide, we will show you how to create a functional web scraper in just five lines of code, covering the following steps:

  • Import libraries (2 lines)
  • Send a GET request to a URL (1 line)
  • Parse the HTML content (1 line)
  • Extract the title of the page (1 line)

Step 1: Import Libraries

First, we need to import the libraries. Ensure you have BeautifulSoup installed. Here’s how we start:

import requests
from bs4 import BeautifulSoup

Libraries:

  • requests: A HTTP library for Python that simplifies making HTTP requests to fetch web content.
  • BeautifulSoup from bs4: A Python library for pulling data out of HTML and XML files.

Step 2: Send GET Request

Next, we use the requests library to send a GET request to the desired URL. This fetches the webpage content:

response = requests.get("https://medium.com/@oliver.lovstrom")

Description: This line sends a GET request to the specified URL and stores the response, including the webpage's HTML content.

Step 3: Parse the HTML Content

To parse the fetched HTML content, we use BeautifulSoup:

soup = BeautifulSoup(response.text, "html.parser")
print(title)

Description: This line creates a BeautifulSoup object (soup) that parses the HTML content retrieved from the response.

Step 4: Extract the Title

Finally, we extract the title of the page using our BeautifulSoup object:

title = soup.find("title").text

Description: This finds the first <title> tag in the HTML document and extracts its content.

Testing

To test our web scraper, simply run the script:

python web_scraper.py
Oliver Lövström – Medium

Full Code

For the complete code, visit this GitHub repository or see the code snippet below:

"""Web scraper."""
import requests
from bs4 import BeautifulSoup

# Define URL.
URL = "https://medium.com/@oliver.lovstrom"

# Send a GET request to the URL.
response = requests.get(URL, timeout=120)

# Parse the HTML content of the page using BeautifulSoup.
soup = BeautifulSoup(response.text, "html.parser")

# Extract and print the title of the page.
title = soup.find("title").text
print(title)

Further Reading

If you want to learn more about programming and, specifically, Python and Java, see the following course:

Note: If you use my links to order, I’ll get a small kickback. So, if you’re inclined to order anything, feel free to click above.

--

--