Web Scraper with Python

A simple Python web scraper in 5 lines of code

Published in

Internet of Technology

2 min readFeb 2, 2024

In this article, we’re building the simplest possible web scraper. While the idea of “scraping” a webpage for useful information might seem daunting, today’s lesson will demonstrate how simple it can actually be.

In this guide, we will show you how to create a functional web scraper in just five lines of code, covering the following steps:

Import libraries (2 lines)
Send a GET request to a URL (1 line)
Parse the HTML content (1 line)
Extract the title of the page (1 line)

Step 1: Import Libraries

First, we need to import the libraries. Ensure you have BeautifulSoup installed. Here’s how we start:

import requests
from bs4 import BeautifulSoup

Libraries:

requests: A HTTP library for Python that simplifies making HTTP requests to fetch web content.
BeautifulSoup from bs4: A Python library for pulling data out of HTML and XML files.

Step 2: Send GET Request

Next, we use the requests library to send a GET request to the desired URL. This fetches the webpage content:

response = requests.get("https://medium.com/@oliver.lovstrom")

Description: This line sends a GET request to the specified URL and stores the response, including the webpage's HTML content.

Step 3: Parse the HTML Content

To parse the fetched HTML content, we use BeautifulSoup:

soup = BeautifulSoup(response.text, "html.parser")
print(title)

Description: This line creates a BeautifulSoup object (soup) that parses the HTML content retrieved from the response.

Step 4: Extract the Title

Finally, we extract the title of the page using our BeautifulSoup object:

title = soup.find("title").text

Description: This finds the first <title> tag in the HTML document and extracts its content.

Testing

To test our web scraper, simply run the script:

python web_scraper.py
Oliver Lövström – Medium

Full Code

For the complete code, visit this GitHub repository or see the code snippet below:

"""Web scraper."""
import requests
from bs4 import BeautifulSoup

# Define URL.
URL = "https://medium.com/@oliver.lovstrom"

# Send a GET request to the URL.
response = requests.get(URL, timeout=120)

# Parse the HTML content of the page using BeautifulSoup.
soup = BeautifulSoup(response.text, "html.parser")

# Extract and print the title of the page.
title = soup.find("title").text
print(title)

Web Scraper with Python

A simple Python web scraper in 5 lines of code

Step 1: Import Libraries

Step 2: Send GET Request

Step 3: Parse the HTML Content

Step 4: Extract the Title

Testing

Full Code

Further Reading

Introduction to Programming with Python and Java

Offered by University of Pennsylvania. Boost Your Computer Programming Skills.

Written by Oliver Lövström