Learn Web Scraping using Python in under 5 minutes

Image for post
Image for post
Figure 1: Image Source- The Data School

What is Web Scraping?

Scraping using BeautifulSoup

Getting Started

pip install beautifulsoup4

Let’s scrape!

Inspecting

Image for post
Image for post
Figure 2: Webpage to be scraped
Image for post
Image for post
Figure 3: HTML code of the webpage
Image for post
Image for post
Figure 4: HTML code showing the required tags

Parsing

from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.bbc.com/sport/football/46897172"
# We use try-except incase the request was unsuccessful because of 
# wrong URL
try:
page = urlopen(url)
except:
print("Error opening the URL")
soup = BeautifulSoup(page, 'html.parser')

Extracting the required elements

content = soup.find('div', {"class": "story-body sp-story-body gel-      body-copy"})
article = ''
for i in content.findAll('p'):
article = article + ' ' + i.text

Saving the parsed text

with open('scraped_text.txt', 'w') as file:
file.write(article)
 Cristiano Ronaldo’s header was enough for Juventus to beat AC Milan and claim a record eighth Supercoppa Italiana in a game played in Jeddah, Saudi Arabia. The Portugal forward nodded in Miralem Pjanic’s lofted pass in the second half to settle a meeting between Italian football’s two most successful clubs. It was Ronaldo’s 16th goal of the season for the Serie A leaders. Patrick Cutrone hit the crossbar for Milan, who had Ivorian midfielder Franck Kessie sent off. Gonzalo Higuain, reportedly the subject of interest from Chelsea, was introduced as a substitute by Milan boss Gennaro Gattuso in Italy’s version of the Community Shield. But the 31-year-old Argentina forward, who is currently on loan from Juventus, was unable to deliver an equalising goal for the Rossoneri, who were beaten 4–0 by Juve in the Coppa Italia final in May.

Conclusion

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store