Member-only story
Easily Web Scrape and Summarize News Articles Using Python
In today’s digital world, we’re bombarded with endless information. We’ve got infinitely scrolling social media feeds and a 24-hour news cycle. So there’s plenty of news to stay aware of and we’ve got to be able to digest it quickly!
So let’s go through an exercise to shrink news articles to a more easily digestible size.
We’ll scrape an example article using the requests and BeautifulSoup packages, then we’ll summarize it using the excellent gensim library. You can follow along interactively by downloading this Jupyter Notebook available on my Github.
Let’s get right to it!
# Imports
import requests
from bs4 import BeautifulSoup
from gensim.summarization import summarize
Now let’s pick an interesting article:
Now that we have an article, we’ll retrieve its content:
# Retrieve page text
url = 'https://www.npr.org/2019/07/10/740387601/university-of-texas-austin-promises-free-tuition-for-low-income-students-in-2020'
page = requests.get(url).text