Member-only story

Easily Web Scrape and Summarize News Articles Using Python

Chaim Gluck
Towards Data Science
4 min readJul 14, 2019
There’s lots of news to read and lots of coffee to drink!

In today’s digital world, we’re bombarded with endless information. We’ve got infinitely scrolling social media feeds and a 24-hour news cycle. So there’s plenty of news to stay aware of and we’ve got to be able to digest it quickly!

So let’s go through an exercise to shrink news articles to a more easily digestible size.

We’ll scrape an example article using the requests and BeautifulSoup packages, then we’ll summarize it using the excellent gensim library. You can follow along interactively by downloading this Jupyter Notebook available on my Github.

Let’s get right to it!

# Imports
import requests
from bs4 import BeautifulSoup
from gensim.summarization import summarize

Now let’s pick an interesting article:

Link - I want to read it. But I bet it’s TOO long!

Now that we have an article, we’ll retrieve its content:

# Retrieve page text
url = 'https://www.npr.org/2019/07/10/740387601/university-of-texas-austin-promises-free-tuition-for-low-income-students-in-2020'
page = requests.get(url).text

Webscraping:

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Chaim Gluck
Chaim Gluck

Written by Chaim Gluck

Freelance Data Scientist. Published by Towards Data Science. Say hello at www.linkedin.com/in/chaimgluck

Responses (2)