playgrdstar
quaintitative
Published in
2 min readAug 2, 2018

--

Summarisation

Sometimes, we just want to quickly get the gist of a document or an article on the web.

There are a number of Python libraries out there that can help with that (or you can train your own), but the ‘sumy’ library is a good choice.

Let’s quickly get some text to summarise first, by scraping the article content from a page on Bloomberg. We won’t go into the details here on how to scrape the content, which we have covered here.

import requests
import bs4
source = 'https://www.bloomberg.com/news/articles/2018-08-01/u-s-to-consider-higher-tariff-on-200-billion-of-china-imports'
html = requests.get(source).content
soup = bs4.BeautifulSoup(html, 'html5lib')
paras = soup.findAll('p', class_=None)

# Append each of the paragraphs in the article into a list

text = []
for p in paras:
text.append(p.text)

# Join the paragraphs

text = " ".join(text)

Now, we get to the whole point of this article.

We first import the following functions from the ‘sumy’ library -

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer

Summarising the text is super simple. Just 3 self explanatory lines.

parser = PlaintextParser(text, Tokenizer('english'))
summarizer = LexRankSummarizer()
summary = summarizer(parser.document, 5)

And print out the results

for sentence in summary:
print(sentence)
China said that the latest threat by the U.S. to increase tariffs won’t succeed in forcing concessions from Beijing.
The Trump administration said Wednesday that it’s considering increasing the proposed tariff on $200 billion in Chinese imports to 25 percent from 10 percent, in a bid to force China back to the negotiating table.
Along with its pledge to fight back, China also left the door open for a resumption of negotiations.
The first wave of 25 percent tariffs on $34 billion of Chinese goods took effect last month, prompting immediate in-kind retaliation from China, and the next round on $16 billion could be implemented by the U.S. in the coming days or weeks.
Trump said last month that he’s willing to impose tariffs on every good imported from China, which totaled more than $500 billion last year.

The Jupyter notebook with the code is here

--

--

playgrdstar
quaintitative

ming // gary ang // illustration, coding, writing // portfolio >> playgrd.com | writings >> quaintitative.com