Wikipedia article summarization tool written in python.

Asiri Hewage
May 24, 2019 · 2 min read
Screenshot

Installation

Use the package manager [pip] to install nltk.
pip install nltk

Import

import bs4 as bs

import urllib.request

import re

import heapq

Usage

scraped_data = urllib.request.urlopen(‘https://en.wikipedia.org/wiki/Object-oriented_programming')
article = scraped_data.read()

parsed_article = bs.BeautifulSoup(article,’lxml’)

paragraphs = parsed_article.find_all(‘p’)

# Removing Square Brackets and Extra Spaces
article_text = re.sub(r’\[[0–9]*\]’, ‘ ‘, article_text)
article_text = re.sub(r’\s+’, ‘ ‘, article_text)

# Removing special characters and digits
formatted_article_text = re.sub(‘[^a-zA-Z]’, ‘ ‘, article_text )
formatted_article_text = re.sub(r’\s+’, ‘ ‘, formatted_article_text)

word_frequencies = {}
for word in nltk.word_tokenize(formatted_article_text):
if word not in stopwords:
if word not in word_frequencies.keys():
word_frequencies[word] = 1
else:
word_frequencies[word] += 1

sentence_scores = {}
for sent in sentence_list:
for word in nltk.word_tokenize(sent.lower()):

if word in word_frequencies.keys():
if len(sent.split(‘ ‘)) < 30:
if sent not in sentence_scores.keys():
sentence_scores[sent] = word_frequencies[word]
else:
sentence_scores[sent] += word_frequencies[word]

maximum_frequncy = max(word_frequencies.values())
for word in word_frequencies.keys():
word_frequencies[word] = (word_frequencies[word]/maximum_frequncy)

summary_sentences = heapq.nlargest(7, sentence_scores, key=sentence_scores.get)
summary = ‘ ‘.join(summary_sentences)
print(summary)


Deployment
You can deploy this project on Google App Engine by creating a simple debain VM instance. After installing all dependancies using pip you can clone this project and run it. Don’t forget to use Use pip3 and python3.

Ex:

pip3 install nltk
python3 wikipy.py

Contributing
The project is done by Asiri (SLIIT UG 3yr) according to the idea given by Ms.Sachindi (SLIIT UN 4yr)
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.

License
[MIT]

    Asiri Hewage

    Written by

    SE Undergraduate | SLIIT | Google Play App Developer | Application Engineering Intern — Pearson Lanka

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade