Introducing SummarLight — A Chrome Extension That Highlights The Most Important Parts Of An Article

Bilal Tahir
Mar 26, 2019 · 4 min read

TLDR: I made a Chrome Extension that highlights the most important parts of a web page (post/article) so you can skim through it in no time!

You can find the extension here: https://chrome.google.com/webstore/detail/summarlight/ligjmagakdphdlenhhncfegpdbbendlg?hl=en-US&gl=US

I’m an information junkie and can spend hours at a time reading posts on the internet. The majority of my time is spent going through articles about tech. And even though I tell myself that it’s better than endlessly scrolling through Facebook or watching funny cat videos on Youtube, I still feel guilty for spending so much time absorbing content rather than working on my projects. As painful it is to admit, I learn way more when I actually do stuff rather than read posts or videos about it.

I have always wanted some sort of an extension which would summarize an article I am going through for me so I can get the gist of it. This would drastically cut down my reading time.

The problem is that, while there has been huge progress in NLP (Natural Language Processing), particularly in the last few years, text summarization still has some ways to go.

We can still get some good results, particularly when the content is structured in specific ways that reduces variation and a limited set of vocabulary is used. Good examples of this are legal and/or medical documents.

For instance legal documents can have repetitive patterns like an introduction section, a section describing the case, information about plaintiff/defendant etc. As someone who has worked in these…rigorously structured settings, I know that a lot of times when workers are starting on a new project, they just take an older document, and using it as a template, copy/paste in the stuff that changes (name, location etc.). So there is a lot of similarity between these documents.

Additionally, the content is spelled out in the same dry vernacular lawyers excel at i.e. there is very little variation in terminology and sentence structure. Using these patterns, we can build algorithms to get very good summaries of huge documents.

However — most articles online tend to be a little more diverse in content than legal documents. And so, summarizing them into the most relevant parts can be challenging.

There are two ways to summarize in NLP:

Extractive Summarization: Where we pick the most important parts of the text and use those sentences to summarize the article.

Abstractive Summarization: Where we summarize the text in our own words. This is closer to how someone like you or me would summarize an article i.e. by taking in all the information and using their own words to describe what the gist of it was.

Perfecting Abstractive Summarization is considered up there among the holy grail of NLP Learning. And significant advances have been made in recent years.

However — for the purpose of this project, I focused on Extractive Summarization. My reasoning for this was because I wanted to take a middle ground between providing summarized content while still letting users see the original text. having this context is important as the summarization algorithm will most probably not be perfect for now.

And so I landed on the idea of creating a Chrome Extension that generates an Extractive Summary of any web page you are on; and then it highlights those sections. This way you could use the highlighted portions to skim through an article but at the same time quickly read around the highlighted text if you need more context.

A Quick Demo Of The Extension

I don’t know if this would actually be a good idea, or even if the highlighted content would be any good, but I wanted to give it a shot and so SummarLight was born!

There are improvements still to be made. Because of the compute heavy nature of the task, the extension takes time (15–30 seconds) to generate the summary and this can be annoying for users.

And even though I have tried to keep costs down by adopting a serverless architecture for my summarization API, I have to be careful about how it’s all going to add up if people do end up using it.

I hope to improve on both the quality of the summary generated, and the user experience in future versions.

You can find the extension here: https://chrome.google.com/webstore/detail/summarlight/ligjmagakdphdlenhhncfegpdbbendlg?hl=en-US&gl=US

Please do try it out and let me know what you think. I appreciate any feedback! :)

Bilal Tahir

Written by

I’m a scrapper/hacker. I like to build things. https://www.linkedin.com/in/biltahir

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade