Samuraiser: The YouTube transcript summarizing extension

Dhanyahegde
Newolf Society
Published in
5 min readSep 12, 2021

Have you spent a lot of time just watching youtube videos and realized that it wasn't worth your time? Then Samuraiser is an extension that is all you need.

Problem statement:

The issue with the content on youtube is that there’s a lot of it. The downside of this is that it gives rise to incessant clickbait videos, which end up wasting the time of the user.

A quick glance at the summary would let the user know whether the video is worth their time and if it contains the topics that they would be looking for. It can also be used to quickly recapitulate useful information in the video.

This would also be beneficial for the hearing-impaired community, as they wouldn’t have to waste time going through the complete transcript to find suitable videos. The main advantage would be for the student community, who would find it useful to cherry-pick lecture/tutorial videos according to their preferences.

How did we implement it?

We created a Chrome Extension which will make a request to a backend REST API where it will perform NLP and respond with a summarized version of a YouTube transcript.

High-Level Overview

• Got transcripts/subtitles for a given YouTube video Id using a Python API.

• Performed text summarization on obtained transcripts using LSA.

• Built a Flask backend REST API to expose the summarization service to the client.

  • Developed a Chrome extension that utilizes the backend API to display summarized text to the user.

Features:

  1. Generates a summary with a simple click within a few seconds.
  2. Timestamps are presented with a summary that directs to that part by clicking the timestamp.

The basic strategy it uses is using ML summarizing techniques on the transcript of the video.

The project is divided into two separate entities:

  1. a client which is a chrome extension
  2. the server which processes the request and sends it back to the client as an HTTP response.

The above structure is taken owing to the implementation of Restful services.

Technologies used:

  1. Sumy for summarizing
  2. Flask for deployment
  3. JS for the chrome extension part

Method of implementation:

Server

The server side is implemented using flask as a restful service. The summarization is done by first generating the transcript of the video for which if the video has already transcript then it is used with the help of a python library youtube-transcript-API, otherwise first the audio is taken and speech to text transformation is done. Again useful python libraries are used for this.

After this, the summary can be generated using the transformers. As described here Useful Blog, there are two ways to do this

  1. Extractive summarization
  2. Abstractive summarization

Here, currently, we used Sumy with LSA summarizer which is based on extractive summarization.

The summary is then given back as an HTTP response after one gives a GET HTTP request on /api/summarize?youtube_video=”a valid url”.

To server the request over HTTPS (as youtube is an HTTPS website and generating an HTTP request to a HTTP website will give a mixed content error), the app needs be to run on https rather than http, for this there can be two solutions -

  1. A easy option is to use pyngrok (a wrapper library for ngrok), which allows to expose local host on a public url. It even gives https and http options for the url, so fulfills our purpose. This even allows to run the app on environments like google colab where localhost won’t be acessible. So, this is good for testing purposes. The colab notebook used for testing is added in server folder.
  2. Another hard option is to deploy it on a host which will give the required https url and certificates.

Also, the CORS needed to be added as again if the HTTP request is headless then it would be blocked due to CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource. So, CORS is simply allowed for all domains on all routes using the flask_cors library.

Chrome-Extension

On clicking the summarize button on the popup, if the url is of form https://www.youtube.com/watch?v=* the popup js makes a GET request to our Server API .A div element is added below the youtube player with a preload text. Then, after the text is received, it is passed to the content js which then changes the content inside the above div element. Most of the properties are inherited from parent element, so that it fits perfectly there. Extra styling are added in content.css
It can be used by loading unpacked from chrome://extensions/.

More about LSA:

Latent Semantic Analysis LSA is an unsupervised approach technique in Natural Language Processing. It is an Algebraic Statistical method that extracts the features of the sentences that cannot be directly mentioned. These features are essential to data but are not original features of the dataset.

Code for the model implementation:

def SumySummarize(text):     
from sumy.parsers.html import HtmlParser
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
LANGUAGE = "english"
SENTENCES_COUNT = 3
import nltk
# url = "https://en.wikipedia.org/wiki/Automatic_summarization"
# parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
# or for plain text files
# parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))
parser = PlaintextParser.from_string(text, Tokenizer(LANGUAGE)) stemmer = Stemmer(LANGUAGE)
summarizer = Summarizer(stemmer)
summarizer.stop_words = get_stop_words(LANGUAGE)
s = ""
for sentence in summarizer(parser.document, SENTENCES_COUNT): s += (str)(sentence)
return s

Code for Chrome extension:

Requirements: We created a chrome extension application directory containing essential files required as mentioned below.

The below diagram indicates the brief role of each of the files for building a chrome extension.

code for manifest.json and popup.js:

manifest.json
Popup.js code

GitHub link:

https://github.com/SammithSB/Samuraiser/tree/final-test

Future scope:

We hope in the future if someone is interested they can find possible ways to extend the summarizer for other streaming services as well.

Here we have come to the end of the project on the topic of summarization of YouTube videos. We have tried our best to include all the necessary features that are required and related to the project.

--

--

Dhanyahegde
Newolf Society

Passionate about Tech and inclusion and diversity of women in tech!