Developing Microsoft Edge extensions powered by SoTA NLP models

A step-by-step tutorial on building a Microsoft Edge extension for text paraphrasing using PEGASUS, KeyBERT, and WordNet

Tezan Sahu
Data Science at Microsoft

--

A Microsoft Edge extension is a small program that allows developers to modify or “extend certain functionality in Microsoft Edge. It enhances the browsing experience of a specific target audience by usually performing a niche function. Information aggregation, web content enrichment, ad blocking, memory optimization, and password management are just some examples of the plethora of functionality these extensions enable.

Developing an extension for Edge is not very complicated. Such extensions are built on web technologies like HTML, JavaScript, and CSS. Based on the functionality that they are designed to provide, these extensions may also require a server with API endpoints exposed that the user interface can interact with.

Source: Image created by the author.

In this article, I present a detailed overview of how to get started creating your very own Edge extensions that leverage state-of-the-art NLP techniques. I show this with an extension I’ve created to be an AI-powered text paraphraser and synonym finder for keywords in text. I use the PEGASUS transformer model from HuggingFace for paraphrasing, along with the KeyBERT model for keyword extraction, and WordNet to find synonyms. Such an extension might come in handy for numerous purposes, including rewriting a block of sentences in an article, post, or email.

So, let’s get started!

Tl;dr: This repository contains all the code mentioned in this article. Although GitHub gists are used as code snippets throughout this article, if copied directly, they may not work as intended. In case you face such issues, please refer to the repository for the exact directory structure and code that goes into each file.

Bird’s-eye view of how the extension works

Let’s start by giving the extension a name. I will call it Janus AI: Smart Paraphrasing Solution (According to Roman mythology, Janus is the god of change and transitions, and the idea of paraphrasing is to “change” the expression of some text for better clarity).

The user interface of the extension is to be a popup (like most other extensions) that allows users to input text for paraphrasing in two ways:

  • The user can type or paste textual content into the input box.
  • The user can select a piece of text from any tab in the browser and click the extension to automatically populate the input box.

When the user clicks the “Rephrase” button, the extension sends the text to the API server, which performs the paraphrasing using PEGASUS, keyword extraction using KeyBERT, and then finds synonyms for these keywords using WordNet. When the result is returned to the extension, the paraphrased text, along with the keywords and their synonyms, is displayed on the popup.

The user can then copy the paraphrased text just as it is or modify it with the synonyms of the keywords identified and then copy the text for further use.

Developing the RESTful API server

Janus AI requires a RESTful API server for the extension to send the user’s input text and receive the paraphrased text along with the keywords and their synonyms. I spin up this server using FastAPI as it is extremely powerful yet very simple to use and has pretty good documentation. The API endpoints of this server use the modules for paraphrasing, keyword extraction, and synonym finding.

Preliminaries

First, I create a folder called server/ for the API server, which contains two files: main.py and paraphrasingModule.py. Then I install the following packages in a virtual environment:

fastapi==0.75.2
keybert==0.5.1
nltk==3.7 # To leverage WordNet for finding synonyms
sentence-splitter==1.4 # To split sentences from input text
sentencepiece==0.1.96 # Required by PEGASUS paraphraser model
torch==1.11.0
transformers==4.18.0
uvicorn==0.17.6 # Lightweight server to run FastAPI code

These packages are found in the server/requirements.txt file.

Module for paraphrasing, keyword extraction and synonym finding

The server/paraphrasingModule.py file contains the PEGASUS paraphraser, KeyBERT keyword extractor, and synonym finder.

PEGASUS for paraphrasing: PEGASUS stands for Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models. Developed by Google Research, it is fine-tuned for text paraphrasing and is available on HuggingFace. This model and its tokenizer are loaded when the PegasusParaphraser class is instantiated. Because the user input can contain multiple sentences, SentenceSplitter() is used to split the paragraph into individual sentences and pass them as batch input to the tokenizer and model. The model requires the following hyperparameters to be set:

  • num_return_sequences is set to 1 to return only one paraphrase per sentence
  • num_beams is set to 10 to allow 10 beams during beam search for selecting alternatives of the input sequence
  • max_length is set to 100 to allow the generation of a maximum of 100 tokens per paraphrased sentence
  • temperature is set to 1.5 to generate more diverse outputs (temperature controls prediction randomness)

The output from the model is a List of paraphrased sentences, which are then concatenated and returned.

Keyword extraction and prefix finding using KeyBERT and WordNet: KeyBERT is a minimal keyword extraction technique that uses BERT-embeddings and cosine similarity to find the sub-phrases in a piece of text that are the most similar to the overall text itself. The model is loaded when the KeywordSynonyms class is instantiated. The extract_keywords() function of KeyBERT extracts and returns a List of the keywords and their similarity scores.

Once the keywords are extracted, the WordNet corpus reader in nltk is used to obtain synonyms for them. WordNet is a huge lexical database of English where words are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. These synsets are used to obtain relevant synonyms for any word.

RESTful API server with FastAPI

With this, we are set to write the code in the server/main.py file, which loads our paraphrasing, keyword extraction, and synonym finding modules, and serves them as an API endpoint using FastAPI. Basically, we spin up a server and offer the endpoint POST /paraphrase for our extension to send the user’s text input. It returns a JSON object containing the paraphrased text and the synonyms of the extracted keywords from the text as a response.

Now, on running uvicorn main:app --host=0.0.0.0 --port=8000, the server starts running on http://0.0.0.0:8000 after downloading (if not already present) and loading the models. 0.0.0.0 indicates that the server can be accessed by the loopback address 127.0.0.1, as well as through the IP address of the machine that the server is started on.

To test the endpoint, visit the interactive documentation (created automatically by FastAPI) at http://127.0.0.1:8000/docs to find the POST /predict endpoint. This can be tested by expanding the section & clicking on “Try it now”.

Testing of the POST /paraphrase endpoint in the RESTful API server by going to http://localhost:8000/docs

Having set up the API server, I proceed to build the actual extension with which the user interacts.

Developing the Edge extension

Microsoft Edge is built on Chromium. Hence, most of the stable chrome APIs (used for developing Chrome extensions) can also be used for developing Edge extensions. The complete list of supported APIs can be found in this documentation.

The overall structure of an extension is similar to that of a regular web app. The basic components that it should include are:

  • An app manifest JSON containing the configuration of the extension
  • HTML (and CSS) file(s) defining the user interface
  • JavaScript file(s) defining the functionality
  • Icon for the extension and other images as required by the extension

Preliminaries

To start, I create a new directory extension/ to maintain the files required by the extension. This folder shall contain the following:

extension/
├── assets/ # Folder to contain icons and other images
│ └── icons/ # Folder to contain all the icons
│ └── icon.png
├── popup/
│ ├── popup.html # Code for the popup UI
│ └── popup.js # Code for logic layer in the popup
└── manifest.json # App manifest file

For this extension, I used an online logo generator to create a single icon for Janus AI and saved it as extension/assets/icons/icon.png. However, one can also choose to save multiple variations of the icon based on different sizes.

Setting up the manifest

Development of extensions usually starts by creating a manifest.json file that contains all the configurations of the extension. A basic manifest file contains the name, description, version (i.e., the version of the extension), and manifest_version.

Note: With the adoption of Manifest V3, Manifest V2 is progressively being phased out. Manifest V3 and its feature-set offer greater privacy, security, and performance. Hence, the manifest_version is set to 3 for Janus AI.

Following is the manifest.json file for Janus AI:

Based on the functionality of Janus AI, several other parameters need to be present in the manifest file:

  • permissions: Specify the optional permissions based on the chrome APIs necessary to implement the extension’s functionality. For Janus AI, the chrome.tabs and chrome.scripting APIs are required.
  • host_permissions: Because our extension should be able to identify selected text from any active tab in the browser, it should discover all hosts at runtime. Thus, "https://*/" should be included in the host permissions.
  • action: This is used to control the behavior of the extension in the Edge toolbar. Within this, default_title specifies what is shown in the tooltip, default_icon specifies the path to the icon to be displayed on the Edge toolbar, and default_popup specifies the HTML file defining the UI of the popup that appears on clicking the extension.
  • icons: These specify the paths to various sizes of the extension’s icon.

Here is the full list of parameters supported by Manifest V3.

Creating the user interface

In the manifest file, the default_popup in action is set to popup/popup.html, which defines what the UI of the popup for Janus AI looks like. Below is an outline of what the UI looks like and how it behaves.

When the extension is clicked, the popup should contain:

  • An input box (or textarea) for the user to input the text to be paraphrased (either by typing or by selecting text on a tab and clicking the extension)
  • A “Rephrase” button, which when clicked, sends the input to the API server for paraphrasing

While the extension receives a response from the server, a spinner can be shown to depict that the result is loading. Once the response is received, the paraphrased text should be populated and displayed in a textarea so that it can be edited if needed. The keywords and their synonyms should also be populated and displayed to the user.

To facilitate the efficient and elegant design of the UI, Bootstrap v5.1 is used as it provides several customizable off-the-shelf UI components, without the need to write explicit CSS code.

This is what the extension/popup/popup.html file looks like:

Toward the end, we include the popup.js file, which implements the actual functionality of the extension.

Leveraging Chrome APIs to select text in the active tab

The first of the two broad categories that the functionality of Janus AI is divided into involves obtaining the text selected by the user in the browser’s active tab and populating the input textarea. This is achieved by adding the following code in the extension/popup/popup.js file:

This code adds an event listener that triggers the function to get selected text from the active browser tab when the DOM content of extension is loaded upon being clicked. This is accomplished programmatically by injecting the _getSelectedTextFromTab() function (using the chrome.scripting API) into the active tab (obtained using the chrome.tabs API).

Adding functionality to interact with the API server

The second category of functionality implemented in the popup.js file involves interacting with the API server to fetch the paraphrased text and synonyms and display the results to the user. The following code implements this functionality:

This adds a click event listener that issues a POST request to the API server for non-empty user input when the “Rephrase” button is clicked. Once the response is received by the extension, the paraphrased text is placed in the output textarea while the synonyms are dynamically displayed in a tabular format.

The extension is now fully functional and can be loaded into Microsoft Edge for testing.

Using the extension on Microsoft Edge

To load the extension into Microsoft Edge, go to the Manage extensions menu using edge://extensions/, enable the Developer mode, and select the extension/ folder after clicking on Load Unpacked.

Once loaded, the icon for Janus AI appears on the Edge toolbar. After ensuring that the API server is up and running at http://localhost:8000, you can use the extension by selecting some text on any browser tab and clicking the extension icon on the toolbar.

Working demo of Janus AI extension on Microsoft Edge

Distribution of the extension for public use

Once the extension has been developed and tested, it is ready for distribution. Before this, however, the server must be hosted online. Although it’s possible to use platforms like Heroku or set up an Azure Compute instance to host the server, I explore a more scalable and hassle-free approach that uses Docker and Azure Container Service instances.

Containerizing and hosting the server using Azure Container Service

Note: This section assumes that you have Docker and Azure CLI installed already. If not, they can be installed using the installation docs for Docker and Azure CLI.

Containerization refers to the packaging of software code with just the OS libraries and dependencies required to run the code into a single lightweight executable — called a container — that runs consistently on any infrastructure.

I use Docker to containerize the API server before hosting it using Azure Container Service. To build a Docker image of the server, I create a Dockerfile in the server/ directory with the following code:

After creating the Dockerfile, the Docker image for the server is built by issuing the docker build -t janus_ai . from within the server/ folder. The image created is tagged with the name janus_ai.

To run this image in a container on Azure, we need to create a container registry on Azure. Using Azure CLI, this can be done as follows:

# Log into Azure CLI
az login
# Create a resource group (calling it 'janus')
az group create --name janus --location eastus
# Create a container registry (calling it 'janus_cr')
az acr create --resource-group janus --name janus_cr --sku Basic

When the registry is created, a JSON output is obtained that contains the loginServer. This value should replace the <login-server> when the image is pushed to this registry:

# Log into the registry
az acr login --name janus_cr
# Tag the image with the name of the login server of the registry
docker tag janus_ai <login-server>/janus_ai
# Push the image to the registry
docker push <login-server>/janus_ai

With this, we now deploy this image to an Azure Container Service instance:

# Create a container instance
az container --resource-group janus --name janus_ai --image <login-server>/janus_ai --dns-name-label janus_ai --ports 8000

Note: The dns-name-label should be unique within the Azure region where the instance is created.

The deployment shall complete in a few moments, and the fully qualified domain name (FQDN) of the container can be found using az container show --resource-group janus --name janus_ai --out table --query “{FQDN:ipAddress.fqdn}}”.

Finally, the base_url in the extension/popup/popup.js file must be changed to the FQDN returned by the previous command.

Publish the extension on Microsoft Edge Add-ons website

Having deployed the server, the extension can be published on the Microsoft Edge Add-ons website to increase its reach and make it available to other Microsoft Edge users. The detailed steps to do so are described in this documentation.

Concluding remarks

In this article, I have illustrated how to develop a text paraphrasing extension for Microsoft Edge that is powered by state-of-the-art NLP models. I start by creating the API server with the modules for paraphrasing, keyword extraction, and synonym finding, followed by building the UI and functionality of the extension from scratch. Toward the end, I also discuss how to containerize the server to deploy it using Azure Container Services and publish the extension on the Microsoft Edge Add-ons website.

Hopefully, this article has given you a good understanding of how to leverage NLP solutions in building your own extensions for Microsoft Edge.

All the code mentioned in this article is available in this repository.

Tezan Sahu is on LinkedIn and GitHub.

References

  1. How To Paraphrase Text Using PEGASUS Transformer
  2. KeyBERT Documentation
  3. NLTK :: Sample usage for wordnet
  4. Overview of Microsoft Edge extensions — Microsoft Edge Development | Microsoft Docs
  5. Extensions — Chrome Developers
  6. Quickstart — Build a container image on-demand in Azure — Azure Container Registry | Microsoft Docs
  7. Quickstart — Deploy Docker container to container instance — Azure CLI — Azure Container Instances | Microsoft Docs

--

--

Tezan Sahu
Data Science at Microsoft

Applied Scientist @Microsoft | #1 Best Selling Author | IIT Bombay '21 | Helping Students & Professionals Ace Data Science Roles | https://topmate.io/tezan_sahu