“With great models comes slower inference speeds”.

Deep Learning has evolved immensely and it has Transforme(r)d NLP completely in the past 5 years. Although these models do achieve state of the art results on various NLP tasks, the models are really big and slow. Bringing these models to production can be a pain because of their large memory footprint and slow speeds. At GumGum, we spend a considerable amount of time building models that are not just accurate but are also lean and fast. Verity, which is the engine that powers our contextual targeting capabilities, requires models that can scale…


“Nobody is Perfect” This quote not just applies to us humans but also the data that surrounds us. Any data science practitioner needs to understand all of the imperfections present in the data and handle them accordingly in order to get the desired results. Once such imperfection is the inherent Class Imbalance which is highly prevalent in most of the real world datasets. In this blog we will cover different Sample Weighting schemes that can be applied to any Loss Function in order to cater to the Class Imbalance present in your data.

What is the Class Imbalance Problem?

The Class Imbalance problem is a problem…


The world we live in is not a just world. It is infected by different kinds of bias, be it Gender Bias or Racial Bias. More recently, the world was shocked by the tragic news of George Floyd’s death due to extreme police brutality. This brought issues like Systemic Racism, unconscious bias, Racial and Gender gap at the focus for many people, organizations, and nations. This blog talks about what we at GumGum can do to bring change by utilizing our Natural Language Processing technology to shed light on potential bias that websites may have in their content. …


It is of extreme importance that one understands the different evaluation metrics and when to use them. Evaluating your model on inadequate metrics and then judging your model based on the improvements achieved on these metrics is a huge trap. Often, especially in the Industry, these metrics are indicators for productionization of newer models. Therefore, as a Data Scientist, one should be aware of the pros and cons of different evaluation metrics in order to avoid falling in to this trap.

Evaluating a Keyword Extraction model is not as straightforward as it is to evaluate a model for a Classification


Image Source

Continuing the series of blogs on different keyword extractors, this blog brings us to the Graph Based approaches. We will cover what inspired the researchers to start exploring a graphical solution for Keyword Extraction and we will then discuss the four Graph Based approaches (TextRank, SingleRank, TopicRank and PositionRank). If you would like to read up on different Statistical Approaches, please refer to the first blog in this series.

Introduction — Graph Based Approaches

All the graph based approaches employ a ranking algorithm like HITS or PageRank. These algorithms compute the importance of a vertex in the graph. …


Image Source

Exploring Different Keyword Extractors is an ongoing series which contains a total of three blogs. This blog is the first in this series. It provides an introduction to Keyword Extraction and why it is important. I also go into the details of three Statistical approaches for Keyword Extraction. The second blog will cover four graph-based Approaches for Keyword Extraction and the third one will cover different Evaluation Metrics and a comparison of different statistical and Graph Based approaches.

Introduction

The pace with which the data and the information generated has been growing, makes summarizing it a challenge. According to Netcraft’s January…


In this blog we will look at the impact of Covid-19 based on GumGum’s publisher network from January 10th 2020 to April 8th 2020. I utilized GumGums AI capabilities to classify all the web pages into different IAB categories to see how different IAB categories were impacted by Covid-19. Links to the interactive versions for each of the graphs in this blog are also present.

Data Collection

I queried GumGum’s database to collect the processed data from the traffic of English Webpages seen by GumGum over the course of January, February, March and April. Due to the high data volume of GumGums…

Ishan Shrivastava

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store