“Nobody is Perfect” This quote not just applies to us humans but also the data that surrounds us. Any data science practitioner needs to understand all of the imperfections present in the data and handle them accordingly in order to get the desired results. Once such imperfection is the inherent Class Imbalance which is highly prevalent in most of the real world datasets. In this blog we will cover different Sample Weighting schemes that can be applied to any Loss Function in order to cater to the Class Imbalance present in your data.

What is the Class Imbalance Problem?

The Class Imbalance problem is a problem…


The world we live in is not a just world. It is infected by different kinds of bias, be it Gender Bias or Racial Bias. More recently, the world was shocked by the tragic news of George Floyd’s death due to extreme police brutality. This brought issues like Systemic Racism, unconscious bias, Racial and Gender gap at the focus for many people, organizations, and nations. This blog talks about what we at GumGum can do to bring change by utilizing our Natural Language Processing technology to shed light on potential bias that websites may have in their content. …


It is of extreme importance that one understands the different evaluation metrics and when to use them. Evaluating your model on inadequate metrics and then judging your model based on the improvements achieved on these metrics is a huge trap. Often, especially in the Industry, these metrics are indicators for productionization of newer models. Therefore, as a Data Scientist, one should be aware of the pros and cons of different evaluation metrics in order to avoid falling in to this trap.

Evaluating a Keyword Extraction model is not as straightforward as it is to evaluate a model for a Classification


MARKETWATCH PHOTO ILLUSTRATION/ISTOCKPHOTO, JOEBIDEN.COM

With the 2020 U.S. Presidential election approaching, talk of who Democratic nominee Joe Biden will tap as his VP candidate has intensified. Given the large volume of publisher inventory that we have access to for brand safety and contextual analysis, we decided to dig into the “Veepstakes” and analyze the frequency of mentions and dominant document sentiment among the most commonly listed potential running mate candidates.

For this analysis, we mined data from Verity, our Contextual Intelligence platform that leverages Computer Vision and Natural Language Processing, to track the mentions of Joe Biden + each of the potential running mates…


Contextual Brand Safety Cover picture
Contextual Brand Safety Cover picture

Contextual brand safety is an ongoing series. This is the second blog in this series. Through this series, we talk about steps to be taken to do multi-label text classification in the industry. This blog post talks about model training and evaluation.

1. Introduction

Brand safety is an important offering of GumGum. Contextual Brand Safety-I talks about the problem and data preprocessing techniques in depth. In this blog post, we will discuss model training, evaluation and steps to production.

2. Experimental setup

We set up a multi-step mlflow project tracking system to track and store artifacts across each step i.e,

  1. Data loading and preprocessing (1…


Image Source

Continuing the series of blogs on different keyword extractors, this blog brings us to the Graph Based approaches. We will cover what inspired the researchers to start exploring a graphical solution for Keyword Extraction and we will then discuss the four Graph Based approaches (TextRank, SingleRank, TopicRank and PositionRank). If you would like to read up on different Statistical Approaches, please refer to the first blog in this series.

Introduction — Graph Based Approaches

All the graph based approaches employ a ranking algorithm like HITS or PageRank. These algorithms compute the importance of a vertex in the graph. …


Image Source

Exploring Different Keyword Extractors is an ongoing series which contains a total of three blogs. This blog is the first in this series. It provides an introduction to Keyword Extraction and why it is important. I also go into the details of three Statistical approaches for Keyword Extraction. The second blog will cover four graph-based Approaches for Keyword Extraction and the third one will cover different Evaluation Metrics and a comparison of different statistical and Graph Based approaches.

Introduction

The pace with which the data and the information generated has been growing, makes summarizing it a challenge. According to Netcraft’s January…


In this blog we will look at the impact of Covid-19 based on GumGum’s publisher network from January 10th 2020 to April 8th 2020. I utilized GumGums AI capabilities to classify all the web pages into different IAB categories to see how different IAB categories were impacted by Covid-19. Links to the interactive versions for each of the graphs in this blog are also present.

Data Collection

I queried GumGum’s database to collect the processed data from the traffic of English Webpages seen by GumGum over the course of January, February, March and April. Due to the high data volume of GumGums…


Contextual brand safety is an ongoing series. This is the first blog in this series. Through this series, we talk about steps to be taken to do multi-label text classification in the industry. This blog post sets the stage by talking about the problem and data collection.

Introduction

GumGum is dedicated to ensuring a brand-safe environment to all our clients; advertisers and publishers alike. In order to implement brand safety, we have a variety of methods which assist in ensuring that we deliver ads on safe, relevant and high-quality content.

One of the most important measures taken to ensure brand safety…


With the advent of cloud and Serverless technologies the number of architectural options available in building systems have increased by many fold. I see many teams struggling to make decisions on whether to use serverless or containers based systems in the organization. My objective is to present you one real life case we encountered at GumGum and the thought process behind choosing containers over serverless.

Verity - Headless Content Extractor

GumGum takes pride in its contextual intelligence capabilities. Our proprietary contextual intelligence API Verity uses Natural Language Processing in combination with Computer Vision to derive context and brand safety information for a webpage. …

GumGum Tech Blog

Thoughts from the GumGum tech team

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store