Real-time machine learning inference at scale has become an essential part of modern applications. GumGum’s Verity engine powers the industry’s most sophisticated contextual targeting product by analyzing thousands of digital content every second around the clock. This is a challenging undertaking that requires deploying deep learning models using an event-driven streaming architecture on an elastic cloud-native cluster.
At GumGum, we use Apache Kafka’s high throughput and scalable streaming platform to connect various components of our machine learning pipelines. Up until recently, we deployed the underlying inference micro-services solely on Amazon ECS, which is a great choice due to its security…
Contextual brand safety is an ongoing series. This is the second blog in this series. Through this series, we talk about steps to be taken to do multi-label text classification in the industry. This blog post talks about model training and evaluation.
Brand safety is an important offering of GumGum. Contextual Brand Safety-I talks about the problem and data preprocessing techniques in depth. In this blog post, we will discuss model training, evaluation and steps to production.
We set up a multi-step mlflow project tracking system to track and store artifacts across each step i.e,
In the previous post we talked about using a multilabel classifier for threat classification in order to address problems when the threat in the image is not the salient object. In this post, we are going to discuss using Exploratory Data Analysis (EDA) for our multilabel dataset and its results. Since our model needs to analyze millions of images, keeping the inference times of our model low is of utmost priority for better scalability. In that regard, we also present results on using mixed precision training for our EfficientNet based threat classifier. Since this is a proprietary dataset, the specifics…
Sanja Stegerer has been working at GumGum as a Data Scientist for past two years. She has worked on various parts of our Natural Language Processing systems. In the talk below, she explains one of the most important systems in GumGum’s NLP systems — The Threat Classification System. The Threat Classification System allows us to identify whether a page is suitable for advertising or not. If a page content has violence, sexual content, illegal acts, disaster etc. Advertisers do not want to show their ads on the page. In this talk, Sanja explains how exactly we make machines identify these…
At GumGum, providing a brand safe environment for our advertisers is of utmost priority. In order to achieve this, the publisher’s inventory is scanned through to avoid ad misplacement. As CV scientists we build systems that can detect and classify threats if present in the publisher’s inventory, which could be images and/or videos. In order to detect and classify these threats, convolutional neural network based image classification algorithms are employed. A conventional multiclass image classifier can often times work well when an object under consideration is the only one in the image or occupies a large enough area of the…
Written by Sanja Stegerer on April 4, 2018
Sanja Stegerer has been working at GumGum as a Data Scientist for past two years. She has worked on various parts of our Natural Language Processing systems. In the talk below, she explains one of the most important systems in GumGum’s NLP systems — The Threat Classification System. The Threat Classification System allows us to identify whether a page is suitable for advertising or not. If a page content has violence, sexual content, illegal acts, disaster etc. Advertisers do not want to show their ads on the page. In this talk, Sanja…
On May 28, 2019, myself (Greg Chu) and Corey Gale presented a talk titled “How GumGum Serves its CV at Scale” at the LA Computer Vision Meetup in GumGum’s Santa Monica office.
Given the rapidly growing utility of computer vision applications, how do we deploy these services in high-traffic production environments to generate business value? Here we present GumGum’s approach to building infrastructure for serving computer vision models in the cloud. We’ll also demo code for building a car make-model detection server.
A Presentation and Meetup discussion by Divyaa Ravichandran and Sanja Stegerer
About the authors:
Sanja Stegerer is an NLP Scientist and has been with GumGum, for 3 years now.
Divyaa Ravichandran has been a Computer Vision Scientist at GumGum for the past 2 years, and has been in the field for almost 3 years now.
As machine learning engineers, the CV and NLP teams in GumGum work towards improving GumGum’s existing CV and NLP capabilities, developing solutions for new advertising campaigns and maintaining code in a production environment.
We recently (15th May 2019) presented a Meetup talk regarding a product…
Written by Cambron Carter on June 5, 2018
GumGum recently hosted Dr. Genquan Stone Duan and Andrew Pierno of WiZR at our LA Computer Vision Meetup. This presentation is an interactive exploration of WiZR’s machine learning and computer vision infrastructure, which is used to provide real-time analytics for the purposes of security and surveillance. Anyone interested in embedded machine learning and computer vision should check out this technical deep-dive.
We’re always looking for new talent! View jobs.
Written by Iris Fu and Divyaa Ravichandran, Computer Vision Scientists, on April 19, 2018
GumGum Sports processes enormous amounts of media each day. They come from a variety of sources and forms, including social media posts and broadcast/streaming videos. Our goal is to identify media that is relevant to our clients to estimate the value of their sponsorships and placements.
The challenge is in processing massive volumes of posts in a short period of time. For example, to estimate the value of each brand’s exposure in a basketball game, we would need to consider all of the available data and…
Thoughts from the GumGum tech team