Data Science Engine @ Cuddle.ai — Anomaly Detection Enhancement

Published in

Crux Intelligence

5 min readSep 13, 2019

Anomaly Detection

Cuddle runs complex algorithms to help users stay on top of their most important business areas. This information is presented to the users in the simplest ways possible so that every user can consume it with clarity.

One of the most commonly used and relevant concepts in the business world is anomaly detection. At Cuddle, we set out to enhance our anomaly detection algorithm to provide more relevant and actionable nudges to the users.

Here is how we went about improving the data science engine behind Cuddle.

STEP 1: Populating Test Datasets

We populated different types of time-series data with different patterns to use as test datasets. The purpose of this was to determine how well the algorithm works on different types of time-series and figure out a way to reduce the false-positive cases.

Here are a few examples of the different types of time-series used for testing:

STEP 2: Defining Success Criteria

We circulated the test datasets with a group of users (mix of analysts and non-analysts) and asked them to manually tag anomalies on different cuts of the time-series from the test datasets.

Manual tags from the users gave us perspective on what the expectations from the anomaly detection algorithm are. From these user tags, we consolidated a comprehensive set of manually tagged anomalies on each time-series from the dataset which made sense to the business users.

The idea of the tagged dataset was to compare it with the algorithms and determine which algorithm is returning the best results.

STEP 3: Business Validation

The purpose of business validation was to determine which algorithm:
1. Returns fewer false positives
2. Accurately detects anomalies
3. Makes sense from a business user perspective

We prepared a scorecard across our algorithm options and chose to penalize algorithms that returned false positives.

The team was shown outputs from the 4 different algorithms developed and engineered by Cuddle’s data science team, along with the manual tags at the same time for a given time-series and were asked to rate them.

Here are some outputs shown to the team for rating:

Scoring from -2 to +2 was done by looking at one time-series with 4 different outputs at once for each time-series in the test dataset.

After taking a collective score of each algorithm for each time-series in the test dataset, we had a clear winner and the team was convinced about deploying the new algorithm to the Cuddle platform.

STEP 4: UAT

The algorithm was deployed to the staging environment to test on other time-series which were not part of the test dataset.

The enhancement was released to a small group of people for a couple of days to get more validation about the enhanced algorithm. The algorithm was tested against real production data which the entire internal team had access to.

STEP 5: Release

The deployment is now done after extensive business validation and UAT. The algorithm is showing great results.

Key Enhancements

1. Adaptivity — Most algorithms do not account for different types of periodicities across different time-series and often they flag out anomalies that are not relevant. For example, a spike in sales on a weekend for alcohol sales in pubs.

Cuddle’s anomaly detection algorithm is now capable of understanding multiple periodicities and patterns in variance or heteroscedasticity and adapting to it.

**BEFORE — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — → AFTER**

2. Detecting Local Anomalies — Most algorithms are unable to flag local anomalies. With this enhancement, Cuddle’s anomaly detection algorithm now considers both local variance and global variance to present the most relevant anomalies as nudges.

3. Significance Test — After anomalies are detected, we have implemented a fail-safe mechanism to weed out the false-positive anomalies, these are programmatically removed from the system to provide the most refined set of nudges to our users.