Building an Anomaly Detection service for Splunk Cloud Platform

Deven Navani
Splunk Engineering
Published in
6 min readSep 4, 2019

By Deven Navani, Summer 2019 Machine Learning Intern

What is the right approach to identifying anomalies in multivariate time-series data?

There isn’t necessarily a correct way, because anomaly detection is (more often than not) an unsupervised machine learning task. Unlike tasks such as image classification, there is no right or wrong answer to classifying a point as an anomaly. How exceptional should an anomaly be? What percentage of the dataset should be labeled as anomalous? The answers to these questions are a function of user preference and use case.

Why is that? Some users may want our service to mark data points with even the slightest deviation from the norm. This may mean more false positives, but this would be desirable in a high-stakes scenario (think irregular heartbeat detection).

Two considerations motivated our approach to building an Anomaly Detection service for Splunk Cloud’s ML API service:

  1. User accessibility
  2. Tried and tested machine learning methods

In the next two sections, I’ll describe our approach, and I’ll wrap up with some thoughts regarding implementation.

User Accessibility

First and foremost, an anomaly detection service should be accessible to as wide a developer population as possible. By accessible, we mean that using the service requires little to no ML knowledge.

Given that not every Splunk user is a machine learning engineer or a data scientist, we decided to abstract away as much difficulty as possible and expose only a single parameter: sensitivity.

The sensitivity parameter is a real number between 0 and 1, exclusive. If a Splunk user specifies a greater sensitivity value, our service will return more anomalies. In a sense, a higher sensitivity means it is “easier” for a point to be labeled as an anomaly. Our algorithm’s default sensitivity value is 0.15 (more on that later).

The highlight here is that we have dimensionally reduced the inputs our users have at their disposal. Within the Machine Learning Toolkit (MLTK), a separate service that is a machine learning extension to Splunk Enterprise, users are expected to run commands as shown in the image below:

Algorithms Splunk MLTK users can utilize for anomaly detection

The commands above present two challenges to those seeking to perform anomaly detection without the necessary ML knowledge:

  1. Specifying the model (e.g. LocalOutlierFactor) — which model is best for the user’s data? As we’ll mention below, the math behind an algorithm makes it suitable for specific scenarios.
  2. Tuning hyperparameters — once the model is picked, how should the user set the hyperparameters? For example, if the user wants more sensitive anomaly detection, how should he or she manipulate the leaf_size parameter in the LocalOutlierFactor constructor?

Now, there actually is an anomalydetection command within MLTK, and its creation was very much motivated by user accessibility. However, the brain of this command is a single algorithm — DensityFunction. While this work meets our first requirement, it doesn’t quite make use of the most recent research behind anomaly detection.

Science

I didn’t use the phrase “state-of-the-art” because that implies superior performance. As we discussed above, performance is difficult to measure when it comes to anomaly detection. Datasets don’t come with points that are inherently anomalies.

Using a single algorithm is flawed because any algorithm influences final results with its bias. For example, LocalOutlierFactor is better at identifying local anomalies because it only compares the score of abnormality of one sample with the scores of its neighbors. OneClassSVM performs exceptionally well with sparse data.

The question now becomes, how do we eliminate these biases? We use an ensemble. Ensemble learning is a popular ML technique because the aggregation of multiple models is often less noisy and more robust than any individual model.

Ensemble learning for anomaly detection purposes has increased in popularity since the publication of the book Outlier Ensembles — An Introduction by Charu C. Aggarwal and Saket Sathe in 2017.

An important consensus we gleaned from this book and various papers is that ensembling several anomaly detection algorithms “blunts” the performance of the most suitable model for a given dataset. However, for high, versatile performance across a multitude of datasets and scenarios, ensembling has emerged as the most dependable approach.

Once we decided on pursuing an ensemble technique, we went with the most straightforward ensembling approach: a simple average.

Our final algorithm is detailed in the diagram below:

Algorithm overview

Each row in the table on the left represents an event in the dataset. The numbers in the first four columns of the table represent anomaly scores we have normalized to the range [0,1]. Anomaly scores are a measure of how anomalous an event is — the closer to 1, the more anomalous. The last column of the table consists of the averages of these anomaly scores.

We compare these averages to a threshold, which we have defined as 1 — sensitivity. Since the default sensitivity is 0.15, the default threshold is 0.85. We arrive at 0.15 as the default by tuning the sensitivity parameter until average performance metrics (f1 / precision / accuracy / recall) were maximized across all our available labelled datasets. For an event to be labeled as an anomaly, its average anomaly score must pass the threshold. This makes sense mathematically — if a user wants a more sensitive anomaly detection service, the threshold for an event to be labeled as an anomaly should be lower.

We return a final predictions table to our user, where 1s represent anomalies.

Implementation

First, some definitions. A workflow is a reusable configuration of task(s), and a build is an execution of workflow tasks on a specific dataset. Our proof of concept works by running four ML API workflows (one for each algorithm) in parallel. Here’s how this looks for the LocalOutlierFactor algorithm, using the Python SDK for ML API:

#### LOF ####LOF_OUTPUTSOURCE = 'dnavani-lof'LOF_events_output = OutputData(kind='Events', destination=Events(sourcetype=SOURCETYPE, source=LOF_OUTPUTSOURCE))LOF_task = FitTask(algorithm='LocalOutlierFactor', fields=Fields(features=features_to_use), parameters={})workflow = Workflow(tasks=[LOF_task])LOF_workflow = custom_scloud.ml.create_workflow(workflow=workflow)wfbuild = WorkflowBuild(input=spl_input, output = LOF_events_output)LOF_wfbuild = custom_scloud.ml.create_workflow_build(id=LOF_workflow.id, workflow_build=wfbuild)#### LOF ####

You would run the exact same code as above for each of the other algorithms, except with variable name changes.

Any user of our Python SDK can replicate the implementation of our ensemble algorithm. For each algorithm, create a FitTask, create a workflow with this task, and run the workflow build. After this, the scores returned from each algorithm can be averaged and compared to a threshold, as so:

def scoring_system(row):
if row['average'] > scoring_system_threshold:
return 1
return -1
scoring_system_threshold = 1 - SENSITIVITY
scores['average'] = scores.mean(axis=1) # create avg. scores column
predictions = scores.apply(lambda x: scoring_system(x), axis=1)
print(predictions)

Full integration into Splunk Cloud’s MLAPI service, however, requires a few more steps. We don’t want our users to have to create four separate workflows. Our ensemble technique would work best with four tasks under a single workflow.

As MLAPI development continues and the product matures from a beta release, workflows will support graph logic, as opposed to just a single task or list of tasks. In our case, the logic of our workflow would function as a MapReduce computational graph, with our four sub-algorithms as the “mappers” and our averaging / ensemble step as the “reducer”. Our users will only have to create one workflow.

ML API’s future is very exciting, and I’m looking forward to its versatility in supporting complex algorithm designs such as ours.

Deven Navani is a Machine Learning intern on Splunk’s MLAPI team within SCP. He is a sophomore in UC Berkeley’s M.E.T. program, where he is majoring in EECS and Business.

--

--