: Open Source Anomaly Detection

Mentat Innovations
Published in
3 min readJan 30, 2018

We are proud to launch the very first version of our open-source project for Anomaly Detection and Behavioural Profiling on data-streams, (dsio on github).

We have a long roadmap ahead of us, but, release often and release early, as they say. So here it is — a minimal viable full-stack Python anomaly detector:

pip install -e git+


The purpose of the project is to perform the following functions:

  • Consume data from a variety of file and stream formats.
  • Transform data streams on the fly to derive statistics of interest such as aggregations, counts, sessions, groupings, or extract features.
  • Model the resulting stream via unsupervised machine learning to capture normal baseline behaviour either globally, or at the level of a device/user.
  • Score every new event by comparing it to the baseline model.
  • Visualise anomalous events on a lightweight customisable dashboard, with a lightweight back-end, involving minimal fuss by the user.

In the spirit of a minimal first release, we start by supporting consumption from CSV files, filtered by column, a couple of basic modelling and scoring options, followed by visualisation via an Elastic-Kibana solution involving a dashboard which is auto-generated in accordance to the column names.

Bring-your-own detector

Those of you that read our previous post know that we are about to unleash some pretty powerful anomaly detection models in this project. But like any open-source project, our main ambition is to create a platform. So for the first release, we have offered two basic example detectors (see below), as a template for you to build your own! All you need to do is support some basic interfaces, like a way to update your model, a way to train it from scratch (this addresses the cold start problem), and a way to detect anomalies, which often will often involve a threshold on a scoring function that numerically describes how likely each new event appears in comparison to the model.

You can try one of our own detectors from the command line like this:

dsio --detector gaussian1d examples/data/cardata_sample.csv

to run against a sample dataset comprising IoT measurements from a car. But if you’d like to write your own, just add your module and run instead:

dso --modules examples/ --detector Percentile1D examples/data/cardata_sample.csv

Here is the result:

We are looking forward to your feedback and contributions! We will be adding exciting contributions from our friends and colleagues in UK academia and industrial partners. in action Kibana dashboard

