Further reduce your time to AI with deep learning templates

Easily set up anomaly detection on multiple hosts without the configuration headache

Samuel Bettevy
Loud ML
Published in
6 min readDec 24, 2018

--

One of the goals of Loud ML is to make machine learning simple and accessible to everyone. That’s why, last summer, we created our TICK-L stack 1-Click ML tool — point and click machine learning right inside Chronograph (part of the InfluxData TICK stack — Telegraf, InfluxDB, Chronograf and Kapacitor, with added Loud ML machine learning).

Before we made this tool, devops had to compose a model manually — feature by feature, without any data visualization to preview input data, nor the ability to see the effect of some parameter settings, such as grouping by interval. There was also no data visualization on measurements, tag values or training time range selection either, yet can have a big impact on how the model fits the data for good predictions. Building a model from a data visualization was not a common way to do ML. I was ready for the challenge!

The summer was long and hot.

Invigorated by what we achieved over the summer (yes, we delivered 1-Click ML), we created a mini-hackathon for ourselves. Dylan Ferreira from Fusemail recently spoke of the importance of log error detection and how simply changing a view of some kernel log data alerted him to a serious problem:

All the docker containers on this host were crash looping at this really low frequency. They were crash looping at a rate that was lower than our detection alert would detect. It’s set that way because we don’t want it going off every time we deploy stuff. This ended up being a serious problem with this host that persisted for days because it was happening at such a low frequency.

In such a scenario, the syslog counters will show an abnormal pattern. We know that an unusual count of log messages (whether notice, info, error, etc) from a given host machine is a bad indicator. Good! This gave us an idea — to create ML templates for that purpose across all our hosts.

We wanted to enable our users to easily create models from a standard data format for typical use-cases. Put another way, if I were a devops guy, how many Mechanical Turks would I need to pay to create my growing number of models? Our 1-Click ML was crying out for a more convenient way.

One slow and lonely weekend later, the idea became a smart and simple feature, fitting perfectly with our goal to make ML accessible to all.

Let’s take a look.

Basic setup: Run all five Docker containers in less than five minutes

We’re using the TICK-L stack throughout this tutorial. To run all five containers (Telegraf, InfluxDB, Chronograf, Kapacitor and 1-Click ML) at once, we’ll use the Docker compose file available on Github.

First, clone the public Loud ML repository, and change the directory to docker/compose. Then edit the notification settings and optionally the database settings if you need to.

All set, then let’s start the containers.

docker-compose up

Open the browser to the URL: localhost:8888

The default database contains no data. We’ll need to set up syslog message data collection before we can use it for machine learning! You can follow this post by David Simmons if you need help with this.

“So, the question is, what will you be able to do now that you can visualize your syslog data in Chronograf?” Good question, David. We’ll detect anomalies!

Create models with Loud ML model builder

Click on the Loud ML logo on the sidebar of the Chronograf application (localhost:8888).

The screen will display ‘0 models’ and prompt you to click on either Create model manually (you create it yourself) or Use model builder (pre-filled).

The model builder provides a step-by-step configuration to create one or more models based on standard data schema and templates.

First, we need to define a model name prefix. This will be used to help you to manage your models, so choose a meaningful name. Let’s define ‘Syslog’.

The next step is choosing a model template. Every data science journey starts with a good schema: data must be clean, well organized, and structured, and that’s exactly what Telegraf — the OSS tool — provides by storing all metrics and logs in the telegraf database: no data silos; all relevant information in one place.

Loud ML provides basic templates that should not only fit your needs, but also meet those high standards. We’ll select the Syslog message template.

Our last step is to choose values for tagged data. In our template for the Syslog model, the host tag is used to filter data for each model. In the model builder, all your host values are fetched from the database, and you can choose which ones to include in your template. The model builder will create one model per host value.

Leave all the values selected, click on the Create models button, and fasten your seatbelts: the fun is about to begin!

Loud ML will start training on the past seven days of syslog data history.

Each model will automatically run an inference job after successful training. If training is not successful, you will need to retrain until it is successful for anything else to happen.

What’s next?

Welcome to productivity! Your ML assistant is now working for you in the background, and it will learn how to distinguish normal and abnormal behavior.

We’ve been working hard to improve this functionality for our next release too, filtering out unnecessary noise so you can get on with what matters. It will spot abnormal data that would otherwise be missed by the average human being. It will also understand the difference between a basic warning and a serious problem requiring your intervention. Yes, we’re planning a devop’s AutoML dream with our next release!

In the meantime, there are some other time-saving options and resources that you can explore, such as:

If you enjoyed reading this article, don’t forget to applaud, and if you have any questions, write a comment. Until then, happy learning.

Photo by rawpixel on Unsplash

--

--