Easy Machine Learning Tricks for Linux Developers in 2018

Sébastien Léger
Loud ML
Published in
5 min readJan 28, 2018

How deep learning becomes easy to integrate with your application

Learning ML in 5 minutes

I love this Matrix scene where you can learn Kung fu in the blink of an eye! It would be cool if learning and embedding machine learning in your application was that easy... Well, now it is!

This application can really be your application, so let’s do it with “Kung fu” style!

Loud ML is your best Kung fu trick

This is 2018 and you can find machine learning APIs available widely, both with and without cloud service connectivity. We will use today an API named Loud ML (www.loudml.io) that can be installed on your favorite Linux host.

It facilitates deep learning usage in many ways:

  • It is data source agnostic (it will connect to all major NoSQL databases) which is cool because learning requires a lot of historical data: pulling data and formatting the data to apply machine learning is no longer a pain.
  • The API is well-documented with a CLI and REST endpoints so it can be controlled remotely
  • It is open source

Getting started: First five minutes with Loud ML

The package is currently available in RPM, in DEB (for Debian stretch) format, and Docker, so you can install it on most Linux distributions using standard tools. For example, in EL7:

$ yum install loudml

Or if you prefer, using Docker:

$ docker pull loudml/community

After installation, find the configuration file located in /etc/loudml/config.yml

You must declare the data sources i.e. where to read the data and how to connect to NoSQL or SQL databases.

Loud ML 1.5.0 release already supports popular databases InfluxDb and Elasticsearch, MongoDb, and you can contribute to add your own.

To configure InfluxDb, define the name and address to connect to the database:

---
datasources:
- name: my-influx-datasource
type: influxdb
addr: <host>:<port>
database: <your database name, e.g. telegraf>

To configure Elasticsearch, define the address of your node and name of the index (aka index pattern) to pull the data:

---
datasources:
- name: my-elastic-datasource
type: elasticsearch
addr: <host>:<port>
index: <your index name or index pattern>
doc_type: doc

Your first predictive model

Say your InfluxDb database (or your Elasticsearch indexes) contains CPU measurements for the server hosting your web application, let’s name them cpu_load, and you have 30 days of history with 1-minute resolution.

Your first model will predict a single feature avg_cpu_load, and will:

  • average data over five-minute intervals (the bucket_interval); and
  • assume the last twelve bucket intervals (span=12, so, in total 2 hours) will be used to guess the next cpu_load value.

Let’s create this model using the CLI.

First, you must write a file that describes your model. This file can be either JSON or YAML.

We will define in the model.yml file a single ‘feature’ to learn the shape of the average cpu load using 5 minutes bucket intervals:

---
name: my-timeseries-model
type: donut
default_datasource: my-influx-datasource
# Size of buckets for data aggregation and prediction
bucket_interval: 5m
# Number of preceding buckets required to predict the next bucket
span: 12
interval: 60s
offset: 30
max_evals: 20
features:
- name: avg_cpu_load
measurement: system
metric: avg
field: cpu_load
default: 0

And now, let’s create this first model:

$ loudml create-model model.yml

The Kung fu lesson is about to begin. By training this model, it will learn how the data evolves over time. You can think of deep learning as a way to approximate *any* function.

The model training is launched with the following command:

$ loudml train <model_name> --from <from_date> --to <to_date>

Accepted date formats are:

  • UNIX timestamp in seconds
  • ISO 8601 format, example: 2018–01–26T16:47:25Z
  • Relative date, example: now-20s, now-45m, now-3h, now-1d, now-3w…

We will train the above model using a 30 days history:

$ loudml train my-timeseries-model --from now-30d --to now

When it’s done, the command will report the actual error loss: the lower the better!

To show additional model information, you can run:

$ loudml show-model <model_name>

It is show time! Predictive capabilities

You know Kung fu, so let’s practice. You’ve trained a model. Now, you can make this model output prediction for avg_cpu_load on a regular interval. This output may be written to another data source, your application, or merely stdout, and compared to the actual values for anomaly detection.

You can enable the loudmld service to execute in the background; if you are running EL7 the system command will be:

$ systemctl enable loudmld.service
$ systemctl start loudmld.service

The loudmld process exposes an HTTP API that you can control using curl. To start predicting future avg_cpu_load values, you can issue this curl command:

curl -X POST http://localhost:8077/models/my-timeseries-model/_start?save_prediction=true

It will tell the loudmld process to wake up at periodic intervals (interval in config.yml — don’t confuse with bucket_interval), pull the data from your data source, and finally output the predicted values to the time series database.

The original data in purple; the predictions in green

If you’re using InfluxDb, you can visualize the result using Chronograf. The prediction is stored in the same database under the measurement called prediction_<model_name> and must be displayed using a GROUB BY time(bucket_interval) query.

Hopefully, this first Kung fu lesson has been a success!

Loud ML documentation and source code are available on Github (https://github.com/regel/loudml) and online at www.loudml.io/guide

Also, you can watch the June 6 video recording for an interactive demo and more information: https://register.gotowebinar.com/recording/1761723683175604488

If you enjoyed reading this article, don’t forget to applaud, and if you have any questions, write a comment. Until then, happy learning.

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by 296,127+ people.

Subscribe to receive our top stories here.

--

--

Sébastien Léger
Loud ML
Editor for

Sébastien is passionate about artificial intelligence, which inspired him to build Loud ML — software that enables engineers to implement AI-driven applications