Easy Machine Learning Tricks for Linux Developers in 2018
How deep learning becomes easy to integrate with your application
Learning ML in 5 minutes
I love this Matrix scene where you can learn Kung fu in the blink of an eye! It would be cool if learning and embedding machine learning in your application was that easy... Well, now it is!
This application can really be your application, so let’s do it with “Kung fu” style!
Loud ML is your best Kung fu trick
This is 2018 and you can find machine learning APIs available widely, both with and without cloud service connectivity. We will use today an API named Loud ML (www.loudml.io) that can be installed on your favorite Linux host.
It facilitates deep learning usage in many ways:
- It is data source agnostic (it will connect to all major NoSQL databases) which is cool because learning requires a lot of historical data: pulling data and formatting the data to apply machine learning is no longer a pain.
- The API is well-documented with a CLI and REST endpoints so it can be controlled remotely
- It is open source
Getting started: First five minutes with Loud ML
The package is currently available in RPM, in DEB (for Debian stretch) format, and Docker, so you can install it on most Linux distributions using standard tools. For example, in EL7:
$ yum install loudml
Or if you prefer, using Docker:
$ docker pull loudml/community
After installation, find the configuration file located in /etc/loudml/config.yml
You must declare the data sources i.e. where to read the data and how to connect to NoSQL or SQL databases.
Loud ML 1.5.0 release already supports popular databases InfluxDb and Elasticsearch, MongoDb, and you can contribute to add your own.
To configure InfluxDb, define the name and address to connect to the database:
---
datasources:
- name: my-influx-datasource
type: influxdb
addr: <host>:<port>
database: <your database name, e.g. telegraf>
To configure Elasticsearch, define the address of your node and name of the index (aka index pattern) to pull the data:
---
datasources:
- name: my-elastic-datasource
type: elasticsearch
addr: <host>:<port>
index: <your index name or index pattern>
doc_type: doc
Your first predictive model
Say your InfluxDb database (or your Elasticsearch indexes) contains CPU measurements for the server hosting your web application, let’s name them cpu_load, and you have 30 days of history with 1-minute resolution.
Your first model will predict a single feature avg_cpu_load, and will:
- average data over five-minute intervals (the bucket_interval); and
- assume the last twelve bucket intervals (span=12, so, in total 2 hours) will be used to guess the next cpu_load value.
Let’s create this model using the CLI.
First, you must write a file that describes your model. This file can be either JSON or YAML.
We will define in the model.yml file a single ‘feature’ to learn the shape of the average cpu load using 5 minutes bucket intervals:
---
name: my-timeseries-model
type: donut
default_datasource: my-influx-datasource# Size of buckets for data aggregation and prediction
bucket_interval: 5m
# Number of preceding buckets required to predict the next bucket
span: 12interval: 60s
offset: 30
max_evals: 20features:
- name: avg_cpu_load
measurement: system
metric: avg
field: cpu_load
default: 0
And now, let’s create this first model:
$ loudml create-model model.yml
The Kung fu lesson is about to begin. By training this model, it will learn how the data evolves over time. You can think of deep learning as a way to approximate *any* function.
The model training is launched with the following command:
$ loudml train <model_name> --from <from_date> --to <to_date>
Accepted date formats are:
- UNIX timestamp in seconds
- ISO 8601 format, example: 2018–01–26T16:47:25Z
- Relative date, example: now-20s, now-45m, now-3h, now-1d, now-3w…
We will train the above model using a 30 days history:
$ loudml train my-timeseries-model --from now-30d --to now
When it’s done, the command will report the actual error loss: the lower the better!
To show additional model information, you can run:
$ loudml show-model <model_name>
It is show time! Predictive capabilities
You know Kung fu, so let’s practice. You’ve trained a model. Now, you can make this model output prediction for avg_cpu_load on a regular interval. This output may be written to another data source, your application, or merely stdout, and compared to the actual values for anomaly detection.
You can enable the loudmld service to execute in the background; if you are running EL7 the system command will be:
$ systemctl enable loudmld.service
$ systemctl start loudmld.service
The loudmld process exposes an HTTP API that you can control using curl. To start predicting future avg_cpu_load values, you can issue this curl command:
curl -X POST http://localhost:8077/models/my-timeseries-model/_start?save_prediction=true
It will tell the loudmld process to wake up at periodic intervals (interval in config.yml — don’t confuse with bucket_interval), pull the data from your data source, and finally output the predicted values to the time series database.
If you’re using InfluxDb, you can visualize the result using Chronograf. The prediction is stored in the same database under the measurement called prediction_<model_name> and must be displayed using a GROUB BY time(bucket_interval) query.
Hopefully, this first Kung fu lesson has been a success!
Loud ML documentation and source code are available on Github (https://github.com/regel/loudml) and online at www.loudml.io/guide
Also, you can watch the June 6 video recording for an interactive demo and more information: https://register.gotowebinar.com/recording/1761723683175604488
If you enjoyed reading this article, don’t forget to applaud, and if you have any questions, write a comment. Until then, happy learning.