Easy Machine Learning Tricks for Linux Developers in 2018
How deep learning becomes easy to integrate with your application
Learning ML in 5 minutes
I love this Matrix scene where you can learn Kung fu in the blink of an eye! It would be cool if learning and embedding machine learning in your application was that easy... Well, now it is!
This application can really be your application, so let’s do it with “Kung fu” style!
Loud ML is your best Kung fu trick
This is 2018 and you can find machine learning APIs available widely, both with and without cloud service connectivity. We will use today an API named Loud ML (www.loudml.io) that can be installed on your favorite Linux host.
It facilitates deep learning usage in many ways:
- It is data source agnostic (it will connect to all major NoSQL databases) which is cool because learning requires a lot of historic data: pulling data and formatting the data to apply machine learning is no longer a pain.
- The API are well-documented with a CLI and REST endpoints so it can be controlled remotely
- The community edition is free!
Getting started: First five minutes with Loud ML
The package is currently available in RPM, and in DEB (for Debian stretch) format, so you can install it on most Linux distributions using standard tools. For example, in EL7:
$ yum install loudml loudml-influx
After installation, find the configuration file located in /etc/loudml/config.yml
You must declare the data sources i.e. where to read the data and how to connect to NoSQL databases.
The Loud ML 1.1 beta release already supports popular databases InfluxDb and Elasticsearch.
To configure InfluxDb, define the name and address to connect to the database:
- name: my-influx-datasource
database: <your database name, e.g. telegraf>
To configure Elasticsearch, define the address of your node and name of the index (aka index pattern) to pull the data:
- name: my-elastic-datasource
index: <your index name or index pattern>
Your first predictive model
Say your InfluxDb database (or your Elasticsearch indexes) contains CPU measurements for the server hosting your web application, let’s name them cpu_load, and you have 30 days of history with 1 minute resolution.
Your first model will predict a single feature avg_cpu_load, and will:
- average data over five-minute intervals (the bucket_interval); and
- assume the last three bucket intervals (span=3, so, in total 15 minutes) will be used to guess the next cpu_load value.
Let’s create this model using the CLI.
First, you must write the file that describe your model. This file can be either JSON or YAML.
We will define in model.yml file a single ‘feature’ to learn the shape of the average cpu load using 5 minutes bucket intervals:
# Size of buckets for data aggregation and prediction
# Number of preceding buckets required to predict the next bucket
- name: avg_cpu_load
And now, let’s create this first model:
$ loudml create-model model.yml
The Kung fu lesson is about to begin. By training this model, it will learn how the data evolves over time. You can think of deep learning as a way to approximate *any* function.
The model training is launched with the following command:
$ loudml train <model_name> --from <from_date> --to <to_date>
Accepted date formats are:
- UNIX timestamp in seconds
- ISO 8601 format, example: 2018–01–26T16:47:25Z
- Relative date, example: now-20s, now-45m, now-3h, now-1d, now-3w…
We will train the above model using a 30 days history:
$ loudml train my-timeseries-model --from now-30d --to now
When it’s done, the command will report the actual error loss: the lower the better!
To show additional model information, you can run:
$ loudml show-model <model_name>
It is show time! Predictive capabilities
You know Kung fu, so let’s practice. You’ve trained a model. Now, you can make this model output predictions for avg_cpu_load on a regular interval. This output may be written to another data source, your application, or merely stdout, and compared to the actual values for anomaly detection.
You can enable the loudmld service to execute in the background; if you are running EL7 the system command will be:
$ systemctl enable loudmld.service
$ systemctl start loudmld.service
The loudmld process exposes a HTTP API that you can control using curl. To start predicting future avg_cpu_load values, you can issue this curl command:
It will tell the loudmld process to wake up at periodic intervals (interval in config.yml — don’t confuse with bucket_interval), pull the data from your data source, and finally output the predicted values to the time series database.
If you’re using InfluxDb, you can visualize the result using Chronograf. The prediction is stored in the same database under the measurement called prediction_<model_name> and must be displayed using a GROUB BY time(bucket_interval) query.
Hopefully, this first Kung fu lesson has been a success!
Also, you can watch the Apr 17 video recording for an interactive demo: https://www.influxdata.com/resources/build-your-a-predictive-model-with-influxdb-and-loud-ml/