Elasticsearch Data Forecasting to predict future possibilities

Suranga jayalath
8 min readJun 28, 2022

What is Elastic Machine Learning?

Machine learning techniques are used to analyze your data and build models of its patterns of behavior. The type of analysis you choose will depend on the topics or problems you want to address and the kind of data you have at your disposal.

Types of Machine Learning

  • Supervised Machine Learning.
  • Unsupervised Machine Learning.

Unsupervised machine learning

Anomaly detection and outlier detection are the two types of analysis that may determine the patterns and relationships in your data without training or human intervention.

Time series data are required for anomaly detection. It builds a probability model and can operate continuously to track down odd events as they happen. The model changes over time; you may predict future behavior using its insights.

Time series data are not necessary for outlier spotting. By examining each data point’s proximity to other points and the density of the cluster of points around it, this type of data frame analytics finds outliers in a data collection.

Supervised machine learning

A subset of machine learning and artificial intelligence is supervised learning, commonly referred to as supervised machine learning. It is distinguished by the way it trains computers to accurately classify data or predict outcomes using labeled datasets.

There are two types of data frame analytics that require training data sets: classification and regression.

In order to forecast discrete categorical values, such as whether a DNS request originated from a hostile or benign domain, classification learns associations between your data points.

Regression uses associations discovered between your data points to forecast continuously varying numerical variables, such as the time it takes a web request to respond.

Forecast future behavior

  • After your anomaly detection job creates baselines of normal behavior for your data, you can use that information to extrapolate future behavior.
  • You can also use it to estimate the probability of a time series value occurring at a future date. For example, you might want to determine how likely it is that your disk utilization will reach 100% before the end of next week.
  • Each forecast has a unique ID, which you can use to distinguish between forecasts that you created at different times.
  • When you create a forecast, you specify its duration, which indicates how far the forecast extends beyond the last record that was processed. By default, the duration is 1 day.
  • Typically the farther into the future that you forecast, the lower the confidence levels become (that is to say, the bounds increase). Eventually if the confidence levels are too low, the forecast stops

Unsupported forecast configurations

There are some limitations that affect your ability to create a forecast:

  • You can generate only three forecasts per anomaly detection job concurrently. There is no limit to the number of forecasts that you retain. Existing forecasts are not overwritten when you create new forecasts. Rather, they are automatically deleted when they expire.
  • If you use an over_field_name property in your anomaly detection job (that is to say, it’s a population job), you cannot create a forecast.
  • If you use any of the following analytical functions in your anomaly detection job, you cannot create a forecast:
  • lat_long
  • rare and freq_rare
  • time_of_day and time_of_week

Set up machine learning features

Requirements overview

To use the Elastic Stack machine learning features, you must have:

  • the appropriate subscription level or the free trial period activated
  • xpack.ml.enabled set to its default value of true on every node in the cluster (refer to Machine learning settings in Elasticsearch)
  • ml value defined in the list of node.roles on the machine learning nodes
  • machine learning features visible in the Kibana space
  • security privileges assigned to the user that:
  • grant use of machine learning features, and
  • grant access to source and destination indices.

Security privileges

Assigning security privileges affects how users access machine learning features. Consider the two main categories:

  • Elasticsearch API user: uses an Elasticsearch client, cURL, or Kibana Dev Tools to access machine learning features via Elasticsearch APIs. It requires Elasticsearch security privileges.
  • Kibana user: uses the machine learning features in Kibana and does not use Dev Tools. It requires either Kibana feature privileges or Elasticsearch security privileges and is granted the most permissive combination of both. Kibana feature privileges are recommended if you control job level visibility via Spaces.

We can configure these privileges under Stack Management > Security in Kibana or via the respective Elasticsearch security APIs.

Elasticsearch API user

If you use machine learning APIs, you must have the following cluster and index privileges:

For full access:

  • machine_learning_admin built-in role or the equivalent cluster privileges
  • read and view_index_metadata on source indices
  • read, manage, and index on destination indices (for data frame analytics jobs only)

For read-only access:

  • machine_learning_user built-in role or the equivalent cluster privileges
  • read index privileges on source indices
  • read index privileges on destination indices (for data frame analytics jobs only)

Kibana security

Feature visibility in Spaces

In Kibana, the machine learning features must be visible in your space. To control which features are visible in your space, use Stack Management > Kibana > Spaces.

In addition to index privileges, source data views must also exist in the same space as your machine learning jobs. These can be configured in Stack Management > Kibana > Data Views.

Each machine learning job and trained model can be assigned to all, one, or multiple spaces. This can be configured in Stack Management > Alerts and Insights > Machine Learning. You can edit the spaces that a job or model is assigned to by clicking the icons in the Spaces

Kibana user

Within a Kibana space, for full access to the machine learning features, you must have:

  • Machine Learning: All Kibana privileges
  • Data Views Management: All Kibana feature privileges
  • read, and view_index_metadata index privileges on your source indices
  • data views for your source indices
  • data views, read, manage, and index index privileges on destination indices (for data frame analytics jobs only)

Within a Kibana space, for read-only access to the machine learning features, you must have:

  • Machine Learning: Read Kibana privileges
  • data views for your source indices
  • read index privilege on your source indices
  • data views and read index privileges on destination indices (for data frame analytics jobs only)

Data Visualizer feature

Within a Kibana space, to upload and import files in the Data Visualizer, you must have:

  • Machine Learning: Read or Discover: All Kibana feature privileges
  • Data Views Management: All Kibana feature privileges
  • ingest_admin built-in role, or manage_ingest_pipelines cluster privilege
  • create, create_index, manage and read index privileges for destination indices

How to set up data forecasting practically..

Step 1 → add sample data file with csv format..you have to select time based datasheet

Step 2 → Create job for detect anomaly

1 → Select uploaded data file

2 → select “Single Metric”

3 → click use full data

4 → pick fields

5 → enter job ID

6 → in summary section,click create job

7 → when done process click “Start job running in real time”

8 → View results

9 → Click Forecast & add dates

10 → now you can see forecast data

  • The yellow line in the chart represents the predicted data values. The shaded yellow area represents the bounds for the predicted values, which also gives an indication of the confidence of the predictions.
  • When you create a forecast, you specify its duration, which indicates how far the forecast extends beyond the last record that was processed. By default, the duration is 1 day.

Forecast with multi-metric view

Step 1 → Select data file

Step 2 → select multi-metric view

Step 3 → in time range step, click use fill data

Step 4 → in Pick fields section, add multi metrics using drop down menu

Step 5 → after adding multi metric, you different section in this page

Step 6 → you must select one or three Influencers. dont use more than three

Influencers — Select which categorical fields have influence on the results. Who/what might you ‘blame’ for an anomaly? Recommend 1–3 influencers.

Step 7 → use proper job id in job details section & description.

Step 8 → if you have done above mentioned steps properly you can see green colored messages as below

Step 9 → create & view results

Step 10 → using detector drop down menu you can use different metrics

Step 11 → data forecast with different metrics

Step 12 → now you can change detectory type and check forecast details

You can see data according to the influenced by

Step 13 → after adding forecast you can see forecast details in bottom section

Conclusion:

This forecast method helps us to get an idea about future behaviours. Forecasting allows businesses to set reasonable and measurable goals based on current and historical data. Having accurate data and statistics to analyse helps businesses to decide what amount of change, growth or improvement will be determined as a success.

References:

--

--