Elasticsearch: working with time series data (+ hands-on)

Published in

Data Reply IT | DataTech

12 min readNov 7, 2023

Elasticsearch is a highly scalable and distributed open-source search and analytics engine. It is designed to handle large volumes of data and perform fast searches across various fields. Elasticsearch is capable of real-time data indexing, full-text search, and complex data analysis, making it a popular choice for building applications that require efficient data retrieval and exploration. It is used for a variety of use cases like Application search, Log/metrics analytics, and others. We already introduced this tool and discussed its main features in this article.

This time, let’s focus on time-series data which are ingested and analyzed by a lot of companies using Elasticsearch.

What are time series data?

Time series data, such as logs and metrics, refer to data points recorded and organized in chronological order. Logs capture events and activities, while metrics measure and monitor system performance. Analyzing time series data with Elasticsearch offers several benefits. Firstly, Elasticsearch’s fast search capabilities (together with Kibana, a powerful visualization tool) enable efficient querying and filtering of large data volumes, facilitating prompt root cause analysis and troubleshooting. Secondly, its real-time indexing enables immediate availability of new data for analysis, enabling real-time monitoring and alerting. Lastly, it allows users to create insightful dashboards and visual representations of time series data, aiding in trend analysis and decision-making.

Ingesting and storing time series data

Elasticsearch provides several tools for ingesting time series data, including Beats, Elastic agents, and APM agents. Beats are a set of lightweight data shippers that can be installed on servers or edge devices, enabling the collection and forwarding of various types of data, such as logs, metrics, and network traffic, to Elasticsearch. Elastic agents are a unified tool that can be used to collect a large variety of information. APM agents, on the other hand, are specifically designed for application performance monitoring, allowing the collection of detailed performance metrics and distributed tracing data. These tools offer versatile options for efficiently ingesting time series data into Elasticsearch, facilitating effective monitoring, analysis, and visualization of data.

In Elasticsearch, time series data, like any other type of data, are commonly stored using a data structure called index which is a logical container that holds related documents: each document represents a single data point or event within the time series. Furthermore, Elasticsearch provides a higher-level abstraction for managing and organizing time series data called data stream.

Data streams offer a convenient means of storing time series data in a sequential manner across multiple indices, providing a unified point of access for making read and write requests. They are highly suitable for handling continuously generated data that gets rarely/never updated like logs and metrics.
Each data stream is made of a list of hidden indices called backing indices: the most recent backing index is the data stream’s write index.

Data stream’s indices representation from the official Elastic website — source

Read requests are automatically routed to the proper backing indices while write requests are routed to the write index only.

Data stream’s requests handling representation from the official Elastic website — source

A data stream requires a matching index template that contains the mappings and settings used to configure all stream’s backing indices. Index template also allows setting an ILM policy in order to automatically and efficiently manage the data stream lifecycle, resource allocation, data retention, and performance optimization in Elasticsearch. In particular, ILM policies can automatically:

create a new backing index and set it as the new data stream’s write index (rollover)
determine when the number of primary shards can be reduced for specific backing indices (shrink)
decide when to move some backing indices to less performant hardware (data tiers)
determine when the number of replica shards can be reduced for specific backing indices
determine when the index can be safely deleted

Analyzing and monitoring time series data

Kibana is a data visualization and exploration tool that works in conjunction with Elasticsearch and provides many useful features that can be used to analyze and monitor time series data.

The Discover tool provides a user-friendly interface to search, explore, and analyze data stored in Elasticsearch indices (and, of course, data streams): it allows you to create complex queries using the Query DSL (Domain Specific Language) or the visual query editor, apply filters, and view the results in a tabular format. The Discover tool is particularly useful for analyzing time series data: it supports time-based analysis by displaying data points on a timeline, allowing you to zoom in and out to view data at different granularities. You can also apply date range filters to focus on specific time periods.

Discover tool view from the official Elastic website — source

When it comes to visualizing and presenting insights, Kibana’s dashboards are invaluable: they enable you to create interactive visualizations and combine multiple visualizations and saved searches into a single, customized dashboard. You can add line charts, area charts, bar charts, pie charts, maps, and more, and configure them to display data from Elasticsearch queries. Dashboards allow you to monitor time series data in real time, track key metrics, identify trends and anomalies, and share these insights with your team.

Dashboards and visualizations example from the official Elastic website — source

Kibana offers alerting capabilities through its Alerting and Actions framework in order to monitor time series data proactively. With Kibana Alerts, you can define different types of rules (conditions) on your time series data, such as Elasticsearch queries rules, Index threshold rules, Transform rules, and others. When an alert condition is triggered, Kibana can automatically perform an action thanks to the usage of a connector: there are many types of connectors such as Index connectors, Email connectors, Teams connectors, and others. Kibana alerts ensure that users are notified promptly when important changes occur in their time series data, allowing you to take immediate action.

By utilizing the Discover tool, dashboards, and alerts you can gain deep insights into your time series data, create visually appealing and informative dashboards, and monitor critical metrics in real-time to respond effectively to events and anomalies.

Hands-on

Now it’s time to test some concepts described above. Let’s start by spinning up e very basic cluster made of three Elasticsearch nodes and a Kibana node: in order to achieve that, we will use a docker-compose file (be sure to install and run docker and docker-compose first) that is configured through a external .env file. First, create an empty directory and place inside it the following file that has to be named .env:

#
# .env: place this file in the main directory
#

# Project namespace (defaults to the current folder name if not set)
COMPOSE_PROJECT_NAME=elastic_time_series_data

# Password for the 'elastic' user (at least 6 characters)
ELASTIC_PASSWORD=changeme

# Password for the 'kibana_system' user (at least 6 characters)
KIBANA_PASSWORD=changeme

# Version of Elastic products
STACK_VERSION=8.7.1

# Set the cluster name
CLUSTER_NAME=docker-cluster

# Set to 'basic' or 'trial' to automatically start the 30-day trial
LICENSE=basic
#LICENSE=trial

# Port to expose Elasticsearch HTTP API to the host
ES_PORT=9200

# Port to expose Kibana to the host
KIBANA_PORT=5601

# Increase or decrease based on the available host memory (in bytes)
ES_MEM_LIMIT=1073741824
KB_MEM_LIMIT=1073741824
LS_MEM_LIMIT=1073741824

# SAMPLE Predefined Key only to be used in POC environments
ENCRYPTION_KEY=a23d38b3a14956121ff2170e5030b471551370178f43e5626eec58b04a30fz19

Then create and place in the same directory the docker-compose.yml file: this will take care of creating 3 Elasticsearch containers, a Kibana container, and a Metricbeat container (after performing some setup operations).

#
# docker-compose.yml: place this file in the main directory
#

version: "3.8"

volumes:
  certs:
    driver: local
  esdata01:
    driver: local
  esdata02:
    driver: local
  esdata03:
    driver: local
  kibanadata:
    driver: local
  metricbeatdata01:
    driver: local

networks:
  default:
    name: elastic
    external: false

services:
  setup:
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
    user: "0"
    command: >
      bash -c '
        if [ x${ELASTIC_PASSWORD} == x ]; then
          echo "Set the ELASTIC_PASSWORD environment variable in the .env file";
          exit 1;
        elif [ x${KIBANA_PASSWORD} == x ]; then
          echo "Set the KIBANA_PASSWORD environment variable in the .env file";
          exit 1;
        fi;
        if [ ! -f config/certs/ca.zip ]; then
          echo "Creating CA";
          bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip;
          unzip config/certs/ca.zip -d config/certs;
        fi;
        if [ ! -f config/certs/certs.zip ]; then
          echo "Creating certs";
          echo -ne \
          "instances:\n"\
          "  - name: es01\n"\
          "    dns:\n"\
          "      - es01\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          "  - name: es02\n"\
          "    dns:\n"\
          "      - es02\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          "  - name: es03\n"\
          "    dns:\n"\
          "      - es03\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          "  - name: kibana\n"\
          "    dns:\n"\
          "      - kibana\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          > config/certs/instances.yml;
          bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key;
          unzip config/certs/certs.zip -d config/certs;
        fi;
        echo "Setting file permissions"
        chown -R root:root config/certs;
        find . -type d -exec chmod 750 \{\} \;;
        find . -type f -exec chmod 640 \{\} \;;
        echo "Waiting for Elasticsearch 01 availability";
        until curl -s --cacert config/certs/ca/ca.crt https://es01:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
        echo "Waiting for Elasticsearch 02 availability";
        until curl -s --cacert config/certs/ca/ca.crt https://es02:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
        echo "Waiting for Elasticsearch 03 availability";
        until curl -s --cacert config/certs/ca/ca.crt https://es03:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
        echo "Setting kibana_system password";
        until curl -s -X POST --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_security/user/kibana_system/_password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done;
        echo "All done!";
      '
    healthcheck:
      test: ["CMD-SHELL", "[ -f config/certs/es01/es01.crt ]"]
      interval: 1s
      timeout: 5s
      retries: 120

  es01:
    depends_on:
      setup:
        condition: service_healthy
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    labels:
      co.elastic.logs/module: elasticsearch
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
      - esdata01:/usr/share/elasticsearch/data
    ports:
      - ${ES_PORT}:9200
    environment:
      - node.name=es01
      - cluster.name=${CLUSTER_NAME}
      - discovery.type=multi-node
      - cluster.initial_master_nodes=es01,es02,es03
      - discovery.seed_hosts=es02
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - bootstrap.memory_lock=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=certs/es01/es01.key
      - xpack.security.http.ssl.certificate=certs/es01/es01.crt
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=certs/es01/es01.key
      - xpack.security.transport.ssl.certificate=certs/es01/es01.crt
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.license.self_generated.type=${LICENSE}
    mem_limit: ${ES_MEM_LIMIT}
    ulimits:
      memlock:
        soft: -1
        hard: -1
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

  es02:
    depends_on:
      setup:
        condition: service_healthy
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    labels:
      co.elastic.logs/module: elasticsearch
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
      - esdata02:/usr/share/elasticsearch/data
    environment:
      - node.name=es02
      - cluster.name=${CLUSTER_NAME}
      - discovery.type=multi-node
      - cluster.initial_master_nodes=es01,es02,es03
      - discovery.seed_hosts=es01
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - bootstrap.memory_lock=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=certs/es02/es02.key
      - xpack.security.http.ssl.certificate=certs/es02/es02.crt
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=certs/es02/es02.key
      - xpack.security.transport.ssl.certificate=certs/es02/es02.crt
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.license.self_generated.type=${LICENSE}
    mem_limit: ${ES_MEM_LIMIT}
    ulimits:
      memlock:
        soft: -1
        hard: -1
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

  es03:
    depends_on:
      setup:
        condition: service_healthy
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    labels:
      co.elastic.logs/module: elasticsearch
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
      - esdata03:/usr/share/elasticsearch/data
    environment:
      - node.name=es03
      - cluster.name=${CLUSTER_NAME}
      - discovery.type=multi-node
      - cluster.initial_master_nodes=es01,es02,es03
      - discovery.seed_hosts=es02
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - bootstrap.memory_lock=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=certs/es03/es03.key
      - xpack.security.http.ssl.certificate=certs/es03/es03.crt
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=certs/es03/es03.key
      - xpack.security.transport.ssl.certificate=certs/es03/es03.crt
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.license.self_generated.type=${LICENSE}
    mem_limit: ${ES_MEM_LIMIT}
    ulimits:
      memlock:
        soft: -1
        hard: -1
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

  kibana:
    depends_on:
      es01:
        condition: service_healthy
      es02:
        condition: service_healthy
      es03:
        condition: service_healthy
    image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
    labels:
      co.elastic.logs/module: kibana
    volumes:
      - certs:/usr/share/kibana/config/certs
      - kibanadata:/usr/share/kibana/data
    ports:
      - ${KIBANA_PORT}:5601
    environment:
      - SERVERNAME=kibana
      - ELASTICSEARCH_HOSTS=["https://es01:9200","https://es02:9200","https://es03:9200"]
      - ELASTICSEARCH_USERNAME=kibana_system
      - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
      - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt
      - XPACK_SECURITY_ENCRYPTIONKEY=${ENCRYPTION_KEY}
      - XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY=${ENCRYPTION_KEY}
      - XPACK_REPORTING_ENCRYPTIONKEY=${ENCRYPTION_KEY}
    mem_limit: ${KB_MEM_LIMIT}
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl -s -I http://localhost:5601 | grep -q 'HTTP/1.1 302 Found'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

  metricbeat01:
    depends_on:
      es01:
        condition: service_healthy
      es02:
        condition: service_healthy
      es03:
        condition: service_healthy
      kibana:
        condition: service_healthy
    image: docker.elastic.co/beats/metricbeat:${STACK_VERSION}
    command: metricbeat -e -strict.perms=false
    user: root
    volumes:
      - certs:/usr/share/metricbeat/certs
      - metricbeatdata01:/usr/share/metricbeat/data
      - "./metricbeat.yml:/usr/share/metricbeat/metricbeat.yml:ro"
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "/sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro"
      - "/proc:/hostfs/proc:ro"
      - "/:/hostfs:ro"
    environment:
      - ELASTIC_USER=elastic
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - ELASTIC_HOSTS=["https://es01:9200","https://es02:9200","https://es03:9200"]
      - KIBANA_HOSTS=http://kibana:5601

As you can see from the docker-compose file, we use Metricbeat (one of the most famous Elastic Beats) to collect some metrics related to the docker container where it is running. We configure it with an additional metricbeat.yml file (placed in the same directory) where we define which metrics have to be collected: we also set docker-metrics-index-<version> as the target Elastcsearch index where metrics are written to. Please refer to the official metricbeat reference page to deep dive into all possible configurations.

#
# metricbeat.yml: place this file in the main directory
#

metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

metricbeat.modules:
- module: elasticsearch
  xpack.enabled: true
  period: 10s
  hosts: ${ELASTIC_HOSTS}
  ssl.certificate_authorities: "certs/ca/ca.crt"
  ssl.certificate: "certs/es01/es01.crt"
  ssl.key: "certs/es01/es01.key"
  username: ${ELASTIC_USER}
  password: ${ELASTIC_PASSWORD}
  ssl.enabled: true

- module: kibana
  metricsets: 
    - stats
  period: 10s
  hosts: ${KIBANA_HOSTS}
  username: ${ELASTIC_USER}
  password: ${ELASTIC_PASSWORD}
  xpack.enabled: true

- module: docker
  metricsets:
    - "container"
    - "cpu"
    - "diskio"
    - "healthcheck"
    - "info"
    - "memory"
    - "network"
  hosts: ["unix:///var/run/docker.sock"]
  period: 10s
  enabled: true

processors:
  - add_host_metadata: ~
  - add_docker_metadata: ~

setup:
  template:
    enabled: false

output.elasticsearch:
  index: "docker-metrics-index-%{[agent.version]}"
  hosts: ${ELASTIC_HOSTS}
  username: ${ELASTIC_USER}
  password: ${ELASTIC_PASSWORD}
  ssl:
    certificate: "certs/es01/es01.crt"
    certificate_authorities: "certs/ca/ca.crt"
    key: "certs/es01/es01.key"

Now open a terminal in the directory we just created and run all the containers using the command docker-compose up: after a few minutes, the Elasticsearch cluster will be up and running and metrics will start to get collected.

Access Kibana to start exploring time series data: open a browser and go to https://localhost:5601 (where Kibana is listening) and log in using credentials from .env file that are:

username: elastic
password: changeme

Open Index Management (you can use the search bar) and verify that the docker-metrics-index-8.7.1 index is present under the indices tab: this index will keep getting filled with new docker metrics documents.

In order to visualize the metrics that are being collected, go to Discover (the search bar is your best friend) and create a Data View with the following settings:

name: docker-metrics-index-view (you can choose any name)
index pattern: docker-metrics-index-* (stick to this value)
timestamp field: @timestamp

Now you can play with the Discover tool and analyze your metrics.

It’s time to switch from a simple index to a data stream. First, change the target index for Metricbeat switching it from docker-metrics-index-%{[agent.version]} to docker-metrics-stream-%{[agent.version]} in the metricbeat.yml file (don’t restart its container for now).

Then open Index Lifecycle (ILM) Policies and create a new policy named docker-metrics-ilm-policy with the following settings (leave empty all other settings):

Hot Phase (advanced settings)

Switch from Keep data in this phase forever to Delete data after this phase
rollover — recommended defaults: false
rollover — maximum age: 15 minutes
shrink — enabled: true
shrink — configure shard count: true
shrink — number of primary shards: 1

Delete Phase

move data into phase when: 30 minutes

By applying this ILM policy to the datastream we are going to create, we’ll allow it to automatically create a new write index (rollover) every 15 minutes, shrinking the previous one to 1 primary shard: also, every data stream’s backing index will be automatically deleted after 30 minutes.

Finally, open Index Management and go under Index Template tab where we can create a new one with the following settings (skip/leave default values for not mentioned ones):

Logistics

name: docker-metrics-template
index-pattern: docker-metrics-stream-*
data stream — enabled

Index Settings

{
  "index": {
    "lifecycle": {
      "name": "docker-metrics-ilm-policy"
    },
    "number_of_shards": "2",
    "number_of_replicas": "1"
  }
}

This template will be associated with every new index having a name that matches the index pattern value docker-metrics-stream-*: all documents that should have been written to such indices will be written to a data stream with the same name instead.

It’s time to restart the Metricbeat container to apply the changes. After restarting it, go again to Index Management and verify that the docker-metrics-stream-8.7.1 data stream is present under the Data Stream tab: you can visualize all its backing indices by clicking on its name. Currently, you should be able to see only one index but remember to try again after 15 minutes: you will see that an additional index has been automatically created through the rollver operation (shrinking indices to 1 primary shard may take a few more minutes).

Conclusion

By harnessing Elasticsearch (and Kibana) tools, users can unlock valuable insights and take reactive and proactive actions based on their time series data, ultimately enhancing their understanding and decision-making processes.

We managed to ingest some time series data (docker metrics) and write it on a datastream with an associated ILM policy. Now it’s your time to play with the Kibana tools we introduced in this article: start by visualizing the datastream’s documents in the Discover tool (you will need a new data view) and then build a simple dashboard (you will see how easy it is). Finally, try to create an alert (search Alerts and Insights) by configuring a Connector and a Rule.