Top 5 Elastic aggregations to empower your charts!

Published in

Shelf Engineering

8 min readDec 19, 2022

Elasticsearch is an excellent tool for indexing large amounts of static data to perform searches and many other analytical queries.

Whether you are a data scientist, an engineer, or simply someone who is interested in gaining a better understanding of your data, Elastic aggregations provide a powerful and flexible toolset that you can use to extract valuable insights and make more informed decisions. They allow you to uncover valuable insights and trends that would be difficult or impossible to detect using other methods.

You are probably aware of some basic aggregations like avg sum, max or a bit more advanced bucket_sort or terms. They are obvious and do not show all the power of Elasticsearch.

So without further ado let’s dive into some non-trivial aggregations provided by Elastic!

Setup

To experiment alongside this article you can both create a free Elastic cluster — here or run an Elastic with Kibana locally. I suggest using docker for this.

Create a docker network for Kibana and elastic:

docker network create elastic

Launch elastic search without security options:

docker run \
      --name elasticsearch \
      --net elastic \
      -p 9200:9200 \
      -e discovery.type=single-node \
      -e ES_JAVA_OPTS="-Xms1g -Xmx1g"\
      -e xpack.security.enabled=false \
      -it \
      docker.elastic.co/elasticsearch/elasticsearch:8.5.3

Spin-up Kibana:

docker run \
    --name kibana \
    --net elastic \
    -p 5601:5601 \
    docker.elastic.co/kibana/kibana:8.5.2

Profit, now go to http://localhost:5601/app/home#/ and upload some sample data, click “ Try sample data”. In the current article, I’m using e-commerce data same.

Bucket aggregations

In Elasticsearch, a bucket is a type of aggregation that groups documents together based on specific criteria. For example, you could use a bucket aggregation to group all of the documents in an index by a specific field, such as the date on which they were created.

Date histogram

It is used to group documents in time intervals by hour, day, week, etc. Example:

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0, 
  "aggs": {
    "chart": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "1d"
      }
    }
  }
}

Data output

This is an Elasticsearch aggregation that uses the date_histogram aggregation to group documents by the order_date field, using a calendar interval of one day. This means that the aggregation will create one bucket for each day, and each bucket will contain all documents that have an order_date on that day.

This aggregation can be used to create a chart that shows the number of orders that were done each day.

It can help to make correlations with events or product types that were launched that day.

Significant terms

The significant_terms aggregation is a type of bucket aggregation that groups documents by the values in a specified field and then calculates which values are statistically significant for the overall set of documents. This can be useful for identifying which values are over-represented or under-represented in your data.

Let’s experiment and make a nested aggregation with histogram_date:

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "chart": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "1w"
      },
      "aggs": {
        "significant_products_categories": {
          "significant_terms": {
            "field": "products.category.keyword"
          }
        }
      }
    }
  }
}

Data output

In this aggregation, we will see categories of products that are anomalies bigger or smaller during the exact week rather than during the whole period of time.

In this case, we see categories that mostly occurred during weekly sales.

Metric aggregation

Metrics are another type of aggregation in Elasticsearch that are used to calculate metrics such as the average, minimum, etc for a field in a group of documents. But there are really interesting and rare aggregations implemented in elastic that you can benefit from!

Percentiles

Calculates the percentile rank for a field in a set of documents. This is useful for identifying the most common values in a field or for determining the distribution of values in a data set.

The aggregation returns the percentiles as a set of key-value pairs, where the keys are the percentile values and the values are the corresponding field values.

For example, the 95th percentile is the value that is greater than 95% of the observed values.

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "countries": {
      "terms": {
        "field": "geoip.continent_name"
      },
      "aggs": {
        "percentile": {
          "percentiles": {
            "field": "taxful_total_price"
          }
        }
      }
    }
  }
}

Data output

This query allows the user to see the distribution of taxful_total_price values for each continent:

Now we can find the maximum price for 7 distributions: 1, 5, 25, 50, 95, and 99 percent of orders on each continent.

Cardinality

This aggregation calculates the number of unique values in a field. This is useful for finding the number of distinct values in a field, such as the number of unique users or products in a dataset.

The aggregation returns a single value, which is the number of unique values in the specified field. You can use the cardinality aggregation in combination with other aggregations:

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "chart": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "1d"
      },
      "aggs": {
        "unique": {
          "cardinality": {
            "field": "customer_id"
          }
        }
      }
    }
  }
}

Data output

This query uses date_histogram aggregation to group documents by the timestamp field, using a calendar interval of one day. It also has a sub-aggregation called "unique" that uses the cardinality aggregation to calculate the number of unique values in the customer_id the field for each bucket in the date_histogram.

This query can be used to create a chart that shows the number of unique users that performed orders for each day:

Pipeline aggregation

Pipeline aggregations are used to process the output of other aggregations.

For example, you could use a pipeline aggregation to sort the results of a bucket aggregation by the average price of the products in each group or to calculate the cumulative sum of the total number of products in each group.

Cumulative sum

This aggregation calculates the cumulative sum of numeric values in a field over a set of documents. This means that the aggregation will calculate the value of the field by summing up the previous value with the current one.

The result of the aggregation will be a single value representing the cumulative sum of the values in the field. Now let’s take a look at this aggregation in a bit sophisticated query.

Example:

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "country": {
      "terms": {
        "field": "geoip.continent_name"
      },
      "aggs": {
        "chart": {
          "date_histogram": {
            "field": "order_date",
            "calendar_interval": "1w"
          },
          "aggs": {
            "sales": {
              "sum": {
                "field": "taxful_total_price"
              }
            },
            "cumulative": {
              "cumulative_sum": {
                "buckets_path": "sales"
              }
            }
          }
        }
      }
    }
  }
}

Data output

An aggregation called “country” uses terms to group the documents by the continent name field in the geoip object. It produces a set of buckets, one for each unique continent name in the index.

The “chart” aggregation, which is nested within the “country”, uses a date_histogram to group the documents within each continent bucket by the order date. It produces a set of sub-buckets, one for each week in the time range of the data.

The “sales” aggregation, which is nested within the “chart”, uses a sum to calculate the total sales for each week by summing the values in the taxful_total_price field.

Finally, the “cumulative” aggregation will keep track of the total sales for each week and will add the sales of each subsequent week to the total as it processes the data.

In this example, we can see and compare how much money was earned each week on each continent.

BONUS — Derivative

Derivative aggregation refers to the process of using derivatives, which are mathematical concepts that describe the rate of change of a function, to analyze and summarize data. This can be useful for a variety of applications, such as identifying trends in a data set or calculating summary statistics.

Let’s take a look at the same aggregation as the previous one but using a derivative instead of a cumulative sum:

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "country": {
      "terms": {
        "field": "geoip.continent_name"
      },
      "aggs": {
        "chart": {
          "date_histogram": {
            "field": "order_date",
            "calendar_interval": "1w"
          },
          "aggs": {
            "sales": {
              "sum": {
                "field": "taxful_total_price"
              }
            },
            "derivative": {
              "derivative": {
                "buckets_path": "sales"
              }
            }
          }
        }
      }
    }
  }
}

Date example

In this case, the derivative aggregation is being applied to the sum of the “taxful_total_price” field, which is the sales metric.

It allows you to track the rate of change for a metric over time. Derivative calculates the difference between the current and previous values. This can be useful for identifying trends and anomalies in your data:

In this case, the derivative aggregation can help you see how the sales numbers are changing from week to week. If the derivative values are consistently increasing, that could indicate that sales are growing. On the other hand, if the derivative values are consistently decreasing, that could indicate that sales are declining.

Wrap up

date_histogram: Groups documents in time intervals (e.g. by the hour, day, week, etc.).

significant_terms: Identifies unusual occurrences of terms in a given

percentiles: Calculates the percentiles of a specified field.

cardinality: Estimates the number of unique values in a field

cumulative_sum: Calculates the cumulative sum of a specified metric.

derivative: Calculates the difference between the current and previous values of a specified metric.

Thanks for reading, and happy Elasticsearching!

Want to connect?

Follow me on Twitter! or read more:

Optimizing massive MongoDB inserts, load 50 million records faster by 33%!

Top 5 Elastic aggregations to empower your charts!

Setup

Bucket aggregations

Date histogram

Significant terms

Metric aggregation

Percentiles

Cardinality

Pipeline aggregation

Cumulative sum

BONUS — Derivative

Wrap up

Want to connect?

Written by Dmytro Harazdovskiy