Sitemap

Create BLAZING fast reports with Elastic Transforms and ES|QL

3 min readJul 22, 2024

I recently worked on enhancing the reporting infrastructure for a customer. The client needed to generate various reports efficiently, and their existing setup was struggling with performance issues. This led me to leverage Elasticsearch, specifically the newly announced ES|QL (Elasticsearch Query Language) from ELK 8.14, combined with Elastic Transforms.

The Challenge

The client’s reporting system was slow and cumbersome, as data volumes increased. Generating reports was becoming a bottleneck, preventing them and their tenants from make timely business decisions. A solution was needed to handle large datasets efficiently and provide quick insights, while keeping maintenance and the development of new reports easy and straight forward.

The Solution: Elastic Transforms + ES|QL

I decided to use ES|QL alongside Elastic Transforms to tackle this problem. This combination not only addressed the performance issues but also made querying and aggregating data more intuitive and flexible.

Use Case: eCommerce Platform

Consider a large eCommerce platform that logs individual shipments and products into an index called product-transactions. This index can be used as the source for various reports, such as top-selling products, product returns, and delayed shipments.

These reports could be generated by directly querying the index using KQL (Kibana Query Language). However, this approach has several drawbacks:

  • KQL queries are complex and hard to maintain.
  • Performing multiple aggregations results in a nested JSON response that requires additional server-side processing before being useful for charting or data export.
  • Reports on large time frames become slower as data is kept in cold or warm storage, or as indices grow very large.

An Alternative Approach: Elastic Transforms + ES|QL

Using Elastic Transforms and ES|QL can bypass many of these issues. Here’s how the workflow looks:

  1. Define the desired reports: Determine the columns, time ranges, and filters for each report.
  2. Create a transform for each report: The transform will run continuously, including all fields necessary for grouping, filtering, and sorting, and calculate all metrics needed for the final report.
  3. Use ES|QL to query the data: Use ES|QL to shape the data into the final report format.

Implementation Example: Top-Selling Products Report

For this example, let’s focus on a report that lists top-selling products. We want the report to include the product name, category, and total amount sold during a given time frame. Here’s how you can set it up:

Define the Transform:

PUT _transform/top-selling-products-transform
{
"source": {
"index": "product-transactions",
"query": {
"bool": {
"filter": [
{
"term": {
"order-status": "ordered"
}
}
]
}
}
},
"dest": {
"index": "top-selling-products"
},
"pivot": {
"group_by": {
"product-id": {
"terms": {
"field": "product-id"
}
},
"product-name": {
"terms": {
"field": "product-name"
}
},
"product-category": {
"terms": {
"field": "product-category"
}
},
"order-date": {
"date_histogram": {
"field": "order-date",
"interval": "1d",
"format": "yyyy-MM-dd",
"min_doc_count": 1
}
}
},
"aggregations": {
"amount-sold": {
"sum": {
"field": "order-amount"
}
}
}
},
"frequency": "1h",
"sync": {
"time": {
"field": "order-date",
"delay": "60s"
}
}
}

Start the transform with the following command:

POST _transform/top-selling-products-transform/_start

This transform will create a continuously updating index named top-selling-products, aggregating the total amount sold each day. Once the transform is set up, data from the main index can be frozen, archived, or even deleted, while you can still query the transform index for the report data.

Use ES|QL:

ES|QL simplifies querying the transform index. Here’s an example query for the top-selling products:

FROM top-selling-products
| WHERE order_date > NOW() - 30d
| WHERE product_category == "Shoes"
| STATS total_sold = SUM(amount_sold) BY product_name, product_category
| SORT total_sold DESC
| LIMIT 50

This query performs the following steps:

  1. Defines the source index as the transformed index containing aggregate data.
  2. Filters the data to include only the past 30 days.
  3. Filters by a specific category (Shoes). Omitting this line would include data for all categories.
  4. Sums the amount of items sold by the specified fields.
  5. Sorts the report by the total sold amount.
  6. Limits the report to the top 50 results.

By running this query from your application using an Elasticsearch client, you can create reports or visualizations based on the transformed data. These reports will load almost instantly and without putting additional load on the main data index.

And that’s that! With a few simple steps you can use Elastic to create flexible, fast and easily maintainable infrastructure for your reports and charts.

Additional resources:

  1. Elastic Transforms
  2. ES|QL
  3. Elasticsearch Clients

--

--

Liza Katz (Lizozom)
Liza Katz (Lizozom)

Written by Liza Katz (Lizozom)

Software developer with way too many years of experience, yoga enthusiast, cook, traveler and writer. https://lizozom.com

No responses yet