Logstash — Denormalize documents (Part 2)

Ingrid Jardillier
2 min readMay 2, 2024

--

This is a three part article. You can find other parts of this article:
*
Part 1 : highlights the need of denormalization
*
Part 2 : exposes the problematic of not using denormalization
*
Part 3 : shows how to implement denormalization

In this second part of Logstash — Denormalize documents, we will use the previous example (described in the Part 1) to expose the problematic of not using denormalization.

Problematic

If you create the prizes-* data view and go to the Discover App, you can have a look at our ingested data:

Documents in prizes-original

We can see that the JSON object with two prizes is rendered there with prize.year and prize.category as arrays.

If you want to look for prizes in 1903 for the “chemistry” category, you can add a query (using the KQL syntax):

prize.year : 1903 and prize.category : "chemistry"

This request returns a single result:

But if we check our original JSON file, we had this:

[
{
"year" : 1903,
"category" : "physics"
},
{
"year" : 1911,
"category":"chemistry"
}
]

So the prize got in 1903 was for “physics” an the one in 1911 for “chemistry”.

So, when we query for year 1903 and category “chemistry”, we should not obtain any result!

But Elasticsearch doesn’t keep link of the different items indices in arrays.

For Elasticsearch, the field prize.year contains 1903 and the field prize.category contains “chemistry”, so the document matches the query.

Resolution

On method to resolve this problematic is to use the nested type but there are some limitations and it can easily break down performance and it is not fully implemented in Kibana so only interesting if you are using Elasticsearch API to query your documents.

The second method is to denormalize documents in order to create one document per prize. This can be done with Logstash and that is what we will describe in our next part (Part 3).

--

--