<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Ingrid Jardillier on Medium]]></title>
        <description><![CDATA[Stories by Ingrid Jardillier on Medium]]></description>
        <link>https://medium.com/@ingrid.jardillier?source=rss-5fe6d48f202a------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*RBExOrRa1j1kBLwVUfeqfg.jpeg</url>
            <title>Stories by Ingrid Jardillier on Medium</title>
            <link>https://medium.com/@ingrid.jardillier?source=rss-5fe6d48f202a------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Fri, 15 May 2026 16:10:52 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@ingrid.jardillier/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Create a nice bar chart in Kibana Vega step by step from Elasticsearch data]]></title>
            <link>https://medium.zenika.com/create-a-nice-bar-chart-in-kibana-vega-step-by-step-from-elasticsearch-data-0e0a61c052f5?source=rss-5fe6d48f202a------2</link>
            <guid isPermaLink="false">https://medium.com/p/0e0a61c052f5</guid>
            <category><![CDATA[vegas]]></category>
            <category><![CDATA[elastic]]></category>
            <category><![CDATA[kibana]]></category>
            <dc:creator><![CDATA[Ingrid Jardillier]]></dc:creator>
            <pubDate>Wed, 29 Jan 2025 07:50:14 GMT</pubDate>
            <atom:updated>2025-01-29T07:50:14.694Z</atom:updated>
            <content:encoded><![CDATA[<p>In a previous article (<a href="https://medium.com/@ingrid.jardillier/5118615f3415">Using transformations in Kibana Vega to adapt data from query DSL</a>), we saw <strong>how to retrieve data from Elasticsearch, enrich it with static data sources, and transform it to adapt it for simplified exploitation</strong>. This data represented J<strong>VM memory by cluster (environment) and role (tier)</strong>, derived from monitoring data.<br><br>For the record, the main data source (named <strong>”jvm”</strong>) after transformation gave the following results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*NMybMQ1H75plKaC8Eb97fA.png" /></figure><p>In this article, we’ll see how to use our reworked data source to produce a bar chart visualization in Kibana Vega. The expected result is the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ebmdeWtatlufcLSLkcvPpg.png" /></figure><h3>Bar chart visualization</h3><h4>Definition</h4><p>The bar chart graph we want to set up will therefore have:</p><ul><li>on the <strong>X-axis</strong>: the <strong>JVM memory</strong> in Gb</li><li>on the <strong>Y-axis</strong>: the <strong>clusters</strong> (environments)</li></ul><p>Each environment will also have a <strong>series</strong> by <strong>role</strong> (tier).</p><h4>Set up in Vega</h4><p>As we already have defined the data for Vega in the previous article, we will only care about visualization creation, ie the <strong>scales</strong>, <strong>axes</strong>, <strong>marks</strong>, and <strong>legends</strong> in the Vega definition.</p><pre>{<br>  &quot;$schema&quot;: &quot;https://vega.github.io/schema/vega/v5.json&quot;,<br>  &quot;title&quot;: {<br>    &quot;text&quot;: &quot;JVM by cluster and role&quot;, <br>    &quot;color&quot;: &quot;black&quot;<br>  },<br>  &quot;description&quot;: &quot;Information about JVM on clusters&quot;,<br>  &quot;padding&quot;: 15,<br>  &quot;background&quot;: &quot;#FFFFFF&quot;,<br>  &quot;config&quot;: {<br>    &quot;title&quot;: { &quot;fontSize&quot;: 20 }<br>  },<br>  &quot;data&quot;: [<br>    // already done in previous article<br>  ],<br>  &quot;scales&quot;: [<br>    // TODO<br>  ],<br>  &quot;axes&quot;: [<br>    // TODO<br>  ],<br>  &quot;marks&quot;: [<br>    // TODO<br>  ],<br>  &quot;legends&quot;: [<br>    // TODO<br>  ]<br>}</pre><h3>Axes definition</h3><p>We will start by defining our 2 axes. These axes must be scaled relative to the values ​​we wish to display.</p><h4>X-axis</h4><p>The X-axis will be displayed at the <strong>bottom</strong> of our graph (“<strong>orientation</strong>”) and the value <strong>labels</strong> (“<strong>labelColor</strong>”) will be black. So we will have to base ourselves on a scale named “<strong>xscale</strong>” which we will define just after. What gives:</p><pre>&quot;axes&quot;: [<br>  {<br>    &quot;orient&quot;: &quot;bottom&quot;, <br>    &quot;scale&quot;: &quot;xscale&quot;, <br>    &quot;labelColor&quot;: &quot;black&quot;<br>  }<br>]</pre><p>The scale to be implemented will therefore be of the “<strong>linear”</strong> type, function of the value of the JVM memory (data source “<strong>jvm</strong>”), the scale has to go from 0 to the value “<strong>total_jvm</strong>” available in the data source and which occupies <strong>all the available width (“range”)</strong>.</p><pre>{<br>  &quot;name&quot;: &quot;xscale&quot;,<br>  &quot;type&quot;: &quot;linear&quot;,<br>  &quot;domain&quot;: {<br>    &quot;data&quot;: &quot;jvm&quot;, <br>    &quot;field&quot;: &quot;total_jvm&quot;<br>  },<br>  &quot;range&quot;: &quot;width&quot;<br>}</pre><p>This implementation of the X-axis gives:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*azLRGhYYW3Tg8IxkqY8cKg.png" /></figure><h4>Y-axis</h4><p>The Y-axis will be displayed on the <strong>left</strong> and will show the names of the clusters (environments).</p><pre>&quot;axes&quot;: [<br>  // ...<br>  {<br>    &quot;orient&quot;: &quot;left&quot;, <br>    &quot;scale&quot;: &quot;yscale&quot;, <br>    &quot;labelColor&quot;: &quot;black&quot;, <br>    &quot;tickSize&quot;: 0, <br>    &quot;labelPadding&quot;: 25<br>  }<br>]</pre><p>The “<strong>tickSize</strong>” to 0 makes the tick on this axis disappear. And we add some padding after the label with “<strong>labelPadding</strong>”.</p><p>The scale for this axis will this time be of type “<strong>band</strong>” to allow us to group according to our clusters, it is the name of the cluster (“<strong>cluster_name</strong>”) that will be displayed as a <strong>label</strong> on this axis but the clusters on this axis will be <strong>sorted</strong> according to a predefined order contained in the field “<strong>cluster_id</strong>”, The axis will take all the <strong>space available in height</strong> (“<strong>range</strong>”).</p><pre>&quot;scales&quot;: [<br>    // ...<br>    {<br>      &quot;name&quot;: &quot;yscale&quot;,<br>      &quot;type&quot;: &quot;band&quot;,<br>      &quot;domain&quot;: {<br>        &quot;data&quot;: &quot;jvm&quot;, <br>        &quot;field&quot;: &quot;cluster_name&quot;, <br>        &quot;sort&quot;: {<br>          &quot;op&quot;: &quot;median&quot;, <br>          &quot;field&quot;: &quot;cluster_id&quot;, <br>          &quot;order&quot;: &quot;descending&quot;<br>        }<br>      },<br>      &quot;range&quot;: &quot;height&quot;<br>    }<br>]</pre><p>This implementation of the Y-axis gives:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/88/1*zEc-wl_VpABuTpJqiVVaAQ.png" /></figure><h3>Legends</h3><p>Now we’re going to look at how to add a caption. Why not do it last? To go step by step through the difficulty 😉 in this article but, in fact, it is possible to do it at the very end or as needed.</p><h4>Legend values</h4><p>First, we will create the legend with only the values ​​of the roles (identifier allowing to order them). We want <strong>symbols</strong> (“<strong>type</strong>”) in the shape of a “<strong>square</strong>”. The title and labels will be in <strong>black</strong> (“<strong>titleColor</strong>” and “<strong>labelColor</strong>”). The legend will be placed at the <strong>bottom</strong> (“<strong>orient</strong>”), <strong>horizontally</strong> (“<strong>direction</strong>”). The “tickMinStep” will ensure we have only integer values.</p><pre>&quot;legends&quot;: [<br>  {<br>    &quot;type&quot;: &quot;symbol&quot;,<br>    &quot;symbolType&quot;: &quot;square&quot;,<br>    &quot;fill&quot;: &quot;color&quot;,<br>    &quot;labelColor&quot;: &quot;black&quot;,<br>    &quot;title&quot;: &quot;Roles&quot;,<br>    &quot;titleColor&quot;: &quot;black&quot;,<br>    &quot;orient&quot;: &quot;bottom&quot;,<br>    &quot;direction&quot;: &quot;horizontal&quot;,<br>    &quot;tickMinStep&quot;: 1<br>  }<br>]</pre><p>To <strong>limit</strong> the legend values ​​(exclude an unused one, like data_cold), we could use the “<strong>values</strong>” attribute instead of “<strong>tickMinStep</strong>”.</p><pre>&quot;values&quot; : [ 1, 3, 4] // remove 2 (data_cold)</pre><p>The “<strong>fill</strong>” field is associated with a new scale (“<strong>color</strong>”), based on the “<strong>role_id</strong>” field and whose “<strong>range</strong>” will allow us to define the colors we want to use for our legend, for each value (we could have used predefined ranges).</p><pre>&quot;scales&quot;: [<br>  //...<br>  {<br>    &quot;name&quot;: &quot;color&quot;,<br>    &quot;type&quot;: &quot;linear&quot;,<br>    &quot;domain&quot;: {<br>      &quot;data&quot;: &quot;jvm&quot;, <br>      &quot;field&quot;: &quot;role_id&quot;, <br>      &quot;sort&quot;: {<br>        &quot;op&quot;: &quot;median&quot;, <br>        &quot;field&quot;: &quot;role_id&quot;, <br>        &quot;order&quot;: &quot;ascending&quot;<br>      }<br>    },<br>    &quot;range&quot;: [&quot;#c0392b&quot;, &quot;#f1c40f&quot;, &quot;#27ae60&quot;, &quot;#3498db&quot;]<br>  }<br>]</pre><h4>Legend labels</h4><p>Now that we have a good base for our legend, we will improve it by <strong>associating labels with the roles&#39; names to each sorted “role_id</strong>.” For this, we will need another “<strong>scale</strong>” to display our “<strong>role</strong>” field (“<strong>range</strong>”) according to the “<strong>role_id</strong>” defined in the “<strong>domain</strong>.”</p><pre>&quot;scales&quot;: [<br>  //...<br>  {<br>    &quot;name&quot;: &quot;scale_legend_values&quot;,<br>    &quot;type&quot;: &quot;ordinal&quot;,<br>    &quot;domain&quot;: {&quot;data&quot;: &quot;jvm&quot;, &quot;field&quot;: &quot;role_id&quot;},<br>    &quot;range&quot;: {&quot;data&quot;: &quot;jvm&quot;, &quot;field&quot;: &quot;role&quot;}<br>  }<br>]</pre><p>We need to update our legend to indicate that we want to display the role label defined in the scale created previously instead of the role value. To do this, we need to use a new property that we haven’t covered yet, namely “<strong>encode</strong>” which allows us to customize some properties, such as the title, labels, symbols, etc.</p><p>In our case, we, therefore, want to update the text of the “<strong>labels</strong>”, using the lookup table defined in the “<strong>scale_legend_values”</strong> ​​scale.</p><pre>&quot;legends&quot;: [<br>    {<br>      //...<br>      &quot;encode&quot;: {<br>        &quot;labels&quot;: {<br>          &quot;update&quot;: {<br>            &quot;text&quot;: {<br>              &quot;signal&quot;: &quot;scale(&#39;scale_legend_values&#39;, datum.value)&quot;<br>            }<br>          }<br>        }<br>      }<br>    }<br>  ]</pre><p>This implementation of the legends gives:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/234/1*Yq1p01BQEV9_JLIRbURnAw.png" /></figure><h3>Graphical Marks</h3><p>We will cover the last part of this article, which discusses how to display the JVM memory value by cluster (environment) and role (tier). The goal is to use a bar chart visualization. In Vega, marks are a visualization&#39;s basic visual building block, providing basic shapes whose properties can be set.</p><h4>Facetting by cluster</h4><p>To create our bar chart, we first need to define a grouping because we want to group our data by cluster (environment). This will be done by using the “<strong>group</strong>” brand type and creating a “<strong>facet</strong>” on the name of our clusters.</p><pre>&quot;marks&quot;: [<br>  {<br>    &quot;type&quot;: &quot;group&quot;,<br>    &quot;from&quot;: {<br>      &quot;facet&quot;: {<br>        &quot;data&quot;: &quot;jvm&quot;,<br>        &quot;name&quot;: &quot;facet&quot;,<br>        &quot;groupby&quot;: &quot;cluster_name&quot;<br>      }<br>    }<br>    // TODO<br>  }<br>]</pre><p>We also need to make each cluster (environment) have its range on the Y-axis, but this time, for the bar chart, hence a starting point for the value of “<strong>y</strong>” which is based on the same “<strong>yscale</strong>” defined in a previous section for the axis display.</p><pre>&quot;marks&quot;: [<br>  {<br>    &quot;type&quot;: &quot;group&quot;,<br>    &quot;from&quot;: {<br>      // ...<br>    },<br>    &quot;encode&quot;: {<br>      &quot;enter&quot;: {<br>        &quot;y&quot;: {&quot;scale&quot;: &quot;yscale&quot;, &quot;field&quot;: &quot;cluster_name&quot;}<br>      }<br>    }<br>    // TODO<br>  }<br>]</pre><p>The added lines of code allow us to properly display our clusters (environments) on the Y-axis (one band reserved by cluster).</p><h4>Scaling by role inside each cluster</h4><p>Inside each band reserved for our clusters, we must redefine a scale that will allow us to correctly place the bar associated with each role (defined by the field “<strong>role_id</strong>”). This scale will be of type “<strong>band</strong>” and as for the environments, we will use the “<strong>range</strong>” property and set it to “<strong>height</strong>” to have a band of a fixed height per role. However, this time, we need a height that depends on the “<strong>yscale</strong>” scale and allows us to divide the height allocated to each environment by the number of roles. Therefore, we must use the “signals” functionality, ie, dynamic variables that parameterize a visualization and can drive interactive behaviors.</p><pre>&quot;marks&quot;: [<br>  {<br>    &quot;type&quot;: &quot;group&quot;,<br>    &quot;from&quot;: {<br>      // ...<br>    },<br>    &quot;encode&quot;: {<br>      // ...<br>    },<br>    &quot;signals&quot;: [<br>      {<br>        &quot;name&quot;: &quot;height&quot;, <br>        &quot;update&quot;: &quot;bandwidth(&#39;yscale&#39;)&quot;<br>      }<br>    ],<br>    &quot;scales&quot;: [<br>      {<br>        &quot;name&quot;: &quot;role&quot;,<br>        &quot;type&quot;: &quot;band&quot;,<br>        &quot;range&quot;: &quot;height&quot;,<br>        &quot;domain&quot;: {&quot;data&quot;: &quot;facet&quot;, &quot;field&quot;: &quot;role_id&quot;, &quot;sort&quot;: true}<br>      }<br>    ]<br>    // TODO<br>  }<br>]</pre><h4>Displaying bars</h4><p>We can finally do what is necessary to display our bars representing JVM memory by cluster and role.</p><p>To do this, we will create “rect” type <strong>mark</strong> linked to the “facet” created previously. The following properties will need to be specified in the “encode” to be able to customize them.</p><ul><li>“<strong>x</strong>” and “<strong>x2</strong>” are used to define the <strong>min</strong> (0) and <strong>max</strong> (based on “<strong>total_jvm</strong>” field) for the length of the rectangle on the X-axis.</li><li>“<strong>y</strong>” allows us to define where to place the rectangle on the Y-axis, in relation to our “role” subscale, based on the “role_id” field.</li><li>“<strong>height</strong>” will indicate that the height of the rectangle will be the height of “<strong>1</strong>” band allocated for a role.</li><li>“<strong>fill</strong>” is used to use our “<strong>color</strong>” scale for each role.</li></ul><pre>&quot;marks&quot;: [<br>  {<br>    &quot;type&quot;: &quot;group&quot;,<br>    &quot;from&quot;: {<br>      // ...<br>    },<br>    &quot;encode&quot;: {<br>      // ...<br>    },<br>    &quot;signals&quot;: [<br>      // ...<br>    ],<br>    &quot;scales&quot;: [<br>      // ...<br>    ],<br>    &quot;marks&quot;: [<br>      {<br>        &quot;name&quot;: &quot;bars&quot;,<br>        &quot;from&quot;: {&quot;data&quot;: &quot;facet&quot;},<br>        &quot;type&quot;: &quot;rect&quot;,<br>        &quot;encode&quot;: {<br>          &quot;enter&quot;: {<br>            &quot;y&quot;: {&quot;scale&quot;: &quot;role&quot;, &quot;field&quot;: &quot;role_id&quot;},<br>            &quot;height&quot;: {&quot;scale&quot;: &quot;role&quot;, &quot;band&quot;: 1},  <br>            &quot;x2&quot;: {&quot;scale&quot;: &quot;xscale&quot;, &quot;field&quot;: &quot;total_jvm&quot;},<br>            &quot;x&quot;: {&quot;scale&quot;: &quot;xscale&quot;, &quot;value&quot;: 0},<br>            &quot;fill&quot;: {&quot;scale&quot;: &quot;color&quot;, &quot;field&quot;: &quot;role_id&quot;}<br>          }<br>        }<br>      }<br>      // TODO<br>    ]<br>  }<br>]</pre><p>This implementation of the “<strong>bars</strong>” mark gives:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DBvJumJXnUvx39_aOe59AA.png" /></figure><p>The last thing to set up is to display the JVM memory value to the right of the rectangle. To do this, we will create another <strong>mark</strong> of type “<strong>text</strong>” based on our previous mark “<strong>bars</strong>” to display our text relative to our bar and with the associated data.</p><p>This time, we will define the following properties:</p><ul><li>“<strong>x</strong>” will place us at the level of “<strong>x2</strong>” of the “<strong>bars</strong>”, to position ourselves to the right of our bars.</li><li>“<strong>y</strong>” allows us to indicate that we want to start from the “<strong>y</strong>” of our “<strong>bars</strong>” but with an “<strong>offset</strong>” allowing us to center on the “<strong>height</strong>” of the bar.</li><li>“<strong>fill</strong>” to use “<strong>black</strong>” as text color.</li><li>“<strong>align</strong>” and “<strong>middle</strong>” to center text.</li><li>“<strong>text</strong>” to set the text to the “<strong>total_jvm</strong>” value</li></ul><p>These last lines allow us to arrive at the final version of our visualization:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*EEZYfVOffzY3ZDu-W6AJfw.png" /></figure><p>We were thus able to <strong>create a step-by-step visualization</strong> with <strong>data from Elasticsearch from several data sources</strong>, and <strong>fully customized to best meet our needs</strong>.</p><p>This <strong>requires a little practice</strong> but the <strong>result is rather interesting</strong>. However, be careful: when you are looking for resources on the Internet, you can find some related to Vega and some others related to Vega-Lite. Kibana supports both, but some features are missing or different between the two.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=0e0a61c052f5" width="1" height="1" alt=""><hr><p><a href="https://medium.zenika.com/create-a-nice-bar-chart-in-kibana-vega-step-by-step-from-elasticsearch-data-0e0a61c052f5">Create a nice bar chart in Kibana Vega step by step from Elasticsearch data</a> was originally published in <a href="https://medium.zenika.com">Zenika</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Using transformations in Kibana Vega to adapt data from query DSL]]></title>
            <link>https://medium.zenika.com/using-transformations-in-kibana-vega-to-adapt-data-from-query-dsl-5118615f3415?source=rss-5fe6d48f202a------2</link>
            <guid isPermaLink="false">https://medium.com/p/5118615f3415</guid>
            <category><![CDATA[elasticsearch]]></category>
            <category><![CDATA[transformation]]></category>
            <category><![CDATA[kibana]]></category>
            <category><![CDATA[vegas]]></category>
            <dc:creator><![CDATA[Ingrid Jardillier]]></dc:creator>
            <pubDate>Wed, 15 Jan 2025 07:54:23 GMT</pubDate>
            <atom:updated>2025-01-15T07:54:23.946Z</atom:updated>
            <content:encoded><![CDATA[<p>When using Kibana to create visualizations, we often need to work on <strong>several indices/datastreams</strong> in order to retrieve all the data needed to build our visualization. However, this is <strong>not possible with traditional tools like Lens</strong> (the goal is not to create several layers with our different indices/datastreams but to aggregate them to make a single source containing all the necessary information).</p><h3>Kibana Custom visualization : Vega</h3><p>One solution is to use Kibana’s <strong>Custom visualizations</strong>, such as <strong>Vega</strong>, which allows you to use multiple data sources from static data or Query DSL queries. In the latter case, <strong>it can be difficult to exploit the result</strong>, the structure is not a simple table but a fairly complex JSON, especially when using aggregations.</p><p>In this article, we will see how to create <strong>several data sources</strong> in Vega, but above all how to <strong>transform</strong> them to have something simple as an output (table) containing all the relevant information that we want to display.</p><h3>Example based on Monitoring metrics</h3><p>To discuss the necessary steps, we will take an example based on data known by Elasticsearch users, namely the data contained in <strong>.monitoring-es-*</strong>. These indices contain all the metrics needed to monitor the different clusters managed by a team/company.</p><p>We will start with a simple example, namely, to sum up the total amount of JVM by tier, and this, for each cluster (environment) that we manage. These clusters are managed in Elastic Cloud and are therefore defined by a UUID.</p><p>The <strong>tiers</strong> are:</p><ul><li>Hot</li><li>Warm (not used in our case)</li><li>Cold</li><li>Frozen</li></ul><p>The managed <strong>environments</strong> are:</p><ul><li>Production</li><li>PréProduction (Pre-Production)</li><li>Recette (Staging)</li><li>Intégration (Integration)</li><li>Monitoring</li></ul><h3>Creating the main query</h3><h4>Defining the main Query DSL</h4><p>Our main query DSL will aim to retrieve the total JVM memory by cluster (cluster_uuid) and role, from the data in the node_stats dataset. A role can contain several nodes and the metric used to obtain the JVM is at the node level, so we must go down to the node level and then sum this JVM for all nodes in the same role, which gives:</p><pre>POST /.monitoring-es-*/_search<br>{<br>  &quot;size&quot;: 0,<br>  &quot;query&quot;: {<br>    &quot;bool&quot;: {<br>      &quot;filter&quot;: [<br>        {<br>          &quot;term&quot;: {<br>            &quot;event.dataset&quot;: &quot;elasticsearch.node.stats&quot;<br>          }<br>        }<br>      ]<br>    }<br>  }, <br>  &quot;aggs&quot;: {<br>    &quot;cluster&quot;: {<br>      &quot;terms&quot;: {<br>        &quot;field&quot;: &quot;cluster_uuid&quot;,<br>        &quot;size&quot;: 10<br>      },<br>      &quot;aggs&quot;: {<br>        &quot;role&quot;: {<br>          &quot;terms&quot;: { <br>            &quot;field&quot;: &quot;elasticsearch.node.roles&quot;,<br>            &quot;include&quot;: &quot;data_.*&quot;,<br>            &quot;exclude&quot;: &quot;data_content&quot;,<br>            &quot;min_doc_count&quot;: 0<br>          },<br>          &quot;aggs&quot;: {<br>            &quot;node&quot;: {<br>              &quot;terms&quot;: {<br>                &quot;field&quot;: &quot;elasticsearch.node.name&quot;,<br>                &quot;size&quot;: 10<br>              },<br>              &quot;aggs&quot;: {<br>                &quot;max_jvm&quot;: {<br>                  &quot;max&quot;: {<br>                    &quot;field&quot;: &quot;elasticsearch.node.stats.jvm.mem.heap.max.bytes&quot;<br>                  }<br>                }<br>              }<br>            },<br>            &quot;sum_jvm&quot;: {<br>              &quot;sum_bucket&quot;: {<br>                &quot;buckets_path&quot;: &quot;node&gt;max_jvm&quot; <br>              }<br>            }<br>          }<br>        }<br>      }<br>    }<br>  }<br>}</pre><p>To make it easier to read, we can filter the output by using the “<strong>filter_path</strong>” attribute:</p><pre>POST /.monitoring-es-*/_search?filter_path=aggregations,-**.doc_count,-**.doc_count_error_upper_bound,-**.sum_other_doc_count,-**.node</pre><p>This will only keep the data useful for the rest by removing any superfluous and intermediate fields in the calculation of the JVM by tier.</p><p>This query therefore gives in output:</p><pre>{<br>  &quot;aggregations&quot;: {<br>    &quot;cluster&quot;: {<br>      &quot;buckets&quot;: [<br>        {<br>          &quot;key&quot;: &quot;PDIJyZMaSQOtFB2LEz9kwA&quot;,<br>          &quot;role&quot;: {<br>            &quot;buckets&quot;: [<br>              {<br>                &quot;key&quot;: &quot;data_hot&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 47764733952<br>                }<br>              },<br>              {<br>                &quot;key&quot;: &quot;data_cold&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 31893487616<br>                }<br>              },<br>              {<br>                &quot;key&quot;: &quot;data_frozen&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 15921577984<br>                }<br>              }<br>            ]<br>          }<br>        },<br>        {<br>          &quot;key&quot;: &quot;fN5Y2U2HRsK1HKw1X_H77A&quot;,<br>          &quot;role&quot;: {<br>            &quot;buckets&quot;: [<br>              {<br>                &quot;key&quot;: &quot;data_hot&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 5876219904<br>                }<br>              },<br>              {<br>                &quot;key&quot;: &quot;data_cold&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 884998144<br>                }<br>              },<br>              {<br>                &quot;key&quot;: &quot;data_frozen&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 0<br>                }<br>              }<br>            ]<br>          }<br>        },<br>        {<br>          &quot;key&quot;: &quot;ARezM52EQhGoxHoFJrm1oA&quot;,<br>          &quot;role&quot;: {<br>            &quot;buckets&quot;: [<br>              {<br>                &quot;key&quot;: &quot;data_hot&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 5876219904<br>                }<br>              },<br>              {<br>                &quot;key&quot;: &quot;data_cold&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 0<br>                }<br>              },<br>              {<br>                &quot;key&quot;: &quot;data_frozen&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 0<br>                }<br>              }<br>            ]<br>          }<br>        },<br>        {<br>          &quot;key&quot;: &quot;OtrL3B00RsuFpcOVCAQ08Q&quot;,<br>          &quot;role&quot;: {<br>            &quot;buckets&quot;: [<br>              {<br>                &quot;key&quot;: &quot;data_hot&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 15728640000<br>                }<br>              },<br>              {<br>                &quot;key&quot;: &quot;data_cold&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 0<br>                }<br>              },<br>              {<br>                &quot;key&quot;: &quot;data_frozen&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 0<br>                }<br>              }<br>            ]<br>          }<br>        },<br>        {<br>          &quot;key&quot;: &quot;erIavvd3TG6naKWwGag2TQ&quot;,<br>          &quot;role&quot;: {<br>            &quot;buckets&quot;: [<br>              {<br>                &quot;key&quot;: &quot;data_hot&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 884998144<br>                }<br>              },<br>              {<br>                &quot;key&quot;: &quot;data_cold&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 884998144<br>                }<br>              },<br>              {<br>                &quot;key&quot;: &quot;data_frozen&quot;,<br>                &quot;sum_jvm&quot;: {<br>                  &quot;value&quot;: 0<br>                }<br>              }<br>            ]<br>          }<br>        }<br>      ]<br>    }<br>  }<br>}</pre><p>we realize that it is not easy to exploit the result JSON to create a visualization..</p><h4>Integrating the main query into a Vega datasource</h4><p>Let’s now integrate our Query DSL into a Vega datasource. We will therefore use the data section, respecting the Vega syntax:</p><ul><li>give a <strong>name</strong> to the datasource in order to be able to use it later in the creation of our visualization or debug it easily</li><li>in the <strong>url</strong> part, specify the query and the body with the associated parts of our Query DSL</li><li>set the <strong>format</strong> to make it easier to access fields in the resulting JSON</li><li>prepare the <strong>transform</strong> attribute for future transformations</li></ul><p>Which gives:</p><pre>{<br>      &quot;name&quot;: &quot;jvm&quot;,<br>      &quot;url&quot;: {<br>        &quot;%context%&quot;: true,<br>        &quot;%timefield%&quot;: &quot;@timestamp&quot;,<br>        &quot;index&quot;: &quot;.monitoring-es-*&quot;,<br>        &quot;query&quot;: {<br>          &quot;bool&quot;: {<br>            // ...<br>          }<br>        }, <br>        &quot;body&quot;: {<br>          &quot;aggs&quot;: {<br>            // ...<br>          },<br>          &quot;size&quot;: 0<br>        }<br>      }<br>      &quot;format&quot;: {&quot;property&quot;: &quot;aggregations.cluster.buckets&quot;}<br>      &quot;transform&quot;: [<br><br>      ]<br>    }</pre><h4>Debugging the data source in Vega</h4><p>Kibana provides very handy <strong>inspection and debugging tools</strong> for Vega. Everything is done through the <strong>Inspect</strong> button in the toolbar.</p><p>The inspection pane provides 2 distinct views: <br>- <strong>View: Requests</strong> to visualize requests, responses and statistics on DSL requests<br>- <strong>View: Vega debug</strong> to inspect datasources as they are implemented</p><p>It is the latter that will interest us:</p><figure><img alt="Debugging Vega" src="https://cdn-images-1.medium.com/max/957/1*kQAJ6owgIT_Ci3yJdbzTDg.png" /></figure><p>As we have set up a format, we see that it has been taken into account since we start to see the data from the sub-attribute “<strong>aggregations</strong> / <strong>cluster</strong> / <strong>buckets</strong>”. The first column displayed (“<strong>key</strong>”) is none other than the key of our highest level aggregation: “<strong>cluster</strong>”, therefore, the cluster_uuid. The second column indicates the number of documents that allowed the calculation of this aggregation and the last one is a JSON representing the value of our aggregation, i.e. the buckets from the sub-aggregations.</p><h3>Transforming the main data source to make it usable</h3><h4>Contextualization of cluster information</h4><p>The <strong>cluster UUID</strong> is a good starting point to know which cluster it is, but when displaying, we will prefer to have a <strong>cluster name</strong> which will provide the associated <strong>environment</strong> and perhaps also a clue allowing us to <strong>order our clusters</strong> according to an order of importance.</p><p>To do this, we will add a <strong>new static data source</strong> with the information allowing us to complete the missing information and make the link with our UUID:</p><pre>{<br>      &quot;name&quot;: &quot;clusters&quot;,<br>      &quot;values&quot;: [<br>        {&quot;uuid&quot;: &quot;erIavvd3TG6naKWwGag2TQ&quot;, &quot;id&quot;: 1, &quot;name&quot;: &quot;Monitoring&quot;}, <br>        {&quot;uuid&quot;: &quot;OtrL3B00RsuFpcOVCAQ08Q&quot;, &quot;id&quot;: 2, &quot;name&quot;: &quot;Recette&quot;}, <br>        {&quot;uuid&quot;: &quot;fN5Y2U2HRsK1HKw1X_H77A&quot;, &quot;id&quot;: 3, &quot;name&quot;: &quot;Intégration&quot;}, <br>        {&quot;uuid&quot;: &quot;ARezM52EQhGoxHoFJrm1oA&quot;, &quot;id&quot;: 4, &quot;name&quot;: &quot;PréProduction&quot;}, <br>        {&quot;uuid&quot;: &quot;PDIJyZMaSQOtFB2LEz9kwA&quot;, &quot;id&quot;: 5, &quot;name&quot;: &quot;Production&quot;}<br>      ]<br>}</pre><p>We will now integrate the fields that interest us directly into our initial data source (jvm), using a <strong>transformation</strong>. To do this, we will use the <strong>lookup</strong> transformation:</p><pre>{<br>    &quot;type&quot;: &quot;lookup&quot;,<br>    &quot;from&quot;: &quot;clusters&quot;,<br>    &quot;key&quot;: &quot;uuid&quot;,<br>    &quot;fields&quot;: [&quot;key&quot;],<br>    &quot;values&quot;: [&quot;id&quot;, &quot;name&quot;],<br>    &quot;as&quot;: [&quot;cluster_id&quot;, &quot;cluster_name&quot;]<br>}</pre><p>This transformation will use the “<strong>clusters</strong>” datasource, which key is the <strong>“uuid” </strong>field, make it correspond to the key of our current data source (“jvm”) which is the <strong>“key”</strong> field, and use this new data source to add the 2 fields named <strong>“id”</strong> and <strong>“name”</strong> to our current datasource with the specified names <strong>“cluster_id”</strong> and <strong>“cluster_name”</strong>.</p><p>At this step, the jvm dataset becomes the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2dRzV6tg1C2CT-ccZnbubw.png" /></figure><h4>Flattening tier data (roles)</h4><p>We will now <strong>simplify the representation</strong> of the roles associated with each environment (cluster) by flattening the data, therefore, creating one line per output role (and this for each cluster). This can be done very easily using a <strong>flatten</strong> transformation:</p><pre>{<br>  &quot;type&quot;: &quot;flatten&quot;, <br>  &quot;fields&quot;: [&quot;role.buckets&quot;],<br>  &quot;as&quot; : [&quot;role&quot;]<br>}</pre><p>We use the field buckets (resulting of the aggregation) of the role column to flatten the data and erase the previous value of the column as we keep the same name in the “<strong>as</strong>” parameter:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LOQzvEcpWo27XVghMEAo5Q.png" /></figure><p>The role colum now has this kind of value:</p><pre>{<br>  &quot;key&quot;: &quot;data_hot&quot;,<br>  &quot;doc_count&quot;: 25933,<br>  &quot;node&quot;: {<br>    &quot;doc_count_error_upper_bound&quot;: 0,<br>    &quot;sum_other_doc_count&quot;: 0,<br>    &quot;buckets&quot;: [<br>      {<br>        &quot;key&quot;: &quot;instance-0000000031&quot;,<br>        &quot;doc_count&quot;: 8649,<br>        &quot;max_jvm&quot;: {<br>          &quot;value&quot;: 15921577984<br>        }<br>      },<br>      {<br>        &quot;key&quot;: &quot;instance-0000000029&quot;,<br>        &quot;doc_count&quot;: 8646,<br>        &quot;max_jvm&quot;: {<br>          &quot;value&quot;: 15921577984<br>        }<br>      },<br>      {<br>        &quot;key&quot;: &quot;instance-0000000030&quot;,<br>        &quot;doc_count&quot;: 8638,<br>        &quot;max_jvm&quot;: {<br>          &quot;value&quot;: 15921577984<br>        }<br>      }<br>    ]<br>  },<br>  &quot;sum_jvm&quot;: {<br>    &quot;value&quot;: 47764733952<br>  }<br>}</pre><h4>Retrieving the JVM metric for each cluster / role</h4><p>Now that we have one line per cluster and role as output, we will be able to easily access the field that interests us from a metric point of view “<strong>sum_jvm</strong>” (at the end of each <strong>role</strong> value) and rework it to <strong>convert it from bytes to Gb</strong>.</p><p>To do such a conversion, we will use the <strong>formula</strong> transformation:</p><pre>{<br>  &quot;type&quot;: &quot;formula&quot;, <br>  &quot;as&quot;: &quot;total_jvm&quot;, <br>  &quot;expr&quot;: &quot;ceil(datum.role.sum_jvm.value / 1024 / 1024 / 1024)&quot;<br>}</pre><p>We create a new field named “total_jvm”, using the expression “expr”, ie, for each row, we use the current (“datum”) value field “role.sum_value.value”, convert it to Gb and rounded it, which now gives:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*oNp8V8PVoBVBg4X1qfpCaQ.png" /></figure><h4>Keep only role name in the role column</h4><p>As the remaining buckets per node in the role is just here to made it able to calculate the sum of JVM by role, we won’t use them further, so we can replace it by only the name of the role by also using a <strong>formula</strong> transformation:</p><pre>{<br>  &quot;type&quot;: &quot;formula&quot;, <br>  &quot;as&quot;: &quot;role&quot;, <br>  &quot;expr&quot;: &quot;datum.role.key&quot;<br>}</pre><p>This formula will take each line of results, and apply the expression, so set the role value to the current role.key field, is the <strong>role name</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*A5-gdjA9YAWJOrPD2wtGfw.png" /></figure><p>We now have a nice and usable output to be able to process it in a beautiful and easy Vega visualization.</p><h4>Latest improvement for the order of roles</h4><p>In the same way that we added a data source for clusters to easily order them in our future visualization, we will do the same for roles because we want to be able to order them in a <strong>logical order</strong>.</p><p>Let’s add a new data source for roles:</p><pre>{<br>  &quot;name&quot;: &quot;roles&quot;,<br>  &quot;values&quot;: [<br>    {&quot;id&quot;: 1, &quot;name&quot;: &quot;data_hot&quot;}, <br>    {&quot;id&quot;: 2, &quot;name&quot;: &quot;data_warm&quot;}, <br>    {&quot;id&quot;: 3, &quot;name&quot;: &quot;data_cold&quot;}, <br>    {&quot;id&quot;: 4, &quot;name&quot;: &quot;data_frozen&quot;}<br>  ]<br>}</pre><p>We then use the same kind of <strong>lookup</strong> transformation to add a new role_id field, with an ordering value:</p><pre>{<br>  &quot;type&quot;: &quot;lookup&quot;,<br>  &quot;from&quot;: &quot;roles&quot;,<br>  &quot;key&quot;: &quot;name&quot;,<br>  &quot;fields&quot;: [&quot;role&quot;],<br>  &quot;values&quot;: [&quot;id&quot;],<br>  &quot;as&quot;: [&quot;role_id&quot;]<br>}</pre><p>The final result is the following one:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*NMybMQ1H75plKaC8Eb97fA.png" /></figure><p>A little simpler to exploit than our initial JSON, right?</p><h3>Conclusion about transformations in Kibana Vega</h3><p>When using Vega in a Kibana context, ie using Query DSL to retrieve Elasticsearch data, the resulting output is not always easy to manipulate. Therefore, we may need to query multiple indices or merge information in other static data sources to get all information.</p><p><strong><em>In these cases, transformations are a good way to adapt the output to better exploit the data in visualizations.</em></strong></p><p>Take care that transformations are applied by Kibana, so on client side !</p><p>In a further article, we will create a Vega visualization based on the previous transformed data. Stay tune !</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5118615f3415" width="1" height="1" alt=""><hr><p><a href="https://medium.zenika.com/using-transformations-in-kibana-vega-to-adapt-data-from-query-dsl-5118615f3415">Using transformations in Kibana Vega to adapt data from query DSL</a> was originally published in <a href="https://medium.zenika.com">Zenika</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to test a Ruby filter in Logstash]]></title>
            <link>https://medium.zenika.com/how-to-test-a-ruby-filter-in-logstash-804551b4b5e9?source=rss-5fe6d48f202a------2</link>
            <guid isPermaLink="false">https://medium.com/p/804551b4b5e9</guid>
            <category><![CDATA[ruby]]></category>
            <category><![CDATA[elastic-stack]]></category>
            <category><![CDATA[testing]]></category>
            <category><![CDATA[elk]]></category>
            <category><![CDATA[logstash]]></category>
            <dc:creator><![CDATA[Ingrid Jardillier]]></dc:creator>
            <pubDate>Wed, 22 May 2024 07:51:44 GMT</pubDate>
            <atom:updated>2024-05-22T07:51:44.201Z</atom:updated>
            <content:encoded><![CDATA[<p>In a previous article, we’ve seen how to share code in Logstash and create a module, in a ruby filter. In this one, we’ll show how to test our filter in order to verify that the resulting events are the expected ones.</p><h3>About our previous code</h3><p>Just for memory, the code was the following:</p><pre>require &#39;./script/denormalized_by_prizes_utils.rb&#39;<br><br># The value of `params` is the value of the hash passed to `script_params` in the logstash configuration.<br>def register(params)<br>    @keep_original_event = params[&quot;keep_original_event&quot;]<br>end<br><br># The filter method receives an event and must return a list of events.<br># Dropping an event means not including it in the return array.<br># Creating new ones only requires you to add a new instance of LogStash::Event to the returned array.<br>def filter(event)<br><br>    items = Array.new<br><br>    # Keep original event if asked<br>    originalEvent = LogStash::Util::DenormalizationByPrizesHelper::getOriginalEvent(event, @keep_original_event);<br>    if not originalEvent.nil?<br>        items.push originalEvent<br>    end<br><br>    # Get prizes items (to denormalize)<br>    prizes = LogStash::Util::DenormalizationByPrizesHelper::getPrizes(event);<br>    if prizes.nil?<br>        return items<br>    end<br>   <br>    # Create a clone base event<br>    eventBase = LogStash::Util::DenormalizationByPrizesHelper::getEventBase(event);<br><br>    # Create one event by prize item with needed modification<br>    prizes.each { |prize| <br>        items.push LogStash::Util::DenormalizationByPrizesHelper::createEventForPrize(eventBase, prize);<br>    }<br><br>    return items;<br>end</pre><p>And the whole code of the denormalized_by_prizes_utils.rb:</p><pre>module LogStash::Util::DenormalizationByPrizesHelper<br>    include LogStash::Util::Loggable<br><br>    # Keep original event if asked<br>    def self.getOriginalEvent(event, keepOriginalEvent)<br>        logger.debug(&#39;keepOriginalEvent is :&#39; + keepOriginalEvent.to_s)<br>        if keepOriginalEvent.to_s == &#39;true&#39;<br>            event.set(&#39;[@metadata][_index]&#39;, &#39;prizes-original&#39;);<br>            return event;<br>        end<br>        return nil;<br>    end<br><br>    # Get prizes items (to denormalize)<br>    def self.getPrizes(event)<br>        prizes = event.get(&quot;prize&quot;);<br>        if prizes.nil?<br>            logger.warn(&quot;No prizes for event &quot; + event.to_s)<br>        end<br>        return prizes;<br>    end<br><br>    # Create a clone base event<br>    def self.getEventBase(event)<br>        eventBase = event.clone();<br>        eventBase.set(&#39;[@metadata][_index]&#39;, &#39;prizes-denormalized&#39;);<br>        eventBase.remove(&quot;prize&quot;);<br>        return eventBase;<br>    end<br><br>    # Create a clone event for current prize item with needed modification<br>    def self.createEventForPrize(eventBase, prize)<br>        eventPrize = eventBase.clone();<br>        # Copy each prize item value to prize object<br>        prize.each { |key,value|<br>            eventPrize.set(&quot;[prize][&quot; + key + &quot;]&quot;, value)<br>        }<br>        return eventPrize;<br>    end<br><br>end</pre><h3>Common syntax</h3><p>In this section, we will show how to write functional tests, checking that resulting events are the expected ones.</p><p>We can write one or more test cases, and for each test case, as many tests as needed. These tests should be written at the end of the ruby filter file, ie, our main file, containing the filter with the register / filter functions.</p><p>A filter test should follow a specific syntax:</p><pre>test &quot;Test case name&quot; do<br><br>    parameters do<br>    { <br>        # The parameters to pass to the filter<br>    }<br>    end<br>    <br>    in_event { <br>        # The event arriving in the filter process<br>    }<br><br>    # The tests with expect methods<br><br>end</pre><h3>Implement tests on our Ruby filter</h3><p>In our example, we implemented denormalization, so in our tests, we will verify that we have well denormalized our original event, in different cases (keeping original event or not, one prize or two prizes in the prize list for example).</p><h4>Test cases</h4><p>So, we need the four test cases as presented below:</p><pre>test &quot;Case 1: one prize in event / don&#39;t keep original event&quot; do<br><br>    parameters do<br>    { <br>        &quot;keep_original_event&quot; =&gt; false<br>    }<br>    end<br><br>    in_event { <br>        { <br>            &quot;id&quot;        =&gt; 1, <br>            &quot;firstname&quot; =&gt; &quot;Pierre&quot;, <br>            &quot;surname&quot;   =&gt; &quot;Curie&quot;,<br>            &quot;gender&quot;    =&gt; &quot;male&quot;,<br>            &quot;prize&quot;     =&gt; [<br>                {<br>                    &quot;year&quot; =&gt; 1903,<br>                    &quot;category&quot; =&gt; &quot;physics&quot;<br>                }<br>            ]<br>        } <br>    }<br><br>    # The tests with expect methods<br><br>end<br><br>test &quot;Case 2: one prize in event / keep original event&quot; do<br><br>    parameters do<br>    { <br>        &quot;keep_original_event&quot; =&gt; true<br>    }<br>    end<br><br>    in_event { <br>        { <br>            &quot;id&quot;        =&gt; 1, <br>            &quot;firstname&quot; =&gt; &quot;Pierre&quot;, <br>            &quot;surname&quot;   =&gt; &quot;Curie&quot;,<br>            &quot;gender&quot;    =&gt; &quot;male&quot;,<br>            &quot;prize&quot;     =&gt; [<br>                {<br>                    &quot;year&quot; =&gt; 1903,<br>                    &quot;category&quot; =&gt; &quot;physics&quot;<br>                }<br>            ]<br>        } <br>    }<br><br>    # The tests with expect methods<br><br>end<br><br>test &quot;Case 3: two prizes in event / don&#39;t keep original event&quot; do<br><br>    parameters do<br>    { <br>        &quot;keep_original_event&quot; =&gt; false<br>    }<br>    end<br><br>    in_event { <br>        { <br>            &quot;id&quot;        =&gt; 2, <br>            &quot;firstname&quot; =&gt; &quot;Marie&quot;, <br>            &quot;surname&quot;   =&gt; &quot;Curie&quot;,<br>            &quot;gender&quot;    =&gt; &quot;female&quot;,<br>            &quot;prize&quot;     =&gt; [<br>                {<br>                    &quot;year&quot; =&gt; 1903,<br>                    &quot;category&quot; =&gt; &quot;physics&quot;<br>                },<br>                {<br>                    &quot;year&quot; =&gt; 1911,<br>                    &quot;category&quot; =&gt; &quot;chemistry&quot;<br>                }<br>            ]<br>        } <br>    }<br><br>    # The tests with expect methods<br><br>end<br><br>test &quot;Case 4: two prizes in event / keep original event&quot; do<br><br>    parameters do<br>    { <br>        &quot;keep_original_event&quot; =&gt; true<br>    }<br>    end<br><br>    in_event { <br>        { <br>            &quot;id&quot;        =&gt; 2, <br>            &quot;firstname&quot; =&gt; &quot;Marie&quot;, <br>            &quot;surname&quot;   =&gt; &quot;Curie&quot;,<br>            &quot;gender&quot;    =&gt; &quot;female&quot;,<br>            &quot;prize&quot;     =&gt; [<br>                {<br>                    &quot;year&quot; =&gt; 1903,<br>                    &quot;category&quot; =&gt; &quot;physics&quot;<br>                },<br>                {<br>                    &quot;year&quot; =&gt; 1911,<br>                    &quot;category&quot; =&gt; &quot;chemistry&quot;<br>                }<br>            ]<br>        } <br>    }<br><br>    # The tests with expect methods<br><br>end</pre><h4>Functional test implementation</h4><p>In this article, we will only implement the more complex test case (the last one). For the other ones, the principle is the same, but as we test different test cases, expected results won’t be the same.</p><p>So, for the last test case, we will check that:</p><ul><li>The original is indeed in the output, without any changes</li><li>Each item in the “prize” array will generate one document, so two items must generate two documents</li><li>Each generated item contains the good common fields and the good prize fields</li><li>So we will have 3 events in the output, each one in dedicated index</li></ul><p>Our test can be thus written:</p><pre>test &quot;Case 4: two prizes in event / keep original event&quot; do<br><br>    parameters do<br>    { <br>        &quot;keep_original_event&quot; =&gt; true<br>    }<br>    end<br><br>    in_event { <br>        { <br>            &quot;id&quot;        =&gt; 2, <br>            &quot;firstname&quot; =&gt; &quot;Marie&quot;, <br>            &quot;surname&quot;   =&gt; &quot;Curie&quot;,<br>            &quot;gender&quot;    =&gt; &quot;female&quot;,<br>            &quot;prize&quot;     =&gt; [<br>                {<br>                    &quot;year&quot; =&gt; 1903,<br>                    &quot;category&quot; =&gt; &quot;physics&quot;<br>                },<br>                {<br>                    &quot;year&quot; =&gt; 1911,<br>                    &quot;category&quot; =&gt; &quot;chemistry&quot;<br>                }<br>            ]<br>        } <br>    }<br><br>    expect(&quot;Count of events&quot;) do |events|<br>        events.length == 3<br>    end<br><br>    expect(&quot;Each event has same shared fields&quot;) do |events|<br>        result = true<br>        events.each { |event|<br>            result &amp;= event.get(&quot;[id]&quot;) == 2<br>            result &amp;= event.get(&quot;[firstname]&quot;) == &quot;Marie&quot;<br>            result &amp;= event.get(&quot;[surname]&quot;) == &quot;Curie&quot;<br>            result &amp;= event.get(&quot;[gender]&quot;) == &quot;female&quot;<br>        }<br>        result<br>    end<br><br>    expect(&quot;Each event has good _index&quot;) do |events|  <br>        result = true<br>        result &amp;= events[0].get(&quot;[@metadata][_index]&quot;) == &quot;prizes-original&quot;<br>        result &amp;= events[1].get(&quot;[@metadata][_index]&quot;) == &quot;prizes-denormalized&quot;<br>        result &amp;= events[2].get(&quot;[@metadata][_index]&quot;) == &quot;prizes-denormalized&quot;<br>        result<br>    end<br><br>    expect(&quot;Each event has good prize fields&quot;) do |events| <br>        result = true <br>        result &amp;= events[0].get(&quot;[prize][0][year]&quot;) == 1903<br>        result &amp;= events[0].get(&quot;[prize][0][category]&quot;) == &quot;physics&quot;<br>        result &amp;= events[0].get(&quot;[prize][1][year]&quot;) == 1911<br>        result &amp;= events[0].get(&quot;[prize][1][category]&quot;) == &quot;chemistry&quot;<br>        result &amp;= events[1].get(&quot;[prize][year]&quot;) == 1903<br>        result &amp;= events[1].get(&quot;[prize][category]&quot;) == &quot;physics&quot;<br>        result &amp;= events[2].get(&quot;[prize][year]&quot;) == 1911<br>        result &amp;= events[2].get(&quot;[prize][category]&quot;) == &quot;chemistry&quot;<br>        result<br>    end<br><br>end</pre><p><strong>Take care on syntax when the expect method has multiple assertion, you should use the &amp;&amp; or &amp;= operator to combine assertions result.</strong></p><p>Our test case implementation is ready. You should be advised that all test cases are run on Logstash startup, when corresponding pipeline is created. Indeed, Logstash is able to discover all tests written in ruby filters. And you will see all test results in the Logstash logs.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*lyls1iqvfPQNNLRUMgOjYg.png" /><figcaption>All tests passed</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_DfdZ1h4FnJOJIESRyGqDA.png" /><figcaption>One test failed</figcaption></figure><p>In case a test failed, you will see it clearly, with test case name and all information needed (parameters, in_event, results). If at least one test fails, associated pipeline won’t start.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=804551b4b5e9" width="1" height="1" alt=""><hr><p><a href="https://medium.zenika.com/how-to-test-a-ruby-filter-in-logstash-804551b4b5e9">How to test a Ruby filter in Logstash</a> was originally published in <a href="https://medium.zenika.com">Zenika</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to share Ruby code in Logstash]]></title>
            <link>https://medium.zenika.com/how-to-share-ruby-code-in-logstash-8d772ee42569?source=rss-5fe6d48f202a------2</link>
            <guid isPermaLink="false">https://medium.com/p/8d772ee42569</guid>
            <category><![CDATA[logstash]]></category>
            <category><![CDATA[ruby]]></category>
            <category><![CDATA[elastic-stack]]></category>
            <category><![CDATA[elk]]></category>
            <category><![CDATA[elasticsearch]]></category>
            <dc:creator><![CDATA[Ingrid Jardillier]]></dc:creator>
            <pubDate>Wed, 22 May 2024 07:27:28 GMT</pubDate>
            <atom:updated>2024-05-22T07:27:28.896Z</atom:updated>
            <content:encoded><![CDATA[<p>In the <a href="https://medium.com/@ingrid.jardillier/logstash-denormalize-documents-part-3-4de6b99ca278">previous article</a>, we’ve seen how to denormalize documents, by writing a ruby filter. In this one, we’ll show how to improve our code and potentially share it between filters.</p><h3>About our previous code</h3><p>For memory, the code was the following:</p><pre># The value of `params` is the value of the hash passed to `script_params` <br># in the logstash configuration.<br>def register(params)<br>    @keep_original_event = params[&quot;keep_original_event&quot;]<br>end<br><br># The filter method receives an event and must return a list of events.<br># Dropping an event means not including it in the return array.<br># Creating new ones only requires you to add a new instance of LogStash::Event to the returned array.<br>def filter(event)<br><br>    items = Array.new<br><br>    # Keep original event if asked<br>    logger.debug(&#39;keep_original_event is :&#39; + @keep_original_event.to_s)<br><br>    if @keep_original_event.to_s == &#39;true&#39;<br>        event.set(&#39;[@metadata][_index]&#39;, &#39;prizes-original&#39;);<br>        items.push event<br>    end<br><br>    # Get prizes items (to denormalize)<br>    prizes = event.get(&quot;prize&quot;);<br>    if prizes.nil?<br>        logger.warn(&quot;No prizes for event &quot; + event.to_s)<br>        return items<br>    end<br>   <br>    # Create a clone base event<br>    eventBase = event.clone();<br>    eventBase.set(&#39;[@metadata][_index]&#39;, &#39;prizes-denormalized&#39;);<br>    eventBase.remove(&quot;prize&quot;);<br><br>    # Create one event by prize item with needed modification<br>    prizes.each { |prize| <br>        eventPrize = eventBase.clone();<br><br>        # Copy each prize item value to prize object<br>        prize.each { |key,value|<br>            eventPrize.set(&quot;[prize][&quot; + key + &quot;]&quot;, value)<br>        }<br><br>        items.push eventPrize<br>    }<br><br>    return items<br>end</pre><p>As we can see, we only have 2 functions, the <strong>register</strong> one to describe parameters and the other to implement the <strong>filter</strong>’s feature. <em>But implementing the whole feature in only one method is not the best choice for many reasons: readability, maintainability, testability, …</em></p><h3>Sharing Ruby code</h3><p>A first way to share code is to externalize some functions in another Ruby file and call these functions in our Ruby filter.</p><p>For example, we can externalize some piece of code in simple functions:</p><ul><li>One to get original event (the current event if we want to keep it)</li><li>One to get prizes array from event</li><li>One to construct the event base (which will be cloned for each prize)</li><li>One to create each event for each prize</li></ul><pre># Keep original event if asked<br>def getOriginalEvent(event)<br>    logger.debug(&#39;keep_original_event is :&#39; + @keep_original_event.to_s)<br>    if @keep_original_event.to_s == &#39;true&#39;<br>        event.set(&#39;[@metadata][_index]&#39;, &#39;prizes-original&#39;);<br>        return event;<br>    end<br>    return nil;<br>end<br><br># Get prizes items (to denormalize)<br>def getPrizes(event)<br>    prizes = event.get(&quot;prize&quot;);<br>    if prizes.nil?<br>        logger.warn(&quot;No prizes for event &quot; + event.to_s)<br>    end<br>    return prizes;<br>end<br><br># Create a clone base event<br>def getEventBase(event)<br>    eventBase = event.clone();<br>    eventBase.set(&#39;[@metadata][_index]&#39;, &#39;prizes-denormalized&#39;);<br>    eventBase.remove(&quot;prize&quot;);<br>    return eventBase;<br>end<br><br># Create a clone event for current prize item with needed modification<br>def createEventForPrize(eventBase, prize)<br>    eventPrize = eventBase.clone();<br>    # Copy each prize item value to prize object<br>    prize.each { |key,value|<br>        eventPrize.set(&quot;[prize][&quot; + key + &quot;]&quot;, value)<br>    }<br>    return eventPrize;<br>end</pre><p>The previous code is written in a file named<em> denormalized_by_prizes_utils.rb.</em></p><p>The main code of the filter will then be the following:</p><pre>require &#39;./script/denormalized_by_prizes_utils.rb&#39;<br><br># The value of `params` is the value of the hash passed to `script_params` in the logstash configuration.<br>def register(params)<br>    @keep_original_event = params[&quot;keep_original_event&quot;]<br>end<br><br># The filter method receives an event and must return a list of events.<br># Dropping an event means not including it in the return array.<br># Creating new ones only requires you to add a new instance of LogStash::Event to the returned array.<br>def filter(event)<br><br>    items = Array.new<br><br>    # Keep original event if asked<br>    originalEvent = getOriginalEvent(event);<br>    if not originalEvent.nil?<br>        items.push originalEvent<br>    end<br><br>    # Get prizes items (to denormalize)<br>    prizes = getPrizes(event);<br>    if prizes.nil?<br>        return items<br>    end<br>   <br>    # Create a clone base event<br>    eventBase = getEventBase(event);<br><br>    # Create one event by prize item with needed modification<br>    prizes.each { |prize| <br>        items.push createEventForPrize(eventBase, prize);<br>    }<br><br>    return items;<br>end</pre><p>The code is much easier to read than the previous and we directly see the different steps of the filter ‘s feature. Maintanability will be improved with small functions, well cut and easy to understand.</p><p>But in some cases, if you have multiple files sharing code and a filter requiring multiple files, we can have some collisions or even partially decrease this maintainability.</p><h3>Creating a module</h3><p>Another way to share code is to create a module. This module will group some pieces of code of a same functional scope. No collision will be possible as we need to indicate the module name before each use of shared function.</p><p>The previous shared functions will become:</p><pre>module LogStash::Util::DenormalizationByPrizesHelper<br>    include LogStash::Util::Loggable<br><br>    # Keep original event if asked<br>    def self.getOriginalEvent(event, keepOriginalEvent)<br>        logger.debug(&#39;keepOriginalEvent is :&#39; + keepOriginalEvent.to_s)<br>        if keepOriginalEvent.to_s == &#39;true&#39;<br>            event.set(&#39;[@metadata][_index]&#39;, &#39;prizes-original&#39;);<br>            return event;<br>        end<br>        return nil;<br>    end<br><br>    # Get prizes items (to denormalize)<br>    def self.getPrizes(event)<br>        prizes = event.get(&quot;prize&quot;);<br>        if prizes.nil?<br>            logger.warn(&quot;No prizes for event &quot; + event.to_s)<br>        end<br>        return prizes;<br>    end<br><br>    # Create a clone base event<br>    def self.getEventBase(event)<br>        eventBase = event.clone();<br>        eventBase.set(&#39;[@metadata][_index]&#39;, &#39;prizes-denormalized&#39;);<br>        eventBase.remove(&quot;prize&quot;);<br>        return eventBase;<br>    end<br><br>    # Create a clone event for current prize item with needed modification<br>    def self.createEventForPrize(eventBase, prize)<br>        eventPrize = eventBase.clone();<br>        # Copy each prize item value to prize object<br>        prize.each { |key,value|<br>            eventPrize.set(&quot;[prize][&quot; + key + &quot;]&quot;, value)<br>        }<br>        return eventPrize;<br>    end<br><br>end</pre><p>We need to include the <strong>Loggable</strong> Util module to be able to use the <em>logger</em> instance.</p><p>The main code will the be:</p><pre>require &#39;./script/denormalized_by_prizes_utils.rb&#39;<br><br># The value of `params` is the value of the hash passed to `script_params` in the logstash configuration.<br>def register(params)<br>    @keep_original_event = params[&quot;keep_original_event&quot;]<br>end<br><br># The filter method receives an event and must return a list of events.<br># Dropping an event means not including it in the return array.<br># Creating new ones only requires you to add a new instance of LogStash::Event to the returned array.<br>def filter(event)<br><br>    items = Array.new<br><br>    # Keep original event if asked<br>    originalEvent = LogStash::Util::DenormalizationByPrizesHelper::getOriginalEvent(event, @keep_original_event);<br>    if not originalEvent.nil?<br>        items.push originalEvent<br>    end<br><br>    # Get prizes items (to denormalize)<br>    prizes = LogStash::Util::DenormalizationByPrizesHelper::getPrizes(event);<br>    if prizes.nil?<br>        return items<br>    end<br>   <br>    # Create a clone base event<br>    eventBase = LogStash::Util::DenormalizationByPrizesHelper::getEventBase(event);<br><br>    # Create one event by prize item with needed modification<br>    prizes.each { |prize| <br>        items.push LogStash::Util::DenormalizationByPrizesHelper::createEventForPrize(eventBase, prize);<br>    }<br><br>    return items;<br>end</pre><p>It does not need a lot of modification in the main code, only prefixing the function calls with the module name. So, if you have multiple functions named getEventBase or whatelse in different modules integrated in your filter’s feature, you will be able to do it without collisions and readability is improved because you explicitly set the module to be used in each case.</p><p>In a future article, we will speak about testing our filter’s code…</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8d772ee42569" width="1" height="1" alt=""><hr><p><a href="https://medium.zenika.com/how-to-share-ruby-code-in-logstash-8d772ee42569">How to share Ruby code in Logstash</a> was originally published in <a href="https://medium.zenika.com">Zenika</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Logstash — Denormalize documents (Part 3)]]></title>
            <link>https://medium.zenika.com/logstash-denormalize-documents-part-3-4de6b99ca278?source=rss-5fe6d48f202a------2</link>
            <guid isPermaLink="false">https://medium.com/p/4de6b99ca278</guid>
            <category><![CDATA[elastic]]></category>
            <category><![CDATA[elastic-stack]]></category>
            <category><![CDATA[ruby]]></category>
            <category><![CDATA[denormalization]]></category>
            <category><![CDATA[logstash]]></category>
            <dc:creator><![CDATA[Ingrid Jardillier]]></dc:creator>
            <pubDate>Thu, 02 May 2024 07:00:16 GMT</pubDate>
            <atom:updated>2025-01-09T08:32:33.581Z</atom:updated>
            <content:encoded><![CDATA[<h3>Logstash — Denormalizing documents — Implementation</h3><p>As we saw in a <a href="https://medium.com/@ingrid.jardillier/logstash-denormalize-documents-part-2-ab1068eb8228">previous article</a>, one of the solution to improve exploitation of documents with arrays is to use Logstash to denormalize documents. In this article, we will implement this denormalization for a simple example.</p><h3>Principle</h3><p>We spoke a lot about <strong>denormalization</strong> but what does it mean in our case?</p><p>As we saw in previous articles, the default ingestion of our JSON objects result in prize’s fieds as arrays.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*RK5-EK0t_y6X0JAph4oG4g.png" /><figcaption>prizes-original index</figcaption></figure><p>Denormalization process will <strong>clone</strong> existing documents with multiple prizes and flat the prize’s fields. So, for the document with id 2, it will create 2 documents, one for 1903 physics prize and one for 1911 chemistry one.</p><p>The result will be the following one:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qC9MNGNjNKuWyWZxqcLyRg.png" /><figcaption>prizes-denormalized index</figcaption></figure><h3><strong>Implementation</strong></h3><p>To implement our denormalization, we just have to change our logstash configuration to add a ruby filter, which will process the denormalization.</p><p>And, as we want to keep the two types of documents (original and denormalized), we will set the index name in the @metadata object and use it in the elasticsearch output. And we’ll use the <em>keep_original_event</em> boolean parameter to indicate if we want to keep the original document or not.</p><pre>input {<br>  file {<br>    id =&gt; &quot;prizes&quot;<br>    path =&gt; &quot;/usr/share/logstash/pipeline/file/prizes.json&quot;<br>    mode =&gt; &quot;read&quot;<br>    codec =&gt; &quot;json&quot;<br>    start_position =&gt; &quot;beginning&quot;<br>    sincedb_path =&gt; &quot;/dev/null&quot;<br>  }<br>}<br><br>filter {<br> json {<br>  source =&gt; message<br>  remove_field =&gt; message<br> }<br> mutate {<br>  remove_field =&gt; [&quot;@timestamp&quot;, &quot;@version&quot;, &quot;event&quot;, &quot;host&quot;, &quot;log&quot;]<br> }<br>}<br><br>filter {<br> ruby {<br>        id =&gt; &quot;denormalized-by-prizes&quot;<br>        path =&gt; &quot;/usr/share/logstash/pipeline/file/denormalized_by_prizes.rb&quot;<br>        script_params =&gt; {<br>          &quot;keep_original_event&quot; =&gt; true<br>        }<br> }<br> mutate {<br>  remove_field =&gt; [&quot;@timestamp&quot;, &quot;@version&quot;, &quot;event&quot;, &quot;host&quot;, &quot;log&quot;]<br>        <br> }<br>}<br><br>output {<br> stdout {<br>  codec =&gt; rubydebug { metadata =&gt; true }<br> }<br>}<br><br>output {<br> elasticsearch {<br>  index =&gt; &quot;%{[@metadata][_index]}&quot;<br>  hosts =&gt; [&quot;https://es01:9200&quot;,&quot;https://es02:9200&quot;,&quot;https://es03:9200&quot;]<br>  ssl_certificate_authorities =&gt; [&quot;/usr/share/logstash/certs/ca/ca.crt&quot;]<br>  user =&gt; &quot;elastic&quot;<br>  password =&gt; &quot;${ELASTIC_PASSWORD}&quot;<br> }<br>}</pre><p>The code of the ruby plugin is then:</p><pre># The value of `params` is the value of the hash passed to `script_params` <br># in the logstash configuration.<br>def register(params)<br>    @keep_original_event = params[&quot;keep_original_event&quot;]<br>end<br><br># The filter method receives an event and must return a list of events.<br># Dropping an event means not including it in the return array.<br># Creating new ones only requires you to add a new instance of LogStash::Event to the returned array.<br>def filter(event)<br><br>    items = Array.new<br><br>    # Keep original event if asked<br>    logger.debug(&#39;keep_original_event is :&#39; + @keep_original_event.to_s)<br><br>    if @keep_original_event.to_s == &#39;true&#39;<br>        event.set(&#39;[@metadata][_index]&#39;, &#39;prizes-original&#39;);<br>        items.push event<br>    end<br><br>    # Get prizes items (to denormalize)<br>    prizes = event.get(&quot;prize&quot;);<br>    if prizes.nil?<br>        logger.warn(&quot;No prizes for event &quot; + event.to_s)<br>        return items<br>    end<br>   <br>    # Create a clone base event<br>    eventBase = event.clone();<br>    eventBase.set(&#39;[@metadata][_index]&#39;, &#39;prizes-denormalized&#39;);<br>    eventBase.remove(&quot;prize&quot;);<br><br>    # Create one event by prize item with needed modification<br>    prizes.each { |prize| <br>        eventPrize = eventBase.clone();<br><br>        # Copy each prize item value to prize object<br>        prize.each { |key,value|<br>            eventPrize.set(&quot;[prize][&quot; + key + &quot;]&quot;, value)<br>        }<br><br>        items.push eventPrize<br>    }<br><br>    return items<br>end</pre><p>In this filter, the principle is the following:</p><ul><li>we create an <em>items</em> array that will contain all documents that we want to have in the output (the original one if the <em>keep_original_event</em> is set to true and the denormalized ones).</li><li>we keep in memory the prizes object of the current event.</li><li>We create a clone base event. This step is optional if events are lights (all can be done in the each loop), but can be better for heavy events for performance considerations).</li><li>We loop on <em>prize</em> object, clones the base event and set all the prize’s field in a <em>prize</em> object. We then push the cloned event.</li></ul><h3>Querying on this field</h3><p>Now, when we add a KQL filter, as seen in the previous part, but this time on the <em>prizes-denormalized</em> index:</p><pre>prize.year : 1903 and prize.category : &quot;chemistry&quot; </pre><p>This doesn’t return any result, as expected!</p><p>We have to use a relevant filter to obtain results, for example:</p><pre>prize.year : 1903 and prize.category : &quot;physics&quot; </pre><p>will return:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*raO36wRtf-B3cZLBFTmD-Q.png" /></figure><p><strong>Warning</strong>:<strong><em> Be advised that cloning events can be an expensive process. You will have to add performance tests to check that event process duration are conforms to your needs.</em></strong></p><p>In future article, we will show how to improve our code for readability and how to test our filter.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4de6b99ca278" width="1" height="1" alt=""><hr><p><a href="https://medium.zenika.com/logstash-denormalize-documents-part-3-4de6b99ca278">Logstash — Denormalize documents (Part 3)</a> was originally published in <a href="https://medium.zenika.com">Zenika</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Logstash — Denormalize documents (Part 2)]]></title>
            <link>https://medium.zenika.com/logstash-denormalize-documents-part-2-ab1068eb8228?source=rss-5fe6d48f202a------2</link>
            <guid isPermaLink="false">https://medium.com/p/ab1068eb8228</guid>
            <category><![CDATA[elastic]]></category>
            <category><![CDATA[elastic-stack]]></category>
            <category><![CDATA[ruby]]></category>
            <category><![CDATA[logstash]]></category>
            <category><![CDATA[denormalization]]></category>
            <dc:creator><![CDATA[Ingrid Jardillier]]></dc:creator>
            <pubDate>Thu, 02 May 2024 06:58:24 GMT</pubDate>
            <atom:updated>2025-01-09T08:35:37.681Z</atom:updated>
            <content:encoded><![CDATA[<h3>Logstash — Denormalizing documents — In which case to use it</h3><p>In this article, we will use the previous example (described in a <a href="https://medium.com/@ingrid.jardillier/logstash-denormalize-documents-part-1-aa674dab6c1d">previous article</a>) to expose the problematic of not using denormalization.</p><h3>Problematic</h3><p>If you create the <em>prizes-*</em> data view and go to the Discover App, you can have a look at our ingested data:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JFC87sQZiFuNyuQmSeM6gA.png" /><figcaption>Documents in prizes-original</figcaption></figure><p>We can see that the JSON object with two prizes is rendered there with <em>prize.year</em> and <em>prize.category</em> as <strong>arrays</strong>.</p><p>If you want to look for prizes in 1903 for the “chemistry” category, you can add a query (using the KQL syntax):</p><pre>prize.year : 1903 and prize.category : &quot;chemistry&quot;</pre><p>This request returns a single result:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ouB2LZpaicpU1qCyChUmPA.png" /></figure><p>But if we check our original JSON file, we had this:</p><pre>[<br>  {<br>    &quot;year&quot; : 1903,<br>    &quot;category&quot; : &quot;physics&quot;<br>  },<br>  {<br>    &quot;year&quot; : 1911,<br>    &quot;category&quot;:&quot;chemistry&quot;<br>  }<br>]</pre><p>So the prize got in 1903 was for “physics” an the one in 1911 for “chemistry”.</p><p><strong>So, when we query for year 1903 and category “chemistry”, we should not obtain any result!</strong></p><p><strong>But Elasticsearch doesn’t keep link of the different items indices in arrays.</strong></p><p>For Elasticsearch, the field <em>prize.year</em> contains 1903 and the field <em>prize.category</em> contains “chemistry”, so the document matches the query.</p><h3>Resolution</h3><p>On method to resolve this problematic is to use the <strong>nested</strong> type but there are some limitations and it can easily break down <strong>performance</strong> and it is <strong>not fully implemented in Kibana</strong> so only interesting if you are using Elasticsearch API to query your documents.</p><p>The second method is to <strong>denormalize</strong> documents in order to create one document per prize. This can be done with <strong>Logstash</strong> and that is what we will describe in our next article.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ab1068eb8228" width="1" height="1" alt=""><hr><p><a href="https://medium.zenika.com/logstash-denormalize-documents-part-2-ab1068eb8228">Logstash — Denormalize documents (Part 2)</a> was originally published in <a href="https://medium.zenika.com">Zenika</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Logstash — Denormalize documents (Part 1)]]></title>
            <link>https://medium.zenika.com/logstash-denormalize-documents-part-1-aa674dab6c1d?source=rss-5fe6d48f202a------2</link>
            <guid isPermaLink="false">https://medium.com/p/aa674dab6c1d</guid>
            <category><![CDATA[denormalization]]></category>
            <category><![CDATA[elastic]]></category>
            <category><![CDATA[elastic-stack]]></category>
            <category><![CDATA[logstash]]></category>
            <category><![CDATA[ruby]]></category>
            <dc:creator><![CDATA[Ingrid Jardillier]]></dc:creator>
            <pubDate>Thu, 02 May 2024 06:55:36 GMT</pubDate>
            <atom:updated>2025-01-09T08:34:06.225Z</atom:updated>
            <content:encoded><![CDATA[<h3>Logstash — Denormalizing documents — The concept</h3><p>In this article, we will take a simple example which highlights the need of denormalization.</p><p>Feel free to use my ELK docker compose to reproduce this example: <a href="https://github.com/ijardillier/docker-elk">https://github.com/ijardillier/docker-elk</a></p><h3>Why denormalization?</h3><p>When we ingest data, we may need to transform them in order to be full usable and relevant. <strong>Denormalization is a way of creating as many documents as items in an array field</strong>. By this way, we will <strong>improve querying</strong> on this flattened field.</p><p>That’s what we will explain in the different parts of this article.</p><h3>Simple example</h3><h4>Index template</h4><p>This index template will be used to store prizes with a few fields, just to understand what happens without denormalization:</p><pre>POST _index_template/prizes<br>{<br>  &quot;index_patterns&quot;: [&quot;prizes-*&quot;],<br>  &quot;template&quot;: {<br>    &quot;mappings&quot;: {<br>      &quot;properties&quot;: {<br>        &quot;id&quot;: {<br>          &quot;type&quot;: &quot;long&quot;<br>        },<br>        &quot;firstname&quot;: {<br>          &quot;type&quot;: &quot;keyword&quot;,<br>          &quot;ignore_above&quot;: 256<br>        },<br>        &quot;surname&quot;: {<br>          &quot;type&quot;: &quot;keyword&quot;,<br>          &quot;ignore_above&quot;: 256<br>        },<br>        &quot;gender&quot;: {<br>          &quot;type&quot;: &quot;keyword&quot;,<br>          &quot;ignore_above&quot;: 256<br>        },<br>        &quot;prize&quot;: {<br>          &quot;properties&quot;: {<br>            &quot;category&quot;: {<br>              &quot;type&quot;: &quot;keyword&quot;,<br>              &quot;ignore_above&quot;: 256<br>            },<br>            &quot;year&quot;: {<br>              &quot;type&quot;: &quot;integer&quot;<br>            }<br>          }<br>        }<br>      }<br>    }<br>  }<br>}</pre><h4>Data</h4><p>We have the following prizes.json file containing all our prizes:</p><pre>{&quot;id&quot;:1,&quot;firstname&quot;:&quot;Pierre&quot;,&quot;surname&quot;:&quot;Curie&quot;,&quot;gender&quot;:&quot;male&quot;,&quot;prize&quot;:[{&quot;year&quot;:1903,&quot;category&quot;:&quot;physics&quot;}]}<br>{&quot;id&quot;:2,&quot;firstname&quot;:&quot;Marie&quot;,&quot;surname&quot;:&quot;Curie&quot;,&quot;gender&quot;:&quot;female&quot;,&quot;prize&quot;:[{&quot;year&quot;:1903,&quot;category&quot;:&quot;physics&quot;},{&quot;year&quot;:1911,&quot;category&quot;:&quot;chemistry&quot;}]}<br>{&quot;id&quot;:3,&quot;firstname&quot;:&quot;Frédéric&quot;,&quot;surname&quot;:&quot;Joliot&quot;,&quot;gender&quot;:&quot;male&quot;,&quot;prize&quot;:[{&quot;year&quot;:1935,&quot;category&quot;:&quot;chemistry&quot;}]}<br>{&quot;id&quot;:4,&quot;firstname&quot;:&quot;Irène&quot;,&quot;surname&quot;:&quot;Joliot-Curie&quot;,&quot;gender&quot;:&quot;female&quot;,&quot;prize&quot;:[{&quot;year&quot;:1935,&quot;category&quot;:&quot;chemistry&quot;}]}</pre><p>You can see that one of our json object contains 2 prizes, in two different categories, and not for the same year.</p><h4>Logstash configuration</h4><p>This logstash configuration will just read this json file as a json content and send it to Elasticsearch:</p><pre>input {<br>  file {<br>    id =&gt; &quot;prizes&quot;<br>    path =&gt; &quot;/usr/share/logstash/pipeline/file/prizes.json&quot;<br>    mode =&gt; &quot;read&quot;<br>    codec =&gt; &quot;json&quot;<br>    start_position =&gt; &quot;beginning&quot;<br>    sincedb_path =&gt; &quot;/dev/null&quot;<br>  }<br>}<br><br>filter {<br> json {<br>  source =&gt; message<br>  remove_field =&gt; message<br> }<br> mutate {<br>  remove_field =&gt; [&quot;@timestamp&quot;, &quot;@version&quot;, &quot;event&quot;, &quot;host&quot;, &quot;log&quot;]<br> }<br>}<br><br>output {<br> stdout {<br>  codec =&gt; rubydebug { metadata =&gt; true }<br> }<br>}<br><br>output {<br>  elasticsearch {<br>    index =&gt; &quot;prizes-original&quot;<br>    hosts =&gt; [&quot;https://es01:9200&quot;,&quot;https://es02:9200&quot;,&quot;https://es03:9200&quot;]<br>    ssl_certificate_authorities =&gt; [&quot;/usr/share/logstash/certs/ca/ca.crt&quot;]<br>    user =&gt; &quot;elastic&quot;<br>    password =&gt; &quot;${ELASTIC_PASSWORD}&quot;<br>  }<br>}</pre><p>In this configuration, we:</p><ul><li>read from the beginning and don’t use sincedb (each time you restart logstash, it will re-read the file)</li><li>parse the json content to extract fields</li><li>keep only usefull fields to concentrate us on important stuff</li><li>send documents to stdout and to elasticsearch, in a <em>prizes-original</em> index.</li></ul><p>A next <a href="https://medium.com/@ingrid.jardillier/logstash-denormalize-documents-part-2-ab1068eb8228">article</a> will highlight the need of denormalization.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=aa674dab6c1d" width="1" height="1" alt=""><hr><p><a href="https://medium.zenika.com/logstash-denormalize-documents-part-1-aa674dab6c1d">Logstash — Denormalize documents (Part 1)</a> was originally published in <a href="https://medium.zenika.com">Zenika</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Using Elasticsearch searchable snapshots in cold / frozen tiers]]></title>
            <link>https://medium.zenika.com/using-elasticsearch-searchable-snapshots-in-cold-frozen-tiers-e01fa72ce8ee?source=rss-5fe6d48f202a------2</link>
            <guid isPermaLink="false">https://medium.com/p/e01fa72ce8ee</guid>
            <category><![CDATA[cold]]></category>
            <category><![CDATA[frozen]]></category>
            <category><![CDATA[ilm]]></category>
            <category><![CDATA[elasticsearch]]></category>
            <category><![CDATA[searchable-snapshots]]></category>
            <dc:creator><![CDATA[Ingrid Jardillier]]></dc:creator>
            <pubDate>Fri, 26 Jan 2024 14:10:39 GMT</pubDate>
            <atom:updated>2024-01-26T14:10:39.378Z</atom:updated>
            <content:encoded><![CDATA[<p>In this article, I’ll present how to use Elasticsearch searchable snapshots in cold and frozen tiers in order to reduce disk usage.</p><p><strong><em>Warning</em></strong>:<strong><em> Searchable snapshots are a paid feature. You must have an Enterprise licence to use it.</em></strong></p><h3>Data tiers architecture</h3><p>In a complete Elasticsearch architecture, you may have the following tiers:</p><ul><li><strong>Hot nodes:</strong> handle the indexing load for time series data such as logs or metrics and hold your most recent, most-frequently-accessed data.</li><li><strong>Warm nodes:</strong> hold time series data that is accessed less-frequently and rarely needs to be updated.</li><li><strong>Cold nodes:</strong> hold time series data that is accessed infrequently and not normally updated.</li><li><strong>Frozen nodes:</strong> hold time series data that is accessed rarely and never updated.</li></ul><h3>Warm vs cold tiers</h3><p>Warm and cold tiers share the same hardware resources needs. Indeed, when you create a new deployment on Elastic Cloud Service, you will remark that the same hardware profile is used on warm and cold tiers.</p><p>The difference between these two tiers lies in the fact that cold tier enables you to use searchable snapshots.</p><p>When you enable searchable snapshots on cold tier using ILM ‘Index Lifecycle Management), as the indices move to the cold tier, they are saved as <strong>snapshots</strong> in the associated repository. The primary shard(s) of the index is(are) then restored with the <strong>“restored-” prefix</strong>. Such indices shards are <strong>fully cached</strong> in the Elasticsearch cluster.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*gxTJ8ODgt4Rgji_HaIisDg.png" /></figure><p>With such a mecanism, <strong>replicas are no more needed on this tier for reliability</strong>. If a recovery is needed for an indice, it is automatically done using the snapshots. Such indices are called <strong>fully mounted indices.</strong> Fully mounted indices are <strong>read-only</strong>. These indices, as they eliminate the need for replicas, <strong>reduce required disk space by approximately 50% compared to the regular indices</strong>.</p><p>These fully mounted indices contain settings not available in classic indices, as you can see in the following capture. All information describing the associated snapshot are set.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*v91_I_IH8BRXnSEwUXjGeg.png" /></figure><p><strong>Search performance is normally comparable to a regular index.</strong> While recovery is ongoing, search performance may be slower than with a regular index because a search may need some data that has not yet been retrieved into the local cache. <strong>On-disk data is preserved across restarts</strong>, such that the node does not need to re-download data that is already stored on the node after a restart.</p><h3>Frozen tier</h3><p>The frozen tier <strong>requires a snapshot repository</strong>. You can’t use classic indices in this tier. Using searchable snapshot is mandatory. This tier stores <strong>partially mounted indices</strong> of searchable snapshots <strong>exclusively</strong>.</p><p>When the indices move to the frozen tier, they are saved as <strong>snapshots</strong> in the associated repository. Indices with <strong>prefix “partial-”</strong> are created but with a “0b” storage size. Only <strong>recently searched parts</strong> of the snapshotted index’s data are stored in a <strong>local cache</strong>. This cache has a fixed size and is shared across shards of partially mounted indices allocated on the same data node.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xh2CjRUX_5B-fd8xp0uvLQ.png" /></figure><p>This mecanism <strong>extends the storage capacity even further</strong> (by up to 20 times compared to the warm tier).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_89bMth7OKfAFMfSGejsVQ.png" /></figure><p><strong>Searches</strong> on the frozen tier are <strong>slower</strong> than on the cold tier because Elasticsearch must sometimes fetch frozen data from the snapshot repository.</p><h3>Conclusion</h3><p>In an operating environment which is compatible with searchable snapshots, <strong>searchable snapshots reduce the costs</strong> of running a cluster by removing the need for replica shards and for shard data to be copied between nodes.</p><p>Storage offered by all major public cloud providers typically provides <strong>very good protection against data loss or corruption</strong>. If you manage your own repository storage then you are responsible for its reliability.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e01fa72ce8ee" width="1" height="1" alt=""><hr><p><a href="https://medium.zenika.com/using-elasticsearch-searchable-snapshots-in-cold-frozen-tiers-e01fa72ce8ee">Using Elasticsearch searchable snapshots in cold / frozen tiers</a> was originally published in <a href="https://medium.zenika.com">Zenika</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Send .Net application traces to Elasticsearch using Elastic APM / RUM agent]]></title>
            <link>https://medium.zenika.com/send-net-application-traces-to-elasticsearch-using-elastic-apm-rum-agent-d7ff111b1ef?source=rss-5fe6d48f202a------2</link>
            <guid isPermaLink="false">https://medium.com/p/d7ff111b1ef</guid>
            <category><![CDATA[kibana]]></category>
            <category><![CDATA[elastic]]></category>
            <category><![CDATA[elasticsearch]]></category>
            <category><![CDATA[dotnet]]></category>
            <category><![CDATA[apm]]></category>
            <dc:creator><![CDATA[Ingrid Jardillier]]></dc:creator>
            <pubDate>Tue, 25 Apr 2023 12:28:40 GMT</pubDate>
            <atom:updated>2023-04-25T12:28:40.319Z</atom:updated>
            <content:encoded><![CDATA[<h3>Send .Net application traces to Elasticsearch using Elastic APM / RUM agent</h3><p>Good practices to send traces and add logs correlation ids to Elasticsearch, using Elastic APM / RUM agent.</p><h3><strong>What is Elastic APM agent?</strong></h3><p>The Elastic APM .NET Agent automatically measures the performance of your application and tracks errors. It has built-in support for the most popular frameworks, as well as a simple API which allows you to instrument any application.</p><p>The agent auto-instruments supported technologies and records interesting events, like HTTP requests and database queries. To do this, it uses built-in capabilities of the instrumented frameworks like Diagnostic Source, an HTTP module for IIS, or IDbCommandInterceptor for Entity Framework. This means that for the supported technologies, there are no code changes required beyond enabling auto-instrumentation.</p><p>Source : <a href="https://www.elastic.co/guide/en/apm/agent/dotnet/current/intro.html">APM .Net Agent</a></p><p>Real User Monitoring captures user interaction with clients such as web browsers. The <a href="https://www.elastic.co/guide/en/apm/agent/rum-js/5.x">JavaScript Agent</a> is Elastic’s RUM Agent.</p><p>Unlike Elastic APM backend agents which monitor requests and responses, the RUM JavaScript agent monitors the real user experience and interaction within your client-side application. The RUM JavaScript agent is also framework-agnostic, which means it can be used with any front-end JavaScript application.</p><p>You will be able to measure metrics such as “Time to First Byte”, domInteractive, and domComplete which helps you discover performance issues within your client-side application as well as issues that relate to the latency of your server-side application.</p><p>Source : <a href="https://www.elastic.co/guide/en/apm/guide/current/apm-rum.html">Real User Monitoring</a></p><h3><strong>Supported technologies for APM agent</strong></h3><p>For APM agent, choosing between Profiler auto instrumentation and NuGet use will depend on your needs and supported technologies.</p><p>See these page for more information: <a href="https://www.elastic.co/guide/en/apm/agent/dotnet/current/supported-technologies.html">Supported technologies</a></p><p>For RUM agent, Elastic provides a JavaScript agent and adds framework-specific integrations for React, Angular and Vue.</p><p>See these page for more information: <a href="https://www.elastic.co/guide/en/apm/agent/rum-js/5.x/supported-technologies.html">Supported technologies</a></p><h3><strong>Elastic APM agent implementation</strong></h3><p><strong>Profiler auto instrumentation</strong></p><p>In our case, as we use Docker, it would be easy to add Profiler auto instrumentation, we just have to add these lines in our Dockerfile:</p><pre>ARG AGENT_VERSION=1.20.0<br><br>FROM mcr.microsoft.com/dotnet/aspnet:6.0-alpine3.16 AS base<br>    <br># ...<br><br>FROM mcr.microsoft.com/dotnet/sdk:6.0-alpine3.16 AS build<br>ARG AGENT_VERSION<br><br># install zip curl<br>RUN apk update &amp;&amp; apk add zip wget<br><br># pull down the zip file based on ${AGENT_VERSION} ARG and unzip<br>RUN wget -q https://github.com/elastic/apm-agent-dotnet/releases/download/v${AGENT_VERSION}/elastic_apm_profiler_${AGENT_VERSION}-linux-x64.zip &amp;&amp; \<br>    unzip elastic_apm_profiler_${AGENT_VERSION}-linux-x64.zip -d /elastic_apm_profiler<br><br># ...<br><br>FROM build AS publish<br><br># ...<br><br>FROM base AS final<br><br>WORKDIR /elastic_apm_profiler<br>COPY --from=publish /elastic_apm_profiler .<br><br># ...<br><br># # Configures whether profiling is enabled for the currently running process.<br>ENV CORECLR_ENABLE_PROFILING=1<br># # Specifies the GUID of the profiler to load into the currently running process.<br>ENV CORECLR_PROFILER={FA65FE15-F085-4681-9B20-95E04F6C03CC}<br># # Specifies the path to the profiler DLL to load into the currently running process (or 32-bit or 64-bit process).<br>ENV CORECLR_PROFILER_PATH=/elastic_apm_profiler/libelastic_apm_profiler.so<br><br># # Specifies the home directory of the profiler auto instrumentation. <br>ENV ELASTIC_APM_PROFILER_HOME=/elastic_apm_profiler<br># # Specifies the path to the integrations.yml file that determines which methods to target for auto instrumentation.<br>ENV ELASTIC_APM_PROFILER_INTEGRATIONS=/elastic_apm_profiler/integrations.yml<br># # Specifies the log level at which the profiler should log. <br>ENV ELASTIC_APM_PROFILER_LOG=warn<br><br># Core configuration options / Specifies the service name (ElasticApm:ServiceName).<br>ENV ELASTIC_APM_SERVICE_NAME=NetApi-Elastic<br># Core configuration options / Specifies the environment (ElasticApm:Environment)<br>ENV ELASTIC_APM_ENVIRONMENT=Development<br># Core configuration options / Specifies the sample rate (ElasticApm:TransactionSampleRate).<br># 1.0 : Dev purpose only, should be lowered in Production to reduce overhead.<br>ENV ELASTIC_APM_TRANSACTION_SAMPLE_RATE=1.0 <br><br># Reporter configuration options / Specifies the URL for your APM Server (ElasticApm:ServerUrl).<br>ENV ELASTIC_APM_SERVER_URL=https://host.docker.internal:8200<br># Reporter configuration options / Specifies if the agent should verify the SSL certificate if using HTTPS connection to the APM server (ElasticApm:VerifyServerCert). <br>ENV ELASTIC_APM_VERIFY_SERVER_CERT=false /* Testing purpuse */ <br># Reporter configuration options / Specifies the path to a PEM-encoded certificate used for SSL/TLS by APM server (ElasticApm:ServerCert).<br># ENV ELASTIC_APM_SERVER_CERT=<br><br># Supportability configuration options / Sets the logging level for the agent (ElasticApm:LogLevel).<br>ENV ELASTIC_APM_LOG_LEVEL=Debug<br><br># ...</pre><p>You can find all the documentation at this place: <a href="https://www.elastic.co/guide/en/apm/agent/dotnet/current/setup-auto-instrumentation.html">Profiler Auto instrumentation</a></p><p>But, in our case, we don’t need any feature provided by the Profiler auto instrumentation. So this code is just shown for example.</p><p><strong>NuGet — Zero code change setup</strong></p><p>As we use .Net 6, we can also use the “zero code change” to integrate NuGet and be able to use NuGet features without changing any code. This is available when using .Net Core and .Net 5+.</p><p>To do this, just add the following environment variables in the Dockerfile:</p><pre>ARG AGENT_VERSION=1.20.0<br><br>FROM mcr.microsoft.com/dotnet/aspnet:6.0-alpine3.16 AS base<br><br># ...<br><br>FROM mcr.microsoft.com/dotnet/sdk:6.0-alpine3.16 AS build<br>ARG AGENT_VERSION<br><br># install zip curl<br>RUN apk update &amp;&amp; apk add zip wget<br><br># pull down the zip file based on ${AGENT_VERSION} ARG and unzip<br>RUN wget -q https://github.com/elastic/apm-agent-dotnet/releases/download/v${AGENT_VERSION}/ElasticApmAgent_${AGENT_VERSION}.zip &amp;&amp; \<br>    unzip ElasticApmAgent_${AGENT_VERSION}would.zip -d /ElasticApmAgent<br><br># ...<br><br>FROM build AS publish<br><br># ...<br><br>FROM base AS final<br><br>WORKDIR /ElasticApmAgent<br>COPY --from=publish /ElasticApmAgent .<br><br># ...<br><br># Inject the APM agent at startup<br>ENV DOTNET_STARTUP_HOOKS=/ElasticApmAgent/ElasticApmAgentStartupHook.dll<br># If the startup hook integration throws an exception, additional detail can be obtained by setting the Startup Hooks Logging variable.<br>ENV ELASTIC_APM_STARTUP_HOOKS_LOGGING=1<br><br># Core configuration options / Specifies the service name (ElasticApm:ServiceName).<br>ENV ELASTIC_APM_SERVICE_NAME=NetApi-Elastic<br># Core configuration options / Specifies the environment (ElasticApm:Environment)<br>ENV ELASTIC_APM_ENVIRONMENT=Development<br># Core configuration options / Specifies the sample rate (ElasticApm:TransactionSampleRate).<br># 1.0 : Dev purpose only, should be lowered in Production to reduce overhead.<br>ENV ELASTIC_APM_TRANSACTION_SAMPLE_RATE=1.0 <br><br># Reporter configuration options / Specifies the URL for your APM Server (ElasticApm:ServerUrl).<br>ENV ELASTIC_APM_SERVER_URL=https://host.docker.internal:8200<br># Reporter configuration options / Specifies if the agent should verify the SSL certificate if using HTTPS connection to the APM server (ElasticApm:VerifyServerCert). <br>ENV ELASTIC_APM_VERIFY_SERVER_CERT=false /* Testing purpuse */ <br># Reporter configuration options / Specifies the path to a PEM-encoded certificate used for SSL/TLS by APM server (ElasticApm:ServerCert).<br># ENV ELASTIC_APM_SERVER_CERT=<br><br># Supportability configuration options / Sets the logging level for the agent (ElasticApm:LogLevel).<br>ENV ELASTIC_APM_LOG_LEVEL=Debug<br><br># ...</pre><p>But, with this implementation, we won’t be able to make correlation with logs by adding transaction id and trace id.</p><p><strong>NuGet — .Net Core setup</strong></p><p>So, we will prefer using NuGet integration, adding logs correlation, and be able to choose the features to integrate.</p><p>The following Elastic for .Net NuGet packages are used:</p><ul><li><a href="https://github.com/elastic/apm-agent-dotnet">Elastic.Apm.NetCoreAll</a></li><li><a href="https://github.com/elastic/ecs-dotnet/tree/main/src/Elastic.Apm.SerilogEnricher">Elastic.Apm.SerilogEnricher</a></li></ul><p>But if you prefer choosing the features you want to integrate, you can choose only the packages you are interesting in instead of <em>Elastic.Apm.NetCoreAll</em>. The documentation is provided <a href="https://github.com/elastic/apm-agent-dotnet#installation">here</a>.</p><p>To enable Elastic APM, you just have one line to add in your Configure method:</p><pre>public void Configure(IApplicationBuilder app)<br>{<br>    app.UseAllElasticApm(Configuration);            <br>}</pre><p>In the case you only want to activate some modules, you can use the UseElasticApm method instead, after adding needed packages:</p><pre>app.UseElasticApm(Configuration,<br>      new HttpDiagnosticsSubscriber(),  /* Enable tracing of outgoing HTTP requests */<br>      new EfCoreDiagnosticsSubscriber()); /* Enable tracing of database calls through EF Core */</pre><p>To define the APM server to communicate with, add the following configuration in the appsettings.json file:</p><pre>{<br>    &quot;AllowedHosts&quot;: &quot;*&quot;,<br>    &quot;ElasticApm&quot;: <br>    {<br>        &quot;ServerUrl&quot;:  &quot;https://host.docker.internal:8200&quot;,<br>        &quot;LogLevel&quot;:  &quot;Information&quot;,<br>        &quot;VerifyServerCert&quot;: false /* Testing purpuse */ <br>    }<br>}</pre><p>See <a href="https://www.elastic.co/guide/en/apm/agent/dotnet/current/config-all-options-summary.html">this page</a> for all options available.</p><p>To add the transaction id and trace id to every Serilog log message (see previous article about logs) that is created during a transaction, you just add to update your configuration in the appsettings.json file:</p><pre>{<br>  &quot;Serilog&quot;: {<br>      &quot;Using&quot;: [&quot;Elastic.Apm.SerilogEnricher&quot;],<br>      /* ... */<br>      &quot;Enrich&quot;: [/* ... */, &quot;WithElasticApmCorrelationInfo&quot;],<br>      /* ... */<br>  }<br>}</pre><h3><strong>Elastic RUM agent implementation</strong></h3><p>For common JavaScript application, the implementation only takes a few lignes (asynchronous / non-blocking pattern):</p><pre>&lt;script&gt;<br>    ;(function(d, s, c) {<br>        var j = d.createElement(s),<br>        t = d.getElementsByTagName(s)[0]<br><br>        j.src = &#39;js/elastic-apm-rum.umd.min.js&#39;<br>        j.onload = function() {elasticApm.init(c)}<br>        t.parentNode.insertBefore(j, t)<br>    })(document, &#39;script&#39;, {serviceName: &#39;NetClient_Elastic_Front&#39;, serverUrl: &#39;https://localhost:8200&#39;, environment: &#39;Production&#39;})<br>&lt;/script&gt;</pre><p>You can You can find this implementation in the sample source code in the _Layout.cshtml of the Blazor App.</p><p>For frameworks like React, Angular and Vue, you can refer to <a href="https://www.elastic.co/guide/en/apm/agent/rum-js/5.x/framework-integrations.html">this page</a>.</p><h3><strong>Sending traces to Elasticsearch</strong></h3><p>The configuration has already been seen in the previous section.</p><p>You just have to ensure you have an APM server available (this is now done with an elastic-agent with an APM integration in Fleet).</p><h3><strong>Analyse traces in Kibana</strong></h3><p>First thing we can check, the correlation ids for logs.</p><figure><img alt="Logs correlation ids on Discover" src="https://cdn-images-1.medium.com/max/1024/1*lyv7DHjcnBA3PdfCqA700Q.png" /><figcaption>Logs correlation ids on Discover</figcaption></figure><p>Then, on the APM App on Kibana, we have a lot of information thanks to our traces.</p><ul><li>APM Inventory which gives the list of all services that send traces:</li></ul><figure><img alt="APM Inventory" src="https://cdn-images-1.medium.com/max/1024/1*g1KgIBwjrCV_TGUijb-shQ.png" /><figcaption>APM Inventory</figcaption></figure><ul><li>APM Service Map which display a map with our services (in case of complexe architecture, it is easy to see dependencies between services):</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MowdY4JCrhVLaGXfQocZ9g.png" /><figcaption>APM Service map</figcaption></figure><ul><li>APM Overview which gives an overview of all information about traces:</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*idShvQqhZz1O6YogIndFwA.png" /><figcaption>APM Overview</figcaption></figure><ul><li>APM Transactions which gives information about all transactions coming from our services:</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zl_G_DXqO7b8DtXw5QWdaA.png" /><figcaption>APM Transactions</figcaption></figure><ul><li>APM Dependencies which list all dependencies of the current service:</li></ul><figure><img alt="APM Dependencies" src="https://cdn-images-1.medium.com/max/1024/1*e2r7Gq-qRdgKpCsPLBSgUA.png" /><figcaption>APM Dependencies</figcaption></figure><ul><li>APM Errors which list all errors not catched by our service:</li></ul><figure><img alt="APM Errors" src="https://cdn-images-1.medium.com/max/1024/1*FwV3WSjhdXTAWLao-Ilotg.png" /><figcaption>APM Errors</figcaption></figure><ul><li>APM Logs which list all logs for the current service:</li></ul><figure><img alt="APM Logs" src="https://cdn-images-1.medium.com/max/1024/1*kllksHr5fkV7O1XGCy66rw.png" /><figcaption>APM Logs</figcaption></figure><ul><li>An interesting view is the trace sample on the Transactions view. You can view the detailed trace for a transaction:</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-iCBJuQl-mnkQ0JOIYNsfA.png" /><figcaption>APM Transactions — Trace timeline</figcaption></figure><ul><li>The same view for JavaScript service (RUM):</li></ul><figure><img alt="APM Transactions — Trace timeline — JavaScript service" src="https://cdn-images-1.medium.com/max/1024/1*ysJcjMsNJthG9kQIdNrYWA.png" /><figcaption>APM Transactions — Trace timeline — JavaScript service</figcaption></figure><ul><li>And you can also see related logs:</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dGSIOlpG5aJkscqUS1gkjg.png" /><figcaption>APM Transactions — Trace logs</figcaption></figure><ul><li>A full dashboard for User Experience (RUM):</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Vb29T9_U3IY1DkXQ0_zu5Q.png" /><figcaption>APM User Experience Dashboard</figcaption></figure><h3>Conclusion</h3><p>In this article, we have seen how to use Elastic APM / RUM agent to send traces to Elasticsearch and add logs correlation ids.</p><p>A complete sample, with 2 projects (.Net API and .Net client with Blazor UI) is available on <a href="https://github.com/ijardillier/netclient-elastic">Github</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d7ff111b1ef" width="1" height="1" alt=""><hr><p><a href="https://medium.zenika.com/send-net-application-traces-to-elasticsearch-using-elastic-apm-rum-agent-d7ff111b1ef">Send .Net application traces to Elasticsearch using Elastic APM / RUM agent</a> was originally published in <a href="https://medium.zenika.com">Zenika</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Write and send .Net application metrics to Elasticsearch using Prometheus]]></title>
            <link>https://medium.zenika.com/write-and-send-net-application-metrics-to-elasticsearch-using-prometheus-31f5c21ba54c?source=rss-5fe6d48f202a------2</link>
            <guid isPermaLink="false">https://medium.com/p/31f5c21ba54c</guid>
            <category><![CDATA[prometheus]]></category>
            <category><![CDATA[elastic]]></category>
            <category><![CDATA[elasticsearch]]></category>
            <category><![CDATA[kibana]]></category>
            <category><![CDATA[dotnet]]></category>
            <dc:creator><![CDATA[Ingrid Jardillier]]></dc:creator>
            <pubDate>Thu, 20 Apr 2023 07:34:34 GMT</pubDate>
            <atom:updated>2023-04-26T13:52:08.487Z</atom:updated>
            <content:encoded><![CDATA[<h3><a href="https://medium.com/p/9819742cf806/edit?source=your_stories_page-------------------------------------">Write and send .Net application metrics to Elasticsearch using P</a>rometheus</h3><p>Good practices to properly write and send metrics to Elasticsearch, using Prometheus.</p><h3><strong>What is Prometheus?</strong></h3><p>Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.</p><p>Source : <a href="https://prometheus.io/">Prometheus</a></p><h3><strong>NuGet packages</strong></h3><p>The following Prometheus for .Net NuGet packages are used:</p><ul><li><a href="https://github.com/prometheus-net/prometheus-net">prometheus-net</a></li><li><a href="https://github.com/prometheus-net/prometheus-net#aspnet-core-exporter-middleware">prometheus-net.AspNetCore</a></li><li><a href="https://github.com/prometheus-net/prometheus-net#aspnet-core-health-check-status-metrics">prometheus-net.AspNetCore.HealthCheck</a>s</li></ul><p>These are .NET libraries for instrumenting your applications and exporting metrics to Prometheus.</p><h3><strong>Prometheus implementation</strong></h3><p>First, you have to add the following packages in your csproj file (you can update the version to the latest available for your .Net version):</p><pre>&lt;PackageReference Include=&quot;prometheus-net&quot; Version=&quot;8.0.0&quot; /&gt;<br>&lt;PackageReference Include=&quot;prometheus-net.AspNetCore&quot; Version=&quot;8.0.0&quot; /&gt;<br>&lt;PackageReference Include=&quot;prometheus-net.AspNetCore.HealthChecks&quot; Version=&quot;8.0.0&quot; /&gt;</pre><p>By default, Prometheus .Net library add some application metrics about .Net (Memory, CPU, garbaging, …). As we plan to use APM agent, we don’t want it to add this metrics, so we can suppress them. We will also add some static labels to each metrics in order to be able to add contextual information from our application, as we did it for logs:</p><pre>public virtual void ConfigureServices(IServiceCollection services)<br>{<br>    // ...<br><br>    Metrics.SuppressDefaultMetrics();<br><br>    Metrics.DefaultRegistry.SetStaticLabels(new Dictionary&lt;string, string&gt;<br>    {<br>        { &quot;domain&quot;, &quot;NetClient&quot; },<br>        { &quot;domain_context&quot;, &quot;NetClient.Elastic&quot; }<br>    });<br><br>    // ...     <br>}</pre><p>We also have to map endpoints for metrics:</p><pre>public void Configure(IApplicationBuilder app)<br>{<br>    // ...<br><br>    app.UseEndpoints(endpoints =&gt;<br>    {<br>        // ...<br><br>        endpoints.MapMetrics();<br>    });<br>}</pre><p>This map exposes the /metrics endpoint with the Prometheus format.</p><p>If you need OpenMetrics format, you can easily access it with /metrics?accept=application/openmetrics-text</p><p>The result is the below:</p><pre># HELP aspnetcore_healthcheck_status ASP.NET Core health check status (0 == Unhealthy, 0.5 == Degraded, 1 == Healthy)<br># TYPE aspnetcore_healthcheck_status gauge<br>aspnetcore_healthcheck_status{name=&quot;self&quot;,domain=&quot;NetClient&quot;,domain_context=&quot;NetClient.Elastic&quot;} 1<br># HELP myapp_gauge1 A simple gauge 1<br># TYPE myapp_gauge1 gauge<br>myapp_gauge1{service=&quot;service1&quot;,domain=&quot;NetClient&quot;,domain_context=&quot;NetClient.Elastic&quot;} 1028<br># HELP myapp_gauge2 A simple gauge 2<br># TYPE myapp_gauge2 gauge<br>myapp_gauge2{service=&quot;service1&quot;,domain=&quot;NetClient&quot;,domain_context=&quot;NetClient.Elastic&quot;} 2403<br># HELP myapp_gauge3 A simple gauge 3<br># TYPE myapp_gauge3 gauge<br>myapp_gauge3{service=&quot;service1&quot;,domain=&quot;NetClient&quot;,domain_context=&quot;NetClient.Elastic&quot;} 3872<br>...</pre><h3><strong>Forward Health checks to Prometheus</strong></h3><p>We can easily forward our health checks (described int the previous article) to Prometheus endpoint, to avoid using http module from Metricbeat and retrieve all metrics including health checks from Metricbeat Prometheus module. By the way, we will also benefit from our static labels if defined.</p><p>This is done here in our custom extension which is used in the ConfigureServices of the Startup file:</p><pre>public virtual void ConfigureServices(IServiceCollection services)<br>{<br>    // ...<br>    services.AddCustomHealthCheck(Configuration)<br>    // ...     <br>}<br><br>public static IServiceCollection AddCustomHealthCheck(this IServiceCollection services, IConfiguration configuration)<br>{<br>    IHealthChecksBuilder hcBuilder = services.AddHealthChecks();<br>    hcBuilder.AddCheck(&quot;self&quot;, () =&gt; HealthCheckResult.Healthy());<br><br>    hcBuilder.ForwardToPrometheus();<br><br>    return services;<br>}</pre><h3><strong>Business metrics</strong></h3><p>Prometheus .Net library offers an easy way to add business metrics.</p><p>To create a new metric, you just have to instantiate an new counter, gauge, …:</p><pre>private readonly Gauge Gauge1 = Metrics.CreateGauge(&quot;myapp_gauge1&quot;, &quot;A simple gauge 1&quot;);</pre><p>If you need to add attached labels, you have to add a configuration:</p><pre>private static readonly GaugeConfiguration configuration = new GaugeConfiguration { LabelNames = new[] { &quot;service&quot; }};<br>private readonly Gauge Gauge2 = Metrics.CreateGauge(&quot;myapp_gauge1&quot;, &quot;A simple gauge 1&quot;, configuration);</pre><p>To apply a label and a value to a metric, use this kind of code:</p><pre>Gauge1.WithLabels(&quot;service1&quot;).Set(_random.Next(1000, 2000));</pre><h3><strong>Sending metrics to Elasticsearch</strong></h3><p>All the metrics are available on the /metrics endpoint.</p><p>In our example, we don’t have any Prometheus server, so metricbeat will directly access metrics from the application metrics endpoint. But if you have a Prometheus server, you can add a new target in your scrape configuration.</p><p>So, to send the metrics to Elasticseach, you will have to configure a metricbeat agent with prometheus module:</p><pre>    metricbeat.modules:<br>    - module: prometheus<br>      period: 10s<br>      metricsets: [&quot;collector&quot;]<br>      hosts: [&quot;host.docker.internal:8080&quot;]<br>      metrics_path: /metrics</pre><p>For more information about this metricbeat configuration, you can have a look to: <a href="https://github.com/ijardillier/docker-elk/blob/master/extensions/beats/metricbeat/config/metricbeat.yml">https://github.com/ijardillier/docker-elk/blob/master/extensions/beats/metricbeat/config/metricbeat.yml</a></p><h3><strong>Analyse metrics in Kibana</strong></h3><p>You can check how metrics are ingested in the Discover module:</p><figure><img alt="Metrics on Discover" src="https://cdn-images-1.medium.com/max/1024/1*shqGBMY0m4WQQXt_YC5z6g.png" /><figcaption>Metrics on Discover</figcaption></figure><p>You can see how metrics are displayed in the Metrics Explorer App:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*gTalgNyy_Zdl0a6q-Lt7mg.png" /></figure><h3>Conclusion</h3><p>In this article, we have seen how to use Prometheus to write and send metrics to Elasticsearch.</p><p>A complete sample, with 2 projects (.Net API and .Net client with Blazor UI) is available on <a href="https://github.com/ijardillier/netclient-elastic">Github</a>.</p><p>In the next article, we will focus on traces with Elastic APM agent.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=31f5c21ba54c" width="1" height="1" alt=""><hr><p><a href="https://medium.zenika.com/write-and-send-net-application-metrics-to-elasticsearch-using-prometheus-31f5c21ba54c">Write and send .Net application metrics to Elasticsearch using Prometheus</a> was originally published in <a href="https://medium.zenika.com">Zenika</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>