Elasticsearch: Veri Keşfi ve Analizinde Güçlü Araçlar [5/5]

Elasticsearch: Powerful Tools for Data Discovery and Analysis [5/5]

Published in

Intertech

6 min readApr 17, 2024

Elasticsearch, dağıtık bir arama ve analiz motoru olarak tasarlanmıştır ve genellikle büyük ölçekli veri depolama ve sorgulama ihtiyaçları için kullanılır. Hem yapılandırılmış hem de yapılandırılmamış verileri hızlı ve etkili bir şekilde işleyebilme yeteneği, Elasticsearch’i birçok endüstride tercih edilen bir çözüm haline getirmiştir.

Elasticsearch, dağıtık bir arama ve analiz motoru olan Apache Lucene tabanlı bir açık kaynaklı yazılım çözümüdür.

Diğer Elasticsearch makalelerine linkler üzerinden erişebilirsiniz.

Compound And Aggregation Queries (Bileşik ve Toplama Sorguları)

Compound (bileşik) sorgular elasticsearch’de birden fazla field üzerinde sorgular oluşturmamıza imkan veren özelliktir. Compound sorgular 4 adet önemli anahtar kelimeye (must, must_not, should, filter) sahiptir.

Elasticsearch mimarisinde birden fazla field üzerinde sorgu yapmakla ilgili yaklaşım, diğer NoSQL veya SQL veri tabanlarından daha farklıdır.

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "geoip.city_name": {
              "value": "AnKaRA",
              "case_insensitive": true
            }
          }
        }
      ],
      "must_not": [
        {
          "range": {
            "year": {
              "lte": 2023
            }
          }
        }
      ],
      "should": [
        {
          "term": {
            "category.keyword": {
              "value": "Diesel Vehicles"
            }
          }
        }
      ],
      "filter": [
        {
          "term": {
            "manufacturer.keyword": {
              "value": "VolKswagen",
              "case_insensitive": true
            }
          }
        }
      ]
    }
  }
}

MUST: Şartları sağlayan veriler response’da yer alır ve skora katkı sağlar. Örneğin yukarıda yer alan sorguda geoip.city_name alanı mutlaka bulunmak zorunda, bulunursa da skora destek sağlıyor. Must ifadesi içerisine birden fazla sorgu yazılabilir.
MUST_NOT: Response’de yer almasını istemediğimiz dokümanların sorgusunu burada belirtiyoruz. Skora katkısı yoktur. İstenmeyen şartları içerir.
SHOULD: OR (Veya) gibi davranış sergiler. Eşleşen dokümanlar içerisinde yer alabilir fakat zorunlu değildir. Eğer ki should içerisinde sorgu response’da geçiyorsa skora katkı sağlar.
FILTER: Sorgular, mutlaka eşleşen dokümanlar içerisinde yer almalıdır fakat skor değerine herhangi bir katkı sağlamaz. Örneğin yukarıda yer alan manufacturer.keyword mutlaka response’da yer alacak ama skor değerine herhangi bir katkı sağlayamayacaktır.

Bucket Aggregations | Metric Aggregations

Bucket aggregations EF Core tarafındaki GroupBy metodu gibi davranış sergiler. Metric aggregations ise toplamını bulma, toplam sayıyı bulma, ortalama sayıyı bulma, maximum ve minimum gibi konuları içerir.

Bucket aggregations sorgulamaların tipine/şartına göre elasticsearch bunları ayrı ayrı kova (bucket) lara alır. Örneğin kategori ve kategoriye bağlı ürün sayılarını istediğimizde ayrı ayrı kovalar oluşturulur. Sonuçta araba kategorisinde 1000 tane, Bisiklet kategorisinde 250 adet ürün var şeklinde bir sonuçla karşılaşırız.

1.1 — Bucket aggregations | Terms query

Örneğin kategoriye bağlı kaç adet ürün var bunu öğrenelim.

// Request
{
  "_source": false, //source ihtiyacı yoksa
  "aggs": {
    "number_of_products_category": { //istenilen isim verilebilir.
      "terms": {
        "field": "category.keyword",
        "size": 1000, //sayfalama yapılmak istenirse
        "order": { //sıralama yapılmak istenirse
            "_key": "asc",
            "_count": "asc"
        }
      }
    }
  }
}

//Response
{
  "took": 12,
  "timed_out": false,
  "_shards": {
    //...
  },
  "hits": {
   //...
  },
  "aggregations": {
    "number_of_products_category": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Electronics",
          "doc_count": 572
        },
        {
          "key": "Home and Kitchen Appliances",
          "doc_count": 830
        },
        {
          "key": "Sporting Goods",
          "doc_count": 944
        }
      ]
    }
  }
}

Normal response’dan farklı olarak aggregations alanı yer almaktadır. Verilen özel isimlendirmenin (number_of_products_category) arkasından diğer alanlara bakıldığında verilerin kovalara bölündüğü görülebilir.

1.2 — Bucket aggregations | Range query

//Request
{
 "_source": false,
  "query": {
    "term": {
      "category.keyword": "Home and Kitchen Appliances"
    }
  },
  "aggs": {
    "category_totalprice_range": {
      "range": {
        "field": "total_price",
        "ranges": [
          {
            "to": 20.00 //fiyatı 20'ye kadar olanlar
          },
          {
            "from": 20.00, //fiyatı 20 ile 150 olanlar
            "to": 150.00
          },
          {
            "from": 300 //fiyatı 300'den fazla olanlar
          }          
        ]
      }
    }
  }
}

//Response
{
  "took": 2,
  "timed_out": false,
  "_shards": {
   //...
  },
  "hits": {
    //...
  },
  "aggregations": {
    "category_totalprice_range": {
      "buckets": [
        {
          "key": "*-20.0",
          "to": 20.0,
          "doc_count": 65
        },
        {
          "key": "20.0-150.0",
          "from": 20.0,
          "to": 150.0,
          "doc_count": 78
        },
        {
          "key": "300.0-*",
          "from": 300.0,
          "doc_count": 254
        }
      ]
    }
  }
}

Belirli bir kriter aralığına göre veri setinde kaç öğe bulunduğunu belirlemek ve ardından bu öğeleri belirli gruplara ayırmak için bir işlem yapılıyor. Örneğin, 100 ile 200 lira arasındaki fiyat aralığındaki ürün sayısını veya belirli bir tarih aralığındaki öğe sayısını belirleyebiliyoruz. Daha sonra, bu öğeleri fiyatlarına veya tarihlerine göre gruplara ayırarak analiz ediyoruz.

1.3 — Metric aggregations | Avg, Sum, Max, Min query

// AVG
{
  "_source": false,
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "total_price"
      }
    }
  }
}

// SUM
{
  "_source": false,
  "query": {
    "term": {
      "category.keyword": "Sporting Goods"
    }
  },
  "aggs": {
    "total_sum_price": {
      "sum": {
        "field": "total_price"
      }
    }
  }
}
// MAX
{
  "_source": false,
  "query": {
    "term": {
      "category.keyword": "Home and Kitchen Appliances"
    }
  },
  "aggs": {
    "maximum_price": {
      "max": {
        "field": "total_price"
      }
    }
  }
}
// MIN
{
  "_source": false,
  "query": {
    "term": {
      "category.keyword": "Electronics"
    }
  },
  "aggs": {
    "minimum_price": {
      "min": {
        "field": "total_price"
      }
    }
  }
}

Match boolean prefix query | Elasticsearch Guide [8.11] | Elastic

A match_bool_prefix query analyzes its input and constructs a from the terms. Each term except the last is used in a…

www.elastic.co

Nest ve Elastic.Clients.Elasticsearch kütüphaneleri, Elasticsearch ile etkileşimde bulunmak için kullanılan .NET tabanlı kütüphanelerdir. Her ikisi de Elasticsearch ile iletişim kurmak ve sorgular yapmak için kullanılır, ancak farklı yaklaşımlar ve özellik setleri sunarlar.

NEST 7.17.5

Strongly typed interface to Elasticsearch. Fluent and classic object initializer mappings of requests and responses…

www.nuget.org

Elastic.Clients.Elasticsearch 8.12.1

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and…

www.nuget.org

Rest/Postman üzerinden örnek kitap index’i nasıl oluşturulur?

// http://localhost:9200/book

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "content": {
        "type": "text"
      },
      "user_id": {
        "type": "keyword"
      },
      "tags": {
        "type": "text"
      },
      "published_date": {
        "type": "date"
      }
    }
  }
}

Bu işlemi eğer net tarafında yapmak isterseniz;

var createIndexResponse = await elasticsearchClient.Indices.CreateAsync<Book>('book', indexDescriptor => indexDescriptor
                                 .Mappings(map => map
                                     .Properties(props => props
                                         .Text(book => book.Title, textDescriptor => textDescriptor.Fields(field => field.Keyword(book => book.Title)))
                                         .Text(book => book.Content)
                                         .Keyword(book => book.UserId)
                                         .Keyword(book => book.Tags)
                                         .Date(book => book.PublishedDate))));

Elasticsearch 8.x versiyonunu kullanıyorsanız Elastic.Clients.Elasticsearch kütüphanesini, daha düşük versiyon kullanıyorsanız NEST kütüphanesini kullanmanız önerilmektedir.

Should | OR ve AND Kullanımı

Should metodu içerisinde sorguları virgül ile ayırarak eklersek OR anlamını kazanırken, noktalar ile devam edersek AND anlamını kazanmaktadır.

OR ile Should

var searchResponse = await _elasticsearchClient.SearchAsync<Bank>(search => search.Index("banks_data")
               .Size(250)
                .Query(query => query
                 .Bool(b => b
                  .Should(should => should
                   .Match(match => match
                    .Field(field => field)
                     .Query(searchText)), //virgül
               should => should.MatchBoolPrefix(prefix => prefix
                    .Field(field => field.Title)
                     .Query(searchText))))));

AND ile Should

var searchResponse = await _elasticsearchClient.SearchAsync<Bank>(search => search.Index("banks_data")
               .Size(250)
                .Query(query => query
                 .Bool(b => b
                  .Should(should => should
                   .Match(match => match
                    .Field(field => field)
                     .Query(searchText)) //nokta
                   .MatchBoolPrefix(prefix => prefix
                    .Field(field => field.Title)
                     .Query(searchText))))));

Senaryo | E-Ticaret Sitesinde Arama Fonksiyonu Oluşturma

Elastic.Clients.Elasticsearch 8.11.0

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and…

www.nuget.org

NEST 7.17.5

Strongly typed interface to Elasticsearch. Fluent and classic object initializer mappings of requests and responses…

www.nuget.org

Fcakiroglublog

Edit description

fcakiroglu.com

Bucket aggregations | Elasticsearch Guide [8.13] | Elastic

Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create…

www.elastic.co

Metrics Aggregation in Elasticsearch

In my previous blog, I have explained about basic aggregation. Now, let us pick the metrics aggregation and see how we…

faun.pub

Introduction to Elasticsearch Queries

Introduction

to Elasticsearch Queries Introductionmedium.com

Elasticsearch: Veri Keşfi ve Analizinde Güçlü Araçlar [5/5]

Elasticsearch: Powerful Tools for Data Discovery and Analysis [5/5]

Compound And Aggregation Queries (Bileşik ve Toplama Sorguları)

Bucket Aggregations | Metric Aggregations

1.1 — Bucket aggregations | Terms query

1.2 — Bucket aggregations | Range query

1.3 — Metric aggregations | Avg, Sum, Max, Min query

Match boolean prefix query | Elasticsearch Guide [8.11] | Elastic

A match_bool_prefix query analyzes its input and constructs a from the terms. Each term except the last is used in a…

NEST 7.17.5

Strongly typed interface to Elasticsearch. Fluent and classic object initializer mappings of requests and responses…

Elastic.Clients.Elasticsearch 8.12.1

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and…

Rest/Postman üzerinden örnek kitap index’i nasıl oluşturulur?

Should | OR ve AND Kullanımı

Senaryo | E-Ticaret Sitesinde Arama Fonksiyonu Oluşturma

Elastic.Clients.Elasticsearch 8.11.0

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and…

NEST 7.17.5

Strongly typed interface to Elasticsearch. Fluent and classic object initializer mappings of requests and responses…

Fcakiroglublog

Edit description

Bucket aggregations | Elasticsearch Guide [8.13] | Elastic

Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create…

Metrics Aggregation in Elasticsearch

In my previous blog, I have explained about basic aggregation. Now, let us pick the metrics aggregation and see how we…

Introduction to Elasticsearch Queries

Introduction

Elasticsearch - Index APIs

Elasticsearch - Index APIs - These APIs are responsible for managing all the aspects of the index like settings…

Written by Cihat Solak