The Retrieval Part of RAG with Elasticsearch for none-English languages.

Agenda of the topic

  1. Start elasticsearch with docker locally/ elastic cloud free trial
  2. Play with multiple vector embedder
    Option 1 Host embedder yourself
    Option 2 with elasticcloud
  3. Create index , setting, mapping and indexing
  4. Search options
  5. RAG improvement for language which has no re-ranker (Thai language)

Start elasticsearch with docker locally/ elastic cloud free trial

  1. Local environment setup


step1: You must already started docker engine.
step2: create docker network

docker network create elastic

step3: Pull the elasticsearch image

docker pull

step4: Run the elasticsearch container

docker run --name es01 --net elastic -p 9200:9200 -it -m 1GB
  • ** when this run finished, The command prints the elastic user password and an enrollment token for Kibana.

step5: Setup credential related stuff
use step4’s output ("your_password")

export ELASTIC_PASSWORD="your_password"

Get cert file

docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .

Test the setup
curl --cacert http_ca.crt -u elastic:$ELASTIC_PASSWORD https://localhost:9200

b. IF you want to use eland and elastic ML Nodes, please use elastic cloud instead. (free trial 14 days)

Play with multiple vector embedder

WHY elasticsearch instead of Milvus???

  1. If you use English as input/output, the model re-ranker can be used to make your retrieval in RAG not that difficult.
  2. But what if your user are non-English speaker/reader??? If your country have re-ranker model, that is good. If not, the scoring of multiple methodology is important.
  3. Elastic has fuzzy search logic (applying analyzer and tokenizer for your language and use n-gram methodology to generate the score, then you can later boost the scoring) mixing with semantic search.
  4. As of now Milvus can only have 1 vector per 1 collection, so if you are none-english or multi language document RAG developer, maybe you need more than 1 collections for the languages. However elasticsearch 8.12 already support the multiple vector field in 1 index!
  5. ref:

Option 1 Host embedder yourself

If you used docker as your hosting for this environment you can create the system like this architecture

FE (frontend)[Pass text input to backend] →
BE(Backend) [Process text2vec →search payload to Elasticsearch] →
ES (elasticsearch) [Do search]

Pros: Cost effective, since we do not have to host embedder model server.

Option 2With ElasticCloud ML Node (Available on IBM cloud

FE (frontend)[Pass text input to backend] →
BE(Backend) [Put the text into search payload then pass to Elasticsearch ML node] →
ES (elasticsearch) [Get text→embed→Dosearch]

Pros: Microservice BE can be smaller, and take advantage of ML node in case lot of searhes request come.

Create index , setting, mapping and indexing

Before we talk about index, setting, mapping and indexing. Let me quickly introduce the name.

Elasticsearch has, Node → Shard → Index → Field Node and Shard is the infrastructure: we will skip in this blog

  • Index is the way you keep the semi-structured data in it. (Structured data you use SQL table, which each tables has a schema (columns))
  • Field is the key of the semi-structure data you keep/ (Think it is a columns of SQL in the scenario of structured data)
  • Setting → elasticsearch is not only Database but also SEARCH Engine, this setting will include the way you set, (n gram, analyzer, tokenizer)
  • Mapping → Data type of each field and applying your setting into it.

This blog I will show the way you want to search something, related to the characters from Harry Potter, Naruto and One-piece.

The data is something like this.

  • For Local elasticsearch using docker
from elasticsearch import Elasticsearch

client = Elasticsearch(
http_auth=('USERNAME', 'PASSWORD'),
from elasticsearch import Elasticsearch 
es = Elasticsearch(cloud_id='ELASTIC_CLOUD_ID',
request_timeout=600 )
  • Then prepare for creating the Index.
setting_mapping = {
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_shingle": {
"tokenizer": "icu_tokenizer",
"filter": [
"filter": {
"filter_shingle": {
"type": "shingle",
"max_shingle_size": 3,
"min_shingle_size": 2,
"output_unigrams": "true"
"mappings": {
"properties": {
"th_character": {"type": "text"},
"th_character_description": {"type": "text", "analyzer": "analyzer_shingle"},
"en_character": {"type": "text"},
"en_character_description": {"type": "text", "analyzer": "analyzer_shingle"},
"vector_en": {
"type": "dense_vector",
"dims": 1024,
"index": True,
"similarity": "cosine"
"vector_th": {
"type": "dense_vector",
"dims": 768,
"similarity": "cosine"

client.indices.create(index='search_character_test', body= setting_mapping)
  • Then create the index
client.indices.create(index='search_character_test', body= setting_mapping)
  • Then upload the data!

This will be different from elasticcloud and your local docker container. For the elastic cloud please follow the REF here! →

For the docker local version please continue reading this blog post.

  • Then prepare the data.
import pandas as pd
from sentence_transformers import SentenceTransformer
from sentence_transformers import SentenceTransformer, models

def get_model(model_name='airesearch/wangchanberta-base-att-spm-uncased', max_seq_length=768, condition=True):
if condition:
# model_name = 'airesearch/wangchanberta-base-att-spm-uncased'
# model_name = "hkunlp/instructor-large"
word_embedding_model = models.Transformer(model_name, max_seq_length=max_seq_length)
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(),pooling_mode='cls') # We use a [CLS] token as representation
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
return model

english_embedder = get_model('BAAI/bge-large-en-v1.5', max_seq_length=768)
thai_embedder = get_model('airesearch/wangchanberta-base-att-spm-uncased', max_seq_length=768)

data_read = pd.read_csv('./mock_data/character_data.csv') # this is what I showed previously
data_set = data_read.to_dict('records')

for data in data_set:
data['vector_en'] = english_embedder.encode(data['en_character_description'])
data['vector_th'] = thai_embedder.encode(data['th_character_description'])
client.index(index='search_character_test', document=data)

Now your index is already ready for search (R in RAG)!!!!

Search options

Elasticsearch has both semantic (vector search) and fuzzy logic (n-grams word similarity).

As you can see the setting we already applied analyzer and tokenizer in our index already, and have vector field in our mapping.

Warm up.

# Get all data
resp ='search_character_test', query={"match_all":{}})
for hit in resp['hits']['hits']:

Start the fuzzy logic.

th_question = 'ใครที่รอดจากคำสาปของลอร์ดโวลเดอมอร์'
en_question = 'Who is survived from the dead curse of lord Voldemort'
fuzzy_en_payload = {
"fuzzy": {
"en_character_description": {
"value": f"{en_question}",
"fuzziness": "AUTO"
fuzzy_en_response ="search_character", query=fuzzy_en_payload)
for hit in fuzzy_en_response['hits']['hits']:

fuzzy_th_payload = {
"fuzzy": {
"th_character_description": {
"value": f"{th_question}",
"fuzziness": "AUTO"
fuzzy_th_response ="search_character", query=fuzzy_th_payload)
for hit in fuzzy_th_response['hits']['hits']:

From these fuzzy searches you will get SCORE from ‘_score’.

Next let’s see the semantic search.

query_vector_en = english_embedder.encode(en_question)
semantic_query_en = {
"field": "vector_en",
"query_vector": query_vector_en,
"num_candidates": 20
semantic_resp_en ="search_character", knn=semantic_query_en)
for hit in semantic_resp_en['hits']['hits']:

query_vector_th = thai_embedder.encode(th_question)
semantic_query_th = {
"field": "vector_th",
"query_vector": query_vector_th,
"num_candidates": 20
semantic_resp_th ="search_character", knn=semantic_query_th)
for hit in semantic_resp_th['hits']['hits']:

From these vector searches you will also get SCORE from ‘_score’.

RAG improvement for language which has no re-ranker (Thai language)

In case your user use English, it is OK to skip these scores because you have the re-ranker model. (REF:

But for the Thai language or other none-English, we may need to use the traditional NLP as a score then, do the ranking by the scores we’ve got (semantic plus fuzzy).

We have scores, we can weighted and boosted these scores as a re-ranker (statistical/mannualy) based on your use case preference.

How to use elasticsearch with IBM cloud please visit:

