Neural Search on OpenSearch

19 min readFeb 6, 2024

OpenSearch 2.11 was released in October 2023. This version of OpenSearch comes with broadened support for neural search by adding support for sparse neural search. In this blog post, we will demonstrate how to use OpenSearch’s neural search support in a local environment using OpenSearch-provided pre-trained models.

Neural search transforms text into vector embeddings at both ingestion and query time. Vector embeddings capture the meaning of text, which allows us to go beyond keywords and facilitate search based on the semantic understanding of user queries. There are two types of vector embeddings used for neural search, dense embeddings and sparse embeddings. This blog post from OpenSearch provides a good overview of these two types of vector embeddings and how they compare with each other in terms of performance, index size, latency, and computational cost. The goal of the current blog post is to demonstrate how to use the OpenSearch support for these two types of embeddings to perform neural search.

As explained in the aforementioned blog post, for sparse embeddings, OpenSearch offers two options. We can either use the bi-encoder mode, where both documents and search queries are passed through deep encoders or we can use the document-only encoder mode, where documents are passed through deep encoders while search queries are only tokenized. The document-only encoder mode saves computational resources at search time since search queries are only tokenized, which also significantly reduces latency. However, quality-wise, bi-encoder mode performs better. For purposes of this exercise, we will demonstrate how to use both modes of sparse neural search.

The rest of the blog post is organized as follows:

First, we’ll set up our local environment by installing OpenSearch, OpenSearch Dashboards and the OpenSearch plugins required for neural search.
Next, we’ll register and deploy embedding models to be used by neural search.
Then, we will create an ingest pipeline to allow OpenSearch to generate embeddings for documents we add to our index.
Next, we’ll create an index where we’ll index some test data.
Finally, we’ll perform neural search requests against our test index.

1 Prerequisites

First, we need to install OpenSearch and OpenSearch Dashboards locally and install the plugins required for neural search. The steps below demonstrate how to do this on a Mac using Homebrew. Note that while the prerequisites outlined here to install OpenSearch, OpenSearch Dashboards and the plugins required for neural search are only documented for a Mac environment, the rest of the sections in this blog post apply to any local environment where the prerequisites are satisfied.

1.1 Install OpenSearch

The following brew command will install the latest version of OpenSearch (2.11.1 as of the writing of this post).

$ brew install opensearch

1.2 Install OpenSearch Dashboards

Note that the version of the dashboards you install needs to match the version of OpenSearch installed in the earlier step. The following brew command will install the latest version of OpenSearch Dashboards (2.11.1 as of the writing of this post).

$ brew install opensearch-dashboards

1.3 Install the Necessary OpenSearch Plugins

We need the following three plugins for neural search. You can download the zip files for these plugins from the central repository. Note that the version of each of these plugins you install needs to match the OpenSearch version you installed earlier.

ML Commons plugin

This plugin will allow us to set up ML models to be used in our ingestion pipeline to index embeddings for both dense and sparse neural search. The zip file for version 2.11.1.0 is available here.

k-NN plugin

This plugin provides support for dense embeddings. The zip file for version 2.11.1.0 is available here.

OpenSearch neural plugin

This plugin supports neural search. It will help us set up an ingestion pipeline to create and index embeddings in our test index. It will also help us perform neural search requests against our index. The zip file for version 2.11.1.0 is available here.

Steps to install the plugins

Download the plugins from the zip file locations and save them in a directory on your system (we’ll call it PLUGIN_DIRECTORY).
Start OpenSearch and OpenSearch Dashboards.

$ brew services start opensearch
$ brew services start opensearch-dashboards

Install all three plugins.

$ cd /usr/local/opt/opensearch
$ bin/opensearch-plugin install file:///PLUGIN_DIRECTORY/opensearch-ml-plugin-2.11.1.0.zip
$ bin/opensearch-plugin install file:///PLUGIN_DIRECTORY/opensearch-knn-2.11.1.0.zip
$ bin/opensearch-plugin install file:///PLUGIN_DIRECTORY/neural-search-2.11.1.0.zip

Restart OpenSearch. You can either use ‘restart’ or ‘stop’ and ‘start’.

$ brew services restart opensearch

Go to the dev console on your OpenSearch Dashboards and get a list of the plugins installed on your OpenSearch cluster. You should see all three plugins in the response.

GET _cat/plugins

2 Register and Deploy Embedding Models

Now, it is time to register and deploy the embedding models we need for this exercise.

2.1 Update Cluster Settings

First, we need to update our cluster settings to enable certain ML-specific features. For example, given that we’re operating a local cluster with a single node, we need to allow running a model on a non-ML node (by setting “only_run_on_ml_node” to false). In addition, sparse embedding models currently require the model URL to be passed in to OpenSearch at registration time. So, we need to enable registering models via a URL (by setting “plugins.ml_commons.allow_registering_model_via_url” to true). You can find out more about ML Commons cluster settings on this page and the security-specific settings on this page.

PUT /_cluster/settings
 {
     "transient": {
         "plugins.ml_commons.allow_registering_model_via_url": true,
         "plugins.ml_commons.only_run_on_ml_node": false,
         "model_access_control_enabled": true,
         "plugins.ml_commons.native_memory_threshold": 99
     }
 }

2.2 Register a Model Group

Next, we will register a model group to organize our models into groups. Note that this is not required, but it is best practice for access control purposes (see this page for more information on model access control). We will call our model group “neural_search_model_group” and set its access mode to “public”.

POST /_plugins/_ml/model_groups/_register
{
  "name": "neural_search_model_group",
  "description": "A group for models to support neural search",
  "access_mode": "public"
}

The above-mentioned request will return a response of the following form:

{
  "model_group_id": "oFFXN4wBEu5eBjm9rchM",
  "status": "CREATED"
}

We will use the “model_group_id” returned in the response to register our models with this model group. Now, it is time to register our models.

2.3 Register Models

For this exercise, we will use OpenSearch-provided pre-trained models. You can find the full list of pre-trained models on this page. We will register the four models that are used for the benchmarking study presented in this blog post.

For dense embeddings, we will use the DistilBERT model from Hugging Face. We’ll choose the latest model version available on this page (1.0.2). Because this is a pre-trained model, we only need to provide the model name, version, and format to register it.

POST /_plugins/_ml/models/_register
{
  "name": "huggingface/sentence-transformers/msmarco-distilbert-base-tas-b",
  "version": "1.0.2",
  "model_group_id": "oFFXN4wBEu5eBjm9rchM",
  "model_format": "TORCH_SCRIPT"
}

Registering a model is an asynchronous task. The request above will return a task ID.

{
  "task_id": "rlFPy4wBEu5eBjm9zMiS",
  "status": "CREATED"
}

You can check the status of the task using the following GET call.

GET /_plugins/_ml/tasks/rlFPy4wBEu5eBjm9zMiS

Once the task is completed, you will get a response like the following.

{
  "model_id": "r1FPy4wBEu5eBjm9zsgt",
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "COMPLETED",
  "worker_node": [
    "JUlkUOs2Qz-hIJFdSO3U4g"
  ],
  "create_time": 1704218053771,
  "last_update_time": 1704218146573,
  "is_async": true
}

We’ll next use the model ID returned in that response to deploy the model.

POST /_plugins/_ml/models/r1FPy4wBEu5eBjm9zsgt/_deploy

Once again, this is an asynchronous task. So, you can check its status using the following GET call.

GET /_plugins/_ml/tasks/sFHhy4wBEu5eBjm9jciS

Once the task is completed, you can test the model by requesting to get embeddings for a sample piece of text.

POST /_plugins/_ml/_predict/text_embedding/r1FPy4wBEu5eBjm9zsgt
{
  "text_docs": ["winter weather"]
}

We are now done with setting up a dense embedding model for this exercise.

The sparse embedding models we will use in this exercise are the pre-trained models mentioned on this page. The bi-encoder model is called opensearch-neural-sparse-encoding-v1. Recall that bi-encoder models are used during both ingestion and search. For sparse embedding models, we need to provide a model URL during registration. The following POST command will register the bi-encoder sparse embedding model.

POST /_plugins/_ml/models/_register
{
  "name": "neural-sparse/opensearch-neural-sparse-encoding-v1",
  "version": "1.0.1",
  "model_group_id": "oFFXN4wBEu5eBjm9rchM",
  "description": "This is a neural sparse encoding model: It transfers text into sparse vector, and then extract nonzero index and value to entry and weights. It serves in both ingestion and search.",
  "model_format": "TORCH_SCRIPT",
  "function_name": "SPARSE_ENCODING",
  "model_content_size_in_bytes": 492184214,
  "model_content_hash_value": "d1ebaa26615090bdb0195a62b180afd2a8524c68c5d406a11ad787267f515ea8",
  "url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip",
  "created_time": 1696913667239
}

Again, registering a model is an asynchronous task. Use the task ID returned by the above-mentioned request to check its status before deploying the model. Once the task is completed, deploy the model.

POST /_plugins/_ml/models/pFFrN4wBEu5eBjm9Ycjm/_deploy

Once the deployment task is completed, you can test the model by requesting to get embeddings for a sample piece of text.

POST /_plugins/_ml/_predict/sparse_encoding/pFFrN4wBEu5eBjm9Ycjm
{
  "text_docs": ["winter weather"]
}

We are now done with setting up a bi-encoder sparse embedding model for this exercise.

The document-only encoder model we will use is called opensearch-neural-sparse-encoding-doc-v1. The following POST command will register the document-only encoder model.

POST /_plugins/_ml/models/_register
{
  "name": "neural-sparse/opensearch-neural-sparse-encoding-doc-v1",
  "version": "1.0.1",
  "model_group_id": "oFFXN4wBEu5eBjm9rchM",
  "description": "This is a neural sparse encoding model: It transfers text into sparse vectors, and then extracts nonzero index and value to entry and weights. It serves only in ingestion and customer should use tokenizer model in query.",
  "model_format": "TORCH_SCRIPT",
  "function_name": "SPARSE_ENCODING",
  "model_content_size_in_bytes": 490620545,
  "model_content_hash_value": "9a41adb6c13cf49a7e3eff91aef62ed5035487a6eca99c996156d25be2800a9a",
  "url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v1-1.0.1-torch_script.zip",
  "created_time": 1696913667239
}

POST /_plugins/_ml/models/p1HOx4wBEu5eBjm9_shX/_deploy

Once the deployment task is completed, you can test the model by requesting to get embeddings for a sample piece of text.

POST /_plugins/_ml/_predict/sparse_encoding/p1HOx4wBEu5eBjm9_shX
{
  "text_docs": ["winter weather"]
}

We are now done with setting up a document-only encoder embedding model for this exercise. Recall that document-only encoder models are used only during ingestion. For search time, we will use a tokenizer model called opensearch-neural-sparse-tokenizer-v1. The following POST command will register the tokenizer model.

POST /_plugins/_ml/models/_register
  "name": "neural-sparse/opensearch-neural-sparse-tokenizer-v1",
  "version": "1.0.1",
  "model_group_id": "oFFXN4wBEu5eBjm9rchM",
  "description": "This is a neural sparse tokenizer model: It tokenizes input sentence into tokens and assigns a pre-defined weight from IDF to each token. It serves only in query.",
  "model_format": "TORCH_SCRIPT",
  "function_name": "SPARSE_TOKENIZE",
  "model_content_size_in_bytes": 567691,
  "model_content_hash_value": "b3487da9c58ac90541b720f3b367084f271d280c7f3bdc3e6d9c9a269fb31950",
  "url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/config.json",
  "created_time": 1696913667239
}

POST /_plugins/_ml/models/qlHfx4wBEu5eBjm9–8hb/_deploy

Once the deployment task is completed, you can test the model by requesting to get embeddings for a sample piece of text.

POST /_plugins/_ml/_predict/sparse_tokenize/qlHfx4wBEu5eBjm9–8hb
{
  "text_docs": ["winter weather"]
}

At this point, we have deployed all four of the models required for this exercise.

3 Create an Ingest Pipeline

Now, it is time to start building our ML ingestion pipeline to index embeddings. ML ingestion pipelines allow us to define processors to be used during indexing to generate dense/sparse embeddings. These processors take a “model_id” and a “field_map” as properties. The model ID indicates the model to be used to generate embeddings. The field map determines which input index field(s) should be used to generate embeddings and where to store the resulting embeddings for each such index field.

The sample request below defines an ML ingestion pipeline with three processors. The first one is a “text_embedding” processor that generates dense embeddings using the dense embedding model we deployed earlier. The model will be used to generate dense embeddings for any text stored in the index field “event_text” and the resulting embeddings will be stored in the index field “event_dense_embedding”. The second processor is a “sparse_encoding” processor that generates sparse embeddings using the bi-encoder sparse embedding model we deployed earlier. Notice that the input index field specified in the “field_map” is the same field as in the case of the first processor. The output field is a new one called “event_sparse_embedding”. The third processor is also a “sparse_encoding” processor, but, in this case, we use the document-only sparse embedding model we deployed earlier. Notice that the input index field specified in the “field_map” is again the same field as in the case of the first two processors. The output field is a new one called “event_doc_sparse_embedding”.

PUT /_ingest/pipeline/nlp-ingest-pipeline
{
  "description": "An ingest pipeline for multiple dense/sparse encodings",
  "processors": [
    {
      "text_embedding": {
        "model_id": "r1FPy4wBEu5eBjm9zsgt",
        "field_map": {
          "event_text": "event_dense_embedding"
        }
      }
    },
    {
      "sparse_encoding": {
        "model_id": "pFFrN4wBEu5eBjm9Ycjm",
        "field_map": {
          "event_text": "event_sparse_embedding"
        }
      }
    },
    {
      "sparse_encoding": {
        "model_id": "p1HOx4wBEu5eBjm9_shX",
        "field_map": {
          "event_text": "event_doc_sparse_embedding"
        }
      }
    }
  ]
}

4 Create an Index

Next, we will create an index called my-semantic-search-index. In order to use the ingestion pipeline we created earlier to generate embeddings during ingestion, we need to set the default pipeline in the index settings to nlp-ingest-pipeline. In addition, in order to index dense embeddings, we need to create a k-NN index by setting index.knn to true in the index settings.

Our index will have a field called “event_text” of type “text” into which we’ll ingest the text data. We’ll also have three fields where we’ll index the embeddings generated by the ingestion pipeline for any text we index into “event_text”. We’ll use the first embedding field, “event_dense_embedding”, to index the dense embeddings generated by the dense embedding model, the second embedding field, “event_sparse_embedding”, to index sparse embeddings generated by the bi-encoder sparse embedding model, and the third embedding field, “event_doc_sparse_embedding”, to index sparse embeddings generated by the document-only encoder sparse embedding model. Note that these field names are the same as the field names we defined in the field maps for the three processers in our ingestion pipeline. To save disk space, we will exclude these embedding fields from the document source in our index mappings.

Within the properties section of the index mappings, we define each of the index fields and indicate their mapping types. The “id” field is of type “keyword” and will be used to index a document ID. The field “event_text” is defined as of type “text”. The field “event_dense_embedding” is of type “knn_vector” with a dimension that matches the dimension of the dense embedding model we’ll use to generate the embeddings that will be indexed in that field. OpenSearch uses the “rank_features” field type to store token weights generated by sparse embedding models. So, the two index fields where we’ll index sparse embeddings are defined as of type “rank_features”.

PUT /my-semantic-search-index
{
  "settings": {
    "index.knn": true,
    "default_pipeline": "nlp-ingest-pipeline"
  },
  "mappings": {
    "_source": {
      "excludes": [
        "event_dense_embedding",
        "event_sparse_embedding",
        "event_doc_sparse_embedding"
      ]
    },
    "properties": {
      "id": {
        "type": "keyword"
      },
      "event_text": {
        "type": "text"
      },
      "event_dense_embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "engine": "lucene",
          "space_type": "l2",
          "name": "hnsw",
          "parameters": {}
        }
      },
      "event_sparse_embedding": {
        "type": "rank_features"
      },
      "event_doc_sparse_embedding": {
        "type": "rank_features"
      }
    }
  }
}

5 Index Documents

Next, we’ll add documents into our index. We’ll use OpenSearch’s Index API to put documents in the index. See Appendix A for sample PUT requests to add documents with “event_text” field values. The texts used in the requests were generated by ChatGPT for prompts that requested event descriptions for four different event types: i) family and kids, ii) dramatic writing class, iii) hip hop/rap party, and iv) classical music event.

6 Search

Now, it’s time to perform search requests against our index to demonstrate how dense/sparse neural search works. Our first query is “novels”. First, we’ll perform a “match” query against the “event_text” field to make sure the query we’re using in this example doesn’t match any of the documents in our index using lexical search. The following request should return an empty response. Although not demonstrated here, you can also check to make sure the singular form of this query “novel” doesn’t match any documents in the index using lexical search, either.

GET /my-semantic-search-index/_search
{
  "query": {
    "match": {
      "event_text": "novels"
    }
  }
}

6.1 Dense Neural Search

Now, let’s search for “novels” using dense neural search. To do that, we’ll use the neural query clause of the k-NN plugin API. Notice that we’re performing the query against the “event_dense_embedding” field where we indexed the dense embeddings coming out of the ingestion pipeline. The model ID is the same ID as the one used during ingestion. In the request below, we’re using the same model to generate embeddings for the search query at hand.

GET /my-semantic-search-index/_search
{
  "query": {
    "neural": {
      "event_dense_embedding": {
        "query_text": "novels",
        "model_id": "r1FPy4wBEu5eBjm9zsgt",
        "k": 100
      }
    }
  }
}

The response returned by OpenSearch is included in Appendix B. Notice that all four documents in our index are returned in the response. The event for the dramatic writing class is ranked the highest although the relevance scores for all events are very close to each other.

6.2 Sparse Neural Search with Bi-encoder Mode

Now, let’s try searching for the same query using sparse neural search with bi-encoder mode. To perform sparse neural search, we’ll use neural_sparse query of the query DSL and specify the model ID we want to use to generate the sparse embeddings for our search query. Note that neural_sparse query also requires a max_token_score parameter for performance optimization purposes. You can find out more about the max_token_score parameter in the OpenSearch documentation for neural_sparse query. In this case, we set max_token_score to 3.5 because that’s the value OpenSearch recommends using with opensearch-neural-sparse-encoding-v1 model.

GET /my-semantic-search-index/_search
{
  "explain": false, 
  "query": {
    "neural_sparse": {
      "event_sparse_embedding": {
        "query_text": "novels",
        "model_id": "pFFrN4wBEu5eBjm9Ycjm",
        "max_token_score": 3.5
      }
    }
  }
}

The response returned by OpenSearch is included in Appendix C. Notice that this request also returns all four documents in our index. However, in this case, the event for the dramatic writing class is identified as a clear winner based on its relevance score.

6.3 Sparse Neural Search with Document-only Encoder Mode

Now, let’s try searching for the same query using the document-only encoder mode. Once again, we’ll use neural_sparse query. Recall that, for document-only encoder mode, we only use an encoder model at ingestion time. At search time, we use a tokenizer model. Hence, the model ID in this case is the one for the tokenizer model opensearch-neural-sparse-tokenizer-v1. Also note that, in this case, we set max_token_score to 2 because that’s the value OpenSearch recommends using with opensearch-neural-sparse-tokenizer-v1 model.

GET /my-semantic-search-index/_search
{
  "explain": false, 
  "query": {
    "neural_sparse": {
      "event_doc_sparse_embedding": {
        "query_text": "novels",
        "model_id": "qlHfx4wBEu5eBjm9-8hb",
        "max_token_score": 2
      }
    }
  }
}

The response returned by OpenSearch is included in Appendix D. Note that this request only returns the event for the dramatic writing class.

6.4 Other Search Queries

Here are some other search queries you can try out to compare the results returned by the three neural search options we demonstrated in this section.

“school holiday outing”
“rhythm blues”
“author”
“rave”

Feel free to try out other search queries and/or index your own data and try sample queries from your own domain.

7 Summary

In this blog post, we demonstrated how to use OpenSearch’s neural search support in a local environment using OpenSearch-provided pre-trained models.

Appendix A: Indexing Documents

PUT /my-semantic-search-index/_doc/1
{
  "event_text": """Join us for a day of laughter, joy, and endless smiles at our spectacular Family and Kids Event!
This enchanting celebration is designed to bring families together for a memorable experience filled
with excitement and wholesome activities.
Highlights:
1. Magical Adventure Zone:
Step into a world of enchantment where your little ones can embark on a magical journey. From face painting
to interactive story sessions, the Adventure Zone promises endless wonders for the young at heart.
2. Arts and Crafts Wonderland:
Unleash your creativity in our Arts and Crafts Wonderland! Join hands with your kids to create beautiful
masterpieces and cherished memories together. From DIY crafts to painting stations, let imagination run wild.
3. Live Entertainment & Performances:
Be captivated by live performances that will leave the whole family in awe. Magicians, puppet shows, and
lively music will fill the air with laughter and cheer.
4. Delicious Delights:
Indulge in a variety of mouthwatering treats from our food stalls. From cotton candy to popcorn, there's
something to satisfy every craving.
5. Family Games & Contests:
Engage in friendly competition with our exciting games and contests. Prizes await the winners, but the real
reward is the bonding experience with your loved ones.
6. Petting Zoo & Pony Rides:
Connect with furry friends at our delightful petting zoo and give your little ones the joy of pony rides.
A perfect opportunity for animal lovers of all ages.
7. Photo Booth Fun:
Capture the magic of the day at our themed photo booths. Dress up in costumes, strike a pose, and take home
cherished snapshots to remember the event.
Don't miss out on this wonderful day of family bonding and laughter! Bring your loved ones and create lasting
memories at our Family Fun Extravaganza. Purchase your tickets now and let the festivities begin!""",
  "id": "d1"
}

PUT /my-semantic-search-index/_doc/2
{
  "event_text": """Unlock the Power of Pen: Dramatic Writing Class
Embark on a journey into the captivating world of storytelling with our exhilarating Dramatic Writing Class!
Whether you're a seasoned wordsmith or a budding storyteller, this course promises to ignite your imagination,
refine your narrative skills, and unleash the dramatist within you.
What to Expect:
1. Craft Compelling Narratives:
Dive deep into the art of storytelling as we explore the intricacies of plot development, character arcs, and
creating tension. Uncover the secrets of crafting narratives that leave a lasting impact.
2. Character Development Workshop:
Bring your characters to life! Learn the techniques to create multidimensional characters with relatable motivations
and authentic voices. Delve into the psychology of your characters and make them resonate with your audience.
3. Mastering Dialogue:
Discover the power of dialogue in creating riveting scenes. From natural conversations to intense confrontations,
our class will guide you in crafting dialogues that ring true and enhance the emotional depth of your writing.
4. Exploring Genres:
Whether your heart lies in drama, suspense, or comedy, our class will help you navigate various genres. Unleash
your creativity by experimenting with different styles, and find your unique voice as a dramatic writer.
5. Feedback and Critique Sessions:
Engage in constructive feedback sessions with both peers and the instructor. Learn to give and receive feedback
effectively, fostering a supportive environment that nurtures growth and improvement.
6. Writing Exercises and Prompts:
Sharpen your skills through engaging writing exercises and thought-provoking prompts. Challenge yourself to think
outside the box and push the boundaries of your creativity.
7. Industry Insights:
Gain valuable insights into the world of dramatic writing, including tips on submitting your work, approaching
agents, and navigating the publishing or scriptwriting industry.
Enrollment Details:
Limited spots available
Ignite your passion for dramatic writing and transform your stories into compelling masterpieces! Reserve your
spot now for an unforgettable journey into the heart of storytelling. Unleash the storyteller within you and
let your words resonate with the world.""",
  "id": "d2"
}

PUT /my-semantic-search-index/_doc/3
{
  "event_text": """Urban Groove Explosion: Hip Hop & Rap Extravaganza!
Get ready to set the stage on fire at the hottest Hip Hop & Rap Party in town! Join us for a night of non-stop
beats, slick rhymes, and an electrifying atmosphere that will have you moving and grooving all night long.
Highlights:
1. Live Performances by Local Talents:
Immerse yourself in the raw energy of local hip hop and rap artists who will be taking the stage by storm.
From lyrically intense performances to mind-blowing beats, get ready for an unforgettable showcase of talent.
2. DJ Showdown:
Our top DJs will be spinning the latest and greatest hip hop and rap tracks, keeping the dance floor alive
with pulsating beats. Expect a mix of old school classics and chart-topping hits that will keep you on your
feet all night.
3. Freestyle Cypher Sessions:
Channel your inner wordsmith and step up to the mic during our freestyle cypher sessions. It's a chance to
showcase your lyrical prowess or simply enjoy the spontaneous flow of words from the crowd.
4. Urban Dance Battles:
Witness jaw-dropping dance battles featuring the slickest moves and mind-blowing choreography. Whether you're
a seasoned dancer or just love to vibe to the rhythm, these battles will leave you in awe.
5. Graffiti Art Showcase:
Immerse yourself in the visual side of hip hop culture with a live graffiti art showcase. Talented artists
will be creating vibrant pieces right before your eyes, adding an extra layer of urban flair to the atmosphere.
6. Street Food Delights:
Refuel with mouthwatering street food options that capture the essence of the urban scene. From gourmet sliders
to creative fusion bites, satisfy your cravings while enjoying the beats.
Dress Code:
Embrace the urban vibe with your freshest streetwear, sneakers, and hip hop swag!
Get ready for an epic night of rhythm, rhymes, and urban vibes at the Urban Groove Explosion! Grab your crew,
secure your tickets, and let's make this an unforgettable night of hip hop and rap celebration,
#UrbanGrooveExplosion #HipHopParty #RapRevolution""",
  "id": "d3"
}

PUT /my-semantic-search-index/_doc/4
{
  "event_text": """Harmony in Elegance: A Night of Classical Splendor
Indulge your senses in an enchanting evening of refined melodies and timeless compositions at "Harmony in Elegance."
Join us for a classical music event that promises to transport you to a world of sophistication and emotive beauty.
Program Highlights:
1. Masterful Orchestral Performance:
Immerse yourself in the sublime sounds of a world-class orchestra as they bring to life classical masterpieces
from renowned composers. Experience the rich textures and emotive nuances of timeless symphonies.
2. Virtuoso Solo Performances:
Witness the breathtaking skill of accomplished soloists as they showcase their artistry on instruments like the
violin, cello, and piano. Let their virtuosity leave you spellbound in a cascade of musical brilliance.
3. Chamber Music Intimacy:
Delight in the intimacy of chamber music as talented ensembles perform in perfect harmony. Experience the interplay
of instruments in an up-close and personal setting, revealing the intricate beauty of classical compositions.
4. Timeless Compositions Reimagined:
Be captivated by innovative interpretations of classical compositions, breathing new life into familiar pieces
while preserving the essence and charm that has endured through the ages.
5. Acoustic Brilliance:
Revel in the acoustics of our carefully chosen venue, ensuring that every note and nuance resonates with clarity,
allowing you to fully appreciate the depth and emotion within each musical piece.
Dress Code:
Embrace the elegance of the evening with semi-formal or formal attire.
Intermission Delights:
Indulge in the refinement of our intermission reception, featuring light refreshments and an opportunity to
mingle with fellow classical music enthusiasts.
Don't miss this extraordinary evening of musical refinement and emotional resonance. "Harmony in Elegance"
invites you to bask in the timeless beauty of classical music. Secure your tickets now for a night that
promises to be a symphony for the soul.
#HarmonyInElegance #ClassicalMusicEvening #MusicalMastery""",
  "id": "d4"
}

Appendix B: Dense Neural Search Response

To save space, we only include the source “id” field in the response.

{
  "took": 42,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 0.013666019,
    "hits": [
      {
        "_index": "my-semantic-search-index",
        "_id": "2",
        "_score": 0.013666019,
        "_source": {
          "id": "d2"
        }
      },
      {
        "_index": "my-semantic-search-index",
        "_id": "4",
        "_score": 0.012640561,
        "_source": {
          "id": "d4"
        }
      },
      {
        "_index": "my-semantic-search-index",
        "_id": "1",
        "_score": 0.011391505,
        "_source": {
          "id": "d1"
        }
      },
      {
        "_index": "my-semantic-search-index",
        "_id": "3",
        "_score": 0.010170446,
        "_source": {
          "id": "d3"
        }
      }
    ]
  }
}

Appendix C: Sparse Neural Search with Bi-encoder Mode Response

To save space, we only include the source “id” field in the response.

{
  "took": 83,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 3.1848598,
    "hits": [
      {
        "_index": "my-semantic-search-index",
        "_id": "2",
        "_score": 3.1848598,
        "_source": {
          "id": "d2"
        }
      },
      {
        "_index": "my-semantic-search-index",
        "_id": "3",
        "_score": 0.69234645,
        "_source": {
          "id": "d3"
        }
      },
      {
        "_index": "my-semantic-search-index",
        "_id": "1",
        "_score": 0.38709638,
        "_source": {
          "id": "d1"
        }
      },
      {
        "_index": "my-semantic-search-index",
        "_id": "4",
        "_score": 0.34997076,
        "_source": {
          "id": "d4"
        }
      }
    ]
  }
}

Appendix D: Document-Only Encoder Mode Response

To save space, we only include the source “id” field in the response.

{
  "took": 18,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.13542217,
    "hits": [
      {
        "_index": "my-semantic-search-index",
        "_id": "2",
        "_score": 0.13542217,
        "_source": {
          "id": "d2"
        }
      }
    ]
  }
}

Neural Search on OpenSearch

1 Prerequisites

1.1 Install OpenSearch

1.2 Install OpenSearch Dashboards

1.3 Install the Necessary OpenSearch Plugins

2 Register and Deploy Embedding Models

2.1 Update Cluster Settings

2.2 Register a Model Group

2.3 Register Models

3 Create an Ingest Pipeline

4 Create an Index

5 Index Documents

6 Search

6.1 Dense Neural Search

6.2 Sparse Neural Search with Bi-encoder Mode

6.3 Sparse Neural Search with Document-only Encoder Mode

6.4 Other Search Queries

7 Summary

Appendix A: Indexing Documents

Appendix B: Dense Neural Search Response

Appendix C: Sparse Neural Search with Bi-encoder Mode Response

Appendix D: Document-Only Encoder Mode Response

Written by Zelal Gungordu