Using AWS OpenSearch

How to add a search layer to DynamoDB

Published in

NEXT Engineering

3 min read6 days ago

We adopted OpenSearch to enhance our data handling capabilities. In this blog post, I’ll share why we chose OpenSearch, its core concepts, and how we integrate and debug it within our systems.

Why OpenSearch?

Before transitioning to OpenSearch, we relied on in-memory search techniques with DynamoDB. However, this approach struggled with scalability and efficiency, especially when dealing with large datasets and complex queries involving vector searches. OpenSearch provides several critical advantages:

Scalability: Efficiently handles large datasets without significant performance degradation.
Aggregation: Supports complex filters and aggregations, enabling sophisticated data analytics and dashboards.
Combined Searches: Facilitates both vector and traditional keyword searches, allowing for versatile querying options.

However, we found one downside: While our complete stack is fully serverless and elastic, OpenSearch isn’t really. AWS bills for OpenSearch independently of whether there is any usage.

Core Concepts of OpenSearch

Understanding OpenSearch’s core concepts is crucial for effective implementation:

Items: The fundamental entities stored in OpenSearch, such as recordings, highlights, and clusters.
Indexing: Organizes data for efficient retrieval. Each item type has its own index, defined by mappings that specify how data fields should be indexed.
Mappings: Define how various data fields are treated. For example, titles might be indexed for free text search but not for aggregation, while dates are indexed for range queries.
Aliases: Allow seamless switching between different indexes, ensuring minimal disruption during updates.

Feeding Data into OpenSearch

We maintain DynamoDB as our source of truth, leveraging its reliability for data storage. Data is then indexed into OpenSearch through two primary methods:

Bootstrapping: We have a tool which creates the OpenSearch index and then transfers all relevant data from DynamoDB.
Continuous Updates: A Lambda function ensures real-time synchronization by updating OpenSearch with any changes made in DynamoDB. The Lambda function is triggered by DynamoDB stream events.

Continuous update flow between DynamoDB and OpenSearch

Denormalization

Denormalization is vital for efficient querying in OpenSearch. Unlike normalized databases, OpenSearch requires denormalized data to avoid complex joins. For instance, if we need to filter highlights based on labels attached to recordings, we store these labels directly within the highlight records.

Denormalization is done by a Lambda function which gets triggered by relevant changes in DynamoDB. We store the denormalized data as well in DynamoDB, so it can be easily monitored and debugged.

Troubleshooting OpenSearch queries

OpenSearch has a very powerful query syntax, which on the flipside means it’s also prone to human errors.

OpenSearch comes with a UI called OpenSearch Dashboard, which allows debugging indexes, mappings, and aliases. Most importantly, it also allows us to send and refine queries directly.

We use these steps to debug queries with the OpenSearch Dashboard:

Trigger the Query: Initiate the query via our app
Retrieve the Query: Locate the specific query in CloudWatch logs
Analyze in Dashboard: Paste the query into OpenSearch Dashboards, refine it, and test different parameters to diagnose the problem.
Fix: Bring the modify query back to the source code

Conclusion

OpenSearch enabled us to efficiently manage large datasets and complex queries. By understanding its core concepts, implementing effective data integration strategies, and leveraging its powerful debugging tools, we continue to optimize our search for users.

Happy coding!