Building resilient Amazon OpenSearch cluster with AWS CDK (part 4)

Mikhail Chumakov
Life at Apollo Division

--

Set up access to cluster from Lambdas

We’ve configured access to the cluster for our engineers, now we should set up access to the cluster for our code. The idea is the same as we did above, with small differences. When we define a Lambda function, it comes with an automatically generated Role (unless we explicitly provide one). This role will then be assumed by the Lambda function during execution. To provide access for Lambda to OS cluster, we need to configure the mapping between OS backend role and Lambda’s role. Instead of mapping each lambda role one to one to OS backend role, we will gather Lambdas in logical groups, then we create a backend role per group and map Lambda roles to it. We already have everything for that — we can reuse the function we created previously to configure OS cluster.

Let’s assume that somewhere in our AWS CDK code, we need to configure Lambda access, in this case, our helper method could look like this:

In the first request, we create a role with the name group1 and give it cluster_composite_ops permissions on the cluster level and full access on the index level. This configuration is just for demonstration purposes, never use it in a production environment, and always follow the principle of Least Privilege when configuring access for the backend role (you can read more about OS service permissions in this article). In the second request, we add a mapping between group1 role and lambda roles.

Important: because we chose IAM for our master user, all requests to the cluster must be signed using AWS Signature Version 4.

Here is a good article on Amazon OpenSearch service documentation with examples for different runtimes of how to sign HTTP requests to the cluster. Unfortunately, there is no example for dotnet (most of our Lambdas are written on dotnet). Following the principle of “do not reinvent the wheel”, we did some investigation and found a great package for dotnet NEST client which solves the problem. Now we are ready to go further.

Set up cross-cluster replication

Let’s assume that we resolved all questions we mentioned above and rolled out infrastructure in the second region. It’s time to set up a cross-cluster connection between domains. Because this step can be automated only partially (but maybe we didn’t find the way to do this, feel free to give us advice in the comments), we decided to execute it manually. The easiest way to connect domains is through the Connections tab of the domain dashboard (this part can be automated). Then OpenSearch Service sends a connection request to the destination domain for approval. When the destination domain approves the request (this part can’t be automated), you can begin replication.

There are several ways you can set up data replication between clusters:

  • you can replicate the dedicated index
  • you can replicate indexes using a wildcard pattern
  • you can create auto-follow rules which allow you to automatically replicate indexes created on the leader cluster based on matching patterns.

In our case, auto-follow rules fit better than other options.

Replication rules are a collection of patterns that you create against a single follower cluster. When you create a replication rule, it starts by automatically replicating any existing indexes that match the pattern. It will then continue to replicate any new indexes that you create that match the pattern.

Request example of creating a replication rule on the follower cluster:

{
"leader_alias" : "my-connection-alias",
"name": "replication-rule-name",
"pattern": "my-index*",
"use_roles":{
"leader_cluster_role": "all_access",
"follower_cluster_role": "all_access"
}
}

It is worth mentioning here that you need to specify the leader and follower cluster roles that OpenSearch uses to authenticate requests. As we already mentioned above in this article, we use admin user with all_access role for simplicity, but it is recommended to create a replication user on each cluster and map it accordingly (you can find some recommendations here).

Now our indexes are replicating between regions, and we are ready to jump into the next step — region switch in case the main one experiences an outage.

We are ACTUM Digital and this piece was written by Mikhail Chumakov, Senior .NET Developer of Apollo Division. Feel free to get in touch.

--

--