AWS Opensearch/Elasticsearch secure IAM for bulk index operation
This is a very short post, that mainly addresses providing minimal permissions for AWS Opensearch through IAM policies. Recommendation from AWS while setting up IAM policies for Opensearch says restrictions can be applied at the index level. Just what I needed as well, as I was having multiple indexes in a Opensearch domain and they had to be restricted to the jobs that were writing to it.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::123456789012:user/test-user"
]
},
"Action": [
"es:*"
],
"Resource": "arn:aws:es:us-west-1:987654321098:domain/test-domain/test-index*"
}
]
}
Above Policy works perfectly fine for all of the es operations, like create, delete and search indexes and restricts it only to the test-index
. Access to all other indexes are blocked which is exactly what we need.
Issue with Bulk Indexing
I used opensearch-py and used the helpers.bulk api method within the python client which has the below syntax
def bulk(client, actions, stats_only=False, ignore_status=(), *args, **kwargs):
While invoking this api with the index restricted IAM policy, there were errors stating that no IAM policy allows EsHttpPost
"errorReason": "AuthorizationException(403, '{\"Message\":\"User: arn:aws:sts::123456789012:assumed-role/test-role/AssumedRoleSession1 is not authorized to perform: es:ESHttpPost because no identity-based policy allows the es:ESHttpPost action\"}')
Digging deeper into the documentation of Opensearch, I was able to figure out that the bulk operations can be executed in two ways:
The helpers.bulk method does the option 1 as no indexes are being passed, so the constructed URL will look like /test-domain/_bulk
but our IAM policy only allows /test-domain/test-index*
. Okay, there is the root-cause, what’s the solution.
Solution 1: Easy fix but Not Secure
In IAM policy Resource section add another entry for the _bulk at the domain level. This will solve the problem and allow the _bulk
, but allows the application or the pipeline to perform bulk index on any index within the domain (Easy fix but not Secure)
"Action": [
"es:*"
],
"Resource": [
"arn:aws:es:us-west-1:987654321098:domain/test-domain/test-index*",
"arn:aws:es:us-west-1:987654321098:domain/_bulk"
]
Solution 2: Right way to solve this
The bulk api looks like below, even though it doesn’t take any index name at the parameter level, it still takes the **kwargs.
def bulk(client, actions, stats_only=False, ignore_status=(), *args, **kwargs):
As long as the **kwargs are passed on to the underlying low level method that constructs the final URL of the the opensearch, then we are good. The final method was client.bulk which has index name in its parameter and the previous method that leads to client.bulk, _process_bulk_chunk passes the **kwargs
, so all good :)
So in my code where i invoke helpers.bulk
I changed it to pass index
field explicitly
This is exactly what **kwargs is supposed to be used for, to code libraries in python and it does exactly that in the opensearch-py library as well. This is very clearly explained in the Fluent-Python book and in real-python
Here is a small example showing the usage of kwargs (this is what is happening within the opensearch-py library too):