How to create an Opensearch cluster using Terraform

Hossein Heydari
HeyJobs Tech
Published in
8 min readFeb 15, 2023

--

Introduction

Recently at HeyJobs, we decided to migrate from a third party search service to our own search service.
I was in charge of provisioning the search engine infrastructure (in our case Opensearch), I had to configure custom domain for Opensearch dashboard, follow fine-grained access control concept and setup a private network for Opensearch to be only accessible through VPN.

As I always follow infra-as-code approach, everything is written in Terraform. I thought this could be a good topic to write about today :)
In this article I’m gonna explain, how to setup the resources mentioned above as Terraform module and walk you through every piece of code.

TL;DR

In case you’re just looking for the whole source code, take a look the Github repository below:

https://github.com/binary01ninja/Opensearch-terraform-module/tree/master

Prerequisites

Before we move ahead, I need to explain that when I was creating this module I already had some of the data sources I needed such as the VPC, subnets, Route53 hosted zone and ACM certificate.

Most probably you already have VPC and subnets configured (AWS default VPC), but you may need to create Route53 hosted zone and ACM certificate.

Links below can help you to add Route53 hosted zone and ACM certificate resources to your Terraform codebase:

ACM Certificate

Route53 Zone

In my case I had to create a data.tf file to keep all my data resources, since this a module that is gonna be re-usable most of the attributes will be defined as variables, so let’s create a variables.tf :

variable "vpc" { type = string }
variable "hosted_zone_name" { type = string }

And data.tf looks like the below:

data "aws_vpc" "selected" {
tags = {
Name = var.vpc
}
}

data "aws_subnets" "private" {
filter {
name = "tag:Name"
values = ["*private*"]
}
filter {
name = "vpc-id"
values = [data.aws_vpc.selected.id]
}
}


data "aws_route53_zone" "opensearch" {
name = var.hosted_zone_name
}
data "aws_acm_certificate" "opensearch" {
domain = var.hosted_zone_name
}

If you don’t have the word private in your subnets’ names, you can filter them based on other conditions, that is up to you.

We need AWS caller identity and AWS region data sources so you also need to add them to data.tf file:

data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

What resources do we need?

The list below contains all the services we need to create to reach our goal.

  • Opensearch Domain
  • Route53 Record (To have a custom subdomain/domain for Opensearch API/Dashboard)
  • Security Group (We need a security group to control the incoming traffic towards Opensearch)
  • Cloudwatch Log Groups (To enable index, search, audit and etc logs for Opensearch)
  • SSM Parameter(To store Opensearch masteruser and password)

We keep all the configuration in a file named main.tf.

Locals and Variables

Since we’re developing an Opensearch module, it should be flexible to be reused later, so most of the specifications are defined as variables.

In main.tf Create a terraform block and define Terraform Version, I defined 1.3.3 as Terraform version:

terraform {
required_version = "~> 1.3.3"
}

Since we’re going to refer to some values multiple times while creating resources, it’s better to define them in a Locals block:

locals {
domain = "${var.service}-engine"
custom_domain = "${local.domain}.${data.aws_route53_zone.opensearch.name}"
subnet_ids = slice(data.aws_subnets.private.ids, 0, var.instance_count)
master_user = "${var.service}-masteruser"
}

domain is the name of Opensearch engine.

custom_domain is the subdomain under Route53 zone that we created earlier, it helps with accessing Opensearch Dashboard and API, in my case this is a data resources because it was already created, you might need to reference provider resource if you’re just creating it.

subnet_ids will be an array of subnets that were retrieved by the data resource.

When zone_awareness_enabled is set to true, AWS distributes nodes across availability zones to prevent downtime and data loss, making the cluster fault tolerant in cases when a zone is experiencing service disruption.

To do this distribution you need to make sure that the amount of subnets in Opensearch VPC configuration and master nodes defined are the same, since for production we have three dedicated master nodes, using slice function only give us three subnets even if there are more available.

slice(list, startindex, endindex)

In case the number of your cluster nodes configuration and availability zones do not make sense (one subnet and three master nodes for example), while applying the terraform configuration, AWS API will raise errors.

For more information about configuring a multi-AZ Opensearch domian, take a look at this article.

Update variables.tf , since this is a module it makes sense to let every attribute to be a variable:

variable "security_options_enabled" { type = bool }
variable "volume_type" {
type = string
}
variable "throughput" {
type = number
}
variable "ebs_enabled" {
type = bool
}
variable "ebs_volume_size" {
type = number
}
variable "service" { type = string }
variable "instance_type" { type = string }
variable "instance_count" { type = number }
variable "dedicated_master_enabled" {
type = bool
default = false
}
variable "dedicated_master_count" {
type = number
default = 0
}
variable "dedicated_master_type" {
type = string
default = null
}
variable "zone_awareness_enabled" {
type = bool
default = false
}
variable "engine_version" {
type = string
}

AWS VPN and Security Groups

We need to create a security group for Opensearch cluster and only allow incoming traffic over port 443 on TCP protocol withing VPC.

In our case we needed to call Opensearch API with Postman or access its dashboard via browser, at the same time we didn’t want to make Opensearch publicly accessible on Internet, that’s why we have an AWS VPN configured.

AWS VPN works as a gateway between AWS VPCs and other networks, so if you are connected to AWS VPN, that basically means you are connected to a VPC and you can access internal resources available within that VPC.

As we want our cluster to be only accessible via VPN, we need to define an ingress rule for Opensearch security group to do security group referencing that allows incoming traffic from VPN.

Go on and add the resources below to main.tf :

resource "aws_security_group" "opensearch_security_group" {
name = "${local.domain}-sg"
vpc_id = data.aws_vpc.selected.id
description = "Allow inbound HTTP traffic"

ingress {
description = "HTTP from VPC"
from_port = 443
to_port = 443
protocol = "tcp"

cidr_blocks = [
data.aws_vpc.selected.cidr_block,
]
}
}

resource "aws_security_group_rule" "allow_opensearch_ingress_vpn" {
type = "ingress"
from_port = 443
to_port = 443
protocol = "tcp"
source_security_group_id = # VPN's security group source Id
security_group_id = aws_security_group.opensearch_security_group.id
description = "Allow connections from AWS VPN"
}

Cloudwatch Logs

We’re going to add log_publishing_options on Opensearch cluster configuration, so we need to create several cloudwatch log groups to keep the log types below:

  1. INDEX_SLOW_LOGS
  2. SEARCH_SLOW_LOGS
  3. ES_APPLICATION_LOGS

If you need to know what each log type does and how it works, check this link.

On main.tf add resources below:

resource "aws_cloudwatch_log_group" "opensearch_log_group_index_slow_logs" {
name = "/aws/opensearch/${local.domain}/index-slow"
retention_in_days = 14
}


resource "aws_cloudwatch_log_group" "opensearch_log_group_search_slow_logs" {
name = "/aws/opensearch/${local.domain}/search-slow"
retention_in_days = 14
}


resource "aws_cloudwatch_log_group" "opensearch_log_group_es_application_logs" {
name = "/aws/opensearch/${local.domain}/es-application"
retention_in_days = 14
}

Add cloudwatch log resource policy:


resource "aws_cloudwatch_log_resource_policy" "opensearch_log_resource_policy" {
policy_name = "${local.domain}-domain-log-resource-policy"

policy_document = <<CONFIG
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "es.amazonaws.com"
},
"Action": [
"logs:PutLogEvents",
"logs:PutLogEventsBatch",
"logs:CreateLogStream"
],
"Resource": [
"${aws_cloudwatch_log_group.opensearch_log_group_index_slow_logs.arn}:*",
"${aws_cloudwatch_log_group.opensearch_log_group_search_slow_logs.arn}:*",
"${aws_cloudwatch_log_group.opensearch_log_group_es_application_logs.arn}:*"
],
"Condition": {
"StringEquals": {
"aws:SourceAccount": "${data.aws_caller_identity.current.account_id}"
},
"ArnLike": {
"aws:SourceArn": "arn:aws:es:eu-central-1:${data.aws_caller_identity.current.account_id}:domain/${local.domain}"
}
}
}
]
}
CONFIG
}

Opensearch MasterUser

We create Opensearch Master user and password and save it as SSM parameter, the password will be created while creating the infrastructure randomly and it’s not hardcoded. Once Terraform is done with creating resources, you can check the password on Systems manager service on AWS console.

resource "random_password" "password" {
length = 32
special = true
}

resource "aws_ssm_parameter" "opensearch_master_user" {
name = "/service/${var.service}/MASTER_USER"
description = "opensearch_password for ${var.service} domain"
type = "SecureString"
value = "${local.master_user},${random_password.password.result}"
}

Opensearch Domain

And finally here’s Opensearch domain resource:

resource "aws_opensearch_domain" "opensearch" {
domain_name = local.domain
engine_version = "OpenSearch_${var.engine_version}"

cluster_config {
dedicated_master_count = var.dedicated_master_count
dedicated_master_type = var.dedicated_master_type
dedicated_master_enabled = var.dedicated_master_enabled
instance_type = var.instance_type
instance_count = var.instance_count
zone_awareness_enabled = var.zone_awareness_enabled
zone_awareness_config {
availability_zone_count = var.zone_awareness_enabled ? length(local.subnet_ids) : null
}
}

advanced_security_options {
enabled = var.security_options_enabled
anonymous_auth_enabled = true
internal_user_database_enabled = true
master_user_options {
master_user_name = local.master_user
master_user_password = random_password.password.result
}
}

encrypt_at_rest {
enabled = true
}

domain_endpoint_options {
enforce_https = true
tls_security_policy = "Policy-Min-TLS-1-2-2019-07"

custom_endpoint_enabled = true
custom_endpoint = local.custom_domain
custom_endpoint_certificate_arn = data.aws_acm_certificate.opensearch.arn
}

ebs_options {
ebs_enabled = var.ebs_enabled
volume_size = var.ebs_volume_size
volume_type = var.volume_type
throughput = var.throughput
}

log_publishing_options {
cloudwatch_log_group_arn = aws_cloudwatch_log_group.opensearch_log_group_index_slow_logs.arn
log_type = "INDEX_SLOW_LOGS"
}
log_publishing_options {
cloudwatch_log_group_arn = aws_cloudwatch_log_group.opensearch_log_group_search_slow_logs.arn
log_type = "SEARCH_SLOW_LOGS"
}
log_publishing_options {
cloudwatch_log_group_arn = aws_cloudwatch_log_group.opensearch_log_group_es_application_logs.arn
log_type = "ES_APPLICATION_LOGS"
}

node_to_node_encryption {
enabled = true
}

vpc_options {
subnet_ids = local.subnet_ids

security_group_ids = [aws_security_group.opensearch_security_group.id]
}


access_policies = <<CONFIG
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "es:*",
"Principal": "*",
"Effect": "Allow",
"Resource": "arn:aws:es:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:domain/${local.domain}/*"
}
]
}
CONFIG
}

What are dedicated master nodes?

Dedicated master nodes are to increase cluster stability, they perform cluster management task but do not store the data or respond to data requests.
They manage number of cluster indexes, track all the nodes, change cluster states when applicable and replicate cluster changes across all the nodes in the cluster.

The recommended number of dedicated master nodes for production environment is three.

You cannot specify one dedicated master node because when you only have one dedicated master node, in case of failure there are no other nodes to back things up and manage the cluster.
Two master nodes is also not possible, since in case of failure nodes do an election to choose a master, even number of nodes can ruin the election result.
So the recommended number of dedicated master nodes is three.

For more information, take a look at this article.

What are data nodes?

Data nodes are responsible for holding data and performing data related operations such as searching, indexing, deleting and etc.

The instance_type and instance_count in cluster config, are type and number of your data node instances.

What are EBS Options?

From AWS official documentations:

An Amazon EBS volume is a durable, block-level storage device that you can attach to your instances. After you attach a volume to an instance, you can use it as you would use a physical hard drive.

So you can attach an EBS volume to your data nodes, the ebs options block in opensearch configuration specifies attributes related to your EBS volume such as type, size and etc.

For more information, take a look at this article.

Opensearch Custom Subdomain with Route53

AWS will generate two random URLs for Opensearch dashboard and API, You need to create a CNAME Route53 Record that points at those two URLs.

resource "aws_route53_record" "opensearch_domain_record" {
zone_id = data.aws_route53_zone.opensearch.zone_id
name = local.custom_domain
type = "CNAME"
ttl = "300"

records = [aws_opensearch_domain.opensearch.endpoint]
}

Final step

To use your module, you can define a terraform module block, fill out the variables, this is an example of how you’d do that:

module "opensearch" {
source = "Path/to/ur/module"
vpc = "vpc_name"
hosted_zone_name = "your_hostedzone"
engine_version = "2.3"
security_options_enabled = true
volume_type = "gp3"
throughput = 250
ebs_enabled = true
ebs_volume_size = 45
service = local.service
instance_type = "m6g.large.search"
instance_count = 3
dedicated_master_enabled = true
dedicated_master_count = 3
dedicated_master_type = "m6g.large.search"
zone_awareness_enabled = true
}

At the end, you can plan and apply your terraform configuration, then check your Opensearch engine on AWS console, the cluster should be up and running.

Thanks for reading this article :)

Interested in joining our team? Browse our open positions or check out what we do at HeyJobs.

--

--