Monitoring AWS EKS cluster using AWS Prometheus (AMP) & AWS Grafana (AMG)

Indumathi Jayaraman
Ankercloud Engineering
8 min readAug 2, 2023

Amazon Managed Prometheus is a fully managed backend to ingest, query metrics, store, and visualizes data using Grafana. It is highly scalable, has fast, and secure access to data, and has a unified way of monitoring all containerized applications like AWS EKS.
Amazon Managed Grafana we can be able to create Grafana dashboards and visualizations to analyze your metrics, and logs, and trace our applications. Here would be able to perform native Prometheus Query Language (PromQL) to query the metrics to analyze the data of our Kubernetes cluster.

CREATING AWS PROMETHEUS AND GRAFANA STEPS

Step 1: Create an EKS cluster with a node group

Step 2: Create a workspace in the AWS Prometheus

Mark down the Workspace ID and Endpoint-query URL this will require later.
Step 3: Setting up the Prometheus server in our Kubernetes.

Prometheus server helps to collect all the cluster metrics which is inside our EKS cluster then it will transfer to AMP.

3.1) Execute the following helm commands to add charts

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics
helm repo update

Create a namespace for running all the Kubernetes objects related to the Prometheus server.

kubectl create namespace prometheus

Step 4:
Setting up IAM for Prometheus Server for the ingestion of metrics and querying the metrics to AMP.
This will create IAM Policy, IAM Role (Trusted relationship with the k8s Service Account)

4.1) To configure IRSA(IAM Role For Service Account) for the ingestion of metrics from our Kubernetes cluster (AWS EKS), create a file by the name ‘createIRSA-AMPIngest.sh’ with the following contents:

#!/bin/bash -e
CLUSTER_NAME=eks-amp-prometheus-grafana-demo #Replace this value by your Cluster name
SERVICE_ACCOUNT_NAMESPACE=prometheus #Replace this value by your prometheus namespace
AWS_ACCOUNT_ID=$(aws sts get-caller-identity - query "Account" - output text)
OIDC_PROVIDER=$(aws eks describe-cluster - name $CLUSTER_NAME - query "cluster.identity.oidc.issuer" - output text | sed -e "s/^https:\/\///")
SERVICE_ACCOUNT_AMP_INGEST_NAME=amp-iamproxy-ingest-service-account
SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE=amp-iamproxy-ingest-role
SERVICE_ACCOUNT_IAM_AMP_INGEST_POLICY=AMPIngestPolicy
cat <<EOF > TrustPolicy.json
{
"Version": "2012–10–17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:sub": "system:serviceaccount:${SERVICE_ACCOUNT_NAMESPACE}:${SERVICE_ACCOUNT_AMP_INGEST_NAME}"
}
}
}
]
}
EOF
# Set up the permission policy that grants ingest (remote write) permissions for all AMP workspaces
cat <<EOF > PermissionPolicyIngest.json
{
"Version": "2012–10–17",
"Statement": [
{"Effect": "Allow",
"Action": [
"aps:RemoteWrite",
"aps:GetSeries",
"aps:GetLabels",
"aps:GetMetricMetadata"
],
"Resource": "*"
}
]
}
EOF
function getRoleArn() {
OUTPUT=$(aws iam get-role - role-name $1 - query 'Role.Arn' - output text 2>&1)
# Check for an expected exception
if [[ $? -eq 0 ]]; then
echo $OUTPUT
elif [[ -n $(grep "NoSuchEntity" <<< $OUTPUT) ]]; then
echo ""
else
>&2 echo $OUTPUT
return 1
fi
}
# Create the IAM Role for ingest with the above trust policy
SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE_ARN=$(getRoleArn $SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE)
if [ "$SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE_ARN" = "" ];
then
# Create the IAM role for the service account
SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE_ARN=$(aws iam create-role \
- role-name $SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE \
- assume-role-policy-document file://TrustPolicy.json \
- query "Role.Arn" - output text)
# Create an IAM permission policy
SERVICE_ACCOUNT_IAM_AMP_INGEST_ARN=$(aws iam create-policy - policy-name $SERVICE_ACCOUNT_IAM_AMP_INGEST_POLICY \
- policy-document file://PermissionPolicyIngest.json \
- query 'Policy.Arn' - output text)
# Attach the required IAM policies to the IAM role created above
aws iam attach-role-policy \
- role-name $SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE \
- policy-arn $SERVICE_ACCOUNT_IAM_AMP_INGEST_ARN
else
echo "$SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE_ARN IAM role for ingest already exists"
fi
echo $SERVICE_ACCOUNT_IAM_AMP_INGEST_ROLE_ARN
eksctl utils associate-iam-oidc-provider - cluster $CLUSTER_NAME - approve

Connect this IdP to AWS IAM so that it can validate and accept the OIDC tokens issued by Kubernetes to service accounts.

Note: Replace your Kubernetes Cluster Name and Kubernetes Namespace in the above script against the keys CLUSTER_NAME and SERVICE_ACCOUNT_NAMESPACE respectively

Once done, then provide the executable permission to this file and execute it:

chmod +x createIRSA-AMPIngest.sh
./createIRSA-AMPIngest.sh

4.2) To configure IRSA for querying metrics, create another file named ‘createIRSA-AMPQuery.sh’ with the below contents:-

#!/bin/bash -e
CLUSTER_NAME=eks-amp-prometheus-grafana-demo #Replace this value by your Cluster name
SERVICE_ACCOUNT_NAMESPACE=prometheus #Replace this value by your prometheus namespace
AWS_ACCOUNT_ID=$(aws sts get-caller-identity - query "Account" - output text)
OIDC_PROVIDER=$(aws eks describe-cluster - name $CLUSTER_NAME - query "cluster.identity.oidc.issuer" - output text | sed -e "s/^https:\/\///")
SERVICE_ACCOUNT_AMP_QUERY_NAME=amp-iamproxy-query-service-account
SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE=amp-iamproxy-query-role
SERVICE_ACCOUNT_IAM_AMP_QUERY_POLICY=AMPQueryPolicy
# Create a trust policy that allows a specific combination of K8s service account and namespace to sign in from a Kubernetes cluster hosting the OIDC Idp.
cat <<EOF > TrustPolicy.json
{
"Version": "2012–10–17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:sub": "system:serviceaccount:${SERVICE_ACCOUNT_NAMESPACE}:${SERVICE_ACCOUNT_AMP_QUERY_NAME}"
}
}
}
]
}
EOF
# Set up the permission policy that grants query permissions for all AMP workspaces
cat <<EOF > PermissionPolicyQuery.json
{
"Version": "2012–10–17",
"Statement": [
{"Effect": "Allow",
"Action": [
"aps:QueryMetrics",
"aps:GetSeries",
"aps:GetLabels",
"aps:GetMetricMetadata"
],
"Resource": "*"
}
]
}
EOF
function getRoleArn() {
OUTPUT=$(aws iam get-role - role-name $1 - query 'Role.Arn' - output text 2>&1)
# Check for an expected exception
if [[ $? -eq 0 ]]; then
echo $OUTPUT
elif [[ -n $(grep "NoSuchEntity" <<< $OUTPUT) ]]; then
echo ""
else
>&2 echo $OUTPUT
return 1
fi
}
# Create the IAM Role for query with the above trust policy
SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE_ARN=$(getRoleArn $SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE)
if [ "$SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE_ARN" = "" ];
then
# Create the IAM role for the service account
SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE_ARN=$(aws iam create-role \
- role-name $SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE \
- assume-role-policy-document file://TrustPolicy.json \
- query "Role.Arn" - output text)
# Create an IAM permission policy
SERVICE_ACCOUNT_IAM_AMP_QUERY_ARN=$(aws iam create-policy - policy-name $SERVICE_ACCOUNT_IAM_AMP_QUERY_POLICY \
- policy-document file://PermissionPolicyQuery.json \
- query 'Policy.Arn' - output text)
# Attach the required IAM policies to the IAM role create above
aws iam attach-role-policy \
- role-name $SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE \
- policy-arn $SERVICE_ACCOUNT_IAM_AMP_QUERY_ARN
else
echo "$SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE_ARN IAM role for query already exists"
fi
echo $SERVICE_ACCOUNT_IAM_AMP_QUERY_ROLE_ARN
eksctl utils associate-iam-oidc-provider - cluster $CLUSTER_NAME - approve

Connect this IdP to AWS IAM so that it can validate and accept the OIDC tokens issued by Kubernetes to service accounts.

Note: Replace your Kubernetes Cluster Name and Kubernetes Namespace in the above script against the keys CLUSTER_NAME and SERVICE_ACCOUNT_NAMESPACE respectively

Once done, give the executable permission to this file (on Linux or macOS) and execute it:-

chmod +x createIRSA-AMPQuery.sh
./createIRSA-AMPQuery.sh

4.3) Create a new file named ‘my_prometheus_values_yaml’ with the below contents to setup the Prometheus server and start the ingestion metrics that would collect all the metric data and ship them to AWS Prometheus (AMP)

serviceAccounts:
server:
name: amp-iamproxy-ingest-service-account
annotations:
eks.amazonaws.com/role-arn: ${IAM_PROXY_PROMETHEUS_ROLE_ARN}
server:
remoteWrite:
- url: https://aps-workspaces.${AWS_REGION}.amazonaws.com/workspaces/${WORKSPACE_ID}/api/v1/remote_write
sigv4:
region: ${AWS_REGION}
queue_config:
max_samples_per_send: 1000
max_shards: 200
capacity: 2500

Note: Replace the following keys appropriately in the above file
IAM_PROXY_PROMETHEUS_ROLE_ARN => IAM Role ARN of amp-iamproxy-ingest-role
AWS_REGION => Region where the Cluster is running
WORKSPACE_ID => AWS Prometheus workspace id (Refer to step 2 for )

Using Helm execute the below command on the terminal:

helm install prometheus-chart-name prometheus-community/prometheus -n prometheus -f my_prometheus_values_yaml

Then check if the Prometheus server is running under your namespace(i.e prometheus namespace) or not in the AWS EKS by the following command:-

kubectl get pod -n prometheus

Step 5 :

Go to AWS Grafana and click Create the Workspace.

In the configure settings check the AWS IAM identity center option and in the permission type chooses Service managed, Then click Next.

In the Next step select the current account in the IAM permission access settings and the data source select Amazon Managed Service for Prometheus.

Finally, click review and create the workspace.

After creating you can able to see the details as mentioned in the below screenshots.

Then click the Grafana workspace URL you can be able to view the Grafana dashboard. In that go to settings under the configuration and click the Data sources.

Then click Add data source, in the filter search box search for Prometheus and select it

Then go to the data source which you created now and click on it.
In that URL section paste the AWS Prometheus endpoint URL api/v1/remote_write.

Turn on the SigV4 auth. And in the SigV4 details select the region where the cluster is launched.
Finally, click save and Test.

5.1) Now navigate to Explore from the left navigation bar to query metrics and enter the following query in the text box:
apiserver_current_inflight_requests
then click Run query, you can able to see some metrics similar in the following screenshot

5.2) In the Grafana dashboard we can also able to import the existing dashboard. click + sign in that click Import and give 3119 and click load.

Then select the prometheus which we created before and click Import.

After importing the dashboard finally, metrics will be displayed from the EKS cluster through the Amazon Managed Service for Prometheus data source.
We can able to see and monitor our cluster in the Grafana.

Cluster Usage Metric
Pod Usage Metric

--

--