Keeping AWS Registry pull credentials fresh in Kubernetes

Photo by Vidar Nordli-Mathisen on Unsplash

Last week I had to turn to the AWS Elastic Container Registry (ECR) as an alternative to the Registry we were originally planning to use. The ECR service, as everything in AWS, is tightly integrated with IAM which would in theory allow us to use EC2 Instance Roles for image push and pulls and avoid have to use password. However, integration with Docker tools is still a story being written and quickly bumped into the following challenge:

Because the Docker CLI does not support the standard AWS authentication methods, you must authenticate your Docker client another way so that Amazon ECR knows who is requesting to push or pull an image. If you are using the Docker CLI, then use the docker login command to authenticate to an Amazon ECR registry with an authorization token that is provided by Amazon ECR and is valid for 12 hours. The GetAuthorizationToken API operation provides a base64-encoded authorization token that contains a user name (AWS) and a password that you can decode and use in a docker login command. However, a much simpler get-login command (which retrieves the token, decodes it, and converts it to a docker login command for you) is available in the AWS CLI.

View reference

The fundamental problem is that we don’t have control over when and how often Docker credentials get used in Kubernetes. It is easy to authenticate initial pulls with our CI/CD box by creating a Kubernetes docker-registry secret and attaching it to a Service Account. However, the service would fail if Pods get evicted or autoscale 12 hour later once the token expires.

AWS does provide a Credential Helper to make this process transparent on machines using the Docker Client cli. This method does not seem to work in Openshift and can cause clashes with RedHat registry authentications.

It seems we can turn a flag on Kubelets to solve the problem natively when workloads run on EC2 instances:

To get the problem quickly solved, I just pulled together a AWS-Cli + Kubectl Docker image that would run the following CronJob:

The CronJob runs with a Service Account that is allowed to delete and update secrets.

The container image being used just contains the AWS Cli and Kubectl binary:

FROM mesosphere/aws-cliRUN apk --no-cache add openssl \
&& wget -q -O kubectl \
&& chmod +x kubectl \
&& mv kubectl /usr/local/bin

One caveat I encountered is that it is not possible to trigger the CronJob immediately after creating it (github issue) so I had to create a job to get the initial deployment credentials set up.


PS. Read on how we added a custom domain for ECR here.

All in one