Skaffold, Jenkins G, and AWS IAM Roles: Geoblink to the rescue

Mario Fernández Martínez

Published in

Geoblink Tech blog

10 min readMar 25, 2021

Reusing a Docker container’s network stack on Skaffold to provide AWS’ IAM credentials via annotations.

Preface

Geoblink has always been looking into the future. For the past years, We have tested uncountable different cutting-edge technologies that could fit our purposes. Most of them have been dropped since they didn’t match our criteria.

However, there are several examples of technologies that we tested in the past and remain till now. In this post, we are going to focus on Kubernetes –in particular an EKS cluster on AWS–, Jenkins G, and Skaffold, and how we solve authentication problems in ephemeral Jenkins G pods using kube2iam annotations.

If you are interested in the motivation behind using some of these technologies, check our previous post: Airflow on Kubernetes: Data Pipelines.

EKS –aka Kubernetes on AWS–

According to Amazon’s documentation, Amazon Elastic Kubernetes Service –Amazon EKS– gives you the flexibility to start, run, and scale Kubernetes applications in the AWS cloud or on-premises. We have been quite involved with Amazon Web Services for quite some time and having a self-managed cluster helped us a lot by removing some of the boilerplates of a fully-managed Kubernetes cluster.

Jenkins G

At Geoblink we have developed our own CI/CD tool, based on a fork of Jenkins X. We have called it Jenkins G and already talked about it in the following post: Jenkins G: Customized CI/CD for cloud-native applications on Kubernetes.

I also recommend reading this article to understand how Jenkins implements communication between different containers

kube2iam

Running on a Kubernetes cluster, kube2iam is a service that provides IAM credentials to the rest of the containers running inside the Kubernetes cluster based on annotations.

Traditionally in AWS, service level isolation is done using IAM roles. IAM roles are attributed through instance profiles and are accessible by services through the transparent usage by the aws-sdk of the ec2 metadata API. When using the aws-sdk, a call is made to the EC2 metadata API which provides temporary credentials that are then used to make calls to the AWS service.

Skaffold

Skaffold is a command line tool that facilitates continuous development for Kubernetes-native applications. Skaffold handles the workflow for building, pushing, and deploying your application, and provides building blocks for creating CI/CD pipelines. This enables you to focus on iterating on your application locally while Skaffold continuously deploys to your local or remote Kubernetes cluster.

> Note: At the time of this research, the latest version of Skaffold API was v2beta9.

State of the art

Our infrastructure team has been very focused on providing the best toolset possible to the rest of the tech teams to make their lives easier. They have been constantly evolving and introducing technologies that help everyone in the company to accomplish their challenges. After a couple of months of testing and determining what’s the best CI/CD tool that fits in our structure, they decided to move into Jenkins G. As described above, Jenkins G relies on Kubernetes to run all kinds of Jenkins processes asynchronously.

Until this point, the tech teams have been happy and glad of this change. Everything runs smooth and we have found a good balance between developing new features and deploying them to production environments. But times come always with new challenges, and we are no exception.

The infrastructure introduced a new tool in our Kubernetes cluster that automatically gets the pod annotations and automatically provides pods with AWS IAM credentials –when applicable–: kube2iam.

This tool made our life even happier than before. By simply adding an annotation to our pods, we can give IAM roles to our services, with no further interaction nor credentials in the environment whatsoever. That’s awesome!

IAM credentials via kube2iam

Here’s an example of a pod with an annotation. With the following command, we get a pod with no annotation at all. If we try to reach any AWS source of data, AWS S3 for example, we get the following response:

kubectl run annotated-alpine — image alpine:latest -it — rm — sh# aws s3 ls s3://geoblink-fake-bucketUnable to locate credentials. You can configure credentials by running “aws configure”.

However, by running the following command

kubectl run annotated-alpine — image alpine:latest — overrides=’{ “apiVersion”: “v1”, “metadata”: {“annotations”: { “iam.amazonaws.com/role”:”fake-bucket-reader” } } }’ -it — rm — sh

we get a pod in our cluster with the desired annotation:

Name: annotated-alpine
Namespace: qa
Priority: 0
Node: xxxx
Start Time: Tue, 16 Mar 2021 14:04:55 +0100
Labels: run=annotated-alpine
Annotations: iam.amazonaws.com/role: fake-bucket-reader
             kubernetes.io/psp: eks.privileged
Status: Running
...
Containers:
  annotated-alpine:
    Container ID: docker://d84bc84762312f123cbea1e430041412ece983b93e1f9ccc59ad6153e5dd3cfb
    Image: alpine:latest
...

When trying to connect to AWS, by using aws-cli, kube2iam automatically intercepts the request and checks whether the annotation has permission to reach the data. So, as an example, we run

aws s3 ls s3://geoblink-fake-bucket/

and we get the desired response in our pod:

# aws s3 ls s3://geoblink-fake-bucketAn error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist

As you might see, there’s a response –of course, it won’t exist, it’s a fake, but the pod was authorized to read from S3–.

Nice! We are then able to connect our Kubernetes running pods to AWS services without passing any credentials that might expose sensitive information to the world.

Building images with Skaffold

The next step is to build images in our Jenkins G cluster. Jenkins G allows us to have a running pod with annotations, so adding the required annotation as described in the previous section grants the running containers access to AWS services. That’s awesome. By simply adding

pipeline {
  agent {
    kubernetes {
      yamlFile 'KubernetesPod.yaml'
    }
  }
}

where KubernetesPod.yaml contains a manifest for the pod to be created, we will end up having a running annotated pod, including all the containers defined within. These containers will have access to AWS services with no further action needed. But there are cases where, sadly, that won’t be enough.

We heavily rely on Docker images and these images are built with the help of Skaffold. As already mentioned, Skaffold handles the workflow for building, pushing, and deploying your application. In our case, all Docker images are built with this tool.

An important aspect to have in consideration is that in KubernetesPod.yaml we mount –via volume– the parent’s docker daemon into the pod. With this approach, we can spawn new Docker containers from within, new containers that will be siblings of the pod, not children.

For having a homogeneous way of building services, we include the skaffold.yaml file in our builds and then we run the skaffold build command. So far so good. But here comes the problem: Skaffold spawns new containers where all the intermediate Docker actions for building the final image are executed. Those containers, because of sharing the parent’s Docker daemon won’t inherit any annotations, since they are completely brand new containers. So, any communication between them and AWS won’t be authorized.

Unfortunately, we didn’t find a simple way to inject those annotations into the new containers, at least at the very first time facing it.

Looking for a workaround

As engineers, we love challenges, and this particular scenario is exactly what the infrastructure team loves to deal with. Despite the previous statement, that we didn’t find a way to inject the annotations, we found a way of having manually spawned containers accessing AWS with no annotations nor credentials in environment variables: by reusing the Docker Network stack.

As described in the official Docker’s documentation, a container can use the network stack of another container, specified via its name or id.

Let’s demonstrate this workaround quickly and simply. Let’s assume we have a running pod with AWS credentials annotated and login into the container:

> kubectl describe pod fake-pod
Name: fake-pod
Namespace: default
Priority: 0
Node: <none>
Labels: jenkins/jenkins-jenkins-agent=true
        jenkins/label=fake-pod-PR-0
        jenkins/label-digest=3e7751727ac241c171302eb74af7cd9014522530
        kind=fake-pod
Annotations: iam.amazonaws.com/role: fake-bucket-reader
...> kubectl exec -it fake-pod -c base — bash

Now, we have access to the Docker daemon

[root@fake-pod] docker ps -q
f3f82d3221fb
...
cbe5b7e6ad0f

and can run a new Alpine container without sharing any annotation:

[root@fake-pod jenkins] docker run - rm -it - name alpine alpine:latest
/ # apk add aws-cli
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/x86_64/APKINDEX.tar.gz
(1/46) Installing libbz2 (1.0.8-r1)
...
(46/46) Installing aws-cli (1.18.177-r0)
Executing busybox-1.32.1-r3.trigger
OK: 143 MiB in 60 packages/ # env
HOSTNAME=44c1c60530b4
#### NOTHING FROM AWS/ # aws s3 ls s3://geoblink-fake-bucket/
Unable to locate credentials. You can configure credentials by running "aws configure".

We have no access to S3 with the current configuration. It’s time to try the workaround then.

First of all, we need to find the Docker container id for the fake-pod. As mentioned in the Jenkins G section above, we rely on Jenkins Remoting. Consequently, there are three running containers for that pod: the k8s_base_fake-pod container –our expected container–, the k8s_jnlp_fake-pod –Java Web Start agent– and another one called k8s_POD_fake-pod –Master to Agent communication protocols–. We could use any of them for reusing the network stack. Arbitrarily We have chosen k8s_POD_fake-pod though.

[root@fake-pod]# docker ps | grep `cat /etc/hostname` | grep POD | tr -s “ “ | cut -d “ “ -f 179c5649f887e

and let’s create a container that reuses its network:

[root@fake-pod]# docker run -it — rm — name alpine-authorized — network=”container:79c5649f887e” alpine sh/

Now, if we try to reach AWS S3, we get the following:

/ # apk add aws-cli
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/x86_64/APKINDEX.tar.gz
(1/46) Installing libbz2 (1.0.8-r1)
...
(46/46) Installing aws-cli (1.18.177-r0)
Executing busybox-1.32.1-r3.trigger
OK: 143 MiB in 60 packages/ # aws s3 ls s3://geoblink-fake-bucket/
                           PRE FOLDER1/
                           PRE FOLDER2/
                           PRE FOLDER3/
2019–11–18 16:42:06 308 fake_file_001.csv
2021–03–17 04:53:34 334577 fake_file_002.csv

Yes! kube2iam notices that the original pod is the one that finally sends the request and authorizes it.

Contributing to Skaffold

We have found a not-very-complex way of authorizing containers generated from the k8s_base_fake-pod container and we need to tell Skaffold to use the Docker network stack as above described to have all new containers with granted access to our AWS resources based on the fake-bucker-reader role. Unfortunately, Skaffold version v2beta9 didn’t have any chance of reusing the network stack of an already running Docker container. The only possibilities were host, bridge, or none, and those weren’t enough for our case.

We decided to open an issue at the Skaffold project, explaining the requirement of enabling the reuse of another container’s network stack. Since we already had the insights into how to address this topic, we decided to directly commit to the project and contribute to the open-source project. We opened a Pull Request that implemented the new feature and the community accepted it. Awesome!

After a couple of weeks, the Google team decided to include our changes in the new version of the API, v2beta11.

Connecting the pieces

The final part of our case was to solve the puzzle. We already had the pieces on the table and as soon as Skaffold v2beta11 was released we accomplished our goal.

We have created a skaffold.yaml file within the fake-pod running, with the following content:

[root@fake-pod tmp]# cat skaffold.yaml
apiVersion: skaffold/v2beta11
kind: Configdeploy: {}
build:
  artifacts:
    - image: alpine-aws
      context: .
      docker:
        noCache: true
        network: "container:79c5649f887e77637e04ef6d42d2dca84a6ea033e9ceb232bbe5d5901a7e1a1e"
  local:
    useDockerCLI: true
    useBuildkit: false
    push: false

As you can see, we are trying to build a simple image named alpine-aws. For reusing the Docker network stack, we add the network parameter to our file, with the Docker image ID of the running pod already mentioned –note that we are using the full ID, extracted from a docker inspect–. Finally, we use the following Dockerfile:

[root@fake-pod tmp]# cat Dockerfile
FROM alpine
RUN apk add aws-cli
RUN aws s3 ls s3://geoblink-fake-bucket/

Everything is ready and skaffold build can be triggered. There we go!

[root@fake-pod tmp]# skaffold build
Generating tags...
 - alpine-aws -> alpine-aws:latest
Some taggers failed. Rerun with -vdebug for errors.
Checking cache...
 - alpine-aws: Not found. Building
Building [alpine-aws]...
Sending build context to Docker daemon  8.704kB
Step 1/3 : FROM alpine
 ---> 28f6e2705743
Step 2/3 : RUN apk add aws-cli
 ---> Running in d0562c2ba1c6
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/x86_64/APKINDEX.tar.gz
(1/46) Installing libbz2 (1.0.8-r1)
...
(46/46) Installing aws-cli (1.18.177-r0)
Executing busybox-1.32.1-r3.trigger
OK: 143 MiB in 60 packages
Removing intermediate container d0562c2ba1c6
 ---> c8b05c0fa7a0
Step 3/3 : RUN aws s3 ls s3://geoblink-fake-bucket/
 ---> Running in b23f0774af5f
                           PRE FOLDER1/
                           PRE FOLDER2/
                           PRE FOLDER3/
2019-11-18 16:42:06        308 fake_file_001.csv
2021-03-17 04:53:34     334577 fake_file_002.csv
Removing intermediate container b23f0774af5f
 ---> 9cb165d3468d
Successfully built 9cb165d3468d
Successfully tagged alpine-aws:latest

As expected, the containers created by Skaffold for running the intermediate steps have access to AWS without injecting any further

Coming soon: a new contribution

As shown in the previous example, we can reference the Docker container by the full id in Skaffold. As of now, there is a restriction: we can use neither a short docker id (12 chars) nor inject it via Environment Variables.

To improve this behavior, we’ve opened a new Pull Request in the Skaffold project that fixes these problems.

Conclusions

Security is probably the most important area of exposed applications. A tiny breach can easily compromise an entire company, making it lose money and, most important, credibility. Here at Geoblink, we are trying to reduce as much as possible the number of credentials injected into our services, even if they will only be used in building processes. For that, we are relying on annotating our pods with the expected role and avoiding the environment variables, and that’s the main reason why we needed to get involved with Skaffold –and by doing that, improving our experience with Jenkins G and AWS–.

However, sometimes the current libraries won’t match our needs. Although the open-source community is always exploring new ideas and introducing changes that help us all a lot, it’s necessary proactivity to inform of new requests and to develop them. Nowadays almost every single project uses open-source libraries –with or without knowing it– so please, get involved. The community needs us.