Moving from Kube2Iam to Kiam

mattreyuk
Building Ibotta
Published in
3 min readJul 31, 2018

At Ibotta, we chose kube2iam to assign AWS IAM Roles to containers running in our Kubernetes cluster. Lately, we’ve run into some issues with it — specifically when running a job that scores all of our service repos. This spins up a number of pods in parallel and has often failed to correctly access roles.

After further investigation, the future of the project seems to be uncertain and there are other issues logged around race conditions, etc. After some research, Kiam seems to be a valid alternative. The developers have written a post on their experience of Kube2iam and why they decided to write Kiam which goes into a lot of detail.

We tested Kiam by running the scorecard job in our staging environment first with kube2iam to verify we could reproduce the errors seen and then with kiam — with kiam, there were no pod failures over multiple runs which showed it was an improvement for this use case.

There are several stages to replacing kube2iam with Kiam — these are outlined in the following section.

Deploying Kiam and Replacing Kube2iam

Kiam uses a namespace annotation with a regular expression to define what roles are allowed for that namespace. This gives you the capability to restrict roles to certain namespaces which is nice for security but you must specify a role annotation even if it’s all roles are allowed like this:

kind: Namespace
metadata:
name: iam-example
annotations:
iam.amazonaws.com/permitted: "*.*"

This won’t have any effect on your current configuration so it’s easy to add at the start before you forget about it.

Both Kube2iam and Kiam default to using the same port ( 8181)for the agent that will supply the role so you’ll need to choose a different one for Kiam while you have both running (we chose 8881 ).

Kiam comes in two parts, the Server which prefetches the roles (and needs the sts.assumeRole permission) and the Agent which proxies role requests and contacts the server to retrieve them. The servers are designed to run on the master nodes and the agents on the worker nodes (you can’t run both on the same node or the agent would stop the server from being able to contact the credential provider).

Be sure to take account of any taints on your master nodes that would prevent pods from being scheduled there normally ( kops generated clusters for instance will require a toleration to be specified for node-role.kubernetes.io/master). You may also need to alter the default path for certs (see this issue for details).

Because Kube2iam doesn’t clean up after itself, it’s iptablesrule is left in place when it’s daemonset is deleted — this causes problems because the Kiam agent also needs to add rule which will be later in the list and therefore not applied (Kiam will remove it’s rule when deleted).

Our deployment process was as follows:

  • Deploy Server daemonset and verify it works and can pull roles via the logs
  • Deploy Agent daemonset and verify it has no issues (healthchecks are passing and logs are clean)
  • Connect to each worker node in turn and delete the kube2iam iptables rule, validate the Agent on that node can supply roles (If you run into issues at this point, deleting the Kiam agent and then restarting the Kube2iam pod will re-insert it’s rule and get you back to running on Kube2iam)
  • Delete the Kube2iam daemonset

Deleting Iptables rules

To delete the old kube2iam rule that allows it to capture role requests:

  • ssh into the worker node
  • run sudo iptables -t nat -n -L PREROUTING --line-numbers

you should see something like:

Chain PREROUTING (policy ACCEPT)num target prot opt source destination1 cali-PREROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* cali:6gwbT8clXdHdC1b1 */2 KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */3 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL4 DNAT tcp -- 0.0.0.0/0 169.254.169.254 tcp dpt:80 to:10.192.102.212:81815 DNAT tcp -- 0.0.0.0/0 169.254.169.254 tcp dpt:80 to:10.192.102.212:8881

rule 4 (port 8181) in this case is the kube2iam rule

  • run sudo iptables -t nat -D PREROUTING 4 to delete it.

I would suggest at least running this manually on a low risk node to validate your Kiam install before proceeding to a more automated approach for multiple nodes.

If your cluster is very dynamic (nodes coming and going) or you’re ok with rolling nodes, you could consider specifying an “anti kube2iam” taint for new nodes so it’s daemonset won’t install there and then new worker nodes will automatically get just Kiam agents.

We’re Hiring!

If these kinds of projects and challenges sound interesting to you, Ibotta is hiring! Check out our jobs page for more information.

--

--