My automated RKE update pipeline broke with version 0.2.x — my fault

Nico Meisenzahl
Apr 12, 2019 · 2 min read

I’m using an automated build pipeline to install, update and destroy my Kubernetes test-environments based on Rancher Kubernetes Engine (RKE). This worked perfectly until this week.

Let me shortly explain the important parts of my pipeline before I talk about the details:

  1. checkout the cluster.yml from a git repository
  2. extract the kube_config_cluster.yml from the secured repository cache
  3. download the latest stable RKE binary from GitHub (I use this in my test environment where I would like to stay updated all the time)
  4. run “rke up/remove”

As I said, this worked perfectly for some time now. Earlier this week I pushed a new version of my cluster.yml to update my Kubernetes Cluster to a newer version. The pipeline started and failed some minutes later with the following error:

Failed to bring up Etcd Plane: [etcd] Etcd Cluster is not healthy

I started debugging the issue but I couldn’t find any issues. The whole Kubernetes Cluster including the etcd looked good. After some time I realized that Rancher released a new RKE version 0.2.x. (until now I used 0.1.x but because of step 3 of my pipeline the build run used the latest available stable version). So why does this even matter? With the new version of RKE Rancher introduced a new way to store the cluster state. They moved it from a configmap entry (0.1.x) to a file called cluster.rkestate (0.2.0) which is stored next to the cluster.yml. Because I wasn’t aware of this file my pipeline didn’t store it somewhere and therefore a “rke up” always created a new cluster.rkestate file. which then leads to the issue described above. After changing my pipeline configuration to also cache the state file the updated finished successfully without any issues.

What have we learned from this? Always read the release notes. 😏

01001101

Stories related to DevOps topics by Nico Meisenzahl. 01001101? First char of my surname.

Nico Meisenzahl

Written by

Senior Cloud & DevOps Consultant at white duck. Docker Community Leader, GitLab Hero, blogger & speaker. 👨‍💻🙋‍♂️ Loves Kubernetes, DevOps & Cloud.

01001101

01001101

Stories related to DevOps topics by Nico Meisenzahl. 01001101? First char of my surname.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade