The Problems With Helm

7Mind GmbH {Dev}
8 min readSep 11, 2018

--

7Mind is Germany’s most popular mindfulness app, and helps easily integrate the practice of meditation into daily life. We believe in the great potential of the human mind, and it is our mission to show individuals and organisations alike how to unlock it in order to live and work feeling more productive, relaxed and happy. 7Mind is also available in French and Dutch, and we will soon expanding overseas to the English markets.

Managing Kubernetes deployments has always been tricky. Just as Kubernetes has allowed much more powerful docker deployments, so too has it increased their complexity. Deployments very quickly become a mess of yaml files mixed together in what rapdily becomes an uninteligable lump of dependencies, configuration variables and general confusion. Helm alleviates this headache somewhat by bundling deployments together so that you no longer have to decipher which of the hundreds of files need to be changed and instead can focus on one application at a time. Furthermore you can dispense with the chore of writing your own charts for completed deployments, such as Postgres or Cassandra, and can instead rely on a massive open source repository. It sounds great in principle, but as we all know life rarely adheres to principles.

Knowing Your Code

xkcd.com/2044

One of the immediate pitfalls that you run into when you start using helm is that you no longer know what exactly is deployed on your cluster. When you run helm install repo/chart you give up almost all configuration control to whoever wrote the original chart. Now that’s not necessarily a bad thing. Odds are they wrote a good chart, and if not there is a whole internet out there who can tell them how to fix it. What they wrote was in all likelyhood better than you could have done yourself had you attempted it. So what’s the problem?

Let’s say that after a while, you run into an error (this will invariably happen if you are deploying anything to production, Mr. Murphy is not one to be ignored). Perhaps you went to inspect a postgres/patroni deployment and realized that something had gone terribly wrong. Brief inspection of the deployment will tell you that it failed to connect to etcd. Of course, not having written the chart you might have been flabbergasted to learn that there was even an etcd dependency, or what version it was, where it was installed from and what it was doing. Further googling would reveal that this a systematic problem with the etcd deployment the chart comes equipped with, but that that etcd deployment can’t actually be overwritten.

Sure it’s inconvenient, but this is still much easier than writing the chart yourself. So you deploy a more stable etcd chart, because of course you can’t switch to another key value store — the original chart didn’t come with a configuration option for that. No matter, you now have two charts deployed instead of one and everything’s fine. Well almost, it seems that you can’t actually disable the default etcd stateful set which is deployed, so chuck in a kubectl delete statefulset origional-statefulsetas part of your deployment process, it’s still not as bad as it used to be.

So you’re cruising along in a boat that has a few holes in it, but you’re most definitely still afloat. Then one day, someone comes along and tells you they want a newer version of postgres, shouldn’t be a problem right? Well it wouldn’t be, except of course you’re still not quite sure what you’re deploying, so you take a look. Hmm, it seems the new version of the chart also upgraded the version of postgress. Well you’d hope the chart author would have included some contingencies in the case of migration, but then again, you have no guarantee that they did, after all, it’s not your code.

It turns out the chart doesn’t tell you anything about how persistent volumes are handled, so there’s no guarantee that updating helm will not also update the deployment with a new persistent volume which doesn’t contain your data. Even if you’re lucky and the author of the chart didn’t change the nature of the persistent value claim for the chart, you still don’t know if the new version of postgres will read or even preserve your old data. Again, the chart makes no guarantees and you don’t own the image that it’s running anyway. So you hold your breath and hope it works, because truth be told you don’t have any other option.

This is a common occurrence with helm: it works, it works, and then it breaks and what would be a minor error in a project you wrote yourself becomes a massive headache of unmaintainable nonsense. I can personally attest to dozens of hours spent debugging weird errors in someone else’s helm chart — time in which I could almost certainly have built something more reliable and suited for my own needs several times over.

When Helm Goes on Vacation

Now there are of course issues with helm charts as outlined above, but perhaps more serious are the issues with the tool itself. Helm does not necessarily do what you want it to. Case and point, today I ran helm delete postgres --purge which exited successfully, removed the postgres deployment from tiller but not from kubernetes. So then I was left with the task of deleting stateful sets, services and persistent volume claims which helm had somehow forgotten about.

Any tool can fail, but the problem with helm is that it often fails quietly. We get neither a bang nor a whimper. Running helm update doesn’t necessarily restart your pods as you might assume it does, but it strikes me that it only does this if you change their configuration, like the image tag for instance, and even then it’s dicey. Forgot to include helm metadata on one of you templates? No problem, unless of course you try to access it later, then you get a rather large problem and a very unclear error.

My point here, is that helm comes without guarantees, and using it in a production environment, especially one in which you are autonomously deploying charts can be dicey. Occasionally it will fail for some unknown reason, and even more frighteningly it won’t tell you it failed! This leaves you in the unfortunate situtation of having to diagnose errors that have nothing to do with anything in your codebase. Imaigne if everytime you deployed an application, you had no guarantee your dependencies would work as intended.

Alternatives

So, what do you do? Well, chances are, you live with it. As a developer you can hope that the helm tool improves itself, you can make sure that your charts are solid, etc… The truth of the matter is you are likely going to continue to run into these problems, but maybe it’s rare enough that its worth the hardship. However, for those of us who are stubborn enough not to accept software with compromises, let me propose a few improvements.

1. Own Your Deployments

Don’t rely on someone else to maintain your charts for you. Do it yourself! The same goes for images. Sure, copy someone else’s code for the baseline, but then make it your own! Store your charts on your repository so that you can change them if you need to, and more importantly you know what you are deploying.

This may sound like more work but its just a few minutes of pulling open source code into your own repo. You can update your charts that way too but this way you don’t get any unexpected surprises when you upgrade. Maintaining your own repository forces you to know what is going on, and allows a newcomer to understand the history and current state of the project without hunting for hundreds of different links. Furthermore, most helm charts are filled with needless configuration bloat, on the off chance that some user has some specific need. While this is undoutably useful for open sourced software, it leads to much more complicated configurations than are necessary for almost any scenario, making debugging much more complex than it needs to be. Maintaining your own chart allows you to strip away this bloat and get a more readable and maintainable deployment.

2. Deploy With Precision

This is a more intense requirement, one I haven’t attempted yet for it involves rebuilding the helm deployment tool. The idea is to make a very concise desired state specification. Consider your deployments as a git repository. When you change something locally and push it to the server, it pushes only the diffs. The server looks at those, interprets what has changed and appropriately creates, deletes and updates kubernetes deployments. This is actually relatively simple to do (I’m not joking when I say that you can just rip off the git diff process). Instead of looking for deployments by metadata, look at what was deployed by version control. It’s just as easy as integrating the diff tool with the kubernetes api.

The key point here, is you would have total (and I mean total) control of your entire deployment pipeline. If you want a slack hook on deployment, go for it. Want to send each update for code review, you can do it, a commit log of the state of your cluster — it’s built in! The advantages to this approach seem almost unlimited, and though I’m sure there would be barriers to overcome, I believe its worth doing so for the sake of a better deployment system.

The point I’m trying to make here is not that helm is bad software. It’s really not, its woks well a lot of the time, and fails occasionally. For most people its good enough to write solid maintianble deployments with only a few hiccups. However, if you work in an environment where there is rapid development and deployments are changed frequently, using software that does 90% of what you want it to but lets you down 10% of the time is not a sustainable option — especially for mission critical components. In situations like that, don’t reinvent the wheel, but if your cart isn’t rolling maybe take a look at some improvements.

I’m Sequoia Snow, a junior backend developer at 7mind. I mainly focus on building our deployment infastructure and convincing everyone else in the company that we should use Haskell. Currently 7mind is in the internationalization phase expanding rapidly across multiple countries, while simultaniously migrating from a monolithic rails application to a more scalable microservice-based architecture. This has made solidifying our deployment process more important than ever, seeing as we often update production services on a weekly or even daily basis.

We are always looking for innovative and talented people at 7Mind. If you feel that you would be a good fit, drop us a line at jobs@7mind.de.

--

--