Your Heroku app deserves 30 seconds to live

Published in

CollegeVine Product

4 min readNov 16, 2022

Do you host your app on Heroku? Are you deploying via Docker containers? Did you see your workloads getting killed without notice? Look again: they probably are.

Strap in. This is a murder mystery.

Disclaimer: everything described here is true as of November 2022. If you’re reading this much later, Heroku may have fixed issues or updated docs to reflect reality.

It was a small toy application. Just a trivial Node.js HTTP server, responding “Hello!” to every request.

But this small application had big ambitions: one day, when it grew up, it wanted to be a big application, with some long-running workloads. It dreamed, and it played, imagining how it would wait for the workloads to finish before exiting gracefully.

The small application knew that Heroku, being big, strong, and kind, takes care of little applications. When it needs to redeploy or downscale, it will ask the application to shutdown politely — by sending it a SIGTERM and then giving it 30 seconds to clean up and wrap up. Heroku said so in its docs.

So the application trapped SIGTERM and waited 3 seconds before exiting, imitating a big application shutdown in the most adorable way. The application was so proud!

Alas, it was not to be. The application didn’t even know what happened. All it saw was a SIGTERM, and then, without a 3-second timeout — darkness.

This is how I found it. Force-terminated without notice. Without the promised 30 seconds. Murdered.

First things first: witness testimony. Heroku logs paint a dark picture:

heroku[web.1]: Stopping all processes with SIGTERM
heroku[web.1]: Process exited with status 143
app[web.1]: Got SIGTERM, waiting 3 seconds…
heroku[web.1]: Starting process with command `/bin/sh -c node\ run.js`
heroku[web.1]: State changed from starting to up

Suspicious thing #1: a conspicuous absence of the “Exiting gracefully” message.
Suspicious thing #2: exit code is 143 instead of zero.
Suspicious thing #3: exit code printed before the “Got SIGTERM” message!
The conclusion is incontrovertible: the little application was killed without a 3-second timeout, and somebody else’s exit code was planted instead.

Second order of business — catalog scene of the crime:

> ps -aux

USER       PID %CPU %MEM STAT START   TIME COMMAND
nobody       1  0.5  0.0 Ss   04:18   0:00 ps-run
u17082       3  0.0  0.0 S    04:18   0:00 /bin/sh -c node run.js
u17082       5 36.0  0.0 Sl   04:18   0:00 node run.js

PID=1 is Heroku’s bootloader.

PID=5 is my poor little application, running under Node.

But what is PID=3? Where did that come from? Hmm…

A little detective work and some high-volatility local experiments reveal the truth (which, in retrospect, I should have known): PID=3 is the sh process, in which the node command was wrapped. Docker does this by default, wrapping the CMD command in sh -c, so that the environment variables can be expanded and other shell goodies provided.

And from here, the murder weapon becomes clear: death of container mediated by SIGTERM to the sh. When Heroku needs to stop a dyno, it sends SIGTERM to every process in the container, including the sh process. And since sh doesn’t trap SIGTERM, it exits immediately with code 143. And because sh is the first process in the container, the container exits too.

RIP little application.

How do we honor the little application’s memory? By telling its story. And by preventing the same thing from happening to others.

The key thing to understand here is that containers on Heroku work ever so slightly differently than containers on your local machine. This is because Heroku needs to insert its magic to control the containers, and to do that, it extracts the CMD (and also ENTRYPOINT) parameter from the container and messes with it in its mysterious ways.

After countless experiments, I have determined that, to avoid creating the extra sh process, the One True Way is the following:

FROM node
ENTRYPOINT ["node"]
CMD ["run.js"]

All hail the One True Way. The end.

Q: Wait, don’t go! Do I have to use the “exec form” (i.e. the brackets) for ENTRYPOINT and CMD? Can’t I use “shell form”, like ENTRYPOINT node and CMD run.js?
A: No, you can’t. If you do that, Heroku, working in its ever mysterious ways, will completely ignore ENTRYPOINT and attempt to start the dyno with sh -c run.js

Q: Can I put run.js inside ENTRYPOINT, as in ENTRYPOINT ["node", “run.js"], and drop CMD completely?
A: This will work on the surface, except your process.argv (i.e. parameters of the program) will duplicate and end up being ["node", "run.js", "node", "run.js"] for some reason. If you don’t care about your parameters, go right ahead!

Q: Can I do ENTRYPOINT ["sh"] and then CMD ["-c", "exec", "node", "run.js"]? In theory, exec should replace the sh process, making Node the first process in the container.
A: In theory yes, but in Heroku’s mysterious ways — no. I can’t say why, because there is no visibility into the container when my program isn’t running, but for some reason, this leads to:

Starting process with command `-c exec node run.js`
Process exited with status 0
State changed from starting to crashed

In general, I have discovered that making sh the entry point doesn’t work in any configuration.

Q: What if my application is not in Node? How do I start it then?
A: Well, it depends on your application of course, but in general, one can roughly translate between platforms. For example, if your app is on Rails, bundle can be seen as an equivalent of node, so you might want to do this:

ENTRYPOINT ["bundle"]
CMD ["exec", "rails", "server"]

Or even this:

ENTRYPOINT ["bin/rails"]
CMD ["server"]

More generally, you can make any file an entry point, as long as it’s either an executable image or has a shebang. So in the most general case, you can just pack everything into a script file:

Your Heroku app deserves 30 seconds to live

Written by Fyodor Soikin