Stopping a Docker Container on COS

Published in

Google Cloud - Community

7 min readFeb 26, 2023

Consider starting a Compute Engine instance that we wish to use to run a Docker container. Both the Google Cloud console and the gcloud commands provide nice wrappers to achieve that for us. We can specify a repository image and, on Compute Engine startup, the image in the repository is retrieved and launched on our behalf. Now imagine a corresponding request to shutdown the Compute Engine. What we find is that the Docker subsystem is terminated as part of the shutdown process but … and this is the key item … the individual containers running on the machine are not gracefully terminated. Instead, they have their legs cut out from under them. They have had no opportunity to quiesce themselves which could be problematic. With no graceful shutdown, the containers can’t clean up. No connections shut down, no files purged before closing, no transactions completed … nothing.

What we ideally want to happen is that the containers are cleanly stopped before the machine terminates. Looking at the normal way to stop a container within a Docker environment, we run:

docker stop CONTAINER_NAME

What this command does is send a SIGTERM signal to the main process (PID 1) in the container informing it that it should stop. The container hosted process can then trap the signal, perform a clean shutdown and, when the process ends, the docker stop command completes and the remaining system shutdown commands continue. With this notion in mind, we can evolve our thinking as to how to run this command at Compute Engine instance shutdown time. A first thought was to create a Compute Engine shutdown script that would run the command. Sadly, that doesn’t work. By experimentation, we found that by the time the shutdown script is executed, the docker subsystem has already been terminated and it is too late to shutdown the container cleanly.

Fortunately, a solution is available. By careful study of this documentation article, we found the presence of the cloud-init subsystem and a recipe to use it to achieve our goal. Before delving into the actual solution, let’s take a few moments to discuss the concepts.

A subsystem called cloud-init has been developed and is available as open source. Its purpose is to perform cloud VM initialization. Think of it as owning the steps to configure a newly created VM started in a cloud environment. This technology has been used by a variety of cloud vendors including Google. The cloud-init subsystem is present in the COS OS which is the machine image we will use to host docker that will run our container. The way cloud-init works is to read configuration data to gather its instructions on what to do. When we configure a Compute Engine instance, we can explicitly set a metadata variable called user-data to be text data that is an instance of cloud-init configuration. What this means is that when the Compute Engine instance boots, cloud-init will run and retrieve configuration data from the user-data metadata. It will then interpret that configuration passed in and perform the corresponding tasks. The configuration of cloud-init has a rich set of commands but we need only concern ourselves with a subset:

users — Create a local user definition
write_files — Create/write a new file on the local file system
runcmd — Run a command

Rather than keep you in suspense, here is the cloud-init data that we are going to use:

#cloud-config

users:
- name: cloudservice
  uid: 2000

write_files:
- path: /etc/systemd/system/docker.service.d/override.conf
  permissions: 0644
  owner: root
  content: |
    [Service]
    ExecStop=/usr/bin/docker stop mycloudservice
- path: /etc/systemd/system/cloudservice.service
  permissions: 0644
  owner: root
  content: |
    [Unit]
    Description=Start a simple docker container
    Wants=gcr-online.target
    After=gcr-online.target

    [Service]
    Environment="HOME=/home/cloudservice"
    ExecStartPre=/usr/bin/docker-credential-gcr configure-docker --registries us-central1-docker.pkg.dev
    ExecStart=/usr/bin/docker run --rm --pull=always --user 2000 --name=mycloudservice us-central1-docker.pkg.dev/test1-305123/docker-shutdown/java-shutdown
    ExecStop=/usr/bin/docker stop mycloudservice
    ExecStopPost=/usr/bin/docker rm mycloudservice

runcmd:
- systemctl daemon-reload
- systemctl start cloudservice.service

We will now explain it and hopefully you will find it makes sense and is actually not as scary as it may first appear. The first section is called users.

users:
- name: cloudservice
  uid: 2000

This causes a new userid to be created within the Compute Engine. The userid will be called “cloudservice” and will be given the UID value of 2000.

Next we have a section called write_files:

write_files:
- path: /etc/systemd/system/docker.service.d/override.conf
  permissions: 0644
  owner: root
  content: |
    [Service]
    ExecStop=/usr/bin/docker stop mycloudservice
- path: /etc/systemd/system/cloudservice.service
  permissions: 0644
  owner: root
  content: |
    [Unit]
    Description=Start a simple docker container
    Wants=gcr-online.target
    After=gcr-online.target

    [Service]
    Environment="HOME=/home/cloudservice"
    ExecStartPre=/usr/bin/docker-credential-gcr configure-docker --registries us-central1-docker.pkg.dev
    ExecStart=/usr/bin/docker run --rm --pull=always --user 2000 --name=mycloudservice us-central1-docker.pkg.dev/test1-305123/docker-shutdown/java-shutdown

This creates a couple of new files on the Compute engine. The first is called /etc/systemd/system/docker.service.d/override.conf. The content of the file is:

[Service]
ExecStop=/usr/bin/docker stop mycloudservice

What this file provides is an override to the systemd service that starts and stops docker. What it added is a request to stop a named container when the docker subsystem is asked to end. This controls our clean shutdown.

The second file is called /etc/systemd/system/cloudservice.service, sets the Linux permissions on it, makes it owned by root and sets the content of the file to be:

[Unit]
Description=Start a simple docker container

[Service]
Environment="HOME=/home/cloudservice"
ExecStartPre=/usr/bin/docker-credential-gcr configure-docker --registries us-central1-docker.pkg.dev
ExecStart=/usr/bin/docker run --rm --user 2000 --name=mycloudservice us-central1-docker.pkg.dev/test1-305123/docker-shutdown/java-shutdown

This will need some explanation but we’ll come back to that later.

The final section in the cloud-init data is:

runcmd:
- systemctl daemon-reload
- systemctl start cloudservice.service

This causes a few Linux commands to run in sequence. Again, we will describe these commands shortly. My goal is for you to realize that this input configuration data is passed to cloud-init when the Compute Engine boots and causes the previously described actions to be performed upon first boot.

Now we can focus on the core of what we wish to achieve. There is a subsystem in Linux called systemd. Its purpose is to start background services (units) to run core components of the environment. Loosely, you can think of it as having the ability to launch a new program in the background. However, beyond simply starting a program, systemd can track the status of that program and manage more of its lifecycle. For example, we can instruct systemd how to start a program, how to stop a program and what to do after the program has stopped. Importantly, when a Linux machine is told to shutdown, systemd will first stop all the applications it previously started. We can now see how this can be useful to us. If we ask systemd to start our docker container then when the Compute Engine is shutting down, systemd can also stop our docker container.

The way that systemd operates is that it looks for files that exist in a Linux directory called /etc/systemd/system. Each file in that directory corresponds to a unit (an application) and should be a plain text file containing the systemd instructions for that particular unit. Looking back at our cloud-init instructions, we see the creation of a new file called /etc/systemd/system/cloudservice.service. Inside this file are the instructions necessary to start a docker container, cleanly shutdown a docker container and how to clean up after it once it is shutdown.

Finally, if we look at the runcmd section of the cloud-init configuration, we see it runs a few commands. The first command is systemctl daemon-reload which causes systemd to re-read its configuration files. We need this because we want to be sure that systemd sees the new configuration file just created. The last command is systemctl start cloudservice.service which asks systemd to run the service called cloudservice. The systemd then invokes the command to start a docker container.

This may seem like a lot of steps but it is actually quite elegant and fully conforms to the expectations of usage from each of these commands. The most important trick in the above is to orchestrate the docker container shutdown in the right sequence. It was hoped that we could have achieved our goal using simple startup and shutdown scripts but we seem to have found that the docker subsystem is already stopped when the shutdown scripts run. This was solved through the recipe described above and guided by this StackOverflow article.

Catching Java

A java application runs under a Java Virtual Machine (JVM). When the container hosting the Java application is cleanly stopped, a SIGTERM signal is sent to the JVM which causes it to shutdown. We can trap the shutdown of the JVM using the Runtime.addShutdownHook(). This method takes a non-started Java Thread as a parameter. When the JVM is asked to shutdown, the thread is started and the JVM doesn’t shutdown till the thread has exited. We can use this to stop any ongoing work and then exit the thread.

Runtime.getRuntime().addShutdownHook(new Thread() {
  @Override
  public void run() {
    try {
      System.out.println("Inside termination Shutdown Hook");
      workerThread.interrupt(); // Tell the worker thread to end
      System.out.println("Waiting for the main worker to end");
      workerThread.join(60 * 1000); // Wait for the worker thread to end
      System.out.println("The main worker has ended");
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}); // End of shutdown hook

Catching Node

A NodeJS application runs under node. We can register a signal capture using the process.on() method.

process.on('SIGTERM', () => {
    console.log("SIGTERM Caught!");
    terminate=true
})

Catching Python

A Python application runs under a Python interpreter. We can register a signal capture using the signal.signal() method.

def handler(signum, frame):
    print("Application signal handler called, signal=",signum)
    global terminate
    terminate = True

signal.signal(signal.SIGINT, handler)

Video

Here is the corresponding video illustrating the technique/technology in practice.

Stopping a Docker Container on COS

Catching Java

Catching Node

Catching Python

Video

References

Written by Neil Kolban