What Happens When Deleting a Pod

Meng Yan
5 min readAug 31, 2022

--

Whenever we remove a pod from Kubernetes, what does it do to prevent outside traffic from entering the dying pod? How does the pod internally sense that it is about to be removed and perform a graceful shutdown? And what is the sequence and relationship between these actions and the hook?

In this post, I will first introduce the whole process of pod deletion, and then make hands dirty to verify three scenarios:

  1. The timing of postStart and preStop execution in the main container of a pod.
  2. How terminationGracePeriodSeconds affect preStop and graceful shutdown?
  3. Whether the API Server can be requested during a Pod graceful shutdown?

Pod Deletion Process

As shown above, when you type kubectl delete pod, the pod record in ETCD will be updated by the API Server, for example, add deletionTimestamp and deletionGracePeriodSeconds. According to the updated ETCD record, the pod is displayed as Terminating status. Next, the pod will carry out two processes in parallel.

  • First, the endpoint controller watched the pod is marked as Terminating. Then it will remove the endpoint of the pod from the associated service to prevent external traffic from entering the pod through the service again. At the latest, the endpoint starts getting removed from Kube-proxy, Iptables, Ingress, CoreDNS and all these things hold the endpoint information.
  • In the meanwhile, kubelet is notified of the pod being updated (Terminating). If the preStop exists, the hook is executed, if not, the kubelet immediately sends a SIGTERM signal to the main container. Then after waiting for a graceful shutdown period, which is determined by the terminationGracePeriodSeconds with default 30 seconds, the container is forcibly stopped. And finally, the API Server removes the pod from ETCD completely.

Since the endpoint controller flow and the pod shutdown flow are happening independently. Before removing Pod IP from kube-proxy, iptables and others, it may still be in use. At the same time, the main container received the SIGKILL and stopped. Then it will not be able to fulfill these ongoing requests. The solution to the issue is to extend the graceful shutdown period, like kubectl delete pod name — grace-period=100. It adds a bit more gap between the endpoints removed from all the consumers and the Pod deleted.

Scenario 1: The Timing of postStart and preStop Execution
Step1. Create the main container with files main.go, postStart.sh and preStop.sh

postStart=5s preStop=5s termiationGracePeriodSeconds=10s gracefulShutdown=2s

main.go
preStop.sh
postStart.sh

Step2. Wrap all the files with Dockerfile

Dockerfile

Step3. Deploy Pod with the main container and hooks

sample.yaml

As shown above, we deploy the Pod and delete it after a few seconds. I captured the logging process and plotted the logging flow. where the # indicates that the main container is running.

Pod.log
Scenario 1 flow chart

We can see that postStart and the main container are running at the same time. After the preStop comes to an end, the Pod receives a SIGTERM signal. Then the GracefulShutdown starts and when it’s done, The container process ends as expected.
It is worth noting that the main container’s GraceShutdown(2s) is less than terminationGracePeriodSeconds(10s), and the main container is shut down gracefully.

Scenario 2: terminationGracePeriodSeconds & preStop & graceful shutdown

Step 1: Extend the preStop to 15 seconds, then the parameters are as follows.

postStart=5s preStop=15s termiationGracePeriodSeconds=10s gracefulShutdown=2s

preStop.sh
pod.log

We can find that terminationGracePeriodSeconds is the duration between the start of Preston and the receiving of SIGTERM. After that, thepreStop continued to work until the main container shut down.

Step2: Set the gracefulShutdown more than termiationGracePeriodSeconds

postStart=5s preStop=8s termiationGracePeriodSeconds=5s gracefulShutdown=8s

pod.log

The following conclusions can be drawn from the above logs.

  1. The timing of receiving the SIGTERM depends on the preStop and terminationGracePeriodSeconds. In a nutshell, the duration of receiving SIGTERM = Min(preStop, terminationGracePeriodSeconds).
    To be specific, if preStop < terminationGracePeriodSeconds, then get the SIGTERM after running the preStop. If preStop >= terminationGracePeriodSeconds, then get the SIGTERM after running the terminationGracePeriodSeconds.
  2. After receiving the SIGTERM, The pod begins to shut down, and receiving the SIGKILL stops arbitrarily, with the maximum duration (terminationGracePeriodSeconds).

Scenario 3. Request the API Server during the Pod graceful shutdown

postStart=5 preStop=5 termiationGracePeriodSeconds=8 gracefulShutdown=10

main.go

Since the pod needs to connect to the API Server, It needs to give some privileges to the Pod’s ServiceAccount in the default namespace named default.

pod.log

At last, we verified the Pod is always able to request API Server during the graceful shutdown.

--

--