Whenever we remove a pod from Kubernetes, what does it do to prevent outside traffic from entering the dying pod? How does the pod internally sense that it is about to be removed and perform a graceful shutdown? And what is the sequence and relationship between these actions and the hook?
In this post, I will first introduce the whole process of pod deletion, and then make hands dirty to verify three scenarios:
- The timing of
postStart
andpreStop
execution in the main container of a pod. - How
terminationGracePeriodSeconds
affectpreStop
and graceful shutdown? - Whether the API Server can be requested during a Pod graceful shutdown?
Pod Deletion Process
As shown above, when you type kubectl delete pod
, the pod record in ETCD will be updated by the API Server, for example, add deletionTimestamp
and deletionGracePeriodSeconds
. According to the updated ETCD record, the pod is displayed as Terminating status. Next, the pod will carry out two processes in parallel.
- First, the endpoint controller watched the pod is marked as Terminating. Then it will remove the endpoint of the pod from the associated service to prevent external traffic from entering the pod through the service again. At the latest, the endpoint starts getting removed from Kube-proxy, Iptables, Ingress, CoreDNS and all these things hold the endpoint information.
- In the meanwhile,
kubelet
is notified of the pod being updated (Terminating). If thepreStop
exists, the hook is executed, if not, thekubelet
immediately sends a SIGTERM signal to the main container. Then after waiting for a graceful shutdown period, which is determined by the terminationGracePeriodSeconds with default 30 seconds, the container is forcibly stopped. And finally, the API Server removes the pod from ETCD completely.
Since the endpoint controller flow and the pod shutdown flow are happening independently. Before removing Pod IP from kube-proxy, iptables and others, it may still be in use. At the same time, the main container received the SIGKILL and stopped. Then it will not be able to fulfill these ongoing requests. The solution to the issue is to extend the graceful shutdown period, like
kubectl delete pod name — grace-period=100
. It adds a bit more gap between the endpoints removed from all the consumers and the Pod deleted.
Scenario 1: The Timing of postStart
and preStop
Execution
Step1. Create the main container with files main.go
, postStart.sh
and preStop.sh
postStart=5s
preStop=5s
termiationGracePeriodSeconds=10s
gracefulShutdown=2s
Step2. Wrap all the files with Dockerfile
Step3. Deploy Pod with the main container and hooks
As shown above, we deploy the Pod and delete it after a few seconds. I captured the logging process and plotted the logging flow. where the #
indicates that the main container is running.
We can see that postStart
and the main container are running at the same time. After the preStop
comes to an end, the Pod receives a SIGTERM signal. Then the GracefulShutdown starts and when it’s done, The container process ends as expected.
It is worth noting that the main container’s GraceShutdown(2s) is less than terminationGracePeriodSeconds(10s), and the main container is shut down gracefully.
Scenario 2: terminationGracePeriodSeconds
& preStop
& graceful shutdown
Step 1: Extend the preStop to 15 seconds, then the parameters are as follows.
postStart=5s
preStop=15s
termiationGracePeriodSeconds=10s
gracefulShutdown=2s
We can find that terminationGracePeriodSeconds
is the duration between the start of Preston
and the receiving of SIGTERM. After that, thepreStop
continued to work until the main container shut down.
Step2: Set the gracefulShutdown
more than termiationGracePeriodSeconds
postStart=5s
preStop=8s
termiationGracePeriodSeconds=5s
gracefulShutdown=8s
The following conclusions can be drawn from the above logs.
- The timing of receiving the SIGTERM depends on the
preStop
andterminationGracePeriodSeconds
. In a nutshell, the duration of receiving SIGTERM = Min(preStop
,terminationGracePeriodSeconds
).
To be specific, ifpreStop < terminationGracePeriodSeconds
, then get the SIGTERM after running thepreStop
. IfpreStop >= terminationGracePeriodSeconds
, then get the SIGTERM after running theterminationGracePeriodSeconds
. - After receiving the SIGTERM, The pod begins to shut down, and receiving the SIGKILL stops arbitrarily, with the maximum duration (
terminationGracePeriodSeconds
).
Scenario 3. Request the API Server during the Pod graceful shutdown
postStart=5
preStop=5
termiationGracePeriodSeconds=8
gracefulShutdown=10
Since the pod needs to connect to the API Server, It needs to give some privileges to the Pod’s ServiceAccount
in the default
namespace named default
.
At last, we verified the Pod is always able to request API Server during the graceful shutdown.
- Demo Source Code