StatefulSet in K8s
Before deploying an application to production, we should know the underlying architecture of the application. A term used in this context is that the application is “Stateless” or “Stateful”.
Stateless applications do not keep the record of the previous state, or each request is completely new. The only thing cluster is responsible for static content hosted on it. Stateless application has no persistent storage attached to it. Different pods across the cluster can work independently.
On the other hand, examples of Stateful Applications are databases hosted on the K8s. Pods have persistent volumes(PV) attached to them, and data replication among pods has to take place periodically.
Both Stateful and Stateless applications are deployed using different components of K8s, Stateless applications are deployed using deployment component, and Stateful applications are deployed using StatefulSet component.
Why different components for different types of applications?
Just like the deployment component Stateful set component also helps us to replicate the pods. Both components even allow us to connect storage the same way, but what is the difference then?
Deployment creates identical replica pods; pods are created in random order with random postfix hash. For example, IDs of different replicas are “my-backend-98fxc30e”, “my-backend-78qwefg”, etc., and if we scale down, then the random pod is picked and deleted. Replicas are chosen randomly for the handling of the request.
In StatefulSet, replicas can’t be deleted at the same time; pods can’t be randomly addressed as replica pods are not identical. In a stateful set, pods get ID in increasing order, and this ID sticks with the pod. If, in any case, one of the pod die,s the newly generated pod will be of the same ID. Examples of replicas are “Mongo-0”, “Mongo-1”, “Mongo-2” … “Mongo-8”, and if we scale down, then first the last pod is removed, which in this case is Mongo-8, then further pods are deleted.
Why we need StatefulSet and its features?
Let us consider an example in which we have MongoDB running in a pod. In this, we have pre-defined that the only master can read and write to its own volume, and the other slave nodes will copy from the master’s PV and then through slave nodes, only read operations can be done. If we allow writing operations from the slave pod, then data inconsistency will occur.
Each replica will have its own PV, and continuous data synchronization between master and slaves is done So that the new read request will get the most updated data from the slave nodes. For example, if the scaling operation is applied to the pod, then all of its predecessors must be running, and once the new pod is ready, it will replicate data from the first N-1 pod. If Mongo-4 is ready, then it will synchronise data from Mongo-3.
The use of this sequential ID is when any of the pod dies or restarts, the new pod can retain state and can retain role(master, slave). In StatefulSet, if pods are being deleted, they are deleted in reverse order, from{N-1..0}.
Kubernetes docker is not perfect for stateful applications. If we want to host a database in Kubernetes, we have to work for configuring the cloning and data synchronization, managing and backup. Because of these challenges, we can choose cloud-hosted database services that will handle all these challenges for us.
Thank You for reading! Stay tuned for further blogs regarding K8s components. Feel free to ask any questions.