What are Quality of Service (QoS) Classes in Kubernetes

Kubernetes provides different levels of Quality of Service to pods depending on what they request and what limits are set for them.

Pods that need to stay up and consistently good can request guaranteed resources, while pods with less exacting requirements can use resources with less/no guarantee.

For each resource, containers specify a request, which is the amount of that resource that the system will guarantee to the container, and a limit which is the maximum amount that the system will allow the container to use.

Scheduling is based on requests and not limits.

Defining Resource Constraints for Pods

CPU

CPU resoures are measured in (v)Core equivalents. You can specify them in decimals (e.g. 0.5 meaning half a core) or in milicpu (e.g. 500m meaning half a core).

Memory

Memory resources are measured in bytes. You specify them as decimals with one of SI suffixes (E, P, T, G, M, K) or their power-of-two equivalents (Ei, Pi, Ti, Gi, Mi, Ki). For example, the following represent roughly the same value: 128974848, 129e6, 129M, 123Mi.

Requests and Limits” and “QoS Classes” are tightly coupled.

How the request and limit are enforced depends on whether the resource is compressible or incompressible.

Compressible Resource Guarantees:
- Kubernetes are only supporting CPU for now.
 - Pods will be throttled if they exceed their limit. If limit is unspecified, then the pods can use excess CPU when available.

Incompressible Resource Guarantees:
- Kubernetes are only supporting memory for now.
 - Pods will get the amount of memory they request, if they exceed their memory request, they could be killed (if some other pod needs memory), but if pods consume less memory than requested, they will not be killed.
 - When Pods use more memory than their limit, a process that is using the most amount of memory, inside one of the pod's containers, will be killed by the kernel.

If the system runs out of CPU or memory resources (where sum of limits > machine capacity). Ideally, kubernetes should kill containers that are less important.

For each resource, Kubernetes divide containers into 3 QoS classes: Guaranteed, Burstable, and Best-Effort, in decreasing order of priority.

Guaranteed (QoS)

Pods are considered top-priority and are guaranteed to not be killed until they exceed their limits.

If limits and optionally requests (not equal to 0) are set for all resources across all containers and they are equal, then the pod is classified as Guaranteed.

Burstable (QoS)

Pods have some form of minimal resource guarantee, but can use more resources when available. Under system memory pressure, these containers are more likely to be killed once they exceed their requests and no Best-Effort pods exist.

If requests and optionally limits are set (not equal to 0) for one or more resources across one or more containers, and they are not equal, then the pod is classified as Burstable. When limits are not specified, they default to the node capacity.

Best-Effort (QoS)

Pods will be treated as lowest priority. Processes in these pods are the first to get killed if the system runs out of memory. These containers can use any amount of free memory in the node though.

If requests and limits are not set for all of the resources, across all containers, then the pod is classified as Best-Effort.

ref: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-qos.md