AIOps For Improving Pod Descheduling

Eric Muccino
Mindboard
Published in
2 min readJan 18, 2023

How AIOps Can Improve Pod Descheduling

What is AIOps? Artificial Intelligence for IT Operations (AIOps) widely refers to the use of Artificial Intelligence (AI) and Machine Learning (ML) techniques to improve, automate and optimize the management of IT operations. AIOps uses a combination of Data Analytics and Automation to efficiently identify and resolve issues.

AIOps analyzes data from various sources, such as log files, performance metrics, and alerts, to identify patterns and trends of potential problems with IT systems or processes. With this information, IT issues can not only be detected, but also resolved through automation rules/scripts, or alerting IT personnel in cases that require manual intervention.

A very important use case of AIOps is Pod Descheduling. In Kubernetes, Descheduling refers to the process of removing a pod from a node where it is currently running. This can be necessary in situations such as when nodes are under or over utilized, original scheduling decisions no longer apply, nodes fail, or new nodes are added to clusters. Typically, Descheduling is performed using a heuristic-based policy; however, using AI and ML can further improve efficiency of Pod Descheduling.

3 Benefits of AIOps for Pod Descheduling

Image: Descheduling Pods — Moving Pod 3 from Node A to Node B

1 - Predictive Descheduling

By analyzing data on pod resource usage, performance, and other metrics, machine learning algorithms can predict which nodes are most suitable for hosting a particular pod. This can optimize resource utilization and minimize the risk of scheduling pods onto overloaded nodes.

2 - Predictive auto-scaling

Machine learning models can predict the future resource requirements of a pod based on the past usage patterns and other factors, allowing Kubernetes to automatically scale pods up or down in response to demanding changes. Predicting the scaling of pods can assist in identification of pods that are appropriate candidates for Descheduling.

3 - Anomaly detection

Anomaly Detection is a very critical in IT operations. AI and ML algorithms can detect anomaly/unusual patterns in resource usage, container hardware requirements, taints/labels, or other metrics that indicate a potential problem with a pod or the hosting node. Ultimately, helping IT teams to identify and resolve issues before they become a serious problem.

The use of AIOps for Pod Descheduling in Kubernetes can improve decision-making and help ensure that pods are scheduled onto suitable nodes. AIOps also enables efficient auto-scaling and robust anomoly detection.

--

--