Cost Optimization — CAST AI Platform for AKS Cost Management

Chaskarshailesh
Javarevisited
Published in
6 min readOct 28, 2023

CAST AI is an all-in-one platform for Kubernetes

  1. Automation
  2. Optimization
  3. Security and
  4. Cost Management.

It abstracts layers of provider-specific technical complexity, so you can manage Kubernetes operations on all three major cloud providers with ease.

The platform comes with cost monitoring for real-time and longer-period cost reports at the cluster, namespace, and workload level. It also offers cost optimization suggestions and automatic optimization using autoscaling, spot instance automation, bin packing, and other features.

At the same time, CAST AI also checks your cluster security configuration for misconfigurations and any potential vulnerabilities and automatically prioritizes the fixes to improve your security posture. It also lets you scan your cluster against industry standards, incl. CIS Benchmarks and many more.

Most Important to understand - What is the CAST AI agent?

The CAST AI agent is a component that connects your Kubernetes cluster to our platform to enable automation, cost monitoring, and optimization features.

It is read-only and follows the principle of least privilege. That means its access to your data is strictly limited, and it can’t change your cluster configuration without your explicit permission.

The agent code is open-source. You can see it in CAST AI’s GitHub repository. In addition, the team regularly releases updates of the agent.

You can remove the CAST AI agent and all its resources at anytime.

What data can the CAST AI read-only agent access?

The agent needs minimal cluster access to deliver meaningful insights. Once connected, it gathers information on how much storage, memory, and CPU units your cluster needs to run efficiently.

Here are the things the agent can access:

  • Main resources such as nodes, pods, and deployments required for running the Available Savings Report.
  • Environment Variables: pods, deployments, stateful sets, daemon sets.

How CAST AI handles sensitive data

CAST AI doesn’t access any sensitive data of the user. This means that:

  • It doesn’t have access to secrets, config maps, or sensitive environment variables (e.g. containing secrets).
  • Before starting the analysis process, it removes environment variables considered sensitive by their name (passwords, tokens, keys, secrets).

No matter the type of resources your Kubernetes cluster stores, the agent can’t see its contents or access them.

Note: CAST AI is ISO 27001-certified and holds the SOC 2 Type II certification.

lets Explore CAST AI:

Step 1 — Head to console.cast.ai, and open a free account.

Step 2 — Connect your cluster — this step will require installing a read-only agent in your terminal or cloud shell. You will be guided through the process in the console.

Cast AI Agent Workload and Nodes list

Connection Successful.

Step 3 — Explore CAST AI AKS Cluster Dashboard

Shows Nodes and PODS scheduled with over all CPU and Memory utilization

Showing Hourly CPU and Memory Utilization stats

Step 4 — Run a savings report to see how much you can save by adjusting your cluster configuration settings.

Step 5— Get even deeper security insights. Onboard CAST AI security agent to see vulnerabilities of all your container images and get a deeper Kubernetes misconfigurations analysis.

Step 6 — Spot Nodes added to the Cluster later on which was detected by CAST AI

As per AKS

As per CAST AI Dashboard

Since we did not deploy any work load, Cluster Autoscaler was enabled for Spot Node pool due to which node count auto reduced from 3 to 1.

Lets deep dive — CAST AI’s cost monitoring includes five main sections providing different levels of granularity :-

  1. Cluster gets you an overview of cluster expenses: compute spend, cost per provisioned resources, avg. daily cost, and daily compute spend details, incl. cost per CPU and MEM. You also get a forecast of your final monthly bill and the overall change compared to the previous month.

2. Workloads report presents the compute cost for each workload, with additional information on their controller type and namespace and the total cost per CPU and MEM. You can further filter your results by labels and namespaces. Additionally, Workload Efficiency highlights the difference between the requested and used resources for each workload, helping to put a number on wasted resources.

3. Namespaces report provides data on the compute cost for each namespace, incl. average CPU and MEM requirement per hour and the total cost per resource.

4. Allocation Groups report provides insights into the allocation groups you add to your cluster. These custom workload groups allow you to allocate costs by grouping workloads by namespaces or labels.

5. Cost comparison lets you compare the requested CPUs’ cost between different periods to understand the level of delivered savings.

One of the key feature of CAST AI — Cluster efficiency report

Cluster efficiency report — report delivers insights on your cluster’s CPU and MEM usage and helps you put a number on overprovisioning. If you select the current period of time, you get real-time insights on your overprovisioned resources.

You can also see exact numbers on your cluster’s provisioned and requested resources, as well as the average hourly cost per CPU and GiB.

Overprovisioning refers to how many provisioned resources you could reduce. It follows this formula: 100% — (req/prov * 100%)

For example, if you provisioned 688 CPUs and requested only 490 CPUs, then overprovisioning would reach a rate of 28.78%.

In current letsailtogether-aks cluster — this is how the efficiency looks

That’s its in this post — for further details refer

https://docs.cast.ai/docs/getting-started……Keep learning together and Let’s sail together……!!

--

--

Chaskarshailesh
Javarevisited

I am a Site Reliability Engineer aspirant Cloud Solutions Architect. Further exploring the horizon into MLOps