Stories by Avinash Goyal on Medium

Accelerating Delivery Excellence with DevOps-infused Agile Release Train (ART)

Avinash Goyal — Mon, 27 Nov 2023 00:14:15 GMT

In the fast-paced realm of Agile development, where synchronization and automation are key, managing an Agile Release Train (ART) with multiple delivery teams is a formidable challenge. Picture this: multiple delivery teams, each fervently working towards their respective business objectives, yet intricately interconnected by shared applications. The need to Plan, Develop, Test, and Release together became not just a preference but a necessity — an urgency to accelerate our delivery cycle.

Why ART and DevOps Matter:
In the complex landscape of modern software development, the challenges of multiple teams working on shared applications require a transformative approach. Here’s why Agile Release Train (ART) with DevOps emerged as not just a solution but the only solution.

At Landmark Group, our journey with Agile Release Train (ART) has been a transformative experience, and after a considerable period of implementation, I would like to share my key thoughts based on my personal experience:

1. Seamless Collaboration:
ART brings together cross-functional teams into a cohesive unit. This structure encourages seamless collaboration, breaking down silos and fostering a shared understanding of business objectives. As part of ART meet, cross-functional teams comes together and discuss, align and collaborate on the features to be released to production.

2. Accelerated Delivery Cycle:
The time-boxed nature of ART, coupled with DevOps practices, accelerates the entire development lifecycle. Continuous Integration and Continuous Deployment ensure that code moves swiftly from development to production. Automated builds and deployments helps the engineering team to execute multiple executions during the day to validate latest changes.

3. Synchronized Planning:
PI Planning sessions within ART synchronize the efforts of all teams, providing a dedicated forum for planning, alignment, and prioritization. This ensures that every team is marching to the same beat. With Program Increment, teams get a quarterly backlog that they need to release in 4–6 sprints and that is where ART plays an important role in synchronization between multiple agile teams.

4. Efficient Handling of Dependencies:
DevOps practices integrated into the ART process efficiently manage dependencies. Teams can work concurrently without bottlenecks, reducing wait times and enhancing overall productivity.

5. Continuous Feedback and Improvement:
The Inspect and Adapt (I&A) workshops within ART, fueled by DevOps metrics, create a culture of continuous improvement. This iterative feedback loop ensures that the development process is refined with each cycle.

6. Quality Assurance Through Automation:
Automated testing, an integral part of DevOps, ensures that the quality of deliverables is maintained throughout the development lifecycle. This minimizes the risk of defects and enhances overall software reliability. Automated regression and performance suite minimizes the manual effort, saves time and ensure product build is of high quality.

7. Security as an Inseparable Component:
DevOps practices embed security seamlessly into the development pipeline. The concept of ‘Security as Code’ ensures that security is not an afterthought but an integral part of every development stage. Plugins like Sonarlint that easily integrates with IDE and Sonarqube helps ensure code is good from quality and security standpoint.

8. Proactive Monitoring for Predictive Action:
Real-time monitoring, a crucial component of DevOps, provides insights into system health and performance. This proactive approach allows teams to identify and address issues before they escalate. With use of Observability platforms like Appdynamics, Dynatrace, one can detect health and performance issues and improve.

Conclusion:
In facing the challenge of multiple delivery teams collaborating on shared applications, the Agile Release Train with DevOps emerged as the optimal solution. This integrated approach not only addresses the challenge head-on but brings forth a new era of efficiency, collaboration, and accelerated value delivery.

Closing Thoughts:
In the intricate dance of Agile and DevOps, the collaboration of multiple Agile delivery teams within an ART is not just a response to a challenge — it’s a strategic move towards sustained success. As your organization evolves, so too will this DevOps-infused ART, orchestrating DevOps harmony and driving success at scale.

👋 If you find this helpful, please click the clap 👏 button below a few times to show your support for the author 👇

🚀Join FAUN Developer Community & Get Similar Stories in your Inbox Each Week

Accelerating Delivery Excellence with DevOps-infused Agile Release Train (ART) was originally published in FAUN.dev() 🐾 on Medium, where people are continuing the conversation by highlighting and responding to this story.

FinOps Cookbook for Kubernetes

Avinash Goyal — Tue, 18 Jul 2023 09:20:56 GMT

FinOps phases

Elasticity, Scalability and Resiliency are some of the most common enablers today to host microservices on Kubernetes platform but as the saying goes with great power comes the responsibilities and the challenges that gives birth to increased cost.

Challenges

Let’s understand why Kubernetes is so complex when it comes to cost optimization:

Shared compute resources — Applications are packaged in pods and run in shared compute resources. The monthly bill from the cloud provider will not give visibility at pod level and thus becomes a black hole for us.
Right sizing of pods — If the pods are not sized with right requests and limits, this will have a direct impact to node being under utilized and paying for unused resources.
Right sizing of persistent volumes — Cost will increase if the persistent volumes are not rightly sized.
Unclaimed resources — Over a period of time, there are many unclaimed volumes and unassigned resources exist in the cluster which are lying there and burning cost without being in use.

We embarked on migrating our microservices to Kubernetes two years back and we realized the cost is increasing year on year and we now need to put our prime focus on Kubernetes cost optimization and governance to ensure resources are efficiently getting utilized and insights are available for continuous tracking of unused resources.

Solution

We break this journey into two phases:

Phase I

As part of first phase, our focus is to look for a solution which not only gives the insights of our cost usage but also provides recommendations on what needs to be done. After exploring multiple options, we decided to go with Kubecost.

Why Kubecost?

Provides complete visibility of Kubernetes spend through cost allocation, optimization recommendations and governance (Picture 1).
Provides estimated savings against key recommendations (Picture 2 and 3).
Fully on-premise deployment, doesn’t require to egress any data to a remote service

Picture 1 — Overview page

Picture 2 — Savings page

Picture 3 — Recommendation page

We were surprised to see the overall Kubernetes spend was drastically reduced by 44%.

Phase II

Once we right-sized the resources and cleaned up the abandoned and unclaimed resources, we realize there is further scope of cost optimization by running the non-prod kubernetes clusters with zero to minimal load during non-business hours.

To achieve this, we need a solution that can scale down the pods in a cluster and with cluster auto-scaler reduces the number of worker nodes during non-business hours and again scale the pods and bring back the worker nodes to its original state during business hours.

With Kube-green, we were able to suspend pods and scale down our kubernetes cluster during non-business hours.

Kube-green to summarize is a Kubernetes controller that defines a Custom Resource Definition called SleepInfo which decides when to stop and start the pods in a namespace. We decided to suspend pods for 8 hours on weekdays and full days on weekend.

This helped in further reduction by 16% making the overall cost reduction by 53%.

Conclusion

Cost governance and optimization are the primary needs for organization today that runs their modern application workloads on Kubernetes. While elasticity, scalability and resiliency are some of the key strengths but leads to over provisioning of resources resulting in excessive spending and increased cost. With lack of visibility on how pods are efficiently consumed within the Kubernetes cluster it remains a primary challenge to get insights and manage overall costs. Tools like Kubecost and Kube-green provides the right FinOps solutions to control pod level consumption for efficient cost monitoring and governance.

👋 If you find this helpful, please click the clap 👏 button below a few times to show your support for the author 👇

🚀Join FAUN Developer Community & Get Similar Stories in your Inbox Each Week

FinOps Cookbook for Kubernetes was originally published in FAUN.dev() 🐾 on Medium, where people are continuing the conversation by highlighting and responding to this story.

Achieving Zero-Downtime deployment in Kubernetes

Avinash Goyal — Tue, 13 Sep 2022 11:30:07 GMT

Over the years Kubernetes has become the go to platform for running container workloads. By default, Kubernetes uses rolling update strategy for deployments. This strategy aims to prevent downtime ensuring some container instances are up and running at any point in time while performing updates. Old version of containers only gets shutdown after new version of containers are ready to receive live traffic.

While theoretically this may seem right but there’s a twist. Let’s understand this in detail.

Kubernetes Pod Termination Process

Before starting with the pod eviction process, let’s understand the two main components of Kubernetes which plays a very important role during the eviction process:

Kubelet — Kubelet collects all details of the pod e.g., IP address and report them back to the control plane and continuously polls the control plane for updates.

Endpoint — Kube-proxy uses the endpoints to setup iptables rules on the nodes. Every time there is a change to the endpoint, kube-proxy retrieves the new list of IP addresses, ports and configure the new iptables rules.

When the pod eviction process is initiated, the API server modifies the pod state in etcd database to Terminating state. The node’s kubelet and endpoints-controller continuously monitoring the pod’s state. As soon as they notice the termination state, they start the eviction process in no order (both operations are asynchronous):

Kubelet sends the SIGTERM signal to terminate the pod
Endpoints-controller handles the endpoint removal process to stop the incoming traffic

Now here’s the twist.

If the endpoint removal process finishes before the SIGTERM signal, no new requests will arrive while the containers are terminating (happy path).

But if the containers start terminating before the endpoint removal process, the containers will continue to receive the requests until the endpoint is removed resulting in application downtime as Kubernetes is still routing traffic to the IP address, but the pod is no longer there.

Graceful shutdown

We need to ensure the pods are terminated gracefully by closing all persistent connections (DB’s, queues. Websockets etc.) and wait for all active requests to drain.

Solution — Pre-stop hooks

To achieve graceful shutdown, we need to ensure we handle SIGTERM (Signal to terminate the pod) and SIGKILL (Signal to forcibly kill the pod) commands gracefully.

We can achieve this by implementing pre-stop hooks in the following two ways:

Adding sleep in the pre-stop hook — This will pause the pod eviction process and wait for the endpoint removal process which will delay the SIGTERM signal and create time for the endpoint removal to propagate. A value between 5–10 seconds will be enough for most of the cases.
Setting terminationGracePeriodSeconds — This is the time limit where kubelet will wait before forcibly killing the container. Based on your application and cluster behavior, this can vary between 15–45 seconds.

With the above two options, we can ensure all the in-flights requests are handled gracefully before the pod gets terminated achieving zero downtime.

Conclusion

I would like to conclude this blog by stating that pre-stop hooks play a major role in achieving zero downtime in Kubernetes and ensures all the in-flight requests during the pod eviction journey are handled gracefully.

References:

Graceful shutdown and zero downtime deployments in Kubernetes

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author 👇

🚀Read similar stories by joining FAUN.

Achieving Zero-Downtime deployment in Kubernetes was originally published in FAUN.dev() 🐾 on Medium, where people are continuing the conversation by highlighting and responding to this story.