Demystifying DevOps

Geeky much!
Published in
15 min readDec 12, 2023


DevOps is a collaborative, efficiency-driven approach merging development and operations. For the tech community, it means:

  • Efficiency: Streamlined workflows, breaking down silos for improved efficiency.
  • Speed: Automation ensures swift, continuous delivery, meeting market demands.
  • Reliability: Collaboration prevents errors, ensuring software stability.
  • Integration: Holistic processes from code to deployment for seamless adaptation.
  • Continuous Improvement: Learning culture, adapting swiftly to tech advancements.
  • Collaboration: Breaking barriers, fostering innovation and collective success.

The DevOps Cycle

1. Automation of Building, Testing, and Releasing

DevOps kicks off with automation. As developers push code changes, a series of automated events are set into motion. Continuous Integration (CI) platforms like GitHub, GitLab, and Bitbucket play a pivotal role in automating the build and test phases. Commits and pull requests trigger the CI process, ensuring that every code contribution is thoroughly tested.

2. Code Review

Between the continuous integration and deployment phases, there lies a crucial checkpoint — code review. Developers collaboratively review each other’s code, ensuring that the proposed changes meet quality standards and align with the project’s objectives.

3. Deployment Strategies

After the code has successfully passed through the CI pipeline, it’s time for deployment. DevOps offers various deployment strategies, from releasing new builds to a limited user base for initial testing to rolling out new versions seamlessly without downtime. Additionally, the ability to roll back to a previous version ensures flexibility and risk mitigation.

4. Performance Management

Beyond deployment; it includes continuous monitoring to gather feedback from users and maintain the health of systems. Logging and alerting mechanisms capture real-time data, enabling teams to respond promptly to issues and optimize performance.

Test-Driven Development (TDD) Integration in the DevOps Cycle

Test-driven development (TDD) is a practice that seamlessly integrates into the DevOps cycle, contributing to a robust software development process. In TDD, the development cycle begins with writing tests before any actual code implementation. This ensures that quality is ingrained in the software from its inception.

1. Unit Testing:

Tests crafted for individual units or components of the software, define the expected outcomes. This methodical process allows for a thorough examination of each piece of code in isolation, ensuring functionality as expected.

Imagine you’re a developer working on an e-commerce website. You’ve just implemented a new function that calculates discounts based on user loyalty points. Now, before deploying this code to the production environment, you want to ensure it works as expected and doesn’t break any existing functionality.

Here’s where unit testing comes in. You write a set of tests that validate different scenarios:

— — — — — — — — — — — — — — — — — — — — — — —

Test 1: Basic functionality

Input: User with 100 loyalty points.
Expected Output: A 10% discount applied.

— — — — — — — — — — — — — — — — — — — — — — —
Test 2: No loyalty points

Input: User with 0 loyalty points.
Expected Output: No discount applied.

— — — — — — — — — — — — — — — — — — — — — — —
Test 3: High loyalty points

Input: User with 500 loyalty points.
Expected Output: A 20% discount applied

2. Integration Testing:

Verifying that different components work cohesively. These tests validate the seamless interaction between modules, catching potential integration issues early in the development process.

Scenario: In an e-commerce system.

To Ensure seamless interaction between different components.

Test Case: Verify that adding items to the cart

  • Updates the inventory, deducts the correct amount, and securely processes payments.

Importance: Validates the collaboration of components for a smooth and error-free shopping experience.

3. Acceptance Testing:

Ensures that the software aligns with specified requirements and meets stakeholder expectations. By defining acceptance criteria through tests, developers create a more reliable and user-centric application.

Scenario: Launching a new mobile banking application.

To Confirm that the application meets user requirements and is ready for public use.

Test Case: A team of end-users conducts tests to ensure that the app allows secure login, facilitates fund transfers, displays accurate account balances, and performs other expected functionalities.

Importance: Ensures the application aligns with user expectations, providing a positive user experience and avoiding critical issues in a live environment.

These are some of the testing methods that can be configured to run automatically with every commit in the CI system.

CI/CD Pipeline for TDD

The CI server pulls the latest code, executes the defined test suite, and provides immediate feedback.

Branching Strategy

An isolated environment allows for thorough testing without affecting the main codebase.

Merge to Master

After successful testing, the developer merges their branch into the master branch. This process ensures that only verified and functioning code enters the main codebase.

Linting, and Code Quality in CI/CD

In the DevOps cycle, linting and static code analysis are essential if added and embedded in the Continuous Integration (CI) process, not only enhancing code quality but also detecting potential issues early on, facilitating a smoother development pipeline.

Linting in CI:

Linting, performed by tools like ESLint and Pylint, ensures code compliance with predefined standards. By making linting a dedicated stage in the CI pipeline, developers swiftly catch syntax errors and maintain a consistent coding style.

Static Code Analysis with tools like SonarQube:

Code analysis involves: scanning for security vulnerabilities, and bugs. Integrated into CI/CD, SonarQube provides actionable insights into code quality, allowing developers to address issues before merging.

Deployment Landscape: Ephemeral Environments, Containers, and VMs

Choosing the Right Tool for the Job

Ephemeral Environments: An ephemeral environment refers to a short-lived and disposable computing environment created dynamically to serve a specific purpose within the software development life cycle. These environments are transient, existing only for the duration of a task, such as testing, development, or deployment.

Key Technical Characteristics:

  • Dynamic Provisioning: Ephemeral environments are created on-demand, typically through automation tools, allowing for quick and automated setup.
  • Isolation: Each ephemeral environment is isolated from others, ensuring independence and preventing interference between tasks.
  • Disposable Nature: Once the task is complete, the ephemeral environment is discarded, freeing up resources and minimizing infrastructure costs.
  • Consistency: Ephemeral environments are designed to be consistent, allowing developers to replicate the same environment for testing or debugging purposes.
  • Automation and Orchestration: Tools like container orchestration platforms (e.g., Kubernetes) or infrastructure-as-code (IaC) frameworks facilitate the automated creation and management of ephemeral environments.

Ephemeral environments are synonymous with agility in the DevOps world, and containers are the primary vehicle for achieving this.

Containers provide a lightweight, scalable solution, making them ideal for microservices architectures. Their ability to encapsulate applications in a bundle along with their dependencies ensures consistency across different environments.

Here are some popular containerization tools and their best use cases:

Docker Use Cases::

  • Packaging applications with dependencies into containers for consistency across different environments.
  • Isolating microservices to enhance scalability and maintainability.

Best Practices:

  • Use a single process per container.
  • Optimize container size by minimizing layers and removing unnecessary dependencies.

Kubernetes Use Cases:

  • Orchestrating and automating the deployment, scaling, and management of containerized applications.
  • Achieving high availability and fault tolerance for microservices.
  • Simplifying the deployment of complex, multi-container applications.

Best Practices:

  • Define applications declaratively using YAML files.

Amazon ECS (Elastic Container Service) Use Cases::

  • Running containers on AWS infrastructure with seamless integration.
  • Simplifying the deployment of applications with AWS-native services.

Best Practices:
Leverage AWS Fargate for serverless container execution.

When to Use Containerization

  • Microservices Architecture: Ideal for breaking down applications into smaller, independently deployable units.
  • Scalability: Containers can quickly scale up or down based on demand, ensuring efficient resource utilization.
  • Consistent Development and Deployment: Containers encapsulate dependencies, ensuring consistent behaviour across different environments.
  • Continuous Integration/Continuous Deployment (CI/CD): Containers streamline the CI/CD pipeline, enabling faster and more reliable deployments.

Virtual Machines (VMs): For robustness and Isolation In situations where robustness and isolation are paramount, virtual machines (VMs) step in. VMs create resilient spaces for applications, allowing for a more comprehensive encapsulation of the entire operating system. This makes VMs suitable for scenarios with specific requirements, ensuring a reliable and consistent runtime environment. A use case might be when it is needed to run an untrusted code.

Deployment Strategies Unveiled

1. Rolling Deployments

Rolling deployments using tools like AWS Elastic Beanstalk and Kubernetes. With this approach, updates are applied incrementally across the infrastructure, ensuring ongoing operations remain undisrupted. This strategy strikes a balance between maintaining application availability and introducing new features or bug fixes. It means gradually replacing instances of the previous version with the new one, ensuring continuous service availability.

As time goes on, the V2 backend replaces the V1 backend


Continuous Availability: Rolling deployments ensure continuous availability of the application during the update process by gradually updating instances without downtime.

Easy Rollback: If issues are encountered, rolling deployments allow for easy rollback to a previous version, minimizing the impact on users.

Resource Efficiency: Rolling updates are resource-efficient as they don’t require doubling the infrastructure, unlike blue-green deployments.


Temporary Coexistence: There is a period during the deployment where multiple versions coexist, which may lead to compatibility issues.

Complexity with Stateful Apps: For stateful applications, managing database schema changes during a rolling deployment can be complex.

2. Blue-Green Deployments

Blue-green deployments leverage tools like Amazon Elastic Load Balancing to achieve a smooth transition between identical environments. By maintaining two production environments, only one is active at any given time. This method reduces downtime and allows for quick rollback in case of unexpected issues, providing a safety net for critical applications.

The “blue” environment is currently in production and the “green” environment, which received the new release. The switch between environments is made once the new release is deemed stable.


Zero Downtime: Blue-green deployments ensure uninterrupted service by directing traffic between two identical environments — blue (current) and green (new). Users experience no downtime during deployment.

Rollback Efficiency: Immediate rollback is possible by directing traffic back to the blue environment if issues arise in the green deployment, ensuring minimal impact.


Resource Intensive: Running two fully functional environments simultaneously may require substantial resources, especially for large-scale applications.

Infrastructure Complexity: Maintaining two identical environments and managing the routing between them increases infrastructure complexity.

Database Synchronization: Ensuring that both environments have synchronized databases can be challenging, especially for databases with frequent updates.

Initial Setup Overhead: Implementing blue-green deployments requires additional setup and configuration.

3. Canary Deployments

Canary deployments take a cautious approach, releasing new versions to a subset of users before the broader audience, ensuring real-world testing without risking a widespread impact.


Gradual Rollout: Canary deployments enable a phased release, minimizing the impact of potential issues. The new version is initially deployed to a small subset of users or servers.

Risk Mitigation: Issues or bugs in the new release are identified early and affect only a small percentage of users, reducing the overall risk associated with deployment.

User Feedback: Allows for real-time feedback from users interacting with the new version, facilitating quick adjustments before a full release.

Rollback Capability: If issues arise, it’s easier to rollback since only a fraction of the infrastructure is running the new version.


Complexity: Implementing canary deployments requires additional infrastructure and complexity, especially in managing traffic distribution between different versions.

Resource Overhead: Running multiple versions concurrently consumes additional resources, impacting server and network performance.

Monitoring Challenges: Monitoring becomes more complex with multiple versions in production, requiring robust logging and monitoring systems.

Scaling Dynamically: Autoscaling Technologies

Autoscaling lies at the heart of modern cloud infrastructure, embodying the dynamic essence of scalability. This transformative concept allows systems to adapt to fluctuating workloads by automatically adjusting the number of resources, ensuring optimal performance and cost efficiency.

AWS EC2 Spot Instances: Efficiency Unleashed

Spot Instances: A Cost-Efficient Choice Amazon EC2 Spot Instances introduce a game-changing paradigm in cost optimization. Spot Instances provide access to computing power at significantly lower costs. While their availability is subject to fluctuations based on demand, strategic use can result in substantial savings, making them an attractive choice for fault-tolerant and flexible workloads.

Kubernetes Horizontal Pod Autoscaling: Adapting to Demand

Dynamic Scaling in Kubernetes Horizontal Pod Autoscaling (HPA) orchestrates the dynamic scaling of pods based on observed metrics. By automatically adjusting the number of replicas in a deployment, HPA ensures that applications have the necessary resources to meet demand. This level of flexibility is especially beneficial for containerized applications, promoting efficiency in resource utilization.

In Kubernetes, a pod is the smallest deployable unit that represents a single instance of a running process in a cluster. It encapsulates one or more containers, storage resources, and a unique network IP, which together form a logical host. Containers within a pod share the same network namespace, allowing them to communicate easily with each other using localhost.

Key Points:

  1. Atomic Unit: A pod is the atomic unit of deployment in Kubernetes.
  2. Single-Use Concept: Pods are generally designed to run a single instance of a specific application.
  3. Shared Resources: Containers within the same pod share the same IP address, port space, and storage volumes.
  4. Facilitates Communication: Containers within a pod communicate using localhost, simplifying inter-container communication.
  5. Grouping Logic: Pods serve as a logical grouping for containers that need to work together closely.

Serverless Computing: Non-Traditional Infrastructure

Serverless computing represents the pinnacle of abstraction, where developers focus solely on code, leaving infrastructure management to the platform.

Serverless computing is a cloud computing model where cloud providers manage infrastructure dynamically. Applications are built around stateless functions triggered by events, automatically scaling based on demand. Users are billed for actual execution time. It promotes a microservices architecture, abstracts infrastructure management, and is cost-effective for event-driven workloads with short-lived functions

Services like AWS Lambda allow functions to be executed in response to events, with the infrastructure automatically scaling to accommodate the workload. This model is particularly advantageous for sporadic workloads and event-driven architectures.

Use Cases and Considerations

Selecting the Right Tool for the Task

  • EC2 Spot Instances: Ideal for fault-tolerant applications, big data analytics, and scenarios where cost optimization is paramount.
  • Kubernetes HPA: Suited for containerized workloads with varying resource requirements, ensuring optimal performance during peaks.
  • Serverless Computing: Perfect for event-driven applications, microservices, and situations where development speed and cost-effectiveness are critical.

Service Discovery

Facilitating the identification and communication of services within a network. It’s needed to manage dynamic, distributed architectures where services scale independently and frequently change locations. Service discovery enables components to find and interact with each other.

Technologies like DNS, hash tables, or dedicated service discovery tools streamline this process in DevOps environments.

But, as the scale of infrastructure and deployments grows, managing service discovery through manual DNS, IP, and port mapping becomes challenging and impractical. This is particularly evident in dynamic environments where services are added, removed, or scaled based on demand.

In such scenarios, traditional static configurations can’t keep pace with the fluidity of modern microservices architectures. This is where more advanced service discovery mechanisms, like automated DNS solutions or dedicated service discovery tools, become essential. They provide dynamic updates and real-time information about the location and status of services, making it easier to handle the complexities introduced by scale and continuous deployments.

Mastering Zero Downtime Deployments

Gateway Zero Downtime Deployments, orchestrated by a gateway or reverse proxies like the Nginx server, prioritize a seamless user experience during application updates. These proxies act as traffic directors, intelligently routing users to the right application version.

Intelligent Routing

Real-Time Decisions Upon user requests, the reverse proxy evaluates factors like load balancing and response times, dynamically deciding whether to direct traffic to the old or new application version.

Suppose there is a V2 of the backend and V2 of the frontend available at the same position in the diagram, the gateway could shut off old connections and route the traffic to the newer version

Implementation Steps

  1. Parallel Deployment: New and old versions coexist.
  2. Config Update: Proxy rules are adjusted for the new version.
  3. Smart Routing: Traffic is intelligently distributed based on performance.
  4. Version Shutdown: Gradual decommissioning of the old version.

Service Discovery with Hash Tables

Dynamic IP Retrieval for service discovery, storing service IPs in a hash table within a file adds a layer of dynamism to IP retrieval. This technical approach involves a sequence of steps for efficient service discovery:

Hash Table Setup: A dedicated file serves as a hash table, with service names as keys and corresponding IP addresses as values.

  1. Service Registration: When a service instance spins up, it registers its IP in the hash table.
  2. Reverse Proxy Interaction: The reverse proxy, responsible for routing traffic, queries the hash table to dynamically fetch the IP for a specific service.
  3. Real-Time Updates: As services scale or migrate, the hash table gets real-time updates, ensuring the reverse proxy always possesses accurate information. how is the updation achieved?


Efficiency Boost: Direct IP retrieval from a hash table streamlines the service discovery process.

Real-Time Accuracy: Dynamic updates to the hash table guarantee current service locations for the reverse proxy.

Scalability: The approach easily scales with the growth of services within the system.

This method of service discovery, akin to a dynamic address book, empowers the reverse proxy with real-time knowledge of service locations. As the system evolves, this approach efficiently adapts to changes, ensuring seamless communication within a distributed environment.

But, iIf the IP of a service changes, the hash table needs to be manually updated.

This manual intervention introduces a risk of errors and increases administrative overhead, especially in dynamic environments where services frequently scale or move.

These challenges highlight the need for more dynamic and automated solutions in modern and scalable infrastructures, which is where advanced service discovery tools and techniques come into play. Tools like Consul or etcd can automate service discovery by dynamically updating the information based on the real-time status of services. This ensures more accuracy, reduces manual efforts, and enhances the reliability of service discovery in dynamic environments.

Local DNS Mapping for Service Discovery

Employing DNS for local hostname-to-IP mapping stands out as a robust technique. This technical breakdown unveils the intricacies involved in making it work seamlessly:

Implementation Steps

  1. Hostname Resolution: Each service is assigned a unique hostname, creating a user-friendly identifier.
  2. Cloud DNS Integration: Cloud DNS steps in as the authoritative source, maintaining the mapping of these hostnames to respective service IPs.
  3. Reverse Proxy Coordination: The reverse proxy, responsible for directing incoming requests, consults the Cloud DNS records to determine the current IP associated with a given service hostname.
  4. Dynamic Updates: As services dynamically scale or migrate, Cloud DNS updates its records, ensuring the reverse proxy stays informed about the real-time whereabouts of each service.

Imagine a scenario in a cloud-native application where microservices are deployed using container orchestration tools like Kubernetes. Each microservice is a component of the larger application and is assigned a unique service name for identification.

Scenario: Scaling a High-Traffic Service

Let’s consider a microservice responsible for handling user authentication in an e-commerce platform. During peak shopping hours, there’s a surge in traffic, and the authentication service needs to scale dynamically to handle the increased load.

  1. The authentication service is running on multiple instances across different nodes in the Kubernetes cluster.
    → Cloud DNS has records mapping the service name (“auth-service”) to the IP addresses of these instances.
  2. The e-commerce platform experiences a sudden increase in user authentication requests.
    → Kubernetes, being aware of the increased load, dynamically scales the number of instances running the authentication service to handle the higher traffic.
  3. As new instances are spawned or existing ones are terminated, Cloud DNS is informed of these changes through Kubernetes events.
    → Cloud DNS auto-updated (it was not needed since one IP was shared by the instances running Kubernetes pods grouped together) its records in real-time to reflect the current IP addresses of all instances handling the authentication service.
  4. The reverse proxy, configured to route authentication requests to “auth-service,” queries Cloud DNS for the latest IP addresses.
    Cloud DNS responds with the updated list of IP addresses, ensuring that the reverse proxy is aware of all instances, including the newly scaled ones.
  5. Users continue to experience uninterrupted service during the scaling process.
    The reverse proxy, equipped with the most recent information from Cloud DNS, efficiently distributes authentication requests across all available instances.


Decentralized Logic: By offloading the mapping logic to a cloud-managed DNS, the system gains decentralization benefits.

User-Friendly Identification: Hostnames provide a human-readable way to identify services, enhancing clarity in the configuration.

Adaptable Scaling: The architecture smoothly adapts to changes in service instances, making it highly scalable.

Under the Hood

This method leverages the power of DNS for efficient service discovery. The orchestration between local hostname assignments, Cloud DNS, and the reverse proxy ensures that, even in dynamic and scalable environments, the system maintains precise knowledge of service locations for seamless user request routing.

A word on Log Aggregation and App Performance Management

Log aggregation and app performance management, a myriad of tools and technologies help us to provide a comprehensive solution.

Metrics Gathering and Analysis

ELK Stack (Elasticsearch, Logstash, Kibana):

  • Elasticsearch: Acts as the heart, indexing and storing logs efficiently.
  • Logstash: Ensures seamless log data ingestion and processing.
  • Kibana: Serves as the visual interface, enabling intuitive log exploration.

AWS CloudWatch:

  • Monitors AWS resources and applications, collecting and tracking performance data.
  • Provides native AWS resource metrics, offering insights into resource utilization and application health.

New Relic:

  • Offers end-to-end visibility, from user interactions to application code and underlying infrastructure.
  • Monitors real-time performance, including response times, error rates, and throughput.


  • Employs a multi-dimensional data model, efficiently scraping metrics and supporting powerful querying.
  • Scrapes and stores metrics, supporting dimensional data for in-depth analysis.


  • Unifies data from various sources, presenting it in customizable and interactive dashboards.

Time Series Database (TSDB):

Serves as the backend storage for time-stamped metrics, ensuring efficient retrieval.

Quartile Analysis:

Evaluates metrics in quartiles, providing a nuanced understanding of performance distribution.

Web UI and Alert Management

Grafana Web UI:

Empowers users with a dynamic and visually appealing interface for exploring metrics.

Alert Manager:

Facilitates robust alerting based on predefined thresholds and anomaly detection.