Better Developer Experience

How to Increase Deployment Observability and Simplify Deployment Pipelines

Published in

Bondora Engineering and Data

8 min readDec 20, 2022

We had visibility of deployments - what was deployed and when but we did not have observability — what changes are deployed by who and why.

Observability is the ability to understand what is happening inside a system based on the external data exposed by that system.

For us the observability of deployments meant having more detailed information about the changes that are deployed, so that we can understand what will (or at least predict what should) change in the service after deployment.

We wanted to provide this information to all stakeholders at time when the changes are distributed. Product managers wanted to have information when something was deployed and what was the change about. Engineers wanted to know which service versions were deployed, by who, how much time did it take, and if the deployments succeeded or had problems.

The aim was not to have control who is allowed to do what and when but to raise overall observability of our system, which also includes deployments.

5 Ws — Who, What, When, Where, Why

5 Whys is a root cause analysis technique which can applied almost any problem. If you get the answers to these five questions, you will understand the root cause almost certainly. So, to get to the core reason for the deployment, the provided information must answer to 5 questions (starting with “w”, hence the 5 Ws).

Who did the changes aka who initiated continuous deployment, and who approved deploy to Production?
What service is deployed and what changes (commits) are included?
When deployment started and when it ended (also, how long it took and if the deploy was successful)?
Where the changes are deployed (which pipeline and environment)?
Why the deployment was done (release note as approval comment)?

Let’s get started

You might wonder if our DevOps tooling (Azure DevOps) does not provide this information already as dashboards, reports, etc. There are also countless integrations with Slack which we use as our main communication platform. These were the first questions that we had ourselves and we guessed that “this should be easy”.

First, dashboards and reports are good for statistics but not good for getting live notifications as events happen.

Second, our choice of DevOps platform, Azure DevOps, did not have very good Slack integration for pipelines. You can have all events from all pipelines to Slack which creates too much noise and does not provide enough details or you have to setup Slack notification manually for every pipeline, choosing environment, stage, result, etc.

I believe in Everything as Code aka pipelines should be defined in code repository together with the rest of the application code. So, no manual setup of notifications without having the setup together with service code on my watch!

It’s not only having all the configuration near the code but also about developer experience — how to be productive and focus on delivering business value and not spend time on routine activities (like setting up integrations manually for every pipeline). Only thing that you should repeat is the automation mantra — Automate everything that is repeated work.

APIs Everywhere

We started with the fact that there are APIs everywhere. Including Azure DevOps Services API and Slack API.

We had an idea to use API calls to get the information and send the notification to Slack when deploy is started and ends. Quick research confirmed our hypothesis and we moved to creating Proof of Concept for the idea.

High level process

Pipeline is started automatically after integrating changes to main branch (merge from feature branch) or manually started for feature branch.

Before Deployment to Environment X is started
- Get service details (name, version) and changes
- Send notification to Slack
Run deployment
After Deployment
- Get result — success, failure (errors), run time, etc.
- Send notification to Slack

Proposed and investigated solutions

Use Azure pipelines Slack integration — Not good

Could be used for pure deploy fact, but does not show enough info
It needs to be added manually for every repository but though Azure integrations aka it is not visible in repo (as code)
You cannot add additional information or modify the sent message
It does not show enough information — version, commits, etc.

Template for Slack notifications — Not good enough

Pipeline template that is stored in separate repository and can be used by other repositories (pipelines)
In template we can change the implementation without needing to change the pipeline in all the service repos
We can access all the predefined pipeline variables and add additional parameters to pipeline
PowerShell or bash scripts can be used to post Slack channel via Slack API
Bad: There is not enough information in predefined pipeline variables (commits, approvals, etc.)

Shared pipeline template, shared scripts, API calls — yay!

Getting all the required information via API calls
Shared scripts — no copy-paste, can later change message format and included data without needing to change all pipelines but only the shared script or template
Similar approach for all pipelines — can have similar approach as good documentation

Solution

Use Azure pipeline job in template for notifying Slack channel about deploy, so that this template can be easily added to existing pipeline’s deploy Stage (pre- and post deploy notifications)
Get information about the service name and version, additionally more context like who started the pipeline (in case of automatic deploy, who’s change triggered the pipeline) and list of last commit messages
In case of required approval for the deploy environment (or specified in pipeline), get the approval information — approver, comment, time

All this information is gathered via API calls, formatted into nice Slack messages in thread, so that deploy started and finished messages are in the same thread and do not get lost between multiple service deploy messages (we have hundreds of services, each of which are deployed multiple times per day). Slack channel can be specified so that we have main channel for environment and team channels for services that team owns (more context, less noise).

Deployment started and finished notifications in Slack channel

Deployment finished but failed because of error

Pipeline templates to the rescue

As a good side-effect we created templates for job, stage and pipeline that could be used by multiple pipeline definitions inside service repository. In case of simple standard microservice that followed agreed conventions you could extend pipeline template and the whole template would be less than 20 lines.

Show me the Code!

The simplest pipeline for standard microservice:

resources:
  repositories:
  - repository: templates
    type: git
    name: my-devops-project/my-templates
    ref: refs/tags/v1 # Use specific version tag, so that breaking changes in template would not break this template

extends:
  template: pipelines/pipeline-dotnet-helm-kubernetes.yml@templates
  parameters:
    serviceName: 'my-microservice' # Also could be repository name by convention

For more complex aka “non-standard” pipelines you can use individual stage templates, so that the standard parts are included from templates.

variables:
  serviceName: my-service-name

- template: stages/set-version.yml@templates

- stage: My_Custom_Build_Test_Analyze_Stage
  jobs:
  - job: Build_Job
    steps:
    - bash: echo 'yay!'
...

- template: stages/build-and-push-docker-image.yml
    parameters:
      imageRepository: ${{ variables.serviceName }}

- template: stages/lint-and-publish-helm-files.yml
    parameters:
      helmChartName: ${{ variables.serviceName }}

- template: stages/deploy-to-kubernetes-with-helm.yml@templates
    parameters:
      environmentName: Production
      serviceName: ${{ variables.serviceName }}

Example stage template for build and push Docker image (additional non-critical parameters omitted):

parameters:
- name: imageRepository
  type: string
- name: dependsOn
  type: object
  default: []
- name: condition
  type: string
  default: succeeded()
- name: pushImageToRegistry
  type: boolean
  default: true

stages:
- stage: Build_Publish_Docker_Image
  dependsOn: ${{ parameters.dependsOn }}
  condition: ${{ parameters.condition }}
  variables:
  - group: 'container-registry' # Get registry variables
  jobs:
  - job: Build_Publish_Docker_Image
    steps:
    - task: Docker@2
      displayName: Build image
      inputs:
        command: 'build'
        repository: '${{ parameters.imageRepository }}'
        containerRegistry: $(container-registry)
        arguments: '--build-arg VERSION=$(buildVersion)'
        tags: |
          latest
          $(Build.BuildNumber)

    - ${{ if eq(parameters.pushImageToRegistry, true) }}:
      - task: Docker@2
        displayName: login to ACR
        inputs:
          command: 'login'
          containerRegistry: $(container-registry)

      - task: Docker@2
        displayName: Push an image to container registry
        inputs:
          repository: '${{ parameters.imageRepository }}'
          containerRegistry: $(container-registry)
          command: 'push'
          tags: |
            latest
            $(Build.BuildNumber)

Using shared script

You can refer to shared script via git repository using checkout: git when the git repo is in the same DevOps organization.

steps:
  - checkout: git://${{ variables['System.TeamProject'] }}/my-pipeline-scripts@refs/tags/v1
  - task: PowerShell@2
    name: SendSlackNotification
    condition: always()
    inputs:
      filePath: scripts/notify-deploy-to-slack.ps1
      arguments: >
        -ServiceName "${{ parameters.serviceName }}"
...

What next

As we have common templates and scripts for deployments, we can get additional information or create additional workflows like sending deployment annotations to Grafana or store detailed statistics about deployments for later analyze.

Deployment started and finished annotations to Grafana
Store details for later analyze or monitoring (dashboards) for example detecting when deployments are getting slower (trend)
You can always do better UX — message content and format
O̵p̵e̵n̵ ̵s̵o̵u̵r̵c̵e̵ ̵t̵h̵e̵ ̵t̵e̵m̵p̵l̵a̵t̵e̵s̵ ̵a̵n̵d̵ ̵s̵c̵r̵i̵p̵t̵s̵?̵ ̵C̵u̵r̵r̵e̵n̵t̵l̵y̵ ̵M̵V̵P̵ ̵:̵)̵ See below for sample script and templates

For the curious mind

What did we learn?

We learned a lot about the many-many-many ways our pipelines were defined and found common patterns which we implemented as templates that could be used by multiple pipelines, so that minor differences can be mitigated by input parameters but the main structure would be same for all pipelines that use same technology and platform. This helped us to have better consistency, following agreed conventions (Conventional Commits) and other standards (SemVer2, Github Flow, Continuous Deployments).

We also created a proper documentation about the templates — when and how to use it, how to modify and avoid breaking changes (breaking all the dependent pipelines).

So, additionally to the deployment observability, we raised our knowledge, created documentation, applied best practices, followed agreed standards. All this helps to have better developer experience for our software engineers aka be more productive at delivering value to our customers.

Sample PowerShell script and Azure pipeline templates

You can find the notify-deploy-to-slack.ps1 PowerShell script in Github repo https://github.com/Bondora/azure-pipeline-scripts.

Additionally you can checkout the Azure pipeline template samples that use that script in Github repo https://github.com/Bondora/azure-pipeline-templates.

Our Tech and tools

Azure DevOps as DevOps platform for storing code, building and delivering
- Git Repositories
- Build, Test, Deploy Pipelines
- Artifacts & Container registry
Docker for building containers, shipping as images and running services
Azure Kubernetes Service aka AKS aka Managed Kubernetes aka K8s as
Helm Charts to deploy services to Kubernetes
Slack for communication
Github Flow as git workflow / branching strategy
Continuous deploy to Staging, additional approval for Production deploy
SemVer2 for versioning
Conventional Commits for change messages
Grafana, Loki, Prometheus as observability stack

References

Azure DevOps Services REST API
- Timeline — Get
- Commits — Get Changes
- Approvals — Get
Azure Pipelines
- How to use template from other Repository
- Predefined (build) variables
- Check out multiple repositories in your pipeline
- Deployment jobs
Use Grafana API to create annotations

Clap, subscribe and join our engineering organization :)