Leverage Azure DevOps pipelines to build our Python packages.

Transforming our build process from complicated and unmaintainable to an organized, reusable release process. We are going to tell you how we used code, Azure DevOps pipelines and DevOps design patterns to create our pipeline and build process.

Shiran Rubin
Microsoft Azure
4 min readApr 6, 2021

--

Problem

When developing a software one of the most important things is to have it easily built, deployed, extended and used. We can achieve that by having a clean and clear CI flow.

This is one of the obstacles we encountered while working on Presidio, moving it from V1 to V2. Presidio is an open-source tool to recognize, analyze and anonymize personally identifiable information (PII). Using trained ML models, Presidio was built to ensure sensitive text is properly managed and governed.

In V1, the entire build process was combined from a short YAML file for the azure DevOps pipelines and a huge messy makefile for everything else.

We needed something simpler, an easy process to maintain and build while supporting the functionality we wanted. This was one of the goals we aimed for in V2.

Solution

First thing first, what do we need and want?

We want to have our infrastructure and pipelines as code. We want it clean, clear, extendable and easy to maintain.

requirements?

We have three Python packages, each should be built and run with Python 3.6, 3.7 and 3.8.

We must be able to build and run packages locally, validate linting, security and compliance, run tests, create and publish to PyPi and Docker and deploy to development and production environments or our demo site.

Simple, right?

How did we get there?

Since there are many requirements, the pipeline writing is not straight forward. We started with a nice diagram to map pipeline triggers and their steps. There are three different triggers:

  1. Pull request pipeline — Triggered on each pull request created for Presidio. We run our automatic validations to make sure the pull request aligns with Presidio defined standards.
  2. Continues integration pipeline — Before merging to main, we validate the standards, deploy the new version to development environment (demo site) and test it. We make sure there are no breaking changes for our demo site.
  3. Release pipeline — Release a new version as a docker container to ACR (Azure container registry), PyPil, the production demo site and in Github (pages, version etc.).

Each trigger works over three packages: Analyzer, Anonymizer and Image redactor. Between pipelines, many stages repeat themselves: unit testing, code linting, building containers/Python packages and running E2E tests are shared between the PR and CI pipelines.

In order to build this beauty, we used our previous knowledge of DevOps patterns and sharing reusable components:

Starting from creating the main templates, the ones who trigger the pipelines. We created a YAML file for each type of trigger.

From here on, we are creating Template Hierarchy. In some cases, we use different Template parameterization patterns. Parameters helps us to customize the template to a specific flow.

Let’s see some code examples of reusable code:

In release.yml file, template build-and-push-containers.yml is being used twice:

  1. Build and push the container with a version.
  2. Build and push the container as the latest.

Another cool feature which came to our rescue, is the job matrix strategy. We used it to run the Python code over different Python versions.

When we wanted to run the Analyzer package build and tests with different versions of Python, we used the matrix to declare the different versions. The build-analyzer.yml template will do the required steps each time with a different version:

Eventually, our build diagram is:

As can be seen, we are reusing many templates. They represent full flows repeating themselves over the different piplines:

  • E2E tests.
  • Build and publish containers.
  • Deployment stage.
  • Build python.
  • Lint, build and test.

By reusing templates with parameters and using the job matrix strategy, we made the pipeline code more readable, easy to use and maintain while the flow is clean and clear.

Conclusion

Many times, we pay attention to clean code, design of code and service and we forget how the CI, pipelines and the entire infrastructure is built. The importance of having a clean, clear, reusable pipeline with the ability to easily extend it, can save a lot of time and headaches.

In this post I have shown how we leveraged those best practices for writing and managing pipeline code, in presidio V2.0 to ensure that adding another Python version / a new service / another publish method or deployment method to our pipeline will take almost no effort and time.

--

--