Automating Build Details and Artifact Retrieval from Azure DevOps using Google Cloud Functions

with Python Scripting and Azure DevOps REST API

Yunus Y

Published in

roofstacks-tech

5 min readJul 6, 2023

In this article, we will explore how we successfully automated the retrieval of build details and artifacts from Azure DevOps using Google Cloud Functions. This automation process allowed us to overcome the limitations of Azure DevOps’ limited retention period for build information and ensured that critical data was preserved and easily accessible. Here, we will share our experience and the story behind setting up this automation.

The Challenge: Limited Retention Period in Azure DevOps

As our software development team relied heavily on Azure DevOps for managing our build pipelines, we encountered a challenge. Azure DevOps had a limited retention period for build details and artifacts, making it difficult to access historical information beyond a certain timeframe. This limitation hindered our ability to analyze past builds and troubleshoot issues that may have arisen in earlier versions of our software.

The Solution: Automation with Google Cloud Functions

To overcome this challenge, we decided to automate the retrieval and storage of build details and artifacts using Google Cloud Functions. This allowed us to store the data in Google Cloud Storage, providing a scalable and long-term solution for preserving and accessing historical build information.

Setting Up the Automation Process

We started by creating a Google Cloud Function with a Pub/Sub trigger. This function would be responsible for executing the automation process daily, ensuring that we had the most up-to-date build information stored in Google Cloud Storage. By scheduling the function to run at a specific time each day, we could consistently retrieve the latest build details and artifacts. This tutorial helped us a lot.

Example structure of Automation — Example Structure of Automation

Retrieving Build IDs and Mapping with Pipeline Names

To fetch the build information from Azure DevOps, we utilized the Azure DevOps REST API. We retrieved the build IDs for a specific date range and extracted the pipeline names from the build definition in the API response using Python. We then created a mapping between the build IDs and pipeline names, allowing us to organize the data effectively. Before providing our main.py file, here is the requirements.txt:

functions-framework==3.*
requests
google-cloud-storage
google-api-python-client

And the first part of main.py


import os
import datetime
import json
import requests
import zipfile
import base64
from google.cloud import storage

azure_organization = "YOUR-ORGANIZATION"
azure_project = "YOUR-PROJECT-NAME"
personal_access_token = "YOUR-PERSONAL-ACCESS-TOKEN"
storage_bucket_name = "bucket-name"

def get_build_ids(date):
    url = (
        f"https://dev.azure.com/{azure_organization}/{azure_project}/"
        f"_apis/build/builds?api-version=7.0&queryOrder=startTimeDescending&minTime={date}T00:00:00Z&maxTime={date}T23:59:59Z"
    )
    authorization = str(base64.b64encode(bytes(':'+personal_access_token, 'ascii')), 'ascii')

    headers = {
        'Accept': 'application/json',
        'Authorization': 'Basic '+authorization
    }
    response = requests.get(url, headers=headers)
    print("Response Status Code:", response.status_code)
    try:
        response_json = response.json()
        build_ids = []
        pipeline_names = []
        for build in response_json["value"]:
            build_ids.append(build["id"])
            pipeline_names.append(build["definition"]["name"])
        build_mapping = dict(zip(build_ids, pipeline_names))
        return build_mapping
    except (KeyError, ValueError) as e:
        print("Error parsing JSON response:", str(e))
    return {}

Fetching Build Details and Downloading Artifacts

Using the build IDs and pipeline names mapping, we fetched the build details from Azure DevOps using the REST API. We saved the build details in respective pipeline folders, creating a dedicated folder for each pipeline and date combination. Additionally, we downloaded the artifacts named “trivyFolder” for each build ID and stored them locally. You can configure your artifact name accordingly.


def get_build_details(build_id):
    url = (
        f"https://dev.azure.com/{azure_organization}/{azure_project}/"
        f"_apis/build/builds/{build_id}/timeline/timeline?api-version=7.1-preview.2"
    )
    authorization = str(base64.b64encode(bytes(':'+personal_access_token, 'ascii')), 'ascii')

    headers = {
        'Accept': 'application/json',
        'Authorization': 'Basic '+authorization
    }
    response = requests.get(url, headers=headers)
    print("Response Status Code:", response.status_code)
    try:
        response_json = response.json()
        build_details = response_json
    except (KeyError, ValueError) as e:
        print("Error parsing JSON response:", str(e))
        build_details = {}
    return build_details

def download_build_artifacts(build_id, pipeline_name):
    url = (
        f"https://dev.azure.com/{azure_organization}/{azure_project}/"
        f"_apis/build/builds/{build_id}/artifacts?api-version=7.0"
    )
    authorization = str(base64.b64encode(bytes(':'+personal_access_token, 'ascii')), 'ascii')

    headers = {
        'Accept': 'application/json',
        'Authorization': 'Basic '+authorization,
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
    }
    response = requests.get(url, headers=headers)
    print("Response Status Code:", response.status_code)
    print("Response Content:", response.text)
    try:
        response_json = response.json()
        artifacts = response_json["value"]
        for artifact in artifacts:
            if artifact["name"] == "trivyFolder":
                download_url = artifact["resource"]["downloadUrl"]
                download_file_name = f"{pipeline_name}-{build_id}_trivy.zip"
                response = requests.get(download_url, headers=headers)  # Add headers to the download request
                with open(download_file_name, "wb") as file:
                    file.write(response.content)
                print("Artifact downloaded:", download_file_name)
                return download_file_name
    except (KeyError, ValueError) as e:
        print("Error parsing JSON response:", str(e))
    return None

Storing Build Details and Uploading Artifacts to Google Cloud Storage

To ensure the longevity and accessibility of the build information, we used the Google Cloud Storage Python client library to upload the build details as JSON files and the artifacts as ZIP files to a storage bucket. We organized the data in the bucket by creating subfolders for each pipeline and date combination, allowing for easy navigation and retrieval of specific builds. In our case, the scheduler will trigger the Function at 4:00 AM, so we are fetching yesterday’s builds.


def upload_to_gcs(file_path, folder_name=""):
    storage_client = storage.Client()
    bucket = storage_client.bucket(storage_bucket_name)
    blob_name = os.path.join(folder_name, os.path.basename(file_path)) if folder_name else os.path.basename(file_path)
    blob = bucket.blob(blob_name)
    blob.upload_from_filename(file_path)
    print("File uploaded to Google Cloud Storage:", blob_name)


def main(message, context):
    # Get today's date
    today = datetime.date.today()

    # Get build IDs and pipeline names for yesterday
    yesterday = today - datetime.timedelta(days=1)
    build_mapping = get_build_ids(yesterday.strftime("%Y-%m-%d"))
    file_name = f"build_mapping_{yesterday.strftime('%Y%m%d')}.json"
    file_content = json.dumps(build_mapping)
    with open(file_name, "w") as file:
        file.write(file_content)
    print("Build mapping stored successfully:", file_name)

    # Process each build ID and its corresponding pipeline name
    for build_id, pipeline_name in build_mapping.items():
        print("Processing pipeline:", pipeline_name)

        # Get build details and save them to the pipeline folder
        folder_name = f"{pipeline_name}-{yesterday.strftime('%Y%m%d')}"
        os.makedirs(folder_name, exist_ok=True)
        file_name = os.path.join(folder_name, f"build_details_{build_id}.json")
        build_details = get_build_details(build_id)
        file_content = json.dumps(build_details)
        with open(file_name, "w") as file:
            file.write(file_content)
        print("Build details stored successfully:", file_name)

        # Download build artifacts named "trivy" for each build ID and upload to Google Cloud Storage
        downloaded_file = download_build_artifacts(build_id, pipeline_name)
        if downloaded_file:
            upload_to_gcs(downloaded_file)
            os.remove(downloaded_file)  # Remove the local file after upload

    # Zip the pipeline folders
    zip_file_name = f"build_details_{yesterday.strftime('%Y%m%d')}.zip"
    with zipfile.ZipFile(zip_file_name, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk('.'):
            if root.startswith('./') and root != './':
                folder = root[2:]
                for file in files:
                    zipf.write(os.path.join(folder, file))

    print("Pipeline folders zipped successfully:", zip_file_name)

    # Upload the zip file to Google Cloud Storage
    upload_to_gcs(zip_file_name)

   # Remove the files and subdirectories inside the local build details folder
    for root, dirs, files in os.walk(folder_name, topdown=False):
        for file in files:
            os.remove(os.path.join(root, file))
        for dir in dirs:
            os.rmdir(os.path.join(root, dir))
    
    # Remove the local build details folder and zip file after upload
    os.rmdir(folder_name)
    os.remove(zip_file_name)

if __name__ == "__main__":
    main()

The Impact: Enhanced Visibility and Collaboration

By automating the retrieval and storage of build details and artifacts, we significantly enhanced the visibility and accessibility of our historical build information. Our team could now easily access and analyze past builds, making it easier to identify trends, diagnose issues, and make informed decisions for future development iterations. This automation process facilitated better collaboration among team members, as everyone had access to a centralized repository of build information. Also now the data is ready for our Data Team to analyze these.

Automating the retrieval of build details and artifacts from Azure DevOps using Google Cloud Functions and Pub/Sub proved to be a game-changer for our software development process. By storing the data in Google Cloud Storage, we overcame the limitations of Azure DevOps’ retention period and ensured that critical build information was preserved. This automation process improved visibility, collaboration, and decision-making within our team, ultimately leading to more efficient and successful software development projects.

Note: The steps and script provided in this article are based on our specific implementation and may require customization to fit your environment and requirements. Please refer to the documentation linked, use the Python script, and adapt it according to your needs.

Automating Build Details and Artifact Retrieval from Azure DevOps using Google Cloud Functions

with Python Scripting and Azure DevOps REST API

Written by Yunus Y