Retrieve GitLab CI artifacts from child pipelines

Alexander Chumakin
4 min readOct 13, 2022

--

I’ve been using GitLab CI for 4 years so far and cannot enjoy it enough, but sometimes I still miss some interesting features that I wish I had out of the box. This story is about one of them.

Problem

I like how GitLab allows to work with independent pipelines from different git repos. For example when we have a development repo and a separate test repo that needs to reuse artifacts produced by dev code (say compilation output) we can simply do it like this:

  • dev project
build:
script:
- echo "Build project output"
- ...
artifacts:
paths:
- project/output

trigger-tests:
variables:
ARTIFACTS_DOWNLOAD_REF: $CI_COMMIT_REF_NAME
trigger:
project: path-to-test-project
branch: master
  • test project
run-tests:
script:
- echo "check dev project artifacts"
- ls project/output
- echo "running tests"
- ...
needs:
- project: path-to-dev-project
job: build
ref: $ARTIFACTS_DOWNLOAD_REF
artifacts: true

That’s all! So we can trigger a test pipeline from dev pipeline and provide some variable like ref name, then we can grab artifacts from another pipeline based on its ref name and run tests against it!

But let’s imagine a more complex scenario where we create artifacts not in the general pipeline, but in a child downstream pipeline, e.g. for multi-module project where we use dynamic child pipelines feature to automatically generate downstream pipelines on the fly based on affected modules.

Example

desired behaviour

I don’t want to over complicate things here, so let’s produce a list of two elements where it could be either empty, has one of elements or both of them randomly with a little Python script.

import random

modules = ['project_1', 'project_2']

for module in modules:
print(module) if bool(random.getrandbits(1)) else None

Now we can simply assign the result of this script to environment variable and emulate “affected modules” functionality in CI.

Pipelines configuration

I have initially 3 stages: randomly generate
“affected modules”, generate dynamic downstream pipeline depending on what I have generated in the previous step, then generate some artifact for dynamic job and run this downstream pipeline.

stages:
- build
- generate-downstream
- trigger-downstream

default:
image: python:3.8-slim

build-modules:
stage: build
script:
- export modules=$(python3 generate_modules.py)
- echo "MODULES=$modules" >> build.env
artifacts:
reports:
dotenv: build.env

prepare-pipeline:
stage: generate-downstream
script:
- cd ci && python generate_ci.py
artifacts:
paths:
- .publish-artifacts.yml
needs:
- job: build-modules
artifacts: true

trigger-child-pipeline:
stage: trigger-downstream
trigger:
include:
- artifact: .publish-artifacts.yml
job: prepare-pipeline
strategy: depend
needs:
- job: build-modules
artifacts: true
- job: prepare-pipeline
artifacts: true

where generate_ci.py is another Python script that generates dynamic downstream pipeline with jobs for affected modules.

Now we can create a script that will get artifacts from downstream pipeline for existing jobs (if any). I would prefer simple bash script to easily reuse it for other projects with different technologies.

#!/bin/bash

perform_gitlab_call() {
path=$1
result=$(curl --silent --insecure --noproxy '*' --header "PRIVATE-TOKEN:$ACCESS_TOKEN" "$CI_API_V4_URL/projects/$CI_PROJECT_ID/$path")
echo "$result"
}

download_artifacts() {
build_job_id=$1
if [[ "$build_job_id" == "null" ]]; then
echo "build job id is null"
else
echo "Downloading artifacts from job $build_job_id"
curl --silent --insecure --noproxy '*' -L -o artifacts.zip --header "PRIVATE-TOKEN:$ACCESS_TOKEN" "$CI_API_V4_URL/projects/$CI_PROJECT_ID/jobs/$build_job_id/artifacts"
unzip -o -qq artifacts.zip

echo "Check if artifacts are extracted successfully"
ls *.txt
rm -rf artifacts.zip
fi
}

find_publish_job_id() {
job_name=$1
result=$(perform_gitlab_call "pipelines/$ci_child_pipeline_id/jobs" |
jq --raw-output '[.[]|select(.name=='\"$job_name\"')][0].id')
echo $result
}

get_pipelines_data() {
pipeline_id=$1
echo "Trying to get artifacts from pipeline $pipeline_id"
ci_child_pipeline_id=$(perform_gitlab_call "pipelines/$pipeline_id/bridges" |
jq '[.[]|select(.name=="trigger-child-pipeline")][0].downstream_pipeline.id')
echo "Found child pipeline id $ci_child_pipeline_id"

project1_build_job_id=$(find_publish_job_id publish-project_1)
project2_build_job_id=$(find_publish_job_id publish-project_2)
printf "Project1 child pipeline id: %s\nProject1 child pipeline id: %s\n" "$project1_build_job_id" "$project2_build_job_id"

if [[ "$project1_build_job_id" == "null" ]] && [[ "$project2_build_job_id" == "null" ]]; then
echo "Cannot find any artifacts for project1 or project2 pipelines, try to download it from another pipeline"
echo "Found ${#pipelines_list[@]} pipelines for branch $CI_COMMIT_REF_NAME to scan"
echo "${pipelines_list[@]}"
add_id=${pipelines_list[0]}
if [ -z "$add_id" ]; then return ; else echo "Retrieved additional pipeline id $add_id"; fi
unset pipelines_list[0]
pipelines_list=( ${pipelines_list[@]} )
get_pipelines_data $add_id
else
download_artifacts "$project1_build_job_id"
download_artifacts "$project2_build_job_id"
fi
}

pipelines_list=($(perform_gitlab_call "/pipelines?ref=$CI_COMMIT_REF_NAME&status=success" | jq --raw-output '.[].id | @sh'))
get_pipelines_data $CI_PIPELINE_ID

All I need to do to configure my project is to prepare ACCESS_TOKEN environment variable in CI that could be personal access token or group/project access token for paid versions of GitLab.

This script will try to find artifacts for latest pipeline in current branch, if there are no “affected modules” pre-generated, then will keep looking in previous pipelines for the same branch. In the end we will have artifacts in parent pipeline level from either this downstream pipeline jobs or from the previous pipelines. Now we just need to call it from pipeline:

stages:
- build
- generate-downstream
- trigger-downstream
- child-artifacts

...

retrieve-artifacts:
stage: child-artifacts
image: alexandrchumakin/docker-jq-curl
before_script:
- chmod +x ci/download-artifacts.sh
script:
- ./ci/download-artifacts.sh
artifacts:
paths:
- '*.txt'

I only need to use an image with normal bash console (not old shell one), jq and curl, so I prepared one myself and pushed to dockerhub.

Conclusion

Even being one of the best (if not just the best one) CI/CD tool in a market, GitLab still might miss some cool features, but with using of documentation, some simple API calls and a bit of enthusiasm and creativity you could implement everything you need quickly and nicely =)

Reach out to my GitLab repo for full code base.

Happy CI-ing!

--

--