Migration to Google cloud part 1 — Migrating dependencies to Artifact Registry

Akhilesh Mishra
Google Cloud - Community
5 min readJun 5, 2023

Last week, I was working to migrate one application to google cloud. Developers told me that they have some applications dependencies(java artifacts) that they want to move to google cloud artifact registry(AR)

Photo by Hanna Morris on Unsplash

Artifact Registry provides a single location for storing and managing your packages and Docker container images. You can: Integrate Artifact Registry with Google Cloud CI/CD services or your existing CI/CD tools. Store artifacts from Cloud Build.

Being a Java virgin, i did not know how complicated it can be to move thousands of maven based java dependencies to AR. People who haven’t done it, they should know that pushing maven dependencies isn’t as straight forward as pushing docker images, or NPM dependencies. I couldn’t get any clear solution in google. Heavyweight ChatGpt, or underdog Bard didn’t help much — says so much about AI replacing us humans.

Lets do it together — one step at a time.

Ask the developers to push all the dependencies packages to cloud storage bucket (You created)from wherever it is stored. Create a maven repo, service account for it and the storage bucket

# Service account that will be used to push artifacts and to access 
resource "google_service_account" "gcsmvn-svc" {
project = var.project_id
account_id = "gcs-mvn-access"
display_name = "Service Account - gcs-mvn"
}


# storage bucket to store artifacts temproary
# Give developers access to upload artifacts to the bucket
resource "google_storage_bucket" "archive-bucket" {
project = var.project_id
name = "${var.project_id}-archive"
uniform_bucket_level_access = true
location = var.region
public_access_prevention = "enforced"

versioning {
enabled = true
}
}

# Add service account permission on storage bucket
resource "google_storage_bucket_iam_member" "archive-member" {
bucket = google_storage_bucket.archive-bucket.name
role = "roles/storage.admin"
member = "serviceAccount:${google_service_account.gcsmvn-svc.email}"
}

Create a AR repo for maven and add service account permission to the repo

resource "google_artifact_registry_repository" "mvn-repo" {
project = var.project_id
location = var.region
repository_id = "${var.prefix}-maven"
description = "Repo to store maven artifacts"
format = "maven"
}

resource "google_artifact_registry_repository_iam_member" "mvn-member" {

for_each = toset([
"roles/artifactregistry.admin",
"roles/artifactregistry.serviceAgent",
])
project = var.project_id
location = var.region
repository = google_artifact_registry_repository.quantexa-mvn-repo.name
role = each.value
member = "serviceAccount:${google_service_account.gcsmvn-svc.email}"
}

Deploy the above terraform code — terraform plan, terraform apply

Lets list out the repositories created

gcloud artifacts repositories list —project=PROJECT_ID  --location=REGION

REPOSITORY FORMAT MODE DESCRIPTION LOCATION LABELS ENCRYPTION CREATE_TIME
main-maven MAVEN STANDARD_REPOSITORY Repo to store maven artifacts someregion Google-managed key 2023-06-01T12:56:42

Constructing maven AR url

https://{REGION}-maven.pkg.dev/{PROJECT_ID}/{MAVEN_REPO}

#Replaces the region, project_id and repositories and you have the url
https://someregion-maven.pkg.dev/someproject/main-maven

In your bastion host, install maven and create setting.xml in .m2 directory under home directory. Create a new directory, lets say mvn where you will download the artifacts from cloud storage. Also create a pom.xml under mvn directory.

apt-get install maven
ls -l ~/.m2. # to see it it exist in right path
touch ~/.m2/settings.xml

Mkdir mvn ; cd mvn
touch pom.xml
gsutil -m cp -r gs://gcs_bucket_path_for_dependencies .

Create a service account key and download it. follow the steps

gcloud iam service-accounts keys create gcs-mvn-access.json --iam-account  \
gcs-mvn-access@someproject.iam.gserviceaccount.com

We will use this service account to authenticate maven with AR. Lets generate settings.xml and pom.xml using below gcloud command. It will generate the content for pom.xml and settings.xml, copy-paste this content to your settings.xml and pom.xml previously created.

gcloud artifacts print-settings mvn  --project=someproject --repository=main-maven  --location=someregion  --json-key=gcs-mvn-access.json

You are all set to upload the contents. Generally we have 2 files, .pom and .jar together in the directory where artifacts are stored. You can run a simple command to upload the artifacts.

mvn deploy:deploy-file -Durl=MAVEN_URL -DpomFile=path/filename.pom  -Dfile=pathfilename.jar

Note: you need to run the command where you have kept pom.xml
MAVEN_URL = artifactregistry://{REGION}-maven.pkg.dev/{PROJECT_ID}/{MAVEN_REPO}

Above command works well if you have 2, or 3 packages but in real scenario, you have hundreds or thousands, embedded in different paths. You will need some sort of automation to complete the job.

Python to rescue

I have created a very simple python scripts, lets call it deploy-maven.py, that should be run from the root directory where we have kept pom.xml and all dependencies.

cd ~/mvn ; touch deploy-maven.py

Copy paste the below code in deploy-maven.py

import os
import subprocess
from datetime import timedelta
from timeit import default_timer as timer


def deploy_artifacts():
"""
Deploys artifacts to Maven repository using the 'mvn deploy-file' command.

Reads the environment variables for the necessary configurations:
- PATH_TO_COPY_FROM: The path from where artifacts will be copied.
- REGION: The region where the Maven repository is located.
- PROJECT_ID: The ID of the project.
- MAVEN_REPO: The name of the Maven repository.

Prints the number of artifacts moved and the time taken for the operation.
"""

cwd = os.getcwd()
path_to_copy_from = os.getenv("PATH_TO_COPY_FROM")
region = os.getenv("REGION")
project = os.getenv("PROJECT_ID")
maven_repo = os.getenv("MAVEN_REPO")
durl = f'artifactregistry://{region}-maven.pkg.dev/{project}/{maven_repo}'

start_path = os.path.join(cwd, path_to_copy_from)

start_time = timer()
count = 0

for dirpath, dirnames, filenames in os.walk(start_path):
if filenames:
dpom_file = ""
dfile = ""
for file in filenames:
if file.split(".")[-1] == "pom":
dpom_file = os.path.join(dirpath, file)
if file.split(".")[-1] == "jar":
dfile = os.path.join(dirpath, file)

if not dpom_file:
continue
if not dfile:
dfile = dpom_file

subprocess.run(
[
f"mvn deploy:deploy-file -Durl={durl} -DpomFile={dpom_file} -Dfile={dfile}"
],
stdout=subprocess.PIPE,
shell=True,
text=True,
)
count += 1

end_time = timer()

print(f"Moved {count} artifacts and it took {timedelta(seconds=(end_time - start_time))}")


# pushing artifacts
if __name__ == "__main__":
deploy_artifacts()

Setup the environments variable as per description documented in the script

Run the script — python3 deploy-maven.py

List the packages in the maven repo

gcloud artifacts packages list —project=PROJECT_ID --repository=MAVEN_REPO  --location=REGION

If you found my content helpful, buy me a coffee to show your support.

Connect with me in Linkedin — @akhilesh-mishra-0ab886124
Follow me for more content on Google Cloud, Terraform, Python and other Devops tools.

Photo by Mike Kenneally on Unsplash

--

--

Akhilesh Mishra
Google Cloud - Community

Self taught DevOps engineer with expertise in multi-cloud, and various DevOps tools. Open for mentorship - https://topmate.io/akhilesh_mishra