How to carry out CI/CD in Machine Learning (“MLOps”) using Kubeflow ML pipelines (#3)

Set up your ML components to be automatically rebuilt when there is new code (CI) and a retraining Experiment Run to be launched whenever there is new data (CD)

Wouldn’t it be great if we had CI/CD with food? The chef can change the item description based on what’s available in the market (CI), you can adapt it when you order (“extra hummus”) when you submit an order and the right food gets made in the kitchen? (CD)

1. Set up Hosted Kubeflow Pipelines

An ML Pipeline consists of ML steps, each of which is a Docker container
The cluster that has come up has a link to the ML Pipelines dashboard
Use the dashboard to manually upload a pipeline to it, and to look at past & ongoing Experiments

2a. Set up your personal development environment

SA_NAME=kfpdemo
gcloud iam service-accounts create $SA_NAME \
--display-name $SA_NAME --project "$PROJECT_ID"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member=serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com \
--role=roles/dataflow.developer
gcloud iam service-accounts keys create application_default_credentials.json --iam-account $SA_NAME@$PROJECT_ID.iam.gserviceaccount.com# Attempt to create a k8s secret. If already exists, override.
kubectl create secret generic user-gcp-sa \
--from-file=user-gcp-sa.json=application_default_credentials.json \
-n $NAMESPACE --dry-run -o yaml | kubectl apply -f -
The user’s development environment should use a service account that has secrets stored in the kfp cluster.

2b. Create Docker containers for pipeline steps

FROM google/cloud-sdk:latestRUN mkdir -p /babyweight/src && \
cd /babyweight/src && \
git clone https://github.com/GoogleCloudPlatform/training-data-analyst
COPY deploy.sh ./ENTRYPOINT ["bash", "./deploy.sh"]
gcloud ai-platform versions create ${MODEL_VERSION} \
--model ${MODEL_NAME} --origin ${MODEL_LOCATION} \
--runtime-version $TFVERSION
echo $MODEL_NAME > /model.txt
echo $MODEL_VERSION > /version.txt
gcloud builds submit . --config cloudbuild.yaml
steps:
- name: 'gcr.io/cloud-builders/docker'
dir: '${DIR_IN_REPO}' # remove-for-manual
args: [ 'build', '-t', 'gcr.io/${PROJECT_ID}/${CONTAINER_NAME}:${TAG_NAME}', '.' ]
images:
- 'gcr.io/${PROJECT_ID}/${CONTAINER_NAME}:${TAG_NAME}'
!docker run -t gcr.io/${PROJECT_ID}/babyweight-pipeline-deploycmle:latest gs://${BUCKET}/babyweight/hyperparam/17 babyweight local

2c. Write a Pipeline to connect the steps

@dsl.pipeline(
name='babyweight',
description='Train Babyweight model from scratch'
)
def preprocess_train_and_deploy(
project='ai-analytics-solutions',
bucket='ai-analytics-solutions-kfpdemo',
start_year='2000'
):
"""End-to-end Pipeline to train and deploy babyweight model"""
# Step 1: create training dataset using Apache Beam on Cloud Dataflow
preprocess = dsl.ContainerOp(
name='preprocess',
# image needs to be a compile-time string
image='gcr.io/ai-analytics-solutions/babyweight-pipeline-bqtocsv:latest',
arguments=[
'--project', project,
'--mode', 'cloud',
'--bucket', bucket,
'--start_year', start_year
],
file_outputs={'bucket': '/output.txt'}
).apply(use_gcp_secret('user-gcp-sa'))
# Step 2: Do hyperparameter tuning of the model on Cloud ML Engine
hparam_train = dsl.ContainerOp(
name='hypertrain',
# image needs to be a compile-time string
image='gcr.io/ai-analytics-solutions/babyweight-pipeline-hypertrain:latest',
arguments=[
preprocess.outputs['bucket']
],
file_outputs={'jobname': '/output.txt'}
).apply(use_gcp_secret('user-gcp-sa'))
  • Decorate the function with `@dsl.pipeline`
  • The parameters to the function can be used to configure the run
  • Each step in my case is a ContainerOp that refers to the Docker image that we pushed to gcr.io. The image name has to be a static string.
  • You can pass arguments to the container. These will become command-line parameters to the entrypoint
  • Specify where the outputs of the step will show up
  • The outputs of step 1 (bucket) are the inputs to step 2 (preprocess.outputs[‘bucket’]) — note that the name of the step is used to reference which step’s output is needed. You can use any step here as long as it doesn’t introduce a circular dependency.

2d. Execute the pipeline manually

args = {
'project' : PROJECT,
'bucket' : BUCKET
}
client = kfp.Client(host=PIPELINES_HOST)
pipeline = client.create_run_from_pipeline_func(
preprocess_train_and_deploy,
args)

3a. Set up continuous integration (CI)

create_github_trigger() {
DIR_IN_REPO=$(pwd | sed "s%${REPO_NAME}/% %g" | awk '{print $2}')
gcloud beta builds triggers create github \
--build-config="${DIR_IN_REPO}/cloudbuild.yaml" \
--included-files="${DIR_IN_REPO}/**" \
--branch-pattern="^master$" \
--repo-name=${REPO_NAME} --repo-owner=${REPO_OWNER}
}
for container_dir in $(ls -d */ | sed 's%/%%g'); do
cd $container_dir
create_github_trigger
cd ..
done

3b. Set up continuous deployment (CD)

def handle_newfile(data, context):
filename = data['filename']
mlp_babyweight.finetune_and_deploy(filename)
def finetune_and_deploy(filename):
"""invoked from a Cloud Function or a Cloud Run, it launches a Pipeline on kfp"""
import kfp
import sys

if 'babyweight/preproc/train' in filename:
PIPELINES_HOST = os.environ.get('PIPELINES_HOST', "Environment variable PIPELINES_HOST not set")
PROJECT = os.environ.get('PROJECT', "Environment variable PROJECT not set")
BUCKET = os.environ.get('BUCKET', "Environment variable BUCKET not set")
print("New file {}: Launching ML pipeline on {} to finetune model in {}".format(
filename, PIPELINES_HOST, BUCKET))
sys.stdout.flush()
client = kfp.Client(host=PIPELINES_HOST)
args = {
'project' : PROJECT,
'bucket' : BUCKET,
}
pipeline = client.create_run_from_pipeline_func(train_and_deploy, args)
return 'Fine tuning job Launched!'
gcloud functions deploy handle_newfile --runtime python37 \
--set-env-vars PROJECT=${PROJECT},BUCKET=${BUCKET},PIPELINES_HOST=${PIPELINES_HOST},HPARAM_JOB=${HPARAM_JOB} \
--trigger-resource=${BUCKET} \
--trigger-event=google.storage.object.finalize

Next steps:

  • Try out the steps in this README.md file in GitHub
  • Read the Google Cloud solution on this topic — the GitHub repo associated with the solution gives you Terraform scripts, etc. to do this in an enterprise context.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lak Lakshmanan

Operating Executive at a technology investment firm; articles are personal observations and not investment advice.