Running Kubeflow Pipelines on IBM Cloud Private 3.1.0

Published in

IBM Cloud

6 min readMar 5, 2019

This article explains how to run a sample taxi-cab-classification pipeline by using Kubeflow Pipelines on IBM Cloud Private 3.1.0.

A pipeline is a description of a machine learning workflow, including all of the components in the workflow and how they combine in the form of a graph. A diagram later in this article shows a pipeline graph of the TensorFlow Extended taxi exmple. The pipeline includes the definition of the inputs (parameters) that are required to run the pipeline and the inputs and outputs of each component. A pipeline component is a self-contained set of user code, packaged as a Docker image, that performs one step in the pipeline. For example, a component can be responsible for data preprocessing, data transformation, model training, and so on.

The taxi-cab-classification sample runs an end-to-end pipeline with TensorFlow’s transform and model-analysis components. This example uses Kubeflow Pipelines, which is a platform for creating and managing workflows based on Docker containers. Because the original sample taxi-cab-classification strongly relies on the Google Cloud Platform, the sample was modified to run it in the IBM Cloud Private environment.

Prerequisites

Install IBM Cloud Private version 3.1.0.
Install Kubeflow Pipelines version 0.4.0 in your IBM Private Cloud environment. Kubeflow Pipelines 0.4.0 can be installed on IBM Cloud Private version 3.1.0 by following the instructions in the article: Running Kubeflow On IBM Cloud Private.

Prepare the Python classes

In the Kubeflow Pipelines component, the Python classes definition file named taxi-cab-classification-pipeline.py relies on Google Cloud Storage. Replace the taxi-cab-classification-pipeline.py file with the updated taxi-cab-classification-pipeline-on-prem.py file to associate the local storage, or persistent volume. The taxi-cab-classification-pipeline-on-prem.py file is associated with a persistent volume claim that’s named pipeline-pvc.

Compile the Kubeflow Pipelines template

Follow the steps that are provided in the Kubeflow guide Build a Pipeline to install the Kubeflow Pipelines SDK. Run the following command to compile the sample Python into a workflow specification:

# dsl-compile --py taxi-cab-classification-pipeline-on-prem.py --output pipeline-tfx-test.tar.gz

The specification takes the form of a YAML file compressed into a .tar.gz file.

Upload the tar package using the Kubeflow Pipelines user interface

Open the Kubeflow Pipelines user interface and click Upload Pipeline to upload the package. The sample is displayed on the Kubeflow Pipelines web page.

Create a persistent volume and persistent volume claim

Create the persistent volume and persistent volume claim by running the following commands:

kubectl create -f pv.yaml
kubectl create -f pvc.yaml

The persistent volume file definition (pv.yaml) is shown in the following example:

apiVersion: v1
 kind: PersistentVolume
 metadata:
   name: kubeflow-pv1
spec:
   capacity:
     storage: 10Gi
   accessModes:
   - ReadWriteMany
   persistentVolumeReclaimPolicy: Retain
  nfs:
     path: ${NFS_SHARED_DIR}
     server: ${NFS_SERVER_IP}

NFS_SERVER_IP is the NFS server IP address. The NFS server IP address can be a management node IP address, but only if the management node supports NFS mounting.
NFS_SHARED_DIR is the NFS shared path that can be mounted by other nodes in the IBM Cloud Private cluster.

The persistent volume claim file definition (pvc.yaml) is shown in the following example:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pipeline-pvc
  namespace: ${K8S_NAMESPACE}
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

K8S_NAMESPACE is the name of the namespace where the Kubeflow will be installed. By default, it should be kubeflow.

Download training data

Download the training and evaluation data taxi-cab-classification from Github, and copy the directory to persistent volume storage by entering the following command:

cp taxi-cab-classification ${NFS_SHARED_DIR}/taxi-cab-classification

Run an experiment

Complete the following steps to run an experiment:

From a pipeline’s details page, click Create experiment. This takes you to the experiment’s page, where you can add a name and description for the experiment.
On the experiment’s details page, click Create new run.
Specify the run’s name and description.
Select which pipeline will run the experiment.
Add the required information in the pipeline’s parameters form on the page. Some of the values might already have values entered.
Click Create to start the run. This opens the experiment’s details page.

Limitation: The value of the pvc_name parameter must be consistent with the value as it is specified in the definition of the taxi-cab-classification-pipeline-on-perm.py file. See the dsl PipelineParam does not work under Image or Command issue for more information about this limitation.

Check the status of the pods

You can check the pipeline status from user interface on the experiment’s details page.

The following graphic shows the status of each component. Pipelines generally consist of a number of components, and the graph shows the steps this run completed so far. Arrows indicate parent/child relationships. The graph can be viewed after the run begins. Each node within the graph corresponds to a step within the pipeline, each node has an icon that indicates its status.

After the pipeline process finishes, some pods are completed. The following example output shows the pods that were completed.

# kubectl get pod -n kubeflow|grep -i taxi
tfx-taxi-cab-classification-pipeline-example-gxlkl-1061333476     0/2       Completed          0          1d
tfx-taxi-cab-classification-pipeline-example-gxlkl-2073587121     0/2       Completed          0          1d
tfx-taxi-cab-classification-pipeline-example-gxlkl-2257762542     0/2       Completed          0          1d
tfx-taxi-cab-classification-pipeline-example-gxlkl-2791178577     0/2       Completed          0          1d
tfx-taxi-cab-classification-pipeline-example-gxlkl-28604577       0/2       Completed          0          1d
tfx-taxi-cab-classification-pipeline-example-gxlkl-3361941409     0/2       Completed          0          1d
tfx-taxi-cab-classification-pipeline-example-gxlkl-4081026581     0/2       Completed          0          1d
tfx-taxi-cab-classification-pipeline-example-gxlkl-650789232      0/2       Completed          0          1d

Ensure that the service is started by entering the following command:

# kubectl get pod -n kubeflow|grep -i taxi |grep -i running
taxi-cab-classification-model-tfx-taxi-cab-classification-dmfhh   1/1       Running

Change the service type to NodePort, so you can get a prediction from the service, as shown in the following example:

# kubectl get svc -n kubeflow |grep -i taxi
taxi-cab-classification-model-tfx-taxi-cab-classification-pipel   ClusterIP   10.0.0.198   <none>        9000/TCP,8000/TCP   8h
# kubectl -n kubeflow patch service taxi-cab-classification-model-tfx-taxi-cab-classification-pipel -p '{"spec": {"type": "NodePort"}}'
# kubectl get svc taxi-cab-classification-model-tfx-taxi-cab-classification-pipel -n kubeflow
NAME                                                              TYPE       CLUSTER-IP   EXTERNAL-IP   PORT(S)                         AGE
taxi-cab-classification-model-tfx-taxi-cab-classification-pipel   NodePort   10.0.0.192   <none>        9000:32039/TCP,8000:30900/TCP   1d

View a prediction from the service

After the service is started, you can run the pipeline_taxi_client.py client script to view a prediction.

The following example shows the command and output when you view a prediction:

#python pipeline_taxi_client.py --num_examples 3 --examples_file data.csv --server $TF_SERVER:$PORT --model_name taxi-cab-classification-model-tfx-taxi-cab-classification-pipel
/Users/hejinchi/pipeline/taxi/lib/python2.7/site-packages/tensorflow_serving/apis/prediction_service_pb2.py:131: DeprecationWarning: beta_create_PredictionService_stub() method is deprecated. This method will be removed in near future versions of TF Serving. Please switch to GA gRPC API in prediction_service_pb2_grpc.
  'prediction_service_pb2_grpc.', DeprecationWarning)
outputs {
  key: "class_ids"
  value {
    dtype: DT_INT64
    tensor_shape {
      dim {
        size: 3
      }
      dim {
        size: 1
      }
    }
    int64_val: 0
    int64_val: 0
    int64_val: 0
  }
}
outputs {
  key: "classes"
  value {
    dtype: DT_STRING
    tensor_shape {
      dim {
        size: 3
      }
      dim {
        size: 1
      }
    }
    string_val: "0"
    string_val: "0"
    string_val: "0"
  }
}
outputs {
  key: "logistic"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 3
      }
      dim {
        size: 1
      }
    }
    float_val: 0.000172115047462
    float_val: 3.72244940081e-05
    float_val: 0.000172115047462
  }
}
outputs {
  key: "logits"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 3
      }
      dim {
        size: 1
      }
    }
    float_val: -8.66717529297
    float_val: -10.1985063553
    float_val: -8.66717529297
  }
}
outputs {
  key: "probabilities"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 3
      }
      dim {
        size: 2
      }
    }
    float_val: 0.999827861786
    float_val: 0.000172115047462
    float_val: 0.999962806702
    float_val: 3.72244940081e-05
    float_val: 0.999827861786
    float_val: 0.000172115047462
  }
}
model_spec {
  name: "taxi-cab-classification-model-tfx-taxi-cab-classification-pipel"
  version {
    value: 1548305926
  }
  signature_name: "predict"
}

TF_SERVER is the TF-Serving service IP address.
PORT is the port of TF-Serving service IP address. In the example, the port is 32039.

Where to go next

Refer to the user guide.