Sharing GPU amongst container instances of a Python3 application which uses TensorFlow object detection and OpenCV — Part 1

5 min readJan 7, 2019

We were involved in a project recently where the objective in the current phase was to containerize a python 3 application which used TensorFlow object detection and OpenCV, and then run multiple instances on this container on GPU VM/s. The areas which I wanted to explore during this phase of the project were:

Containerizing the application so that it can be run on GPU VMs using the nvidia docker runtime
Setting up the build pipeline for the container images
And most importantly sharing the VM GPU among multiple running instances the applications containers

I wanted to share the issues faced, the solutions / workarounds taken, and lessons learned during this work. This post (part 1) will detail the first 2 points, and part 2 of the post details point 3.

In the next phase of this project we will be deploying the application containers to Azure Kubernetes Service (AKS), however this is outside the scope of the current posts (part 1 and part2).

About the Application

The pseudo code for the application is shown below. basically it is a daemon which checks for messages which need to be processed. If a message is found then the set of images whose location is indicated in the message are processed. Predictions are made on the images using trained models, and then the aggregated result for the set of images is returned. The get_image_details function shown uses OpenCV, and the get_image_set_prediction function uses tensor flow object detection

# PSEUDO CODEdef get_image_set_prediction(prediction_parameter, image_details)
    model_path = # get model file (.pb) path for prediction parameter
    
    # Generate prediction result using tensor flow using model 
     with detection_graph.as_default():
        with tf.Session(graph=detection_graph) as sess:
        .
        .
        prediction_result = # result of prediction    return prediction_resultdef get_image_set_details(images_location)
    image_details = # Get attributes for all the images. This uses OpenCV among other things
    return image_details
# Main Daemon Codewhile True:
    # If messages to process 
    if len(messages) > 0:
        message = # get first message, Message has details of image location and other details
        print("....Message received. Processing Message")
        images_location = # get image location from message# Get details associated with each image. The function uses OpenCV.
        image_details = get_image_set_details(images_location)# Get predictions on different parameters using 
# the different ML models, tensorflow and object detection
        data_param1 = get_image_set_prediction("param1", image_details)
        data_param2 = get_image_set_prediction("param2", image_details)
        data_param3 = get_image_set_prediction("param3", image_details)
        data_param4 = get_image_set_prediction("param4", image_details)# prepare output based on predictions on the 4 different parameters       prepare_generate_output(data_param1, data_param2, data_param3, data_param4)    else: # if no messages to process
        print("....Sleeping")
        time.sleep(sleep_duration) # configurable sleep duration

Baking in OpenCV over the base image

The base image used was tensorflow/tensorflow:1.10.0-gpu-py3. This image the suitable base image to be used with nvidia docker runtime.To install OpenCV on the top of this based image I used the Dockerfile shown in https://hub.docker.com/r/fbcotter/docker-tensorflow-opencv/dockerfile as a reference. It takes a long time to create the OpenCV layers for the docker image (close to 20 mins in my case) during the initial build. We will see the workarounds we used to keep the application build times low when using the Azure DevOps hosted agents in the build and push pipeline section below

Adding object detection in to the container image

To add object detection into the image I used the Dockerfile shown in https://medium.com/@sozercan/tensorflow-object-detection-on-azure-part-1-using-docker-and-deep-learning-vms-a439e711092a as a reference.

The initial issue we faced at this stage was that after object detection layers are added, OpenCV is not found in path (import cv2 fails). To resolve this opencv was added in to the path in the docker file as shown

ENV PYTHONPATH "$PYTHONPATH:/usr/include/opencv"

With this we have the docker container image layers in place which have python3, tensor flow, OpenCV and object detection

Adding the application layer into the container image

Application layers are added using the following lines

COPY requirements.txt /opt/app
RUN pip install --no-cache-dir -r requirements.txtWORKDIR /opt/app
COPY . /opt/appCMD ["python", "app.py"]

When running this container image we mounted the directory containing the trained models (used in the prediction function) as an external volume.

Azure pipelines are a great way to build, test and deploy application to any cloud. It is very easy to build and push container images to container registry, you can see the document link https://docs.microsoft.com/en-us/azure/devops/pipelines/languages/docker?view=vsts&tabs=designer for more details.

We were going to use the free hosted agents for the building and pushing the container image to Azure container registry (ACR).

The issue with the single docker file and the free hosted agents was that with this approach each build would take more than 30 minutes, this is because the hosted agent does not have the base layers of the container image cached. To shorten the application build time we decided to split the image in to the base image and the application image. The base image would have OpenCV and object detection baked in to tensorflow/tensorflow:1.10.0-gpu-py3 image. The build times of the base image would be around 30 minutes, however this base image would not change very often

The application image would be build over this base image. The hosted agent would download the base image each time and bake the application into it. The build times for this would only be around 5 minutes

let us have a look at what the base image and the application image would look like

Base Image Dockerfile :

FROM tensorflow/tensorflow:1.10.0-gpu-py3# Install OpenCVRUN apt-get update# Core linux dependencies. 
RUN apt-get install -y \
    # developer tools
    build-essential \
    cmake \
    git \
    wget \
    unzip \
    yasm \
    pkg-config \
    # image formats support
    libtbb2 \
    libtbb-dev \
    libjpeg-dev \
    libpng-dev \
    libtiff-dev \
    libjasper-dev \
    libhdf5-dev \
    # video formats support
    libavcodec-dev \
    libavformat-dev \
    libswscale-dev \
    libv4l-dev \
    libxvidcore-dev \
    libx264-dev# Python dependencies
RUN pip --no-cache-dir install \
    numpy \
    hdf5storage \
    h5py \
    scipy \
    py3nvml \
    kerasWORKDIR /RUN wget https://github.com/opencv/opencv_contrib/archive/3.3.0.zip \
    && unzip 3.3.0.zip \
    && rm 3.3.0.zipRUN wget https://github.com/opencv/opencv/archive/3.3.0.zip \
    && unzip 3.3.0.zip \
    && mkdir /opencv-3.3.0/build \
    && cd /opencv-3.3.0/build \
    && cmake -DBUILD_TIFF=ON \
    -DBUILD_opencv_java=OFF \
    -DOPENCV_EXTRA_MODULES_PATH=/opencv_contrib-3.3.0/modules \
    -DWITH_CUDA=OFF \
    -DENABLE_AVX=ON \
    -DWITH_OPENGL=ON \
    -DWITH_OPENCL=ON \
    # cannot download ippicv
    -DWITH_IPP=ON \
    -DWITH_TBB=ON \
    -DWITH_EIGEN=ON \
    -DWITH_V4L=ON \
    -DBUILD_TESTS=OFF \
    -DBUILD_PERF_TESTS=OFF \
    -DCMAKE_BUILD_TYPE=RELEASE \
    -DCMAKE_INSTALL_PREFIX=$(python -c "import sys; print(sys.prefix)") \
    -DPYTHON_EXECUTABLE=$(which python) \
    -DPYTHON_INCLUDE_DIR=$(python -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
    -DPYTHON_PACKAGES_PATH=$(python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") .. \
    && make install \
    && rm /3.3.0.zip \
    && rm -r /opencv-3.3.0 \
    && ldconfig# Install Object Detection
WORKDIR /usr/local/lib/python3.5/dist-packages/tensorflow# RUN git clone https://github.com/tensorflow/models
RUN git clone https://github.com/sozercan/modelsWORKDIR /usr/local/lib/python3.5/dist-packages/tensorflow/models/researchENV PYTHONPATH "$PYTHONPATH:/usr/local/lib/python3.5/dist-packages/tensorflow/models/research:/usr/local/lib/python3.5/dist-packages/tensorflow/models/research/slim::/usr/include/opencv"
ENV PYTHON_HOME "$PYTHON_HOME:/usr/local/lib/python3.5/dist-packages/tensorflow/models/research:/tensorflow/models/research/slim"RUN curl -L -o /protoc-3.3.0-linux-x86_64.zip https://github.com/google/protobuf/releases/download/v3.3.0/protoc-3.3.0-linux-x86_64.zip \
    && unzip /protoc-3.3.0-linux-x86_64.zip \
    && rm /protoc-3.3.0-linux-x86_64.zip \
    && ./bin/protoc object_detection/protos/*.proto --python_out=. \
    && pip install Pillow lxml

Application Dockerfile:

FROM <baseimage:latest>ENV PYTHONPATH "$PYTHONPATH:/usr/local/lib/python3.5/dist-packages/tensorflow/models/research:/usr/local/lib/python3.5/dist-packages/tensorflow/models/research/slim::/usr/include/opencv"ENV PYTHON_HOME "$PYTHON_HOME:/usr/local/lib/python3.5/dist-packages/tensorflow/models/research:/tensorflow/models/research/slim"ARG APP_ENV=productionENV APP_ENV $APP_ENVRUN useradd --user-group --create-home --shell /bin/false app &&   mkdir -p /opt/appWORKDIR /opt/appCOPY requirements.txt /opt/appRUN pip install --no-cache-dir -r requirements.txtRUN chown -R app:app /opt/app && chgrp -R app /opt/appUSER appCOPY . /opt/appCMD ["python", "app.py"]

Sharing GPUs amongst multiple container instances of this application

Part 2 of this post discusses issues faced and solutions/workarounds taken to enable sharing of GPUs amongst multiple container instances of this application

The Build and Push pipeline