Using Docker Images for Cloud Native Artifacts

Published in

AI2 Labs

4 min readAug 16, 2020

Continuing from my previous article on Kubernetes, this article will focus on showing how we can use docker images to host artifacts in the form of files.

Why?

Typically, most software applications blur the line between code and assets. When scaling up, we simply duplicate whatever grouping we have of the application. However this is not ideal for applications dealing with artifacts that has its own independent lifecycle for development and release, such as AI/ML models. This is especially applied to cases where these files can get pretty heavy in resource strapped situations such as scaling for GPU-based solutions where the price to autoscale recklessly is prohibitive.

Beside resource constraints and dealing with independent lifecycles, we can also include the artifacts build process into the CI/CD process that we already have with other software products that are using docker etc, thus reaping the benefits of automation.

This is not something radical, and has been explored by Microsoft with their open source tool : ORAS. However, we can simply use Docker as well.

How?

There are two steps to this. Packaging and deployment.

Packaging

The docker file for this will be slightly different as compared to the usual. While the final artefacts can be merged into one container, you should try to copy them over using multi-stage builds so that you can take advantage of layer caching. This will reduce the amount of repeated copying of the files needed in the future when it is rebuilt.

Your artefacts can be generated using whatever pipeline your team has already been using this way.

FROM alpine:latest AS model1
COPY ./densenet_model.pb .FROM alpine:latest AS model2
COPY ./densenet_model_2.pb .FROM alpine:latest
WORKDIR /data
COPY --from=model1 /densenet_model.pb .
COPY --from=model2 /densenet_model_2.pb .

What I also use is gitlab CI/CD, because the gitlab runner can be installed on the specific machine your team has been using to generate their models.

In this case all you need to do in your docker gitlab runner is to change the following in config.toml:

...
[[runners]]
  ...
  [runners.docker]
    ...
    privileged = true
		...
    volumes = ["/cache", "<path in your host>:<path in your container>:ro"]

And in your .gitlab-ci.yml you can access the artefacts from the path in your container specified in the config/toml file.

Once you’ve built and pushed your docker image, you now have your artefacts in the cloud.

Deployment

Docker-only

Now, if you’re deploying only using docker, you can run it like so

docker volume create shared-volume
docker run -v <path in container>:shared-volume:ro artefact-image
docker run -v shared-volume:<path in container>:ro app-image

Any container which has mounted the volume will automatically share the artefacts, which is pretty neat.

Kubernetes (K8s)

K8s doesn’t really recommend host storage. However, we want to use it so that we can take advantage of linux memory page caching, which would ensure that applications on the same computer will reuse the same pointer to the file. This will avoid the situation where multiple containers using the same file blows the memory available when scaling up.

We can still do it, somehow. I will only show the example YAML files here. It should be a simple matter of doing kubectl apply -f <file name>.yaml

First of all, we need to create the K8s volume and the volume claim. You should create one on each host.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: example-pv
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 1GiB
  accessModes:
    - ReadOnlyMany
  hostPath:
    path: "/tmp/data"---apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: example-pvc
spec:
  storageClassName: manual
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 1GiB

Then we create a DaemonSet, which is basically a deployment that will run once on each machine in the cluster. This will make sure each machine will have the files copied once each time it is added to the cluster.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: example
spec:
  selector:
    matchLabels:
      name: example
  template:
    metadata:
      labels:
        name: example
    spec:
      volumes:
      - name: example
        hostPath:
          path: <path on host>
          type: Directory
      containers:
      - name: first
        image: <artefact-image>
        volumeMounts:
        - name: example
          mountPath: <path on container>
        command: ["/bin/sh", "-c"]
        args:
          - "mkdir -p <sub path on host>; cp <your files>"

Note that since the volumes in k8s is initially one way (host files will be linked into the container, but not the other way around, so your files in the container will be overwritten), you can use cp to populate the host instead.

Now you can use it in your pods like in the following example:

apiVersion: v1
kind: Pod
metadata:
  name: example
spec:
  volumes:
    - name: example-storage
      persistentVolumeClaim:
        claimName: example-pvc
  containers:
    - name: example
      image: <your image>
      volumeMounts:
        - mountPath: <path in container>
          name: example-storage

And there you have it, “artifact-sharing” using docker images.