My Slow Internet vs Docker

I work in New York City, and I live in a micro-studio. The apartment offers Wi-Fi, but unfortunately, it has a very low bandwidth and is often unreliable. I also travel frequently and I spend a lot of my time on trains, on airplanes, and in hotels that often have slow and unreliable Internet access as well.

I build Docker images frequently — for work, for demos, and for Cloud Spin!

I’ve learned that downloading Docker images and pushing images over low bandwidth connections is unpleasant.

Docker Machine to the Rescue

Rather than building and pushing Docker images locally with boot2docker, I’ve been using Docker Machine on Google Compute Engine for the past couple of months. It’s worked wonderfully in a number of situations — in my apartment, in hotels, on trains, and even on flights.

There are several benefits to using Docker Machine:

  • Fast network — fast container image downloads and fast pushing to container repositories such as Docker Hub
  • Access to private Google Container Registry when you host Docker Machine on Google Compute Engine (even faster image downloads and pushes)
  • Container images are stored on the Docker Machine instance rather than your local desktop/laptop
  • I can free up my laptop resources (including disk, CPU, and even bandwidth)

How?

Setting up Docker Machine on Google Compute Engine is simple!

  1. Sign up to use Google Cloud Platform and create a new project ( remember the Project ID)
  2. Install Docker command line tools
  3. Install Docker Machine

The process for provisioning a Docker Machine instance on Google Compute Engine can be seen in this animated GIF (time compressed):

See the original tweet.

But, here are the detailed instructions (updated August 30, 2017, Thanks to Gustav Maskowitz):

  • If you are on a Mac, the simplest way to install Docker Machine is using Brew.
$ brew install docker docker-machine
  • To use Docker Machine with Google Cloud Platform, install Google Cloud SDK. It’s a CLI tool for Google Cloud Platform.
  • Authenticate your account using Google Cloud SDK. You may need to authenticate both the regular login and an application login
$ gcloud auth login
$ gcloud auth application-default login
  • Create a Docker Machine instance (in this example, the instance is named “docker”). You can determine a suitable machine type for your use case, but I recommend one with at least two CPUs.
$ docker-machine create docker -d google --google-project=PROJECT_ID --google-machine-type n1-highcpu-4 --google-open-port 22
  • The command will prompt you to authorize Docker Machine to access your Google Cloud Platform project.
  • Once authorized, copy and paste the temporary authorization code back into the command line prompt:
Enter code: [PASTE YOUR TEMPORARY CODE HERE]
  • Docker Machine will create a virtual machine instance in Google Compute Engine, and set up all of the necessary components and keys to access the Docker Machine instance securely.
  • Finally, you need to set the environment variables to tell Docker to use the newly created Docker Machine instance:
$ eval $(docker-machine env docker)
  • Consider adding this line to ~/.bash_profile or your choice of shell initialization script to make the Docker Machine the default choice.

There are a couple of things you can do to validate that your Docker Machine instance is running:

You should be able to list it:

$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM
docker * google Running tcp://...

You should be able to SSH into the machine:

$ docker-machine ssh docker
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.16.0-30-generic x86_64)
...
docker-user@docker:~$

And of course, Docker should be able to connect to the Docker Machine instance:

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

Finally, the ultimate test is to run a container in the Docker Machine instance:

$ docker run busybox echo hello world
Unable to find image 'busybox:latest' locally
latest: Pulling from busybox
cf2616975b4a: Pull complete
6ce2e90b0bc7: Pull complete
8c2e06607696: Already exists
busybox:latest: The image you are pulling has been verified. Important: image verification is a tech preview feature and should not be relied on to provide security.
Digest: sha256:38a203e1986cf79639cfb9b2e1d6e773de84002feea2d4eb006b52004ee8502d
Status: Downloaded newer image for busybox:latest
hello world

Tips and Tricks

I learned a few things while using Docker Machine over low bandwidth Internet connections that may save you some troubles too!

Download Everything from the Dockerfile

ADD/COPY directives will copy files from your local machine to the Docker Machine instance over the network. Rather than using the ADD/COPY directive to copy large files into your Docker image, try to download as much as possible from the Internet, including:

  • Code (use `git clone`, for example)
  • Binaries (use wget/curl to download large binaries)

I do copy immediate source code into Docker Machine in order to test the container with the new code quickly. But I try to download as much dependencies (source codes and binaries) from the Internet.

If what you need is not available over the Internet, try storing it in Google Cloud Storage, and download from there.

Use ONBUILD

Use the ONBUILD directive to download any build dependencies, and to compile your source. For example, if you build Java applications with Maven, check out the maven:onbuild base image. This will download all the Maven dependencies to build your project. When used with Docker Machine, all the dependencies will be downloaded from the Docker Machine which may have a faster Internet connection.

Updated June 16: ONBUILD could result in very large images that has both the source code and the compiled binary. Use multi-Stage build instead, see below!

Use Multi-Stage Build

Starting Docker 17.05, you can use multi-stage builds. This allow you to have a build stage that fetches the source and compiles the code. Then, copying the artifacts into a final runtime container. An example for a Java build:

FROM maven:3.5-jdk-8 as BUILD
COPY . /src
RUN mvn -f /src/pom.xml package
FROM openjdk:8
COPY --from=BUILD /src/target/lib /app/lib
COPY --from=BUILD /src/target/artifact.jar /app/artifact.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "/app/artifact.jar"]

Use Compression

You can compress the Docker context when it’s being sent to the remote Docker daemon:

$ docker build --compress ...

Where Did That Volume Mount?

When using Docker Machine and mounting volumes into the container — the volume is mounted against a directory on the virtual machine itself — not your local desktop/laptop.

$ docker run -ti -v /myfiles:/data busybox

This will bind the virtual machine’s /myfiles directory into the container’s /data directory.

That also means you’ll need to copy whatever files you wish to share with the container into the Docker Machine instance:

  • First, SSH into the Docker Machine instance to create the directory structure
$ docker-machine ssh docker
...
Last login: Thu Jul 16 03:18:20 2015 from ...
docker-user@docker:~$ sudo mkdir -p /myfiles
  • Then use the `docker-machine scp` command:
$ docker-machine scp somefile docker:/myfiles

Downloading Outputs from Container

Sometimes, your container may produce outputs, such as a binary build, or an image generated from a Deep Dream container. You can get the data out a couple of ways:

Use `docker cp` (this works with both running containers and exited containers):

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0dc6f3c9c399 deepdream:latest "/bin/bash" 2 days ago Up 2 days 8888/tcp nostalgic_almeida
$ docker cp 0dc6f3c9c399:/deepdream/deepdream/frames/0001.jpg .

If you mounted a volume to store the output, use `docker-machine scp`:

$ docker run -ti -v /output:/output mycontainer
$ docker-machine scp docker:/output/myfile .

One of my favorite techniques is to simply pipe the output directly to STDOUT. For this, checkout my oauth2util written in Go. It can use a container to build the binaries for multiple platforms.

The build script does two things:

  • In the container, tar up all the build artifacts and pipe it to STDOUT
tar -czf — oauth2util-*”
  • Outside of the container, untar the output:
tar -xzf -

Putting them together:

docker run -e GOOS=$GOOS -e GOARCH=$GOARCH oauth2util /bin/bash -c “go build -o oauth2util-$GOOS-$GOARCH$SUFFIX 1>&2 && tar -czf — oauth2util-*” | tar -xzf -

Lastly, you can also pipe input into STDIN of the Docker container process rather than copying the input into the filesystem:

$ cat myimage.jpg | docker -i mycontainer > output.jpg

Exposing Ports

If you need to bind a container port onto the host, it will bind to the Docker Machine instance and not your local machine. So, you’ll need to configure firewall rules to allow access to that port. For example, if you bind a port to host port 8888:

$ docker run -p 8888:80 nginx

… then you’ll need to configure the firewall. First log into the Google Developer Console, then:

  • Navigate to Compute Engine > Networks
  • Click default network
  • Click Add firewall rule
  • Allow from Any Source, and Allow the protocol and port (e.g., tcp:8888)
  • Click Create

Finally, find out the Docker Machine instance IP address:

$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM
docker * google Running tcp://xxx.xxx.xxx.xxx:2376

And navigate to http://xxx.xxx.xxx.xxx:8888/ to access the port.

Stop/Start a Docker Machine

When you are done using a Docker Machine, you can turn it off to reduce the cost:

$ docker-machine stop docker

And, start when you need to use it again:

$ docker-machine start docker