Raymond Ferguson
Aug 10, 2018 · 17 min read

Turtles All the Way Down: Snap(LXD(Docker)))

Turtles All the Way Down: Snap(LXD(Docker)))

The state of container technology has evolved considerably since Stéphane Graber’s article regarding Docker in LXD over two years ago. It served as a good but dated introduction to the topic. This article updates and expands on the topic with experience gleaned from the current stable versions of LXD on current Ubuntu LTS as of August 2018.

The walk through introduces cloud-init via LXC Profile for automatic provisioning of unprivileged docker service instances and provides a known working storage configuration for successfully running Docker Daemon within an unprivileged LXD guest.

We’ll also introduce the snapcraft application container to install LXD which is the recommended way to track current/stable LXD versions as of LXD 3.0.

See Turtles on github for companion files, on Devendor Tech for prettiest formatting, and on medium for related dialogue.

Summary

The target configurations itemized below are known to work, but results should be relevant to for other distribution that support snap packages and a modern linux kernel for the host os.

Storage pools are more finicky when running the Docker Service in an unprivileged/userns confined container. Any block based storage should work which includes CEPH, or direct device delegation when combined with the overlay driver for docker in the guest.

If you choose xfs behind an overlayfs docker filesystem, you’ll need to ensure you enable d_type support. EXT4 supports d_type by default.

Another known working option is using a btrfs storage pool in LXD and the btrfs storage driver in Docker for graph storage.

Overlay2 is known to work in a privileged LXD guest, but fails to unpack some Alpine based images like memcached:alpine when running Docker/overlay2 in an unprivileged LXD guest. The error is thrown by tar from within the container, and seems to be due to an interaction between the busybox implementation of the tar command and the overlay2 driver when unpacking layers on top of Alpine. Strangely, if you pull the image while running privileged, then stop the LXD guest, switch it to unprivileged, and continue, you can still use these images within an unprivileged lxd guest.

A combination of btrfs for docker graph storage and lvm storage for pass through persistent volumes might provide an ideal combination of container optimization, and stable/high performance persistent storage for your apps.

This approach should work on any combination of private and public cloud options and hardware to allow deeper continuous deployment and automation and further decouple solutions from platforms.

LVM Pool with Overlay Docker Graph Storage

Target Host Configuration

Target Guest Configuration

Walkthrough

This example starts on a google compute instance with the os on /dev/sda and an additional empty disk on /dev/sdb.

Remove the default lxd daemon. .. code-block:

Install lxd and thin provisioning tools. .. code-block:

Configure LXD. This shows the dialogue based lxd init method of configuring your lxd instance. .. code-block:

Create the cloud-init profile for our nested docker daemon. Note that we’ll use the sparse example on git, and the default profile that adds a root disk and nic on in our default storage pool and network.

Pull the ubuntu bionic lxd image. Note that ‘b’ is just an alias for ubuntu-bionic. .. code-block:

Now we can simply launch a new instance and watch it build. Note that the first time you use the new image the container creation is slow. This is due to loading the new image onto an lvm sparse volume. Subsequent containers start with a snapshot and initialize much faster.

BTRFS LXD Pool with BTRFS Docker Graph Storage

Target Host Configuration

Target Guest Configuration

Walkthrough

For this example, I’ve partitioned sdb and will use sdb1 to back my btrfs storage pool, then add an additional LVM storage pool on sdb2 for passthrough persistent volumes.

Listing the partitions for reference. .. code-block:

Install lxd and thin provisioning tools as we did above.

Configure LXD. This shows the dialogue based lxd init method of configuring your lxd instance. Note that we select btrfs and /dev/sdb1 in this example.

Add the lvm pool for persistent storage. .. code-block:

Create and load our profile again. .. code-block:

At this point you can pull in the lxd guest image and and launch and docker instance with the same steps we used above and the root filesystem of your guest will be on btrfs with docker running it’s guest in btrfs.

Enter the lxd guest and verify the results.

Build a docker container and verify the results:

Working with the container

The examples below start with the btrfs docker guest setup in the steps above.

LXD Proxy Devices

LXD proxy devices allow you to expose container connections through the host OS. The example below shows the protocol translation feature by forwarding between a unix socket on the host to a tcp socket in the container.

DNS Resolution

By default, lxd guests are added to a dnsmasq nameserver listening on your lxdbr0 interface. The steps below just tell the local resolver to use the dnsmasq instance for resolution.

Using persistent lxd data volumes

The myData volume created below persists even when we delete the LXD container it’s attached to and can be used to persist data on ephemeral LXD guests or even ephemeral Docker guests in ephemeral LXD guests.

You can also pass block devices or bind mounts into the container directly.

Exploring the namespaces

Direct namespace exploration and manipulation is one area that is extremely useful, but seldom covered as it falls outside the envelope of the container systems built on top of kernel namespaces.

Note that lsns COMMAND and PID output is just the lowest PID in the namespace and doesn’t represent where the namespace started.

Snap namespaces

The lxd application is running in it’s own mount namespace within snap.

The namespace used by the LXD snap is 4026532209. We can view all 5 of the processes in that namespace with some flags on ps.

The snap container uses the squashfs snap-core image as it’s rootfs. This corresponds to /snap/core/4917 outside of the mount namespace and the hostfs is relocated to /var/lib/snap/hostfs with pivotroot.

Snap and LVM Thinpools

Note

TODO: Figure out interaction between lvm_thinpool autoextend and snap mountns.

One of the strange side effects of burying your LVM storage pool behind a mount namespaces is that monitoring the pool is less straight forward. LVM events don’t seem to propagate through to the host namespace where dmeventd is running.

I haven’t done the work to examine how this this would effect dmeventd and automatic extension of thin pools, but this detail is essential if you intend to oversubscribe thin pools with the expectation that automatic extension will kick in. Failure to extend a full thinpool can result in corruption.

Miscellaneous Tips

Cloud-init in LXD Guests

When working with cloud-init, the key config->user.user-data one large string that contains a second yaml document written to the cloud-init seed files via template in the lxd image. The centos images don’t have cloud config installed currently, but it’s relatively easy to create an image with templates based on the ubuntu image templates.

The embedded yaml does present a challenge for linting as it’s seen as a string and not tested. The yaml2json.py utility can help with this issue. Yaml2json.py makes it easy to extract the user-data embedded yaml document for linting, and you can pass it back through yaml2json.py to validate nesting and structure as well.

I also recommend working from a file, and pushing your edits by passing the file to stdin. By default lxc profile edit name opens the profile in an editor for direct editing, but if you have an error in the embedded yaml string quoting, it will fix it for you by converting it to a quoted and escaped string. That can be fixed with yaml2json, but it’s better to just avoid the direct edit convenience feature to keep whitespace clean.

Incidentally, if you notice your profile comes back with a odd doublequoted representation of the embedded cloud-init yaml, it is probably a trailing whitespace error.

LXD files of interest

Several ephemeral files are generated by LXD. These should not be edited directly, but they can provide useful insight when troubleshooting and tuning various settings. Note that LXD is a web service based management layer on top of the lxc engine so lxc.conf and the LXC documentation is a good place to explore when your digging deep.

Per container logs and ephemeral lxc.conf: /var/snap/lxd/common/lxd/logs/{container}/

Ephemeral container apparmor files: /var/snap/lxd/common/lxd/security/apparmor/{container}

Ephemeral container seccomp apparmor files: /var/snap/lxd/common/lxd/security/seccomp/{container}

SQLite db. LXD settings: /var/snap/lxd/common/lxd/database/local.db

Instance type definitions: /var/snap/lxd/common/lxd/cache/instance_types.yaml

LXD daemon log file: /var/snap/lxd/common/lxd/logs/lxd.log

Overuse of cloud-init

Cloud init is really very cool and is a step toward an image build technology that is not dependent on the container technology provided cloud-init is installed in your base image.

My opinion is that it also has a ton of features which are an invitation to use it past it’s sweet spot.

A better approach may be using cloud-init to install ansible, puppet, or chef and use those tools to do the complex build. The advantage to something like ansible over cloud-init is that ansible is a more capable state machine. If it fails halfway through, it stops there, and you can troubleshoot the failed step tofix it without repeating all of the steps leading to that failed step on each troubleshooting iteration.

Using the more advanced toolset at the start also give you more options as you bump into edge cases that are not easily addressed by cloud-init toolset, and it puts the team’s experience behind the tool with more use cases beyond initialization.

That said, between ansible, lxd, and cloud-init, you have momentum behind yaml defined automation which can quickly develop as a core skill on a devops team.

Devendor Tech

Technical publications from Devendor Tech