Where did my /mnt go? — Part 1

Amityo
4 min readDec 14, 2016

--

Background

We have recently upgraded our Jenkins slave machines to Yakkety (Ubuntu 16.10) on Azure with Azure Slave Plugin. The plugin creates and stops machines as necessary. Each machine comes with a temporary storage drive — /dev/sdb1 mounted on /mnt; The data will be deleted on each resize, shutdown or restart…

Why should you use it? The D-Series VM’s have a fast SSD mounted on /mnt. This is perfect for our scenario — The build data is temporary and not supposed to be persistent. Delete it and we won’t care a bit, the advantage is we get extremely fast builds.

I’ll start with the problem and walk down the line to the cause…

The Problem

We created a new Canonical Ubuntu 16.10 VM.

After two reboots (sudo reboot) we were surprised to find an empty /mnt directory and nothing relevant in mount.

$ ls -ltr /mnt
total 0
$ mount | grep mnt
...
$ mount | grep sdb
...

What happened? No idea. Let’s investigate.

Diving In

The mount must have failed. Let’s try to find any failure in systemd units:

$ systemctl | grep failed
mnt.mount loaded failed failed /mnt
$ systemctl status mnt.mount
Loaded: loaded (/etc/fstab; generated; vendor preset: enabled)
Active: failed
systemd[1]: Mounting /mnt...
systemd[1]: mnt.mount: Mount process exited, code=exited status=32
systemd[1]: Failed to mount /mnt.

mnt.mount is the unit that mounts /mnt and as you can see — it fails for some unknown reason. cat the unit tells us why:

$ systemctl cat mnt.mount# Automatically generated by systemd-fstab-generator
...
[Mount]
What=/dev/disk/cloud/azure_resource
Where=/mnt

The ‘Where’ is right, but the ‘What’ is wrong:

$ ls -ltr /dev/disk/cloud
azure_resource -> ../../sdb
azure_resource-part1 -> ../../sdb1

azure_resource is a symlink to /dev/sdb, but the documentation tells us the temp drive is supposed to be at /dev/sdb1. For some reason mnt.mount is generated with the wrong parameters.

What is systemd-fstab-generator?

The first line in mnt.mount tells us the file was generated by systemd-fstab-generator. As the documentation says — fstab-generator translates /etc/fstab to native systemd unit. The problem must be in /etc/fstab:

$ cat /etc/fstab
...
/dev/disk/cloud/azure_resource /mnt auto defaults,nofail,x-systemd.requires=cloud-init.service,comment=cloudconfig

/mnt is configured to mount azure_resource and not azure_resource-part1. Who generated /etc/fstab? let’s look again at /etc/fstab:

$ cat /etc/fstab
# CLOUD_IMG: This file was created/modified by the Cloud Image build process
...

The comment doesn’t tell us much, but after some reading, I found out that a project named cloud-init that is responsible to initialize cloud instances.

Cloud-init

Cloud-init handles initialization of cloud instances. It is installed on all Ubuntu Cloud Images (but can be installed on multiple operating systems). One of its responsibilities is to set up ephemeral mount points or — /mnt.

The relevant module is the Mounts module. The Mounts module adds mount points to /etc/fstab.

$ journalctl -u cloud-init | grep cc_mounts
(1) cloud-init[1193]: [CLOUDINIT] stages.py[DEBUG]: Running module mounts (<module 'cloudinit.config.cc_mounts' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_mounts.py'>) with frequency always
(2) cloud-init[1193]: [CLOUDINIT] cc_mounts.py[DEBUG]: Attempting to determine the real name of ephemeral0(3) cloud-init[1193]: [CLOUDINIT] cc_mounts.py[DEBUG]: Mapped metadata name ephemeral0 to /dev/disk/cloud/azure_resource(4) cloud-init[1193]: [CLOUDINIT] cc_mounts.py[DEBUG]: changed default device ephemeral0 => /dev/disk/cloud/azure_resource

The problem is indeed in cc_mounts.py. Let’s dive in to the code and try to find the problem. First find the current version:

$ dpkg-query --show cloud-init
cloud-init 0.7.8-15-g6e45ffb-0ubuntu1
$ git clone https://git.launchpad.net/cloud-init
$ checkout ubuntu/0.7.8-15-g6e45ffb-0ubuntu1

cc_mounts.py

We are searching for the part of the code that writes azure_resource and not azure_resource-part1 to /etc/fstab

The logs from cloud-init can be found at sanitize_devname function. let’s debug it together using the logs.

First the code, just for reference.

devname = "ephemeral0" device_path = "/dev/disk/cloud/azure_resource"partition_number = None //expand_dotted_devname returns None if dot is not found.partition_path = _get_nth_partition_for_device("/dev/disk/cloud/azure_resource", 1)

devname is ephemeral0 (line 6), transformer set device_path to be /dev/disk/cloud/azure_resource (line 15).

_get_nth_partition_for_device purpose is to find a file that exists with one of the following suffixes:

  • {device_path}{partition_number}
  • {device_path}p{partition_number}
  • {device_path}-part{partition_number}

The third option (-part1) is supposed to return true, but from coud-init logs

changed default device ephemeral0 => /dev/disk/cloud/azure_resource

we understand that the file doesn’t exists and the method returns None and set partition_path to be None.

Back to sanitize_devname:

If partition_path is None (which it is), the default is to return device_path. And finally this is the we get:

changed default device ephemeral0 => /dev/disk/cloud/azure_resource

Recap

what we figured out —

  • /etc/fstab is generated by the systemd unit — cloud-init
  • cc_mounts.py can’t find the sysmlink /dev/disk/cloud/azure_resource-part1 and defaults back to /dev/disk/cloud/azure_resource

The next question — who is responsible to create azure_resource-part1?

systemd-udevd

cloud-init installs the file — 66-azure-ephemeral.rules at /lib/udev/rules.d

$ cat /lib/udev/rules.d/66-azure-ephemeral.rules
...
# Create the symlinks
ENV{DEVTYPE}=="partition", SYMLINK+="disk/cloud/$env{fabric_name}-part%n"

systemd-udevd is a device managing daemon that listens to kernel uevents and executes matching instructions accordingly. It executes instructions specified in udev rules, one of them is to create the symlink /dev/disk/cloud/azure_resource-part1.

On the one hand when I login to the machine, the symlink is there, but on the other hand cc_mounts determines that it doesn’t exists. We are getting close.

Summary

  • Two reboots caused the mount to /mnt to fail.
  • mnt.mount failed because the ‘What’ section is wrong.
  • systemd-fstab-generator generated the broken mnt.mount unit because /etc/fstab was generated with wrong data.
  • cloud-init (specifically cc_mounts module) generated the broken /etc/fstab file because it couldn’t find the symlink azure_resource-part1.
  • systemd-udevd created the symlink azure_resource-part1

The last point hints that this all could be a timing issue.

Next Time

We will Dive-in deeper to try to figure out what went wrong between systemd-udevd and cloud-init.

--

--