Image Segmentation Task using Fastai v1.0

Fei Wu
Analytics Vidhya
Published in
3 min readNov 24, 2019

Image segmentation is a task where each pixel of an image is classified into a category. For example, given an image with a cyclist in it, each pixel composing the cyclist should be classified into the cyclist category (cyan colored in below picture).

Fastai (an online machine learning course) studies the image segmentation task by training an U-Net with the CamVid dataset in this notebook.

An already trained U-Net can also be tested in this colab notebook.

Data

Image segmentation data are organized into pairs of images and masks. An image is represented by a tensor of shape (h, w, c) where h, w and c are the height, width and channel number of the image. A mask is represented by a tensor of shape (h, w, N). If there are a total of N classes (categories) to classify into, each pixel of a mask is a vector of length N with 0 everywhere and 1 at an index i (label of the corresponding class).

The loader that will load and preprocess images with their respective mask for the training and validation steps is defined by following lines:

src = (SegmentationItemList.from_folder(path_of_dataset)
.split_by_fname_file('../valid.txt')
.label_from_func(get_y_fn, classes=codes))
data = (src.transform(get_transforms(), size=size, tfm_y=True)
.databunch(bs=bs)
.normalize(imagenet_stats))

Model

The model used to map an image to its corresponding mask is an U-Net.

The first part of the U-Net (down sampling path) is a standard convolutional network (resnet in the notebook of fastai) that gradually decreases the size of the image and increases its number of channel.

The second part of the U-Net (up sampling path) on the contrary, gradually increases the size of the image tensor and decreases its number of channel. Every time the image tensor is up sampled, it is concatenated (gray arrows) with its corresponding image tensor in the down sampling path. By doing so, the model can better localize information of the deepest layers in specific regions of the input image.

The U-Net with its specific dataset can be created with one line.

learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)

Training

To train the U-Net, fastai uses an Adam optimizer with default parameters, a weight decay of 1e-2 and choose the following learning rates per iterations (1 epoch has around 70 iterations/batches).

This shape of learning rate can train a more generalized model, because the steady boost of learning rate can allow the optimization process to escape local minima.

The loss function is the average of softmax losses (nn.CrossEntropyLoss in PyTorch) applied to every pixel. The down sampling path of the U-Net is pre-trained with ImageNet. It is first freezed and only the up sampling path is trained at the beginning. The whole U-Net is then trained by unfreezing the down sampling path.

learn.fit_one_cycle(10, slice(lr), pct_start=0.9)
learn.unfreeze()
learn.fit_one_cycle(10, slice(lr/400,lr/4), pct_start=0.8)

Fastai then proposes to increase the size of the input images to further train the model. It has the purpose of reducing overfitting and acts as a transfer learning step. The model was ‘pre-trained’ with a smaller version of the images and then trained with a bigger version of the images.

learn.destroy()
data = (src.transform(get_transforms(), size=size*2, tfm_y=True)
.databunch(bs=bs)
.normalize(imagenet_stats))
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)
learn.fit_one_cycle(10, slice(lr), pct_start=0.8)
learn.unfreeze()
learn.fit_one_cycle(10, lrs)

Results

After 40 epochs of training here are some examples of image segmentation results.

Ground Truth / Prediction
Ground Truth / Prediction

--

--