Understanding U-Net

Bhavesh Goyal
7 min readJul 30, 2020

--

TABLE OF CONTENTS:

  1. INTRODUCTION
  2. PRE-REQUISITE
  3. WHAT IS U-NET
  4. U-NET STRUCTURE
  5. KAGGLE DATA SCIENCE BOWL 2018 CHALLENGE

INTRODUCTION

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, machines can accurately identify and classify objects — and then react to what they see. (SAS)

Deep Learning has enabled the field of Computer Vision to advance rapidly in the last few years. In this post, I would like to discuss one specific task in Computer Vision called Segmentation. Even though researchers have come up with numerous ways to solve this problem, I will talk about a particular architecture namely UNET, which uses a Fully Convolutional Network Model for the task.

We will use UNET to build the first-cut solution to the Kaggle Data Science Bowl 2018 Challenge Spot Nuclei Speed Cures challenge.

PREREQUISITE

This post is written assuming that the reader is already familiar with the basic concepts of Machine Learning and Convolutional Networks. Also, he/she is having some working knowledge of ConvNets with Python and Keras library.

WHAT IS SEGMENTATION?

The goal of segmentation is to separate different parts of an image into sensible coherent parts. There are two types of segmentation:

  1. Semantic Segmentation ( Pixel Level Predictions based on labeled classes)
  2. Instance Segmentation (Objection detection and Object identification)

In this post, we will be focusing mainly on Semantic Segmentation.

WHAT IS U-NET?

Created in 2015, U-Net was a unique CNN developed for Biomedical Image Segmentation. U-Net has now become a very popular end-to-end encoder-decoder network for semantic segmentation. It has a unique Up-Down architecture that has a Contracting path and an Expansive path.

U-NET Architecture

U-NET STRUCTURE

The downsampling path in U-Net consists of 4 blocks with the following layers:

  1. 3x3 CONV (ReLU + Batch Normalization and Dropout used)

2. 3x3 CONV (ReLU + Batch Normalization and Dropout used)

3. 2x2 Max Pooling

4. Feature maps double as we go down the blocks, starting at 64, then 128, 256, and 512.

Bottleneck consists of 2 CONV layers with Batch Normalization & Dropout

The upsampling path consists of 4 blocks with the following layers:

1. Deconvolution layer

2. Concatenation with the feature map from the corresponding contracting path

3. 3x3 CONV (ReLU + Batch Normalization and Dropout used)

4. 3x3 CONV (ReLU + Batch Normalization and Dropout used)

KAGGLE DATA SCIENCE BOWL 2018 CHALLENGE

The main task of the challenge was to detect nuclei in an image. By automating nucleus detection, you could help unlock cures faster. Identifying the cells’ nuclei is the starting point for most analyses because most of the human body’s 30 trillion cells contain a nucleus full of DNA, the genetic code that programs each cell. Identifying nuclei allows researchers to identify each individual cell in a sample, and by measuring how cells react to various treatments, the researcher can understand the underlying biological processes at work.

Sample Images, Objective and our approach

● Objective — Automate the generation of the image masks

● Approach — Use U-Net, a special CNN designed for segmentation tasks to generate these masks.

You can download the dataset from the Kaggle website.

The entire source code for this project can be downloaded from here.

IMPORTING ALL NECESSARY PACKAGES AND MODULES

import os
import sys
import random
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
from itertools import chain
from skimage.io import imread, imshow, imread_collection, concatenate_images
from skimage.transform import resize
from skimage.morphology import label
from keras.models import Model, load_model
from keras.layers import Input
from keras.layers.core import Dropout, Lambda
from keras.layers.convolutional import Conv2D, Conv2DTranspose
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import concatenate
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras import backend as K
import tensorflow as tf

IMG_WIDTH = 128
IMG_HEIGHT = 128
IMG_CHANNELS = 3
TRAIN_PATH = './U_NET/train/'
TEST_PATH = './U_NET/validation/'

warnings.filterwarnings('ignore', category=UserWarning, module='skimage')
seed = 42
random.seed = seed
np.random.seed = seed

Collect our file names for training and test data

train_ids = next(os.walk(TRAIN_PATH))[1]
test_ids = next(os.walk(TEST_PATH))[1]

Creating our image masks of dimension 128 x 128 (black images)

print('Getting and resizing training images ... ')
X_train = np.zeros((len(train_ids), IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS), dtype=np.uint8)
Y_train = np.zeros((len(train_ids), IMG_HEIGHT, IMG_WIDTH, 1), dtype=np.bool)
# Re-sizing our training images to 128 x 128
# Note sys.stdout prints info that can be cleared unlike print.
# Using TQDM allows us to create progress bars
sys.stdout.flush()
for n, id_ in tqdm(enumerate(train_ids), total=len(train_ids)):
path = TRAIN_PATH + id_
img = imread(path + '/images/' + id_ + '.png')[:,:,:IMG_CHANNELS]
img = resize(img, (IMG_HEIGHT, IMG_WIDTH), mode='constant', preserve_range=True)
X_train[n] = img
mask = np.zeros((IMG_HEIGHT, IMG_WIDTH, 1), dtype=np.bool)

# Now we take all masks associated with that image and combine them into one single mask
for mask_file in next(os.walk(path + '/masks/'))[2]:
mask_ = imread(path + '/masks/' + mask_file)
mask_ = np.expand_dims(resize(mask_, (IMG_HEIGHT, IMG_WIDTH), mode='constant',
preserve_range=True), axis=-1)
mask = np.maximum(mask, mask_)
# Y_train is now our single mask associated with our image
Y_train[n] = mask

# Get and resize test images
X_test = np.zeros((len(test_ids), IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS), dtype=np.uint8)
sizes_test = []
print('Getting and resizing test images ... ')
sys.stdout.flush()

# Here we resize our test images
for n, id_ in tqdm(enumerate(test_ids), total=len(test_ids)):
path = TEST_PATH + id_
img = imread(path + '/images/' + id_ + '.png')[:,:,:IMG_CHANNELS]
sizes_test.append([img.shape[0], img.shape[1]])
img = resize(img, (IMG_HEIGHT, IMG_WIDTH), mode='constant', preserve_range=True)
X_test[n] = img

print('Done!')

Building our U-Net Model

def my_iou_metric(label, pred):     metric_value = tf.py_func(iou_metric_batch, [label, pred], tf.float32)     
return metric_value
# Build U-Net model

# Note we make our layers varaibles so that we can concatenate or stack
# This is required so that we can re-create our U-Net Model
inputs = Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
s = Lambda(lambda x: x / 255) (inputs)
c1 = Conv2D(16, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (s)
c1 = Dropout(0.1) (c1)
c1 = Conv2D(16, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c1)
p1 = MaxPooling2D((2, 2)) (c1)
c2 = Conv2D(32, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (p1)
c2 = Dropout(0.1) (c2)
c2 = Conv2D(32, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c2)
p2 = MaxPooling2D((2, 2)) (c2)
c3 = Conv2D(64, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (p2)
c3 = Dropout(0.2) (c3)
c3 = Conv2D(64, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c3)
p3 = MaxPooling2D((2, 2)) (c3)
c4 = Conv2D(128, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (p3)
c4 = Dropout(0.2) (c4)
c4 = Conv2D(128, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c4)
p4 = MaxPooling2D(pool_size=(2, 2)) (c4)
c5 = Conv2D(256, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (p4)
c5 = Dropout(0.3) (c5)
c5 = Conv2D(256, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c5)
u6 = Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same') (c5)
u6 = concatenate([u6, c4])
c6 = Conv2D(128, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (u6)
c6 = Dropout(0.2) (c6)
c6 = Conv2D(128, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c6)
u7 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same') (c6)
u7 = concatenate([u7, c3])
c7 = Conv2D(64, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (u7)
c7 = Dropout(0.2) (c7)
c7 = Conv2D(64, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c7)
u8 = Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same') (c7)
u8 = concatenate([u8, c2])
c8 = Conv2D(32, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (u8)
c8 = Dropout(0.1) (c8)
c8 = Conv2D(32, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c8)
u9 = Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same') (c8)
u9 = concatenate([u9, c1], axis=3)
c9 = Conv2D(16, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (u9)
c9 = Dropout(0.1) (c9)
c9 = Conv2D(16, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c9)
# Note our output is effectively a mask of 128 x 128
outputs = Conv2D(1, (1, 1), activation='sigmoid') (c9)
model = Model(inputs=[inputs], outputs=[outputs])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[my_iou_metric])
model.summary()

Fit our model

model_path = "./nuclei_finder_unet_1.h5"
checkpoint = ModelCheckpoint(model_path,
monitor="val_loss",
mode="min",
save_best_only = True,
verbose=1)

earlystop = EarlyStopping(monitor = 'val_loss',
min_delta = 0,
patience = 5,
verbose = 1,
restore_best_weights = True)

# Fit our model
results = model.fit(X_train, Y_train, validation_split=0.1,
batch_size=16, epochs=10,
callbacks=[earlystop, checkpoint])

Generating our predictions for training and validation data

# Predict on training and validation data
# Note our use of mean_iou metri
model = load_model('./nuclei_finder_unet_1.h5',
custom_objects={'my_iou_metric': my_iou_metric})

# the first 90% was used for training
preds_train = model.predict(X_train[:int(X_train.shape[0]*0.9)], verbose=1)

# the last 10% used as validation
preds_val = model.predict(X_train[int(X_train.shape[0]*0.9):], verbose=1)

#preds_test = model.predict(X_test, verbose=1)

# Threshold predictions
preds_train_t = (preds_train > 0.5).astype(np.uint8)
preds_val_t = (preds_val > 0.5).astype(np.uint8)

Showing our predicted masks on our training data

ix = random.randint(0, 602)
plt.figure(figsize=(20,20))

# Our original training image
plt.subplot(131)
imshow(X_train[ix])
plt.title("Image")

# Our original combined mask
plt.subplot(132)
imshow(np.squeeze(Y_train[ix]))
plt.title("Mask")

# The mask our U-Net model predicts
plt.subplot(133)
imshow(np.squeeze(preds_train_t[ix] > 0.5))
plt.title("Predictions")
plt.show()

To stay connected follow me here.

--

--