Nuclei Detection using UNet and HRNet
Table of Contents:
- Introduction
- Business Problem
- Problem Formulation
- Data Pipeline
- Custom Performance Metric
- UNet Model
- HRNet Model
- Inference
- Model Comparison
- Model Quantization
- Conclusion
- Future Work
- Links
- References
Introduction
Computer vision is a field of study focused on the problem of helping computers to see. It is a multidisciplinary field that could broadly be called a sub-field of artificial intelligence and machine learning, which may involve the use of specialized methods and make use of general learning algorithms. The goal of computer vision is to understand the content of digital images. Typically, this involves developing methods that attempt to reproduce the capability of human vision. Understanding the content of digital images may involve extracting a description from the image, which may be an object, a text description, a three-dimensional model, and so on. Many popular computer vision applications involve trying to recognize things in photographs; for example:
- Object Classification: What broad category of object is in this photograph?
- Object Identification: Which type of a given object is in this photograph?
- Object Verification: Is the object in the photograph?
- Object Detection: Where are the objects in the photograph?
- Object Landmark Detection: What are the key points for the object in the photograph?
- Object Segmentation: What pixels belong to the object in the image?
- Object Recognition: What objects are in this photograph and where are they?
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as image objects). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics. There are two types of segmentation- semantic segmentation and instance segmentation. If there are 5 people in an image, semantic segmentation will focus on classifying all the people as a single instance. Instance segmentation, on the other hand. will identify each of these people individually.
Medical image segmentation is the task of segmenting objects of interest in a medical image. Image segmentation is considered the most essential medical imaging process as it extracts the region of interest (ROI) through a semiautomatic or automatic process. It divides an image into areas based on a specified description, such as segmenting body organs/tissues in the medical applications for border detection, tumor detection/segmentation, and mass detection. Because segmentation partitions the image into coherent regions, clustering procedures can be applied for segmentation by extracting the global characteristics of the image to professionally separate the ROI from the background.
Business Problem
- Problem Statement:
Identify the nuclei in the images of the cells. But Why?
Identifying the cells’ nuclei is the starting point for most analyses because most of the human body’s 30 trillion cells contain a nucleus full of DNA, the genetic code that programs each cell. Identifying nuclei allows researchers to identify each individual cell in a sample, and by measuring how cells react to various treatments, the researcher can understand the underlying biological processes at work. - Data Source:
https://www.kaggle.com/c/data-science-bowl-2018 - Real World / Business Constraints:
* No latency requirements as such. But the model should also not take hours for segmentation.
* Cost of incorrect segmentation is high because it may fail to identify the nucleus correctly, which will have further consequences in further tasks.
* Model should be good in generalization, it should not be over-fitted.
Problem Formulation
- Data
* Source: https://www.kaggle.com/c/data-science-bowl-2018/overview
* This dataset contains a large number of segmented nuclei images. - Each image is represented by an associated ImageId. Files belonging to an image are contained in a folder with this ImageId. Within this folder are two sub-folders:
* ‘images’ — contains the image file.
* ‘masks’ — contains the segmented masks of each nucleus. This folder is only included in the training set. Each mask contains one nucleus. Masks are not allowed to overlap (no pixel belongs to two masks). - Type of Deep Learning Problem: Image Segmentation
We have to identify each nucleus present in the image of cells. - Performance Metrics:
Since this is an Image segmentation task, There are two most commonly used metrics:
- Intersection over Union: IoU is the area of overlap between the predicted segmentation and the ground truth divided by the area of union between the predicted segmentation and the ground truth.This metric ranges from 0–1 (0–100%) with 0 signifying no overlap and 1 signifying perfectly overlapping segmentation.
2. Dice Coefficient: Dice Coefficient is 2 * the Area of Overlap divided by the total number of pixels in both images. The Dice coefficient is very similar to the IoU. They are positively correlated, meaning if one says model A is better than model B at segmenting an image, then the other will say the same. Like the IoU, they both range from 0 to 1, with 1 signifying the greatest similarity between predicted and truth.
Data Pipeline
For this project I created input data pipeline in the following way:
- First I unzipped the training and testing datasets provided and store the contents into ‘train’ and ‘test’ folders/directories:
#Unzipping the training and testing folders into directoriesprint('Unzipping stage1_train.zip')
!unzip -q "../input/data-science-bowl-2018/stage1_train.zip" -d train/
print('Unzipped stage1_train.zip')print('Unzipping stage1_test.zip')
!unzip -q "../input/data-science-bowl-2018/stage1_test.zip" -d test/
print('Unzipped stage1_test.zip')
2. After that I created a dataframe with filenames as rows which will be corresponding to each sample point in the dataset.
# Function to create a dataframe of files which will be used for further processingdef files_df(root_dir):
subdir = os.listdir(root_dir)
files = []
df = pd.DataFrame()
for dir in subdir:
files.append(os.path.join(root_dir,dir))
df['files'] = files
return df# Root directories for training and testing
TRAIN_ROOT = './train'
TEST_ROOT = './test'train_df = files_df(TRAIN_ROOT)
test_df = files_df(TEST_ROOT)
3. Then I again created a dataframe which will take filenames from the previous dataframe and create image paths and mask paths for the image and masks corresponding to each filename. The following function will not only create a dataframe but will also combine various masks corresponding to each cell image combine into a single mask. The reason for doing so is that there are various masks for a particular cell image, where each mask identifying a different nucleus in the same cell image:
# Hyperparameters
IMG_WIDTH = 256
IMG_HEIGHT = 256
IMG_CHANNELS = 3
CLASSES = 1
BATCH_SIZE = 8# Function which will create a dataframe of image paths and mask paths along with creating a single mask with multiple masksdef image_df(filenames):
image_paths = []
mask_paths = []
df = pd.DataFrame() for filename in tqdm(filenames):
file_path = os.path.join(filename,'images')
image_path = os.path.join(file_path,os.listdir(file_path)[0])
image_paths.append(image_path) mask = np.zeros((IMG_WIDTH,IMG_HEIGHT,1))
mask_dir = file_path.replace("images", "masks")
masks = os.listdir(mask_dir) for m in masks:
mask_path = os.path.join(mask_dir,m)
mask_ = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)
mask_ = cv2.resize(mask_,(IMG_WIDTH,IMG_HEIGHT), interpolation=cv2.INTER_NEAREST)
mask_ = np.expand_dims(mask_, axis = -1)
mask = np.maximum(mask,mask_) newmask_dir = mask_dir.replace("masks", "masks_") if not os.path.isdir(newmask_dir):
os.mkdir(newmask_dir) newmask_path = image_path.replace("images", "masks_")
mask_paths.append(newmask_path)
cv2.imwrite(newmask_path, mask) df['images'] = image_paths
df['masks'] = mask_paths
return df
# Training dataframe
train_filenames = train_df['files']
train = image_df(train_filenames)
4. After that I split the training data into training and validation set:
X_train, X_val = train_test_split(train, test_size=0.1, random_state=42)
5. Then I made a data preprocessing function:
# Function to parse image and mask file path and convert them into image and maskdef parse_function(image_path, mask_path):
image_string = tf.io.read_file(image_path)
image = tf.image.decode_png(image_string, channels=IMG_CHANNELS)#
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [IMG_HEIGHT, IMG_WIDTH]) mask_string = tf.io.read_file(mask_path)
mask = tf.image.decode_png(mask_string, channels=IMG_CHANNELS)#
mask = tf.image.convert_image_dtype(mask, tf.float32)
mask = tf.image.resize(mask, [IMG_HEIGHT, IMG_WIDTH])
return image, mask
6. Then comes creation of train and validation dataset using tf.data:
# Training datasettrain_ds = tf.data.Dataset.from_tensor_slices((X_train['images'], X_train['masks']))train_ds = train_ds.shuffle(X_train.shape[0])train_ds = train_ds.map(parse_function, num_parallel_calls=tf.data.AUTOTUNE)train_ds = train_ds.map(train_preprocess, num_parallel_calls=tf.data.AUTOTUNE)train_ds = train_ds.batch(BATCH_SIZE)train_ds = train_ds.prefetch(1)# Validation datasetval_ds = tf.data.Dataset.from_tensor_slices((X_val['images'], X_val['masks']))val_ds = val_ds.shuffle(X_val.shape[0])val_ds = val_ds.map(parse_function, num_parallel_calls=tf.data.AUTOTUNE)val_ds = val_ds.batch(BATCH_SIZE)val_ds = val_ds.prefetch(1)
7. Sample of training dataset:
8. Sample of Validation dataset:
Custom Performance Metric
For this work I customized the mean IoU metric of the tensorflow. This was done to incorporate a threshold, for converting predicted probabilities into boolean tensor, which can be varied.
# Custom MeanIoU Metric functionclass MeanIoU(tf.keras.metrics.Metric):
def __init__(self, num_classes, thres=0.5, name='mean_iou', dtype=None):
super(MeanIoU, self).__init__(name=name, dtype=dtype)
self.num_classes = num_classes
self.thres = thres
self.total_cm = self.add_weight('total_confusion_matrix',
shape=(num_classes, num_classes),
initializer=tf.zeros_initializer())
def update_state(self, y_true, y_pred, sample_weight=None):
y_true = tf.cast(y_true, self._dtype)
y_pred = tf.cast(y_pred, self._dtype)
if y_pred.shape.ndims > 1:
y_pred = tf.reshape(y_pred, [-1])
if y_true.shape.ndims > 1:
y_true = tf.reshape(y_true, [-1]) y_pred = tf.where(y_pred > self.thres, 1.0, 0.0) if sample_weight is not None:
sample_weight = tf.cast(sample_weight, self._dtype)
if sample_weight.shape.ndims > 1:
sample_weight = tf.reshape(sample_weight, [-1]) current_cm = tf.math.confusion_matrix(y_true,
y_pred,
self.num_classes,
weights=sample_weight,
dtype=self._dtype) return self.total_cm.assign_add(current_cm)
def result(self):
sum_over_row = tf.cast(tf.reduce_sum(self.total_cm, axis=0), dtype=self._dtype) sum_over_col = tf.cast(tf.reduce_sum(self.total_cm, axis=1), dtype=self._dtype) true_positives = tf.cast(tf.linalg.tensor_diag_part(self.total_cm), dtype=self._dtype) denominator = sum_over_row + sum_over_col - true_positives num_valid_entries = tf.reduce_sum(tf.cast(tf.math.not_equal(denominator, 0), dtype=self._dtype)) iou = tf.math.divide_no_nan(true_positives, denominator) return tf.math.divide_no_nan(tf.reduce_sum(iou, name='mean_iou'), num_valid_entries) def reset_states(self):
# The state of the metric will be reset at the start of each epoch.
tf.keras.backend.set_value(self.total_cm, np.zeros((self.num_classes, self.num_classes))) def get_config(self):
config = {'num_classes': self.num_classes}
base_config = super(MeanIoU, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
UNet Model
The UNet was developed by Olaf Ronneberger et al. for Bio Medical Image Segmentation. The architecture contains two paths. First path is the contraction path (also called as the encoder) which is used to capture the context in the image or to extract the factors in the image. The encoder consists of convolutional and max pooling layers in a stacked fashion.
The second path is the symmetric expanding path (also called as the decoder) which is used to enable precise localization using transposed convolutions. Thus it is an end-to-end fully convolutional network (FCN), i.e. it only contains Convolutional layers and does not contain any Dense layer because of which it can accept image of any size. To learn more about the UNet please do read research paper — U-Net: Convolutional Networks for Biomedical Image Segmentation.
Architecture
In the following code we can see the implementation of the UNet architecture implemented using Tensorflow.
mean_iou = MeanIoU(2, 0.4)# Input Layer
# Input shape 256X256X3
inputs = Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
# Left Side/Downsampling Side
# 256 -> 128
conv1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(inputs)conv1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv1)pool1 = MaxPool2D((2, 2))(conv1)
pool1 = Dropout(0.25)(pool1)# 128 -> 64
conv2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(pool1)conv2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv2)pool2 = MaxPool2D((2, 2))(conv2)
pool2 = Dropout(0.5)(pool2)# 64 -> 32
conv3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(pool2)conv3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv3)pool3 = MaxPool2D((2, 2))(conv3)
pool3 = Dropout(0.5)(pool3)# 32 -> 16
conv4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(pool3)conv4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv4)pool4 = MaxPool2D((2, 2))(conv4)
pool4 = Dropout(0.5)(pool4)# Middle Part
# 16 -> 16
convm = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(pool4)convm = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(convm)# Right Side/ Upsampling Side
# 16 -> 32
uconv4 = Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(convm)uconv4 = Concatenate()([uconv4, conv4])
uconv4 = Dropout(0.5)(uconv4)uconv4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(uconv4)uconv4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(uconv4)# 32 -> 64
uconv3 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(uconv4)uconv3 = Concatenate()([uconv3, conv3])
uconv3 = Dropout(0.5)(uconv3)uconv3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(uconv3)uconv3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(uconv3)# 64 -> 128
uconv2 = Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same')(uconv3)uconv2 = Concatenate()([uconv2, conv2])
uconv2 = Dropout(0.5)(uconv2)uconv2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(uconv2)uconv2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(uconv2)# 128 -> 256
uconv1 = Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same')(uconv2)uconv1 = Concatenate()([uconv1, conv1])
uconv1 = Dropout(0.5)(uconv1)uconv1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(uconv1)uconv1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(uconv1)# Output Layer
# Output shape 256X256X1
outputs = Conv2D(CLASSES, (1, 1), activation='sigmoid')(uconv1)model = Model(inputs=[inputs], outputs=[outputs])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[mean_iou])#model.summary()
Callbacks
ModelCheckpoint Callback was used for saving best poerforming model.
!rm -rf ./unet_save/if not os.path.exists('unet_save'):
os.makedirs('unet_save')filepath="unet_save/weights-{epoch:04d}.hdf5"
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath=filepath,
save_best_only=True,
mode='max',
monitor='val_mean_iou')
Training
Model was trained for 30 epochs first time and then again trained till 60 epochs. Second time training starts from the epoch which gave the best performance in the first round.
callbacks_list = [checkpoint]
# First round of training
history = unet_model.fit(train_ds,
initial_epoch = 0,
epochs=30,
callbacks=callbacks_list,
validation_data=val_ds)# Second round of training
initial_epoch = int(sorted(os.listdir('unet_save'))[-1].split('.')[0].split('-')[-1])history = unet_model.fit(train_ds,
initial_epoch = initial_epoch,
epochs=60,
callbacks=callbacks_list,
validation_data=val_ds)# Best model
unet_model = tf.keras.models.load_model('./unet_save/'+sorted(os.listdir('unet_save'))[-1])# save model
unet_model.save("unet_model.h5")
unet_model = tf.keras.models.load_model("unet_model.h5")
HRNet Model
HRNet model was developed by Jingdong Wang et. al. for addressing the issue of lack of high level resolution in the previous image segmentation models. Their architecture is able to maintain highresolution representations through the whole process. The first stage of the architecture is a high-resolution subnetwork, then gradually high-to-low resolution subnetworks are added one by one to form more stages, and connect the multi-resolution subnetworks in parallel. There are (i) this approach connects high-to-low resolution subnetworks in parallel rather than in series as done in most
existing solutions. Thus, high-resolution can be maintained instead of recovering the resolution through a low-to-high process. (ii) Most existing fusion schemes aggregate low-level and highlevel representations. Instead, in this architecture repeated multiscale fusions were performed to boost the high-resolution representations with the help of the low-resolution representations of the same depth and similar level, and vice versa, resulting in that high-resolution representations are richer. To learn more about HRNet please do read research paper — Deep High-Resolution Representation Learning for Visual Recognition.
Architecture
In the following code we can see the implementation of the HRNet archiecture implemented using Tensorflow.
# Hyperparameters
BN_MOMENTUM = 0.1
BN_EPSILON = 1e-5
INITIALIZER = 'he_normal'# Functions to build layersdef conv(x, outsize, kernel_size, strides_=1, padding_='same', activation=None):
return Conv2D(outsize,
kernel_size,
strides=strides_,
padding=padding_,
kernel_initializer=INITIALIZER,
use_bias=False,
activation=activation)(x)
def BasicBlock(x, size, downsampe=False):
residual = x out = conv(x, size, 3)
out = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(out)
out = Activation('relu')(out) out = conv(out, size, 3)
out = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(out) if downsampe:
residual = conv(x, size, 1, padding_='valid')
residual = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(residual) out = Add()([out, residual])
out = Activation('relu')(out) return out
def BottleNeckBlock(x, size, downsampe=False):
residual = x
out = conv(x, size, 1, padding_='valid')
out = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(out)
out = Activation('relu')(out) out = conv(out, size, 3)
out = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(out)
out = Activation('relu')(out) out = conv(out, size * 4, 1, padding_='valid')
out = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(out) if downsampe:
residual = conv(x, size * 4, 1, padding_='valid')
residual = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(residual)
out = Add()([out, residual])
out = Activation('relu')(out) return out
def layer1(x):
x = BottleNeckBlock(x, 64, downsampe=True)
x = BottleNeckBlock(x, 64)
x = BottleNeckBlock(x, 64)
x = BottleNeckBlock(x, 64)
return x
def transition_layer(x, in_channels, out_channels):
num_in = len(in_channels)
num_out = len(out_channels)
out = [] for i in range(num_out):
if i < num_in:
if in_channels[i] != out_channels[i]:
residual = conv(x[i], out_channels[i], 3)
residual = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(residual)
residual = Activation('relu')(residual)
out.append(residual)
else:
out.append(x[i])
else:
residual = conv(x[-1], out_channels[i], 3, strides_=2)
residual = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(residual)
residual = Activation('relu')(residual)
out.append(residual) return out
def branches(x, block_num, channels):
out = []
for i in range(len(channels)):
residual = x[i]
for j in range(block_num):
residual = BasicBlock(residual, channels[i])
out.append(residual) return out
def fuse_layers(x, channels, multi_scale_output=True):
out = []
for i in range(len(channels) if multi_scale_output else 1):
residual = x[i]
for j in range(len(channels)):
if j > i:
y = conv(x[j], channels[i], 1, padding_='valid')
y = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(y)
y = UpSampling2D(size=2 ** (j - i))(y)
residual = Add()([residual, y])
elif j < i:
y = x[j]
for k in range(i - j):
if k == i - j - 1:
y = conv(y, channels[i], 3, strides_=2)
y = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(y)
else:
y = conv(y, channels[j], 3, strides_=2)
y = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(y)
y = Activation('relu')(y)
residual = Add()([residual, y])
residual = Activation('relu')(residual)
out.append(residual)
return out
# Functions to create model
def HighResolutionModule(x, channels, multi_scale_output=True):
residual = branches(x, 4, channels)
out = fuse_layers(residual, channels,
multi_scale_output=multi_scale_output)
return out
def stage(x, num_modules, channels, multi_scale_output=True):
out = x
for i in range(num_modules):
if i == num_modules - 1 and multi_scale_output == False:
out = HighResolutionModule(out, channels,
multi_scale_output=False)
else:
out = HighResolutionModule(out, channels)
return out
def hrnet_keras(input_size=(256, 256, 3)):
channels_2 = [32, 64]
channels_3 = [32, 64, 128]
channels_4 = [32, 64, 128, 256]
num_modules_2 = 1
num_modules_3 = 4
num_modules_4 = 3 inputs = Input(input_size)
x = conv(inputs, 64, 3, strides_=2)
x = BatchNormalization(epsilon=BN_EPSILON,momentum=BN_MOMENTUM)(x)
x = conv(x, 64, 3, strides_=2)
x = BatchNormalization(epsilon=BN_EPSILON,momentum=BN_MOMENTUM)(x)
x = Activation('relu')(x) la1 = layer1(x)
tr1 = transition_layer([la1], [256], channels_2)
st2 = stage(tr1, num_modules_2, channels_2)
tr2 = transition_layer(st2, channels_2, channels_3)
st3 = stage(tr2, num_modules_3, channels_3)
tr3 = transition_layer(st3, channels_3, channels_4)
st4 = stage(tr3, num_modules_4, channels_4,
multi_scale_output=False)
up1 = UpSampling2D()(st4[0])
up1 = conv(up1, 32, 3)
up1 = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(up1)
up1 = Activation('relu')(up1)
up2 = UpSampling2D()(up1)
up2 = conv(up2, 32, 3)
up2 = BatchNormalization(epsilon=BN_EPSILON,
momentum=BN_MOMENTUM)(up2)
up2 = Activation('relu')(up2)
final = conv(up2, 1, 1, padding_='valid', activation='sigmoid') model = Model(inputs=inputs, outputs=final)
return model
hrnet_model = hrnet_keras()
hrnet_model.summary()
hrnet_model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=[mean_iou])
Callbacks
ModelCheckpoint Callback was used for saving best poerforming model.
!rm -rf ./hrnet_save/if not os.path.exists('hrnet_save'):
os.makedirs('hrnet_save')
filepath="hrnet_save/weights-{epoch:04d}.hdf5"
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath=filepath,
save_best_only=True,
mode='max',
monitor='val_mean_iou')
Training
Model was trained for 30 epochs first time and then again trained till 60 epochs. Second time training starts from the epoch which gave the best performance in the first round.
callbacks_list = [checkpoint]
# First round of training
history = hrnet_model.fit(train_ds,
initial_epoch = 0,
epochs=30,
callbacks=callbacks_list,
validation_data=val_ds)# Second round of training
initial_epoch = int(sorted(os.listdir('hrnet_save'))[-1].split('.')[0].split('-')[-1])history = hrnet_model.fit(train_ds,
initial_epoch = initial_epoch,
epochs=60,
callbacks=callbacks_list,
validation_data=val_ds)# Best model
hrnet_model = tf.keras.models.load_model('./hrnet_save/'+sorted(os.listdir('hrnet_save'))[-1])# save model
hrnet_model.save("hrnet_model.h5")
hrnet_model = tf.keras.models.load_model("hrnet_model.h5")
Inference
Inference on Validation dataset:
for image, mask in val_ds.take(1):
for i in range(BATCH_SIZE):
pred_mask_u = unet_model.predict(image[i][np.newaxis,:,:,:])
pred_mask_h = hrnet_model.predict(image[i][np.newaxis,:,:,:]) fig = plt.figure(figsize=(14,10)) ax1 = fig.add_subplot(141)
ax1.title.set_text('Original Image')
ax1.imshow(image[i]) ax2 = fig.add_subplot(142)
ax2.title.set_text('Ground Truth')
ax2.imshow(mask[i][:,:,0], cmap='gray') ax3 = fig.add_subplot(143)
ax3.title.set_text('UNet Prediction')
ax3.imshow(pred_mask_u[0,:,:,0], cmap='gray')
ax4 = fig.add_subplot(144)
ax4.title.set_text('HRNet Prediction')
ax4.imshow(pred_mask_h[0,:,:,0], cmap='gray')
plt.show()
Inference on Test dataset:
test_filenames = test_df['files']
for filename in test_filenames[:5]:
file_path = os.path.join(filename,'images')
image_path = os.path.join(file_path,os.listdir(file_path)[0])
image_string = tf.io.read_file(image_path)
image = tf.image.decode_png(image_string, channels=IMG_CHANNELS)#
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [IMG_HEIGHT, IMG_WIDTH]) pred_mask_u = unet_model.predict(image[np.newaxis,:,:,:])
pred_mask_h = hrnet_model.predict(image[np.newaxis,:,:,:]) fig = plt.figure(figsize=(10,6)) ax1 = fig.add_subplot(131)
ax1.title.set_text('Original Image')
ax1.imshow(image) ax2 = fig.add_subplot(132)
ax2.title.set_text('UNet Prediction')
ax2.imshow(pred_mask_u[0,:,:,0], cmap='gray') ax3 = fig.add_subplot(133)
ax3.title.set_text('HRNet Prediction')
ax3.imshow(pred_mask_h[0,:,:,0], cmap='gray')
plt.show()
Model Comparison
In this section we will see how our two models have performed. First we will be making a dataframe with IoU scores of each model for each data point.
# Function to create a dataframe with iou_scores for each model and image and mask paths.def metric_df(data):
unet_iou_scores = []
hrnet_iou_scores = []
m = MeanIoU(2, 0.4)
for i in tqdm(range(len(data))):
image_path = data['images'].iloc[i]
mask_path = data['masks'].iloc[i] image_string = tf.io.read_file(image_path)
image = tf.image.decode_png(image_string,channels=IMG_CHANNELS)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [IMG_HEIGHT, IMG_WIDTH]) mask_string = tf.io.read_file(mask_path)
mask = tf.image.decode_png(mask_string, channels=CLASSES)
mask = tf.image.convert_image_dtype(mask, tf.float32)
mask = tf.image.resize(mask, [IMG_HEIGHT, IMG_WIDTH]) pred_mask_u = unet_model.predict(image[np.newaxis,:,:,:])
m.update_state(mask, pred_mask_u)
u_iou_score = m.result().numpy()
unet_iou_scores.append(round(u_iou_score,4)) pred_mask_h = hrnet_model.predict(image[np.newaxis,:,:,:])
m.update_state(mask, pred_mask_h)
h_iou_score = m.result().numpy()
hrnet_iou_scores.append(round(h_iou_score,4)) data['unet_iou_scores'] = unet_iou_scores
data['hrnet_iou_scores'] = hrnet_iou_scores
return data
df = train.copy()
df = metric_df(df)
df = df.sort_values(by=['hrnet_iou_scores','unet_iou_scores'])
Best Output samples
These are samples which have got best IoU score.
d1 = df.tail()
for i in range(5):
image_path = d1['images'].iloc[i]
mask_path = d1['masks'].iloc[i] image_string = tf.io.read_file(image_path)
image = tf.image.decode_png(image_string, channels=IMG_CHANNELS)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [IMG_HEIGHT, IMG_WIDTH]) mask_string = tf.io.read_file(mask_path)
mask = tf.image.decode_png(mask_string, channels=CLASSES)
mask = tf.image.convert_image_dtype(mask, tf.float32)
mask = tf.image.resize(mask, [IMG_HEIGHT, IMG_WIDTH]) pred_mask_u = unet_model.predict(image[np.newaxis,:,:,:])
pred_mask_h = hrnet_model.predict(image[np.newaxis,:,:,:]) fig = plt.figure(figsize=(14,10)) ax1 = fig.add_subplot(141)
ax1.title.set_text('Original Image')
ax1.imshow(image) ax2 = fig.add_subplot(142)
ax2.title.set_text('Ground Truth')
ax2.imshow(mask[:,:,0], cmap='gray') ax3 = fig.add_subplot(143)
ax3.title.set_text('UNet: ' +str(round(d1['unet_iou_scores'].iloc[i],4)))
ax3.imshow(pred_mask_u[0,:,:,0], cmap='gray') ax4 = fig.add_subplot(144)
ax4.title.set_text('HRNet: ' +str(round(d1['hrnet_iou_scores'].iloc[i],4)))
ax4.imshow(pred_mask_h[0,:,:,0], cmap='gray') plt.show()
Worst Output Samples
These are samples which have got worst IoU scores.
d2 = df.head()
for i in range(5):
image_path = d2['images'].iloc[i]
mask_path = d2['masks'].iloc[i] image_string = tf.io.read_file(image_path)
image = tf.image.decode_png(image_string, channels=IMG_CHANNELS)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [IMG_HEIGHT, IMG_WIDTH]) mask_string = tf.io.read_file(mask_path)
mask = tf.image.decode_png(mask_string, channels=CLASSES)
mask = tf.image.convert_image_dtype(mask, tf.float32)
mask = tf.image.resize(mask, [IMG_HEIGHT, IMG_WIDTH])
pred_mask_u = unet_model.predict(image[np.newaxis,:,:,:])
pred_mask_h = hrnet_model.predict(image[np.newaxis,:,:,:]) fig = plt.figure(figsize=(14,10)) ax1 = fig.add_subplot(141)
ax1.title.set_text('Original Image')
ax1.imshow(image) ax2 = fig.add_subplot(142)
ax2.title.set_text('Ground Truth')
ax2.imshow(mask[:,:,0], cmap='gray') ax3 = fig.add_subplot(143)
ax3.title.set_text('UNet: '+ str(round(d2['unet_iou_scores'].iloc[i],4)))
ax3.imshow(pred_mask_u[0,:,:,0], cmap='gray') ax4 = fig.add_subplot(144)
ax4.title.set_text('HRNet: '+ str(round(d2['hrnet_iou_scores'].iloc[i],4)))
ax4.imshow(pred_mask_h[0,:,:,0], cmap='gray') plt.show()
Distribution of IoU scores of models
- Distribution of IoU scores when range between 0 and 1:
* The below plot is not providing much information.
* We will zoom it than.
2. Distribution of IoU scores when range between 0.91 and 0.96:
* Below two plots seem almost identical.
* Points having iou score greater than 0.93 for the case of HRNet seems to be slightly more than UNet.
3. Distribution of IoU scores when range between 0.92 and 0.935:
* From the below plot we can see that there are more points after 0.93 for the case of HRNet than UNet.
Scatter Plot between IoU scores of the two models
- Scatter plot between scores when range between 0 and 1:
* Scatter plot between iou scores of two models.
* Green color signifies points where HRNet gave better iou score than UNet.
* Red color signifies points where UNet gave better iou score than HRNet.
2. Scatter plot between scores when range between 0.92 and 0.94:
* Zooming the scatter plot we can see that there are more green points than red points.
* This signifies that HRNet has performed better than UNet.
* We can see points where UNet performed better is spread across the range.
3. Scatter plot between scores when range between 0.925 and 0.9275:
* In this plot we can see that although HRNet has performed better than UNet but the difference is very small.
Model Quantization
tflite_models_dir = pathlib.Path("/content/drive/MyDrive/CaseStudy2/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)
Post training quantization of UNet model:
# UNet Model
unet_model = tf.keras.models.load_model("/content/drive/MyDrive/CaseStudy2/unet_model.h5")# Post Training quantized UNet model
converter = tf.lite.TFLiteConverter.from_keras_model(unet_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quant_unet_model = converter.convert()# Save the quantized UNet model:
quant_unet_file = tflite_models_dir/"quant_unet_model.tflite"
quant_unet_file.write_bytes(quant_unet_model)
Post training quantization of HRNet:
# HRNet Model
hrnet_model = tf.keras.models.load_model("/content/drive/MyDrive/CaseStudy2/hrnet_model.h5")# Post Training quantized HRNet model
converter = tf.lite.TFLiteConverter.from_keras_model(hrnet_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quant_hrnet_model = converter.convert()# Save the quantized HRNet model:
quant_hrnet_file = tflite_models_dir/"quant_hrnet_model.tflite"
quant_hrnet_file.write_bytes(quant_hrnet_model)
Sizes of Models before and after quantization:
* We can see the considerable decrease in the file sizes of both the UNet and HRNet models.
* This reduction is good when we want to deploy our models on small devices.
print("UNet model in Mb:", os.path.getsize("/content/drive/MyDrive/CaseStudy2/unet_model.h5") / float(2**20))print("Quantized UNet in Mb:", os.path.getsize("/content/drive/MyDrive/CaseStudy2/quant_unet_model.tflite") / float(2**20))print("Float HRNet in Mb:", os.path.getsize("/content/drive/MyDrive/CaseStudy2/hrnet_model.h5") / float(2**20))print("Quantized HRNet in Mb:", os.path.getsize("/content/drive/MyDrive/CaseStudy2/quant_hrnet_model.tflite") / float(2**20))
Evaluation of unquantized and post training quantized models:
- Quantized UNet:
# Importing Quantized UNet model
u_interpreter = tf.lite.Interpreter(model_path="/content/drive/MyDrive/CaseStudy2/quant_unet_model.tflite")# Function to predict segments using quantized UNet model
def lite_unet_model(images):
u_interpreter.allocate_tensors()
u_interpreter.set_tensor(u_interpreter.get_input_details()[0]['index'], images)
u_interpreter.invoke()
return u_interpreter.get_tensor(u_interpreter.get_output_details()[0]['index'])
- Quantized HRNet:
# Importing Quantized HRNet model
h_interpreter = tf.lite.Interpreter(model_path="/content/drive/MyDrive/CaseStudy2/quant_hrnet_model.tflite")# Function to predict segments using quantized HRNet model
def lite_hrnet_model(images):
h_interpreter.allocate_tensors()
h_interpreter.set_tensor(h_interpreter.get_input_details()[0]['index'], images)
h_interpreter.invoke()
return h_interpreter.get_tensor(h_interpreter.get_output_details()[0]['index'])
- Sample Predictions of all the models:
* For some points UNet is performing better than HRNet and for some HRNet is performing better than UNet.
* Also it can be seen that for those data points where UNet performed better than HRNet, their quantized version shows the same behaviour.
* Overall, there are very small differences in the IoU scores of all the four models.
df = train.sample(n=8, random_state=1)
m = MeanIoU(2, 0.4)
for i in range(len(df)):
image_path = df['images'].iloc[i]
mask_path = df['masks'].iloc[i]
image_string = tf.io.read_file(image_path)
image = tf.image.decode_png(image_string, channels=IMG_CHANNELS)#
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [IMG_HEIGHT, IMG_WIDTH]) mask_string = tf.io.read_file(mask_path)
mask = tf.image.decode_png(mask_string, channels=CLASSES)
mask = tf.image.convert_image_dtype(mask, tf.float32)
mask = tf.image.resize(mask, [IMG_HEIGHT, IMG_WIDTH]) pred_mask_u = unet_model.predict(image[np.newaxis,:,:,:])
m.update_state(mask, pred_mask_u)
u_iou_score = m.result().numpy() pred_mask_qu = lite_unet_model(image[np.newaxis,:,:,:])[0]
m.update_state(mask, pred_mask_qu)
qu_iou_score = m.result().numpy() pred_mask_h = hrnet_model.predict(image[np.newaxis,:,:,:])
m.update_state(mask, pred_mask_h)
h_iou_score = m.result().numpy() pred_mask_qh = lite_hrnet_model(image[np.newaxis,:,:,:])[0]
m.update_state(mask, pred_mask_qh)
qh_iou_score = m.result().numpy() fig = plt.figure(figsize=(16,14)) ax1 = fig.add_subplot(161)
ax1.title.set_text('Original Image')
ax1.imshow(image) ax2 = fig.add_subplot(162)
ax2.title.set_text('Ground Truth')
ax2.imshow(mask[:,:,0], cmap='gray') ax3 = fig.add_subplot(163)
ax3.title.set_text('UNet: '+ str(round(u_iou_score,4)))
ax3.imshow(pred_mask_u[0,:,:,0], cmap='gray') ax4 = fig.add_subplot(164)
ax4.title.set_text('Quant_UNet: '+ str(round(qu_iou_score,4)))
ax4.imshow(pred_mask_qu[:,:,0], cmap='gray') ax5 = fig.add_subplot(165)
ax5.title.set_text('HRNet: '+ str(round(h_iou_score,4)))
ax5.imshow(pred_mask_h[0,:,:,0], cmap='gray') ax6 = fig.add_subplot(166)
ax6.title.set_text('Quant_HRNet: '+ str(round(qh_iou_score,4)))
ax6.imshow(pred_mask_qh[:,:,0], cmap='gray')
plt.show()
- Average IoU scores over a sample of 30 images:
* We can conclude that among UNet and HRNet, HRNet have better average IoU score.
* Similarly after quantization HRNet has better average IoU score.
* Also for this sample size quantized HRNet has better avergae IoU score than float HRNet.
* If we want to deploy model on smaller devices then quantized HRNet will be a better option.
df = train.sample(n=30, random_state=1)
unet_iou_scores = []
quant_unet_iou_scores = []
hrnet_iou_scores = []
quant_hrnet_iou_scores = []
m = MeanIoU(2, 0.4)
for i in range(len(df)):
image_path = df['images'].iloc[i]
mask_path = df['masks'].iloc[i] image_string = tf.io.read_file(image_path)
image = tf.image.decode_png(image_string, channels=IMG_CHANNELS)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [IMG_HEIGHT, IMG_WIDTH]) mask_string = tf.io.read_file(mask_path)
mask = tf.image.decode_png(mask_string, channels=CLASSES)
mask = tf.image.convert_image_dtype(mask, tf.float32)
mask = tf.image.resize(mask, [IMG_HEIGHT, IMG_WIDTH]) pred_mask_u = unet_model.predict(image[np.newaxis,:,:,:])
m.update_state(mask, pred_mask_u)
u_iou_score = m.result().numpy()
unet_iou_scores.append(round(u_iou_score,4)) pred_mask_qu = lite_unet_model(image[np.newaxis,:,:,:])[0]
m.update_state(mask, pred_mask_qu)
qu_iou_score = m.result().numpy()
quant_unet_iou_scores.append(round(qu_iou_score,4)) pred_mask_h = hrnet_model.predict(image[np.newaxis,:,:,:])
m.update_state(mask, pred_mask_h)
h_iou_score = m.result().numpy()
hrnet_iou_scores.append(round(h_iou_score,4)) pred_mask_qh = lite_hrnet_model(image[np.newaxis,:,:,:])[0]
m.update_state(mask, pred_mask_qh)
qh_iou_score = m.result().numpy()
quant_hrnet_iou_scores.append(round(qh_iou_score,4))print('The average IoU Score for UNet model: ', np.mean(np.array(unet_iou_scores)))print('The average IoU Score for Quantized UNet model: ', np.mean(np.array(quant_unet_iou_scores)))print('The average IoU Score for HRNet model: ', np.mean(np.array(hrnet_iou_scores)))print('The average IoU Score for Quantized HRNet model: ', np.mean(np.array(quant_hrnet_iou_scores)))
Conclusion
- Overall if we see HRNet performed better than UNet although the difference was very small.
- If we would like to deploy this work on small devices like smartphones or raspberry pi, then using quantized models would be better.
- Quantized models are very small in size, with little degradation in performance.
- For this case quantized models performed as good as unquantized models.
- Quantized models mostly have slightly inferior to unquantized models. But for some cases quantized models gave slightly better results than unquantized models.
- For cases where UNet performed better than HRNet, their quantized versions shows the same behaviour and vice versa.
Future Work
This work will be extend to explore latest segmentation architectures. Also further improving the performance of the models used in this work can also be explored.
Links
- Github Link:
https://github.com/sahu-mak/Nuclei_Detection_CaseStudy - Linkedin Link:
https://www.linkedin.com/in/sahumayank/ - Link of video of streamlit app:
https://youtu.be/7F6eKNpPep0
References
- https://towardsdatascience.com/metrics-to-evaluate-your-semantic-segmentation-model-6bcb99639aa2
- https://cs230.stanford.edu/blog/datapipeline/
- https://www.tensorflow.org/guide/data#basic_mechanics
- UNet — https://arxiv.org/abs/1505.04597
- HRNet — https://arxiv.org/abs/1908.07919
- https://github.com/1044197988/TF.Keras-Commonly-used-models/blob/master/%E5%B8%B8%E7%94%A8%E5%88%86%E5%89%B2%E6%A8%A1%E5%9E%8B/HRNet.py
- https://www.tensorflow.org/lite/performance/post_training_quant
- https://www.appliedaicourse.com