Day 68 of 100DaysofML

Charan Soneji
100DaysofMLcode
Published in
4 min readSep 7, 2020

Image processing. Thought of writing a blog on the basics and usage of Image Processing using skimage. I shall share the notebook below but first let's complete the basics of Image Processing. The code shall basically represent a pipeline of numpy and scipy.

Let’s start with the implementation of the code by importing the libraries and by importing the packages required.

import pathlib
import imageio
import numpy as np
training_paths = pathlib.Path('../input/stage1_train').glob('*/images/*.png') training_sorted = sorted([x for x in training_paths])
im_path = training_sorted[45]
im = imageio.imread(str(im_path))

So glob in the above lines is basically a UNIX style of importing the files and basically, we are trying to import all the training and testing images using a single given path. Below given is the documentation for understanding the syntax of glob and its usage.

Understanding Image Modality and working on it.

Image modality and visual image characteristics can offer valuable addition to traditional textual metadata for archiving and indexing visual material. Example image modalities include computerized tomography (CT), X-ray, magnetic resonance imaging (MRI), ultrasound, photographs, illustrations, charts, graphs, and sketches. The open-access biomedical literature in PMC is a source for OpenI (pronounced Open “eye”), a multimodal biomedical information retrieval system developed at the NLM that enables users to search for and retrieve relevant images and text.

The images in this dataset can be in RGB, RGBA and grayscale format, based on the “modality” in which they are acquired. For color images, there is a third dimension which encodes the “channel” (e.g. Red, Green, Blue). To make things simpler for this first pass, we can coerce all these images into grayscale using the rgb2gray function from scikit-image.

print('Original image shape: {}'.format(im.shape))

from skimage.color import rgb2gray
im_gray = rgb2gray(im)
print('New image shape: {}'.format(im_gray.shape))

So the first command is used to highlight the dimensions of the initial image and since we are not manipulating it, the image dimensions remain the same and we use skimage to change the color to grayscale.

Comparison of images
import matplotlib.pyplot as plt
plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.imshow(im)
plt.axis('off')
plt.title('Original Image')
plt.subplot(1,2,2)
plt.imshow(im_gray, cmap='gray')
plt.axis('off')
plt.title('Grayscale Image')
plt.tight_layout()
plt.show()
Comparison of images

Background Removal

Perhaps the simplest approach for this problem is to assume that there are two classes in the image: objects of interest and the background. Under this assumption, we would expect the data to fall into a bimodal distribution of intensities. If we found the best separation value, we could “mask” out the background data, then simply count the objects we’re left with.

The “dumbest” way we could find the threshold value would be to use a simple descriptive statistic, such as the mean or median. But there are other methods: the “Otsu” method is useful because it models the image as a bimodal distribution and finds the optimal separation value.

from skimage.filters import threshold_otsu
thresh_val = threshold_otsu(im_gray)
mask = np.where(im_gray > thresh_val, 1, 0)

if np.sum(mask==0) < np.sum(mask==1):
mask = np.where(mask, 0, 1)
plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
im_pixels = im_gray.flatten()
plt.hist(im_pixels,bins=50)
plt.vlines(thresh_val, 0, 100000, linestyle='--')
plt.ylim([0,50000])
plt.title('Grayscale Histogram')
plt.subplot(1,2,2)
mask_for_display = np.where(mask, mask, np.nan)
plt.imshow(im_gray, cmap='gray')
plt.imshow(mask_for_display, cmap='rainbow', alpha=0.5)
plt.axis('off')
plt.title('Image w/ Mask')
plt.show()

We are creating the thresholds and then display the graph as subplots.

Removing background and creating masks

Counting the number of Objects in the image

For this contest, we need to get a separate mask for each nucleus. One way we can do this is by looking for all objects in the mask that are connected, and assign each of them a number using ndimage.label. Then, we can loop through each label_id and add it to an iterable, such as a list. The implementation of which can be seen in the below given code.

labels, nlabels = ndimage.label(mask)

label_arrays = []
for label_num in range(1, nlabels+1):
label_mask = np.where(labels == label_num, 1, 0)
label_arrays.append(label_mask)

print('There are {} separate components / objects detected.'.format(nlabels))
from matplotlib.colors import ListedColormap
rand_cmap = ListedColormap(np.random.rand(256,3))
labels_for_display = np.where(labels > 0, labels, np.nan)
plt.imshow(im_gray, cmap='gray')
plt.imshow(labels_for_display, cmap=rand_cmap)
plt.axis('off')
plt.title('Labeled Cells ({} Nuclei)'.format(nlabels))
plt.show()
No of objects detected

A quick glance reveals two problems (in this very simple image):

  • There are a few individual pixels that stand alone (e.g. top-right)
  • Some cells are combined into a single mask (e.g., top-middle)

I shall be discussing resolution to these problems in tomorrow’s blog.

Thanks for reading. Keep Learning.

Cheers.

--

--