Day 68 of 100DaysofML

Published in

100DaysofMLcode

4 min readSep 7, 2020

Image processing. Thought of writing a blog on the basics and usage of Image Processing using skimage. I shall share the notebook below but first let's complete the basics of Image Processing. The code shall basically represent a pipeline of numpy and scipy.

Let’s start with the implementation of the code by importing the libraries and by importing the packages required.

import pathlib
import imageio
import numpy as np
training_paths = pathlib.Path('../input/stage1_train').glob('*/images/*.png') training_sorted = sorted([x for x in training_paths])
im_path = training_sorted[45]
im = imageio.imread(str(im_path))

So glob in the above lines is basically a UNIX style of importing the files and basically, we are trying to import all the training and testing images using a single given path. Below given is the documentation for understanding the syntax of glob and its usage.

glob - Unix style pathname pattern expansion - Python 3.8.5 documentation

The module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although…

docs.python.org

Understanding Image Modality and working on it.

Image modality and visual image characteristics can offer valuable addition to traditional textual metadata for archiving and indexing visual material. Example image modalities include computerized tomography (CT), X-ray, magnetic resonance imaging (MRI), ultrasound, photographs, illustrations, charts, graphs, and sketches. The open-access biomedical literature in PMC is a source for OpenI (pronounced Open “eye”), a multimodal biomedical information retrieval system developed at the NLM that enables users to search for and retrieve relevant images and text.

The images in this dataset can be in RGB, RGBA and grayscale format, based on the “modality” in which they are acquired. For color images, there is a third dimension which encodes the “channel” (e.g. Red, Green, Blue). To make things simpler for this first pass, we can coerce all these images into grayscale using the rgb2gray function from scikit-image.

print('Original image shape: {}'.format(im.shape))

from skimage.color import rgb2gray
im_gray = rgb2gray(im)
print('New image shape: {}'.format(im_gray.shape))

So the first command is used to highlight the dimensions of the initial image and since we are not manipulating it, the image dimensions remain the same and we use skimage to change the color to grayscale.

import matplotlib.pyplot as plt
plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.imshow(im)
plt.axis('off')
plt.title('Original Image')
plt.subplot(1,2,2)
plt.imshow(im_gray, cmap='gray')
plt.axis('off')
plt.title('Grayscale Image')
plt.tight_layout()
plt.show()

Background Removal

Perhaps the simplest approach for this problem is to assume that there are two classes in the image: objects of interest and the background. Under this assumption, we would expect the data to fall into a bimodal distribution of intensities. If we found the best separation value, we could “mask” out the background data, then simply count the objects we’re left with.

The “dumbest” way we could find the threshold value would be to use a simple descriptive statistic, such as the mean or median. But there are other methods: the “Otsu” method is useful because it models the image as a bimodal distribution and finds the optimal separation value.

from skimage.filters import threshold_otsu
thresh_val = threshold_otsu(im_gray)
mask = np.where(im_gray > thresh_val, 1, 0)

if np.sum(mask==0) < np.sum(mask==1):
    mask = np.where(mask, 0, 1)plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
im_pixels = im_gray.flatten()
plt.hist(im_pixels,bins=50)
plt.vlines(thresh_val, 0, 100000, linestyle='--')
plt.ylim([0,50000])
plt.title('Grayscale Histogram')
plt.subplot(1,2,2)
mask_for_display = np.where(mask, mask, np.nan)
plt.imshow(im_gray, cmap='gray')
plt.imshow(mask_for_display, cmap='rainbow', alpha=0.5)
plt.axis('off')
plt.title('Image w/ Mask')
plt.show()

We are creating the thresholds and then display the graph as subplots.

Counting the number of Objects in the image

For this contest, we need to get a separate mask for each nucleus. One way we can do this is by looking for all objects in the mask that are connected, and assign each of them a number using ndimage.label. Then, we can loop through each label_id and add it to an iterable, such as a list. The implementation of which can be seen in the below given code.

labels, nlabels = ndimage.label(mask)

label_arrays = []
for label_num in range(1, nlabels+1):
    label_mask = np.where(labels == label_num, 1, 0)
    label_arrays.append(label_mask)

print('There are {} separate components / objects detected.'.format(nlabels))
from matplotlib.colors import ListedColormap
rand_cmap = ListedColormap(np.random.rand(256,3))
labels_for_display = np.where(labels > 0, labels, np.nan)
plt.imshow(im_gray, cmap='gray')
plt.imshow(labels_for_display, cmap=rand_cmap)
plt.axis('off')
plt.title('Labeled Cells ({} Nuclei)'.format(nlabels))
plt.show()