Yearn the Learn: Image Processing with Machine Learning and Python

Nico Aguila
The Startup
Published in
4 min readJan 29, 2021
We can actually generate features from the identified objects on our images and use them as training data for our ML models. How cool is that?

From my previous post, we were able to segment images based on multiple approaches such as blob detection and connected components. Have you ever thought that once these objects were identified, we can actually generate features based on how the labels were generated?

Let’s take a look at the red blood cells from my blob detection article once again, and copy all the necessary codes and outputs from there. What we will reference back to are the following:

  1. Source Image
  2. Data Cleaning Steps
  3. Image Segmentation

Once these steps are done, we can then move on with the machine learning part

Source Image and Data Cleaning

#library imports
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from skimage.io import imread, imshow
from skimage.color import rgb2gray
from skimage.measure import label, regionprops, regionprops_table
from skimage.morphology import (erosion, dilation, closing, opening,
area_closing, area_opening)
from glob import glob
#Source Image
rbc = imread("medium/rbc.jpg")
#Image preprocessing and data cleaning
im2 = rgb2gray(rbc)
im2_bw = im2<0.85
img_morph = area_closing(area_opening(im2_bw, 200), 200)
imshow(img_morph)
Initial source image (left) and processed image (right)

Now that preprocessing has been complete, we can start the segmentation of the blood cells

#initialize rbc for labeling and segmentation
rbc_files = glob('medium/rbc.jpg')
#initialize properties for features
properties = ['area', 'centroid', 'convex_area',
'bbox', 'bbox_area', 'eccentricity', 'equivalent_diameter',
'extent', 'filled_area',
'major_axis_length', 'minor_axis_length',
'perimeter', 'orientation', 'solidity']
clean_rbc = [] # List of two tuples (img, regionprops)
fig = plt.figure(figsize=(20, 20))
file_count = len(rbc_files)
thres = 0.4
no_col = 6
no_row = int(np.ceil(file_count * 2 / no_col))
gs = gridspec.GridSpec(no_row, no_col)for i, file in enumerate(rbc_files):#Show images
#Subplot was made to be able to adjust to same image prefixes with different shots (ex. rbc_1, rbc_2, etc.)
img = imread(file)
fn = file.split()[-1].split('.')[0]
ax0 = fig.add_subplot(gs[i * 2])
ax1 = fig.add_subplot(gs[i * 2 + 1])
ax0.axis('off')
ax1.axis('off')
# Display Threshold Image
ax0.imshow(img)
ax0.set_title(fn)

# Get region properties of image
img_label = label(img_morph)
df_regions = pd.DataFrame(regionprops_table(img_label,
properties=properties))

# Filter regions using area
area_thres = df_regions.convex_area.mean()/3
df_regions = df_regions.loc[df_regions.convex_area > area_thres]

mask_equal_height = ((df_regions['bbox-2'] - df_regions['bbox-0'])
!= img_label.shape[0])
mask_equal_width = ((df_regions['bbox-3'] - df_regions['bbox-1'])
!= img_label.shape[1])
df_regions = df_regions.loc[mask_equal_height & mask_equal_width]
# Compute for Derived features
y0, x0 = df_regions['centroid-0'], df_regions['centroid-1']
orientation = df_regions.orientation
x1 = (x0 + np.cos(df_regions.orientation) * 0.5
* df_regions.minor_axis_length)
y1 = (y0 - np.sin(df_regions.orientation) * 0.5
* df_regions.minor_axis_length)
x2 = (x0 - np.sin(df_regions.orientation) * 0.5
* df_regions.major_axis_length)
y2 = (y0 - np.cos(df_regions.orientation) * 0.5
* df_regions.major_axis_length)
df_regions['major_length'] = np.sqrt((x2 - x0)**2 + (y2 - y0)**2)
df_regions['minor_length'] = np.sqrt((x1 - x0)**2 + (y1 - y0)**2)
df_regions['circularity'] = (4 * np.pi * df_regions.filled_area
/ (df_regions.perimeter ** 2))

# Display segmented image
ax1.imshow(img_label)
ax1.set_title(f'{fn} segmented: {df_regions.shape[0]}')

df_regions['target']='target'
clean_rbc.append(df_regions)
252 red blood cells identified

We can then transform the identified regions as a Pandas DataFrame

#You can concatenate different types of "clean_rbc" lists as a dataframe using these codesdf_rbc = pd.concat([*clean_rbc])#Retain necessary features
req_feat = ['area', 'convex_area', 'bbox_area', 'eccentricity',
'equivalent_diameter', 'extent', 'filled_area',
'perimeter',
'solidity', 'major_length', 'minor_length',
'circularity', 'target']
df_rbc = df_rbc[req_feat]
Now we have a dataframe that we can use for Machine Learning

For the Machine Learning, I’ll be using Pycaret for a simple approach on the Machine Learning process. Let’s make a regression example with circularity as the target variable with a 75–25 split

#where the magic happens
from pycaret.regression import *
df_pycaret = setup(data = df_rbc, target = 'circularity',
numeric_features=['area', 'convex_area', 'bbox_area', 'eccentricity',
'equivalent_diameter', 'extent', 'filled_area','perimeter',
'solidity', 'major_length', 'minor_length'], train_size = 0.75)
compare_models()
Pycaret’s initial results

With the initial results of Pycaret, the Extra Trees Regressor is the best performing model for this dataset. Let’s try tuning it using a couple more lines of code

et = create_model('et')
tuned_et = tune_model(et,choose_better=True)
Pycaret’s original model (left) and tuned model (right)

From the generated models, Pycaret suggests to retain the original generated Extra Trees Regressor model as it provides a higher R-squared value (if we are to use that as the main metric). On the other hand, if R-squared is not necessarily the metric that you have chosen, then the tuned Extra Trees is the way to go.

And there you have it! Machine Learning combined with Image Processing

--

--