Yearn the Learn: Image Processing with Machine Learning and Python
From my previous post, we were able to segment images based on multiple approaches such as blob detection and connected components. Have you ever thought that once these objects were identified, we can actually generate features based on how the labels were generated?
Let’s take a look at the red blood cells from my blob detection article once again, and copy all the necessary codes and outputs from there. What we will reference back to are the following:
- Source Image
- Data Cleaning Steps
- Image Segmentation
Once these steps are done, we can then move on with the machine learning part
Source Image and Data Cleaning
#library imports
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from skimage.io import imread, imshow
from skimage.color import rgb2gray
from skimage.measure import label, regionprops, regionprops_table
from skimage.morphology import (erosion, dilation, closing, opening,
area_closing, area_opening)
from glob import glob#Source Image
rbc = imread("medium/rbc.jpg")#Image preprocessing and data cleaning
im2 = rgb2gray(rbc)
im2_bw = im2<0.85img_morph = area_closing(area_opening(im2_bw, 200), 200)
imshow(img_morph)
Now that preprocessing has been complete, we can start the segmentation of the blood cells
#initialize rbc for labeling and segmentation
rbc_files = glob('medium/rbc.jpg')#initialize properties for features
properties = ['area', 'centroid', 'convex_area',
'bbox', 'bbox_area', 'eccentricity', 'equivalent_diameter',
'extent', 'filled_area',
'major_axis_length', 'minor_axis_length',
'perimeter', 'orientation', 'solidity']clean_rbc = [] # List of two tuples (img, regionprops)
fig = plt.figure(figsize=(20, 20))
file_count = len(rbc_files)
thres = 0.4
no_col = 6
no_row = int(np.ceil(file_count * 2 / no_col))gs = gridspec.GridSpec(no_row, no_col)for i, file in enumerate(rbc_files):#Show images
#Subplot was made to be able to adjust to same image prefixes with different shots (ex. rbc_1, rbc_2, etc.)img = imread(file)
fn = file.split()[-1].split('.')[0]
ax0 = fig.add_subplot(gs[i * 2])
ax1 = fig.add_subplot(gs[i * 2 + 1])
ax0.axis('off')
ax1.axis('off')# Display Threshold Image
ax0.imshow(img)
ax0.set_title(fn)
# Get region properties of image
img_label = label(img_morph)
df_regions = pd.DataFrame(regionprops_table(img_label,
properties=properties))
# Filter regions using area
area_thres = df_regions.convex_area.mean()/3
df_regions = df_regions.loc[df_regions.convex_area > area_thres]
mask_equal_height = ((df_regions['bbox-2'] - df_regions['bbox-0'])
!= img_label.shape[0])
mask_equal_width = ((df_regions['bbox-3'] - df_regions['bbox-1'])
!= img_label.shape[1])
df_regions = df_regions.loc[mask_equal_height & mask_equal_width]# Compute for Derived features
y0, x0 = df_regions['centroid-0'], df_regions['centroid-1']
orientation = df_regions.orientation
x1 = (x0 + np.cos(df_regions.orientation) * 0.5
* df_regions.minor_axis_length)
y1 = (y0 - np.sin(df_regions.orientation) * 0.5
* df_regions.minor_axis_length)
x2 = (x0 - np.sin(df_regions.orientation) * 0.5
* df_regions.major_axis_length)
y2 = (y0 - np.cos(df_regions.orientation) * 0.5
* df_regions.major_axis_length)df_regions['major_length'] = np.sqrt((x2 - x0)**2 + (y2 - y0)**2)
df_regions['minor_length'] = np.sqrt((x1 - x0)**2 + (y1 - y0)**2)
df_regions['circularity'] = (4 * np.pi * df_regions.filled_area
/ (df_regions.perimeter ** 2))
# Display segmented image
ax1.imshow(img_label)
ax1.set_title(f'{fn} segmented: {df_regions.shape[0]}')
df_regions['target']='target'
clean_rbc.append(df_regions)
We can then transform the identified regions as a Pandas DataFrame
#You can concatenate different types of "clean_rbc" lists as a dataframe using these codesdf_rbc = pd.concat([*clean_rbc])#Retain necessary features
req_feat = ['area', 'convex_area', 'bbox_area', 'eccentricity',
'equivalent_diameter', 'extent', 'filled_area',
'perimeter',
'solidity', 'major_length', 'minor_length',
'circularity', 'target']
df_rbc = df_rbc[req_feat]
For the Machine Learning, I’ll be using Pycaret for a simple approach on the Machine Learning process. Let’s make a regression example with circularity
as the target variable with a 75–25 split
#where the magic happens
from pycaret.regression import *df_pycaret = setup(data = df_rbc, target = 'circularity',
numeric_features=['area', 'convex_area', 'bbox_area', 'eccentricity',
'equivalent_diameter', 'extent', 'filled_area','perimeter',
'solidity', 'major_length', 'minor_length'], train_size = 0.75)compare_models()
With the initial results of Pycaret, the Extra Trees Regressor is the best performing model for this dataset. Let’s try tuning it using a couple more lines of code
et = create_model('et')
tuned_et = tune_model(et,choose_better=True)
From the generated models, Pycaret suggests to retain the original generated Extra Trees Regressor model as it provides a higher R-squared value (if we are to use that as the main metric). On the other hand, if R-squared is not necessarily the metric that you have chosen, then the tuned Extra Trees is the way to go.
And there you have it! Machine Learning combined with Image Processing
Interested in my work? You can see more stories over at my profile