Feature extraction from Medical Images and an introduction to “xtract-features.”

Vatsal Saglani
Mar 14 · 7 min read


With the advent of neural networks and advancements in Deep Learning for Computer Vision, we have stopped thinking about the features that are being extracted through these black box models and how they affect the accuracy of the model. And with the recent advancements in Transfer Learning, we have also neglected the model building process. We are taking a pre-trained model and change the last layers by adding our layers based on the number of classifications required to be predicted for our data.

Eventually, this process of adding or subtracting the layers becomes a trial-and-error method to get good “accuracy” and probably we will get outstanding “accuracy” based on the number of changes we do to that model. But, only to get good accuracy(mind you which is important) we have forgotten the essence of extracting various types of features from the images — features related to the texture and shape of different regions in the image.

One may question that, why to go through this hassle of extracting the features manually and then training Machine Learning model? I have developed a python package which will help avoid the trouble of extracting features from each image and then storing those features sequentially into a data frame to later apply traditional Machine Learning models. The package only works for 2-Channel gray scale images.

Installing the Package

Installing OpenCV dependencies

sudo apt-get update# Opencv-Deps
sudo apt-get install build-essential checkinstall cmake pkg-config yasm
sudo apt-get install git gfortran
sudo apt-get install libjpeg8-dev libjasper-dev libpng12-dev
sudo apt-get install libtiff5-dev
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libdc1394–22-dev
sudo apt-get install libxine2-dev libv4l-dev
sudo apt-get install libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev
sudo apt-get install qt5-default libgtk2.0-dev libtbb-dev
sudo apt-get install libatlas-base-dev
sudo apt-get install libfaac-dev libmp3lame-dev libtheora-dev
sudo apt-get install libvorbis-dev libxvidcore-dev
sudo apt-get install libopencore-amrnb-dev libopencore-amrwb-dev
sudo apt-get install x264 v4l-utils
sudo apt-get install libprotobuf-dev protobuf-compiler
sudo apt-get install libgoogle-glog-dev libgflags-dev
sudo apt-get install libgphoto2-dev libeigen3-dev libhdf5-dev doxygen
sudo apt-get install python3-dev python3-pip

A. Install using PIP

# package installation 
pip3 install git+https://github.com/vatsalsaglani/xrayimage_extractfeatures.git

B. Clone and Install

git clone https://github.com/vatsalsaglani/xrayimage_extractfeatures.git
cd xrayimage_extractfeatures
python3 setup.py install

Introduction to the Package

The package includes the following contents,

  1. GLCM Features
  2. Moments
  3. Region Properties
  4. Helpers
  5. Feature Extraction
  6. Extras

1. GLCM Features

  • Correlation
  • Homogeneity
  • Energy
  • Contrast

But, what is GLCM?

Statistically, GLCM is a method of examining texture that considers the spatial relationship of pixels in the gray-level co-occurrence matrix or GLCM. The texture is characterized by the GLCM based on how often pairs of pixel with specific values and in a specified spatial relationship occur in an image.

Extracting GLCM Features:

from xtract_features.glcms import *
feats = glcm(img)
# energy
energy = feats.energy()
# correlation
corr = feats.correlation()
# contrast
cont = feats.contrast()
# homogeneity
homogeneity = feats.homogeneity()
# all the features at once
_all = feats.glcm_all()

2. Moments

  • 24 variant image moment values
  • Hu Moments

How moments as a concept are used for images?

Image moments are used to describe objects after segmentation and play an essential role in object recognition and shape analysis. Images moments may be employed for pattern recognition in images. Simple image properties derived via raw moments is area or sum of grey levels.

Extracting Moments from an Image

from xtract_features.moments import *_moments = moment(img).get_moments()_hu_moments = moment(img).get_HuMoments()

_moments is a list of 24 variant moments and _hu_moments is list of the 7 hu moments which are invariant.

3. Region Properties

Extracting Region Properties

from xtract_features.region_props import *_rp = region_props(img)# maximum area region
max_area = _rp.max_area()
# plot regions
# plot black and white
# plot with lables
# mean of areas of all the regions
# eccentricity of the highest area region

Apart from the above given function the package contains 20 more functions to extract various other features from an image.

# standard deviation of all the areas of the regions of the given image
# otsu’s Threshold

4. Helpers

This module provides some basic functionalities like extracting the numpy-image-arrays given the path of the folder containing DICOM files, saving the list containing every numpy-image-array obtained from the path to a pickle file and can also load any previously saved pickle file.

a. Extract list of numpy-image-arrays: extract_img_array()

from xtract_features.helpers import extract_img_array# get list of numpy image arrays and a list of filename/idsnumpy_list, ids = extract_img_array(‘path-to-image-folder’, getID = True)# only get a list of numpy image arraysnumpy_list, ids = extract_img_array(‘path-to-image-folder’)
# here the ids list will be an empty list

b. Save Pickle: save_pickle()

Given the extracted list of numpy-image-arrays or list-of-image-ids/names we can save it using the save_pickle() function.

from xtract_features.helpers import save_picklesave_pickle(numpy_list, “numpy-list”) 
save_pickle(ids, “ids-list”)

c. Load Pickle: load_pickle()

Given any saved .pkl file we can load it using the load_pickle() function.

from xtract_features.helpers import load_picklenp_list = load_pickle(“numpy-list”)
ids = load_pickle(“ids-list”)

d. Show Image: show()

Display the image stored in the form of numpy-image-array using the show() function.

from xtract_features.helpers import show# show with title
show(np_list[1], title = ids[1])
# show without title

e. Plots: plots()

Using the plots() function display a list of images stored in the form of numpy-image-array.

from xtract_features.helpers import plots#plots with titles
plots(np_list[:8], titles = ids[:8]
# plots without titles

5. Feature Extraction

By now you may be longing for the fulfillment of the commitment made at the start, extracting a bunch of features from every image inside the folder and saving those into a data frame. Well, everyone, the time has finally come to unveil that functionality that adds the most value to this package.

But before jumping into that let’s see some other functionalities of this module.

a. Entropy

from xtract_features.extract import s_entropy, entropy_simple# shannon's entropy
s_entr = s_entropy(img)
# simple entropy
entr_simp = entropy_simple(img)

b. Feature Dictionary from Image Path: feature_dict_from_imgpath()

  • getId = True: The keys of the gererated feature dictionary are the image-name/image-id extracted while generating the numpy-image-array list.
  • getId = False (default): The keys of the generated feature dictionary is an integer which corresponds to list of features.
from xtract_features.extract import feature_dict_from_imgpath# getId = True
data_d = feature_dict_from_imgpath(‘path-of-image-folder’, ids, getId = True)
# getId = False
data_d = feature_dict_from_imgarray(numpy_list, [])

c. Feature Dictionary from Image Array List: feature_dict_from_imgarray()

  • getId = True: The keys of the gererated feature dictionary are the image-name/image-id extracted while generating the numpy-image-array list.
  • getId = False (default): The keys of the generated feature dictionary is an integer which corresponds to list of features.
from xtract_features.extract import feature_dict_from_imgarray#getId = True
data_d = feature_dict_from_imgarray(numpy_list, ids, getId = True)
# getId = False
data_d = feature_dict_from_imgarray(numpy_list, [])

d. Get Data frame from Image Path: get_df_from_path()

  • getId = True: if you need the features .csv file to have corresponding image-names/ids for their feature values
  • getId = False (default): the output file will be numbered and will not contain any image-name/id corresponding to the feature value
from xtract_features.extract import get_df_from_path# getId = True
df = get_df_from_path(‘path-to-image-folder’, ids, getId = True)
# getId = False (default)
df = get_df_from_path('path-to-image-folder', [])

e. Get Data frame from Image Array list: get_df_from_img_array()

  • getId = True: if you need the features .csv file to have corresponding image-names/ids for their feature values
  • getId = False (default): the output file will be numbered and will not contain any image-name/id corresponding to the feature value
from xtract_features.extract import get_df_from_img_array# getId = True
df = get_df_from_img_array(‘path-to-image-folder’, ids, getId = True)
# getId = False (default)
df = get_df_from_img_array('path-to-image-folder', [])

6. Extras

  • 2D Convolutions
  • Segmentation
  1. 2D Convolutions: conv2d()

For edge detection, sharpening and blurring the image we use 2D convolutions.

from xtract_features.twodconv import conv2dconv2d(image, "kernel-name")

There are 14 convolution kernels/matrices available inside the package as follows,

  • identity
  • edge-all
  • edge-H
  • edge-V
  • sharp
  • gauss-3
  • gauss-5
  • boxblur
  • unsharp
  • gradient-H
  • gradient-V
  • sobel-H
  • sobel-V
  • emboss

2. Segmentation: water_seg()

For now the package includes only one segmentation technique, Watershed Segmentation.

from xtract_features.segmentations import water_segwater_seg(image)

If this article helped you in any which way possible and you liked it, please appreciate it by sharing it in among your community. It’s my first article on feature extraction from images and image processing, and I have provided everything that I know about it in this article.

Every piece of feature extraction code is available on the internet, but it is scattered, and the sole purpose of this package “xtract_features” is to merge those in a single place so to reduce the hassle of writing each block of code to extract a single feature for everyone. Please feel free to fork this package and add some interesting things to it, even more, point out some changes to make the code more efficient.

*An example notebook is available here, go thorough it if you find any difficulties.

To know more about me please click here and if you find something interesting just shoot me a mail and we could have a chat over a cup of ☕️.

Vatsal Saglani

Written by

A Full Stack Web developer who also is a Machine Learning and AI enthusiast and a neophyte in NLP and native applications development.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade