HumanPartSegmentation : A Machine Learning Model for Segmenting Human Parts

David Cochard
axinc-ai
Published in
4 min readMay 27, 2021

This is an introduction to「HumanPartSegmentation」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

Self-Correction for Human Parsing is a machine learning model released by BaiduResearch in October of 2019 that can perform segmentation for different parts of a person.

The following parts are supported.

CATEGORY = (
‘Background’, ‘Hat’, ‘Hair’, ‘Glove’, ‘Sunglasses’, ‘Upper-clothes’, ‘Dress’, ‘Coat’,
‘Socks’, ‘Pants’, ‘Jumpsuits’, ‘Scarf’, ‘Skirt’, ‘Face’, ‘Left-arm’, ‘Right-arm’,
‘Left-leg’, ‘Right-leg’, ‘Left-shoe’, ‘Right-shoe’
)

Below is a result on an input image.

Source:https://github.com/PeikeLi/Self-Correction-Human-Parsing/blob/master/demo/demo.jpg
Inference result

Architecture

HumanPartSegmentation has been trained from 50,000 images of the LIP dataset, but this dataset presents some challenges. In normal segmentation, all the pixels belonging to one instance share the same semantic label, but in human part segmentation, ambiguous boundaries between different semantic parts makes the cost of annotating higher, and often result in noise and mislabeling in the Ground Truth (GT) data.

Example of noise in the GT data (Source:https://arxiv.org/pdf/1910.09777.pdf

In Self-Correction for Human Parsing (SCHP), it is assumed that the dataset contains noise, and a specific loss function is applied to edges to generate class-agnostic boundaries, combined with a self-correction method used to refine GT label data to achieve more accurate segmentation.

The network architecture uses resnet101 as the backbone and is known as Context Embedding with Edge Perceiving (CE2P). CE2P was first introduced in Devil in the Details: Towards Accurate Single and Multiple Human Parsing published in September 2018, which uses a method to improve accuracy by applying a specific loss function on edges between parts of the segmentation. Traditionally, learning is based on the assumption that the GT data is correct, but CE2P assumes that the GT segmentation contains noise, and deals data accordingly.

Source: https://arxiv.org/pdf/1910.09777.pdf

One of the characteristic SCHP is the use of a self-correcting learning cycle to modify the labels of the ground truth data as they learn. As shown in Distilling the Knowledge in a Neural Network published in March 2015, multiclass labels are known to contain dark knowledge. By using pseudo-masks, you can generate soft-target labels that contain dark knowledge, as opposed to one-hot labels that contain only the correct answer labels.

SCHP generates less noisy teacher labels, from the perspective of distillation, and a more accurate model by repeatedly training on GT labels, then re-labeling with the trained model, and training again using those new labels.

The generated model was awarded at the CVPR 2019 LIP Challenge.

Usage

You can use the following command to run HumanPartSegmentation on the webcam video stream in ailia SDK.

$ python3 human_part_segmentation.py -v 0

Here are some results.

Related topic

ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.

ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.

--

--