Predict material properties from an image using a neural net in 50 lines of code with Pytorch

Sagi eppel
8 min readDec 11, 2021

--

The goal of this tutorial is how to predict a continuous property like color, fill level from an image, in the shortest way possible using Pytorch neural net for image classification. We will learn to load the existing net, modify it to predict specific property, and train it in less than 50 lines of code (not including spaces).

Standart neural nets are usually focused on classification problems like identifying cats and dogs. However, these networks can easily be modified to predict continuous properties from images, like age, size, or price. Our goal here will be to predict the color of a liquid inside a transparent container. We will receive the image and predict the color as 3 numbers corresponding to the RGB value of the liquid color. Predicting the color of material might seem trivial, but note that the color that appears in an image depends on the illumination and scene. Inferring the original color of an object from an image is a rather hard task for computers.

Example images:

Example images, the goal is to predict the color of the liquid in the vessel from an image, like 3 numbers corresponding to the RGB color of the liquid.

For training the net, we will use the TransProteus dataset; this is free copyrights dataset. A sample of this dataset for fast training can be downloaded from: https://icedrive.net/0/cfijRDl62i

Next, you will also need to install Pytorch and OpenCV for image reading.

OpenCV can be installed using:

pip install opencv-python

First, let’s import packages and define the main training parameters:

import osimport numpy as np
import cv2
import torchvision.models.segmentationimport torch
import torchvision.transforms as tf
import json
Learning_Rate=1e-4
width=height=900 # image width and height
batchSize=4

Learning_Rate: is the step size of the gradient descent during the training.

Width and height are the dimensions of the image used for training. All images during the training processes will be resized to this size.

batchSize: is the number of images that will be used for each iteration of the training.

batchSize*width*high will be proportional to the memory requirement of the training. Depending on your hardware, it might be necessary to use a smaller batchSize to avoid out-of-memory problems.

Note that since we train with only a single image size, the net once trained is likely to be limited to work with only images of this size. In most cases, what you want to do is change the size between each training batch.

Next, we create a list of all images in the dataset:

TrainFolder=r"Transproteus/FlatSurfaceLiquids/"
ListImages=os.listdir(os.path.join(TrainFolder)) # Create list of

Were TrainFolder and is the folder where training data is stored.

Next, we create a function that will allow us to load a random image and the color corresponding to the liquid in this image:

def ReadRandomImage(): 
idx=np.random.randint(0,len(ListImages)) # Select random image
Img=cv2.imread(os.path.join(TrainFolder,ListImages[idx], "VesselWithContent_Frame_0_RGB.jpg"))
with open(os.path.join(TrainFolder,ListImages[idx], "ContentMaterial. json")) as f: # load liquid data
MaterialPropeties = json.load(f)
color=MaterialPropeties['Base Color'][:3] # read color

# create tranasformation and transform image to pytorch
transformImg = tf.Compose([tf.ToPILImage(), tf.Resize((height, width)), tf.ToTensor(),tf.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))])

Img=transformImg(Img)
return Img,torch.tensor(color)

In the first part, we pick a random index from the list of images and load the image corresponding to this index:

idx=np.random.randint(0,len(ListImages)) # Select random imageImg=cv2.imread(os.path.join(TrainFolder,ListImages[idx],           "VesselWithContent_Frame_0_RGB.jpg"))

Next, we want to load the color of the liquid in the image from the JSON file.

with open(os.path.join(TrainFolder, ListImages[idx],"ContentMaterial.json")) as f: # load data from file
MaterialPropeties = json.load(f)
color=MaterialPropeties['Base Color'][:3] # read color

Next, we define a set of transformations that will be performed on the image using the TorchVision transform module:

transformImg=tf.Compose([tf.ToPILImage(),tf.Resize((height,width)), tf.ToTensor(),tf.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))])transformAnn=tf.Compose([tf.ToPILImage(),tf.Resize((height,width)), tf.ToTensor()])

This defines a set of transformations that will be applied to the image. This includes converting to PIL format, which is the standard format for the transform, as well as resizing and converting to PyTorch format. We also normalize the intensity of the pixels in the image by subtracting the Mean and dividing by the standard deviation of pixels intensity. The mean and deviation were calculated beforehand large set of images.

Once we define the transformations we apply them to the image:

Img=transformImg(Img)

For training, we need to use a batch of images. This means several images stacked on top of each other in a 4D matrix. We create the batch using the function:

def LoadBatch(): # Load batch of images
images = torch.zeros([batchSize,3,height,width])
color = torch.zeros([batchSize,3])
for i in range(batchSize):
images[i],color[i]=ReadRandomImage()
return images,color

The first line creates an empty 4d matrix that will store the images with dimensions: [batchSize, channels, height, width], where channels are the number of layers for the image; this is 3 for RGB image. Batch size is the number of images that will be used in every training step:

images = torch.zeros([batchSize,3,height,width])

The next line creates an array where the color for each image will be stored. Note that it has a size of [batchsize, 3], which means we store 3 values per image (R,G,B):

color = torch.zeros([batchSize,3])

The next part loads a set of images and the color in these images to the empty matrix, using the ReadRandomImage() function we defined earlier:

for i in range(batchSize):
images[i],color[i]=ReadRandomImage()

Now that we can load our data, its time to load the neural net:

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')Net = torchvision.models.resnet50(pretrained=True) # Load netNet.fc = torch.nn.Linear(in_features=2048,out_features=3, bias=True)Net.to(device)optimizer = torch.optim.Adam(params=Net.parameters(),lr=Learning_Rate)

The first part is identifying whether the computer has GPU or CPU. If there is Cuda GPU the training will be done on the GPU:

device = torch.device(‘cuda’) if torch.cuda.is_available() else torch.device(‘cpu’)

For any practical dataset, training using a CPU is extremely slow.

Next, we load the net for image classification:

Net = torchvision.models.resnet50(pretrained=True)

torchvision.models. Containing many useful models for image classification. Resenet50 is a strong model that is good for most cases(note that the number refers to the number of layers in the net).

By setting pretrained=True we load the net with weight pretrained on the Imagenet dataset. It is always better to start from the Pretrained model when learning a new problem since it allows the net to build on previous experience and converge faster.

We can see all the layers of the net we just loaded by writing:

print(Net)

(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=1000, bias=True)

This prints the net of the layer in the order they are used. The final layer of the network is a linear transformation with 2048 inputs and 1000 outputs. The 1000 represent the number of output classes (This net was trained on image net, which classify the image into one of 1000 classes). Since we only want to predict 3 values, we want to replace it with a new linear layer with 3 outputs:

Net.fc = torch.nn.Linear(in_features=2048,out_features=3,bias=True)

To be fair, this part is optional since a net with 1000 output channels can predict 3 values simply by ignoring the reminder 997 channels. But it’s more elegant this way.

Next, we load the net into our GPU or CPU device:

Net=Net.to(device)

Finally, we load an optimizer:

optimizer=torch.optim.Adam(params=Net.parameters(),lr=Learning_Rate) # Create adam optimizer

The optimizer will control the gradient rates during the backpropagation step of the training. Adam optimizer is one of the fastest optimizers available.

Finally, we start the training loop:

AverageLoss=np.zeros([50])
for itr in range(20001): # Training loop
images,prop=LoadBatch() # Load taining batch

images=torch.autograd.Variable(images,requires_grad=False)
.to(device)
color =torch.autograd.Variable(prop,requires_grad=False).
to(device)

Pred = Net(images) # make prediction
Net.zero_grad()

First, we want to save our average loss during training; we create an array that will store the loss of the last 50 steps:

AverageLoss=np.zeros([50])

This will be used to track the average loss during training.

We are going to train for 20000 steps (but we can always stop in the middle):

for itr in range(20000):

LoadBatch function was defined earlier and loads a batch of images and their colors.

torch.autograd.Variable: convert the data into gradient variables that can be used by the net. We set Requires_grad=False since we don’t want to apply the gradient to the image, only to the layers of the net. The to(device) copy the tensor to the same device (GPU/CPU) as the net.

Finally, we input the image to the net and get the prediction:

Pred = Net(images) # make prediction

Once we made a prediction, we can compare the predicted color to the real (ground truth) color and calculate the loss. The loss will simply be the absolute difference (L1) between the predicted and real value.

Loss=torch.abs(Pred - color).mean()

Note that we apply the loss not to one image but to several images in the batch, so we need to take the mean to have the loss as a single number.

Once we calculate the loss, we can apply the backpropagation and change the net weights.

Loss.backward() # Backpropogate loss
Optimizer.step() # Apply gradient descent change to wei

During training, we want to see if our average loss decreases, to see if the net actually learns anything. We will therefore store the last 50 loss values in an array and display the average in every step:

AverageLoss[itr%50]=Loss.data.cpu().numpy() # Save loss average
print(itr,") Loss=",Loss.data.cpu().numpy(),
'AverageLoss',AverageLoss.mean())

This covers the full training stage, but we also need to save the trained model. Otherwise, it will be lost once the program stop.
Saving is time-consuming, so we want to do it only about once every 200 steps:

if itr % 200 == 0: 
print(“Saving Model” +str(itr) + “.torch”)
torch.save(Net.state_dict(), str(itr) + “.torch”)

After running this script about 2000 steps, the net should give good results.

All together 50 lines of code not including spaces.

Full code can be found here:

Now if you actually want to use the neural net after you trained it you can use the infer script:

https://github.com/sagieppel/Predict-material-properties-from-image-using-neural-net-in-50-lines-of-code-with-pytorch/blob/main/infer.py

This script loads the net that you trained and saved earlier and uses it to make a prediction.
Most of the code here is the same as the training script, with only a few differences:

Net.load_state_dict(torch.load(modelPath)) # Load trained model

Load the net we trained and saved earlier from the file in modelPath

Net.eval()

Convert the net from training mode to evaluation mode. This mainly means no batch normalization statistics will be calculated.

with torch.no_grad():

This means the net is run without collecting gradients. Gradients are only relevant for training, and collecting them is resource-intensive.

--

--

Sagi eppel

I am a researcher at the University of Toronto, focusing on applying computer vision for controlling autonomous chemistry lab.