Deep Learning with PointCloud, Image and different kind of sensor fusion

Chris Thaliyath
6 min readSep 6, 2023

--

How to bring Lidar and Camera on the same coordinate frame and overlay the image on the pointcloud ?

  • Camera Object detection
  • Lidar object detection
  • Project the 3d objects and 2d object to same coordinate space

DIfferent KINDS OF FUSION

1)EARLY FUSION

order

  • Fusion
  • detection
  • output

2)LATE FUSION

order

  • detection
  • Fusion
  • output
  • Camera object detection
  • Lidar object detection
  • Project the 3d Obstacle in the image
  • Fuse the 3d Box(Lidar) with the 2d box (Camera)
  • Build a fused object

3)DEEP FUSION

https://www.thinkautonomous.ai/blog/aurora-deep-learning-sensor-fusion-motion-prediction/

4)MID FUSION

order

  • Learning
  • Fusion
  • detection
  • output

5)SEQEUNTIAL FUSION

order

  • detection(sensor A)
  • Kalman filter fusion
  • Detection (sensor B)
  • kalman filter fusion
  • fusing the results one after the other and using an algorithm like Kalman filters for it .
  • Particle filter

Camera Object detection

  • Camera frame is Pixel

LIDAR Object detection

  • point RCNN
  • Lidar frame is 3D
  • Extract the corners

Use the Intersection of the Union of camer and Lidar object detection to find the object .

Bipartiate Matching

  • Hungarian algorithm , for associattion of Lidar and Camera objects
https://www.thinkautonomous.ai/blog/hungarian-algorithm/

Deep learning

MSE

Mean Square Error

Back Propagation

- Forward pass
- Loss function calculation
- backpropagation
- weight updation

Sigmoid function

SOFTMAX

For multiclass , classification

ENCODER in dl

CROSS — EnTROPY

To compare the result , we use LOSS FUNCTION such as CE

HYPER PARAMETERS

  • learning rate , determines how fast we go down
  • batch size , how many data we send before we do an update
    numer of samples send for updates
  • TRAIN_SPLIT ,
  • VAL_SPLIT ,
  • num_epochs , how many times we are goind to do the process
    how many times we need to see the entire dataset
  • loss_function ,
  • optimizer ,
  • device , training on GPU , but target is usually CPU

TENSORFLOW

  • more like C++ harder
  • Deployment on tensorflow is better
  • Keras , is integrated

PYTORCH

more research friendly , better choise for starting .ptrblck

  • Syntax
  • Graph
  • Debugging
  • Deployment is hard
## import libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
print(torch.__version__)
import matplotlib.pyplot as plt
import numpy as np
BATCH_SIZE = 3 # PICK A NUMBER
#transform = 4# CREATE TRANSFORMS TO A TENSOR
transform = transforms.ToTensor()

## download and load train and test dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

## Dataloaders
#trainloader = trainset#CALL A DATALOADER
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
#testloader = testset#CALL A DATALOADER
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=True, num_workers=2)

def imshow(img):
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))

## get some random training images
dataiter = iter(trainloader)
print(dataiter)
images, labels = next(dataiter)

## show images
imshow(torchvision.utils.make_grid(images))

print(labels)

imshow(images[0])
print(labels[0].item())

for images, labels in trainloader:
print("Image batch dimensions:", images.shape)
print("Image label dimensions:", labels.shape)
break

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)

def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
model = Net()

print(model)

learning_rate = .1#PICK A NUMBER
num_epochs =4 #PICK A NUMBER

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)
criterion = nn.CrossEntropyLoss()#PICK A LOSS CONFORM TO CLASSIFICATION
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)#PICK AN OPTIMIZER

for epoch in range(num_epochs):
train_running_loss = 0.0
# SET MODEL TO TRAIN
model = model.train()

for i, (images, labels) in enumerate(trainloader):
optimizer.zero_grad()
images = images.to(device)
labels = labels.to(device)

logits = model(images)#FORWARD PROP
loss = criterion(logits, labels)#LOSS FUNCTION
# BACKPROP

## update model params
optimizer.step()

train_running_loss += loss.detach().item()

model.eval()
print('Epoch: %d | Loss: %.4f '\
%(epoch, train_running_loss / i))



import random

# Get a single batch (images and labels) from the iterator
dataiter = iter(testloader)
images, labels = next(dataiter)
print(len(dataiter))
print(len(images))

model.eval()

# Select a random value
random_idx = random.randint(0, len(images) - 1)

# Get the random image and corresponding label
random_image = images[random_idx]
random_label = labels[random_idx]

# Prepare the image for prediction
input_image = random_image.unsqueeze(0)
input_image = input_image.to(device)

# Run prediction
with torch.no_grad():
output = model(input_image)#FORWARD PASS
predicted = #GET THE PREDICTED VALUE

# Show results
print(f"Predicted label: {predicted.item()}")
print(f"Ground truth label: {random_label.item()}")

# Show image
plt.imshow(random_image.squeeze().cpu().numpy(), cmap="gray")
plt.title(f"Predicted: {predicted.item()}, Ground Truth: {random_label.item()}")
plt.show()

Convolutional Neural Network

  • padding & stride
  • max-pooling
  • average -pooling
Introduction To Pooling Layers In CNN – Towards AI
  • layers

MAIN COMPONENTS

  • RESIDUAL BLOCK
  • TRANSFORMERS
  • BIFPN’s
  • DENSE BLOCK
  • PYRAMID POOLING
  • PA-NET
  • BACKBONES
  • SPATIAL TRANSFORMERS
  • CONVOLUTIONAL BLOCKS

RES-NET VS DENSE-NET

BILENEAR INTERPOLATION

  • upsampling basically !

All the above were refresher and basics of Neural Networks

FINALLY 3D Deep learning

This applies for the following set of data

  • meshes
  • pointclouds
  • volumetric data

2 Techniques

  1. Process the points directly
    - PointNet
    - PointNet++
  2. T-net
  3. MLP
  4. Out

3 Problems 3D data has

  1. N! permutations
    the order wont impact the result
  2. We have rotation's and translations
    Rotation the object wont impact the result , a tree stays a tree
  3. We have local structures and geometry

VOXEL GRID

Voxel Feature Encoding

- maxpooling
- activations
- loss functions
-empty voxels

99% of the space contains empty voxels

3D CNN

SPARSE CONvolutionals

spconv.SparseSequential(
norm_fn(nPlanes[0]),
nn.ReLU(),
spconv.SparseConv3d(nPlanes[0], nPlanes[1], kernel_size=2, stride=2, bias=False, indice_key='bb_spconv{}'.format(1))
)

CODING 3D classifiers

  • OpenMM Lab
  • Open3d
  • Pytorch3D

https://www.kaggle.com/code/jeremy26/voxels-3d-cnns-starter

--

--

Chris Thaliyath

I am formidable in Christ Jesus .For the word of G_d is living and active. Sharper than any double-edged sword, it pierces even to dividing soul and spirit…