Deep Learning with PointCloud, Image and different kind of sensor fusion

Chris Thaliyath

6 min readSep 6, 2023

How to bring Lidar and Camera on the same coordinate frame and overlay the image on the pointcloud ?

Camera Object detection
Lidar object detection
Project the 3d objects and 2d object to same coordinate space

DIfferent KINDS OF FUSION

1)EARLY FUSION

order

Fusion
detection
output

2)LATE FUSION

order

detection
Fusion
output
Camera object detection
Lidar object detection
Project the 3d Obstacle in the image
Fuse the 3d Box(Lidar) with the 2d box (Camera)
Build a fused object

3)DEEP FUSION

https://www.thinkautonomous.ai/blog/aurora-deep-learning-sensor-fusion-motion-prediction/

4)MID FUSION

order

Learning
Fusion
detection
output

5)SEQEUNTIAL FUSION

order

detection(sensor A)
Kalman filter fusion
Detection (sensor B)
kalman filter fusion
fusing the results one after the other and using an algorithm like Kalman filters for it .
Particle filter

Camera Object detection

Camera frame is Pixel

LIDAR Object detection

point RCNN
Lidar frame is 3D
Extract the corners

Use the Intersection of the Union of camer and Lidar object detection to find the object .

Bipartiate Matching

Hungarian algorithm , for associattion of Lidar and Camera objects

https://www.thinkautonomous.ai/blog/hungarian-algorithm/

Deep learning

MSE

Mean Square Error

Back Propagation

- Forward pass
- Loss function calculation 
- backpropagation 
- weight updation

Sigmoid function

SOFTMAX

For multiclass , classification

ENCODER in dl

CROSS — EnTROPY

To compare the result , we use LOSS FUNCTION such as CE

HYPER PARAMETERS

learning rate , determines how fast we go down
batch size , how many data we send before we do an update
numer of samples send for updates
TRAIN_SPLIT ,
VAL_SPLIT ,
num_epochs , how many times we are goind to do the process
how many times we need to see the entire dataset
loss_function ,
optimizer ,
device , training on GPU , but target is usually CPU

TENSORFLOW

more like C++ harder
Deployment on tensorflow is better
Keras , is integrated

PYTORCH

more research friendly , better choise for starting .ptrblck

Syntax
Graph
Debugging
Deployment is hard

## import libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
print(torch.__version__)
import matplotlib.pyplot as plt
import numpy as np
BATCH_SIZE = 3 # PICK A NUMBER
#transform = 4# CREATE TRANSFORMS TO A TENSOR
transform = transforms.ToTensor()

## download and load train and test dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

## Dataloaders
#trainloader = trainset#CALL A DATALOADER
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)
#testloader = testset#CALL A DATALOADER
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                          shuffle=True, num_workers=2)

def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

## get some random training images
dataiter = iter(trainloader)
print(dataiter)
images, labels = next(dataiter)

## show images
imshow(torchvision.utils.make_grid(images))

print(labels)

imshow(images[0])
print(labels[0].item())

for images, labels in trainloader:
    print("Image batch dimensions:", images.shape)
    print("Image label dimensions:", labels.shape)
    break

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output
model = Net()

print(model)

learning_rate = .1#PICK A NUMBER
num_epochs =4 #PICK A NUMBER

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)
criterion = nn.CrossEntropyLoss()#PICK A LOSS CONFORM TO CLASSIFICATION
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)#PICK AN OPTIMIZER

for epoch in range(num_epochs):
    train_running_loss = 0.0
    # SET MODEL TO TRAIN
    model = model.train()

    for i, (images, labels) in enumerate(trainloader):
        optimizer.zero_grad()
        images = images.to(device)
        labels = labels.to(device)

        logits = model(images)#FORWARD PROP
        loss = criterion(logits, labels)#LOSS FUNCTION
        # BACKPROP

        ## update model params
        optimizer.step()

        train_running_loss += loss.detach().item()

    model.eval()
    print('Epoch: %d | Loss: %.4f '\
          %(epoch, train_running_loss / i))



import random

# Get a single batch (images and labels) from the iterator
dataiter = iter(testloader)
images, labels = next(dataiter)
print(len(dataiter))
print(len(images))

model.eval()

# Select a random value
random_idx = random.randint(0, len(images) - 1)

# Get the random image and corresponding label
random_image = images[random_idx]
random_label = labels[random_idx]

# Prepare the image for prediction
input_image = random_image.unsqueeze(0)
input_image = input_image.to(device)

# Run prediction
with torch.no_grad():
    output = model(input_image)#FORWARD PASS
    predicted = #GET THE PREDICTED VALUE

# Show results
print(f"Predicted label: {predicted.item()}")
print(f"Ground truth label: {random_label.item()}")

# Show image
plt.imshow(random_image.squeeze().cpu().numpy(), cmap="gray")
plt.title(f"Predicted: {predicted.item()}, Ground Truth: {random_label.item()}")
plt.show()

Convolutional Neural Network

padding & stride

max-pooling
average -pooling

Introduction To Pooling Layers In CNN – Towards AI

layers

An Interactive Node-Link Visualization of Convolutional Neural Networks

Adam W. Harley Convolutional neural networks are at the core of state-of-the-art approaches to a variety of computer…

adamharley.com

MAIN COMPONENTS

RESIDUAL BLOCK

TRANSFORMERS

BIFPN’s

DENSE BLOCK

PYRAMID POOLING

PA-NET

BACKBONES

SPATIAL TRANSFORMERS

CONVOLUTIONAL BLOCKS

RES-NET VS DENSE-NET

BILENEAR INTERPOLATION

upsampling basically !

All the above were refresher and basics of Neural Networks

FINALLY 3D Deep learning

This applies for the following set of data

meshes
pointclouds
volumetric data

2 Techniques

Process the points directly
- PointNet
- PointNet++
T-net
MLP
Out

3 Problems 3D data has

N! permutations
the order wont impact the result
We have rotation's and translations
Rotation the object wont impact the result , a tree stays a tree
We have local structures and geometry

VOXEL GRID

Voxel Feature Encoding

- maxpooling
- activations 
- loss functions 
-empty voxels 

99% of the space contains empty voxels

3D CNN

SPARSE CONvolutionals

spconv.SparseSequential(
            norm_fn(nPlanes[0]),
            nn.ReLU(),                                   
            spconv.SparseConv3d(nPlanes[0], nPlanes[1], kernel_size=2, stride=2, bias=False, indice_key='bb_spconv{}'.format(1))
        )

CODING 3D classifiers

OpenMM Lab
Open3d
Pytorch3D

https://www.kaggle.com/code/jeremy26/voxels-3d-cnns-starter

Deep Learning with PointCloud, Image and different kind of sensor fusion

Deep learning

Back Propagation

HYPER PARAMETERS

An Interactive Node-Link Visualization of Convolutional Neural Networks

Adam W. Harley Convolutional neural networks are at the core of state-of-the-art approaches to a variety of computer…

FINALLY 3D Deep learning

CODING 3D classifiers

Written by Chris Thaliyath