Deep Learning with PointCloud, Image and different kind of sensor fusion
How to bring Lidar and Camera on the same coordinate frame and overlay the image on the pointcloud ?
- Camera Object detection
- Lidar object detection
- Project the 3d objects and 2d object to same coordinate space
DIfferent KINDS OF FUSION
1)EARLY FUSION
order
- Fusion
- detection
- output
2)LATE FUSION
order
- detection
- Fusion
- output
- Camera object detection
- Lidar object detection
- Project the 3d Obstacle in the image
- Fuse the 3d Box(Lidar) with the 2d box (Camera)
- Build a fused object
3)DEEP FUSION
4)MID FUSION
order
- Learning
- Fusion
- detection
- output
5)SEQEUNTIAL FUSION
order
- detection(sensor A)
- Kalman filter fusion
- Detection (sensor B)
- kalman filter fusion
- fusing the results one after the other and using an algorithm like Kalman filters for it .
- Particle filter
Camera Object detection
- Camera frame is Pixel
LIDAR Object detection
- point RCNN
- Lidar frame is 3D
- Extract the corners
Use the Intersection of the Union of camer and Lidar object detection to find the object .
Bipartiate Matching
- Hungarian algorithm , for associattion of Lidar and Camera objects
Deep learning
MSE
Mean Square Error
Back Propagation
- Forward pass
- Loss function calculation
- backpropagation
- weight updation
Sigmoid function
SOFTMAX
For multiclass , classification
ENCODER in dl
CROSS — EnTROPY
To compare the result , we use LOSS FUNCTION such as CE
HYPER PARAMETERS
- learning rate , determines how fast we go down
- batch size , how many data we send before we do an update
numer of samples send for updates - TRAIN_SPLIT ,
- VAL_SPLIT ,
- num_epochs , how many times we are goind to do the process
how many times we need to see the entire dataset - loss_function ,
- optimizer ,
- device , training on GPU , but target is usually CPU
TENSORFLOW
- more like C++ harder
- Deployment on tensorflow is better
- Keras , is integrated
PYTORCH
more research friendly , better choise for starting .ptrblck
- Syntax
- Graph
- Debugging
- Deployment is hard
## import libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
print(torch.__version__)
import matplotlib.pyplot as plt
import numpy as np
BATCH_SIZE = 3 # PICK A NUMBER
#transform = 4# CREATE TRANSFORMS TO A TENSOR
transform = transforms.ToTensor()
## download and load train and test dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
## Dataloaders
#trainloader = trainset#CALL A DATALOADER
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
#testloader = testset#CALL A DATALOADER
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=True, num_workers=2)
def imshow(img):
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
## get some random training images
dataiter = iter(trainloader)
print(dataiter)
images, labels = next(dataiter)
## show images
imshow(torchvision.utils.make_grid(images))
print(labels)
imshow(images[0])
print(labels[0].item())
for images, labels in trainloader:
print("Image batch dimensions:", images.shape)
print("Image label dimensions:", labels.shape)
break
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
model = Net()
print(model)
learning_rate = .1#PICK A NUMBER
num_epochs =4 #PICK A NUMBER
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)
criterion = nn.CrossEntropyLoss()#PICK A LOSS CONFORM TO CLASSIFICATION
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)#PICK AN OPTIMIZER
for epoch in range(num_epochs):
train_running_loss = 0.0
# SET MODEL TO TRAIN
model = model.train()
for i, (images, labels) in enumerate(trainloader):
optimizer.zero_grad()
images = images.to(device)
labels = labels.to(device)
logits = model(images)#FORWARD PROP
loss = criterion(logits, labels)#LOSS FUNCTION
# BACKPROP
## update model params
optimizer.step()
train_running_loss += loss.detach().item()
model.eval()
print('Epoch: %d | Loss: %.4f '\
%(epoch, train_running_loss / i))
import random
# Get a single batch (images and labels) from the iterator
dataiter = iter(testloader)
images, labels = next(dataiter)
print(len(dataiter))
print(len(images))
model.eval()
# Select a random value
random_idx = random.randint(0, len(images) - 1)
# Get the random image and corresponding label
random_image = images[random_idx]
random_label = labels[random_idx]
# Prepare the image for prediction
input_image = random_image.unsqueeze(0)
input_image = input_image.to(device)
# Run prediction
with torch.no_grad():
output = model(input_image)#FORWARD PASS
predicted = #GET THE PREDICTED VALUE
# Show results
print(f"Predicted label: {predicted.item()}")
print(f"Ground truth label: {random_label.item()}")
# Show image
plt.imshow(random_image.squeeze().cpu().numpy(), cmap="gray")
plt.title(f"Predicted: {predicted.item()}, Ground Truth: {random_label.item()}")
plt.show()
Convolutional Neural Network
- padding & stride
- max-pooling
- average -pooling
- layers
MAIN COMPONENTS
- RESIDUAL BLOCK
- TRANSFORMERS
- BIFPN’s
- DENSE BLOCK
- PYRAMID POOLING
- PA-NET
- BACKBONES
- SPATIAL TRANSFORMERS
- CONVOLUTIONAL BLOCKS
RES-NET VS DENSE-NET
BILENEAR INTERPOLATION
- upsampling basically !
All the above were refresher and basics of Neural Networks
FINALLY 3D Deep learning
This applies for the following set of data
- meshes
- pointclouds
- volumetric data
2 Techniques
- Process the points directly
- PointNet
- PointNet++ - T-net
- MLP
- Out
3 Problems 3D data has
- N! permutations
the order wont impact the result - We have rotation's and translations
Rotation the object wont impact the result , a tree stays a tree - We have local structures and geometry
VOXEL GRID
Voxel Feature Encoding
- maxpooling
- activations
- loss functions
-empty voxels
99% of the space contains empty voxels
3D CNN
SPARSE CONvolutionals
spconv.SparseSequential(
norm_fn(nPlanes[0]),
nn.ReLU(),
spconv.SparseConv3d(nPlanes[0], nPlanes[1], kernel_size=2, stride=2, bias=False, indice_key='bb_spconv{}'.format(1))
)
CODING 3D classifiers
- OpenMM Lab
- Open3d
- Pytorch3D