Stories by Konstantinos Gyftodimos on Medium

1 — Pinhole Camera & Perspective Projection

Konstantinos Gyftodimos — Sun, 15 Jan 2023 17:06:10 GMT

Computer Vision Quick Series— Pinhole Camera & Perspective Projection

Theory :

A pinhole camera is a simple camera without a lens, where an aperture, called a pinhole, serves as the aperture to control the amount of light entering the camera. The image is formed by light passing through the pinhole and projecting an inverted image onto the opposite side of the camera called image plane.

In order to understand how a 3D object can be projected on the image plane through a pinhole, one has to observe Figure 1 where:

Figure 1 : Pinhole Camera Components

Optical Axis: The axis with normal vector perpendicular to the image plane.
Pinhole: Optical axis’ center.
Effective focal length: Distance of pinhole and image plane along the z − axis.
P0, P1: Real and projected point that are described by the vectors r0 and r1 respectively.

To find the relationship between the real point P0 on the 3D object, and the projected point P1 on the image plane, one has to notice the similar triangles in the figure above:

Calculation of x1, y1 coordinates of the projected 3D object on the image plane.

Code:

A simple pinhole camera model can be directly modeled and visualized in the Python code below:

import cv2
import numpy as np

# Create a blank image with a black background
width, height = 640, 480
image = np.zeros((height, width, 3), np.uint8)

# Define the camera matrix
focal_length = 1
center = np.array([width/2, height/2])
camera_matrix = np.array([[focal_length, 0, center[0]],
                        [0, focal_length, center[1]],
                        [0, 0, 1]], dtype = "double")

# Create a 3D point in the world space
world_points = np.array([[0, 0, 0]], dtype='double')

# Project the 3D point onto the image plane
projected_points, _ = cv2.projectPoints(world_points, np.zeros((3,1)), np.zeros((3,1)), camera_matrix, None)

# Draw the projected point on the image
cv2.circle(image, tuple(np.squeeze(projected_points[0]).astype(int)), 5, (0, 255, 0), -1)

cv2.imshow("Pinhole Camera", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Outro:

Hope this was helpful!

I can personally code your A.I project! Hire me via Fiverr:

https://www.fiverr.com/share/98GZLA

Optical Flow with Python & OpenCV

Konstantinos Gyftodimos — Sat, 07 Jan 2023 16:22:13 GMT

Optical Flow between car pixels in consequtive frames.

Table of Contents:

Intro
Lucas-Kanade method (explanation & code)
Horn-Schunck method (explanation & code)
Farneback method (explanation & code)
Outro

Intro

In this article 3 different methods for optical flow will be briefly explained and implemented.

Optical flow is a technique used to measure the motion of objects in an image or video. It is based on the idea that the apparent motion of objects in an image can be used to estimate the underlying motion of those objects in the real world. Optical flow algorithms are used in a wide range of applications, including video compression, object tracking, and image registration.

To understand optical flow, it is helpful to consider a simple example. Suppose that you are watching a video of a car driving down a road. As the car moves from one frame of the video to the next, the pixels that make up the car will also move. If you were to plot the positions of these pixels in each frame of the video, you would see that they form a curve. This curve is called the “optical flow” of the car.

Lucas-Kanade method (explanation & code)

One way to estimate the optical flow of objects in an image or video is to use the Lucas-Kanade method. This method is based on the assumption that the motion of objects in an image can be approximated by a small displacement vector, which describes the change in position of the objects from one frame to the next. To estimate the optical flow using the Lucas-Kanade method, you would need to compute these displacement vectors for each pixel in the image.

Here is some example Python code that demonstrates how to use the Lucas-Kanade method to estimate the optical flow of a simple image:

import cv2
import numpy as np

# Read the first frame of the video
prev_frame = cv2.imread('frame1.jpg')

# Convert the frame to grayscale
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

# Read the second frame of the video
next_frame = cv2.imread('frame2.jpg')

# Convert the frame to grayscale
next_gray = cv2.cvtColor(next_frame, cv2.COLOR_BGR2GRAY)

# Compute the Lucas-Kanade Optical Flow
flow = cv2.calcOpticalFlowFarneback(prev_gray, next_gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)

# Convert the flow to x and y coordinates
flow_x = flow[:, :, 0]
flow_y = flow[:, :, 1]

# Calculate the magnitude and angle of the flow vectors
magnitude, angle = cv2.cartToPolar(flow_x, flow_y)

# Draw the flow vectors on the frame
h, w = prev_gray.shape[:2]
fx, fy = flow[:, :, 0], flow[:, :, 1]
lines = np.vstack([fx, fy, np.ones(fx.shape)])
[vx, vy, x, y] = np.linalg.lstsq(lines.T, np.ones(fx.shape), rcond=None)[0]
result = cv2.warpAffine(prev_frame, cv2.getRotationMatrix2D((x, y), angle, 1.0), (w, h), flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP)
result[vy:vy + h, vx:vx + w] = next_frame

# Display the resulting frame
cv2.imshow('Optical Flow', result)
cv2.waitKey(0)
cv2.destroyAllWindows()

Horn-Schunck method (explanation & code)

The Horn-Schunck method is another technique used to estimate the optical flow between two images. It is based on the assumption that the flow is smooth, which means that pixels that are close to each other in the image will have similar flow vectors.

To compute the optical flow using the Horn-Schunck method with Python and OpenCV, you can use the calcOpticalFlowHS function. This function takes in the previous frame, the current frame, and some parameters that control the smoothness of the flow and the accuracy of the computation. It returns the flow field, which is a 2D array with the flow vectors for each pixel in the image.

Here is an example of how to use the Horn-Schunck method to compute the optical flow between two images:

import cv2
import numpy as np

# Read the first frame of the video
prev_frame = cv2.imread('frame1.jpg')

# Convert the frame to grayscale
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

# Read the second frame of the video
next_frame = cv2.imread('frame2.jpg')

# Convert the frame to grayscale
next_gray = cv2.cvtColor(next_frame, cv2.COLOR_BGR2GRAY)

# Compute the Horn-Schunck Optical Flow
flow = cv2.calcOpticalFlowHS(prev_gray, next_gray, 0, None, (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1))

# Convert the flow to x and y coordinates
flow_x = flow[:, :, 0]
flow_y = flow[:, :, 1]

# Calculate the magnitude and angle of the flow vectors
magnitude, angle = cv2.cartToPolar(flow_x, flow_y)

# Draw the flow vectors on the frame
h, w = prev_gray.shape[:2]
fx, fy = flow[:, :, 0], flow[:, :, 1]
lines = np.vstack([fx, fy, np.ones(fx.shape)])
[vx, vy, x, y] = np.linalg.lstsq(lines.T, np.ones(fx.shape), rcond=None)[0]
result = cv2.warpAffine(prev_frame, cv2.getRotationMatrix2D((x, y), angle, 1.0), (w, h), flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP)
result[vy:vy + h, vx:vx + w] = next_frame

# Display the resulting frame
cv2.imshow('Optical Flow', result)
cv2.waitKey(0)
cv2.destroyAllWindows()

Farneback method (explanation & code)

The Farneback method is a technique used to estimate the optical flow between two images. It is based on the assumption that the flow is approximately a quadratic function of the pixel intensities.

To compute the optical flow using the Farneback method with Python and OpenCV, you can use the calcOpticalFlowFarneback function. This function takes in the previous frame, the current frame, and some parameters that control the smoothness of the flow and the accuracy of the computation. It returns the flow field, which is a 2D array with the flow vectors for each pixel in the image.

Here is an example of how to use the Farneback method to compute the optical flow between two images:

Copy code
import cv2
import numpy as np

# Read the first frame of the video
prev_frame = cv2.imread('frame1.jpg')

# Convert the frame to grayscale
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

# Read the second frame of the video
next_frame = cv2.imread('frame2.jpg')

# Convert the frame to grayscale
next_gray = cv2.cvtColor(next_frame, cv2.COLOR_BGR2GRAY)

# Compute the Farneback Optical Flow
flow = cv2.calcOpticalFlowFarneback(prev_gray, next_gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)

# Convert the flow to x and y coordinates
flow_x = flow[:, :, 0]
flow_y = flow[:, :, 1]

# Calculate the magnitude and angle of the flow vectors
magnitude, angle = cv2.cartToPolar(flow_x, flow_y)

# Draw the flow vectors on the frame
h, w = prev_gray.shape[:2]
fx, fy = flow[:, :, 0], flow[:, :, 1]
lines = np.vstack([fx, fy, np.ones(fx.shape)])
[vx, vy, x, y] = np.linalg.lstsq(lines.T, np.ones(fx.shape), rcond=None)[0]
result = cv2.warpAffine(prev_frame, cv2.getRotationMatrix2D((x, y), angle, 1.0), (w, h), flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP)
result[vy:vy + h, vx:vx + w] = next_frame

# Display the resulting frame
cv2.imshow('Optical Flow', result)
cv2.waitKey(0)
cv2.destroyAllWindows()

Outro

I hope this article helped you have a quick look at some of the optical flow methods!

I can personally code your A.I project! Hire me via Fiverr:

https://www.fiverr.com/share/98GZLA

Vision Transformer for Binary Classification of Custom Dataset with PyTorch

Konstantinos Gyftodimos — Thu, 15 Dec 2022 13:22:55 GMT

Short description:

Vision transformers are one of the popular transformers in the field of deep learning. Before the origin of the vision transformers, we had to use convolutional neural networks in computer vision for complex tasks. With the introduction of vision transformers, we got one more powerful model for computer vision tasks as we have BERT and GPT for complex NLP tasks. In this article, we will learn how can we use a vision transformer for an image classification task. For this purpose, we will demonstrate a hands-on implementation of a vision transformer for image classification.

The Vision Transformer classification process is summarized in the image below:

https://medium.com/media/a3607c15937007cb5f7283273f9a74f4/href

Coding part:

Step 1 : Create an anaconda environment and set-up required libraries.

Download Anaconda for Windows and then Create Anaconda Environment and activate it via “Anaconda Prompt”:

!conda create --name vit_project python=3.8
!conda activate vit_project

Download requirements.txt (link below), put it in your VIT-related project folder, activate the anaconda environment:

https://drive.google.com/uc?export=download&id=14xiSObMiBNRPSbwyevZ_hRRk7V3R-txF

!pip install -r requirements.txt

Step 2 : Folder structure for your custom dataset.

Make sure the folder structure for your classification dataset is the same as the one in the image below:

Structure your binary data like in the image above

Step 3 : Coding Finally Begins.

Libraries:

from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from linformer import Linformer
from PIL import Image
from torch.optim.lr_scheduler import StepLR
from tqdm.notebook import tqdm
from vit_pytorch.efficient import ViT
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.metrics import confusion_matrix
import torch.utils.data as data
import torchvision
from torchvision.transforms import ToTensor
torch.cuda.is_available()

Hyperparameters:

# Hyperparameters:
batch_size = 64 
epochs = 20
lr = 3e-5
gamma = 0.7
seed = 142
IMG_SIZE = 128
patch_size = 16
num_classes = 2

Optional — Automatic Random Dataset Split:

# input_folder = "dataset_new/"
# splitfolders.ratio(input_folder, output = "dataset_new_split", 
#                    seed = 42, ratio = (.80, 0.10, .10), 
#                    group_prefix = None)

Tensor Transforms & Data Loaders:

# Tensor Transforms (with Augmentation) and Pytorch Preprocessing:
train_ds = torchvision.datasets.ImageFolder("dataset_new_split/train", transform=ToTensor())
valid_ds = torchvision.datasets.ImageFolder("dataset_new_split/val", transform=ToTensor())
test_ds = torchvision.datasets.ImageFolder("dataset_new_split/test", transform=ToTensor())

# Data Loaders:
train_loader = data.DataLoader(train_ds, batch_size=batch_size, shuffle=True,  num_workers=4)
valid_loader = data.DataLoader(valid_ds, batch_size=batch_size, shuffle=True,  num_workers=4)
test_loader  = data.DataLoader(test_ds, batch_size=batch_size, shuffle=True, num_workers=4)

Model Building:

# Training device:
device = 'cuda'

# Linear Transformer:
efficient_transformer = Linformer(dim=128, seq_len=64+1, depth=12, heads=8, k=64)

# Vision Transformer Model: 
model = ViT(dim=128, image_size=128, patch_size=patch_size, num_classes=num_classes, transformer=efficient_transformer, channels=3).to(device)

# loss function
criterion = nn.CrossEntropyLoss()

# Optimizer
optimizer = optim.Adam(model.parameters(), lr=lr)

# Learning Rate Scheduler for Optimizer:
scheduler = StepLR(optimizer, step_size=1, gamma=gamma)

Custom Training:

# Training:
for epoch in range(epochs):
    epoch_loss = 0
    epoch_accuracy = 0
    for data, label in tqdm(train_loader):
        data = data.to(device)
        label = label.to(device)

        output = model(data)
        loss = criterion(output, label)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        acc = (output.argmax(dim=1) == label).float().mean()
        epoch_accuracy += acc / len(train_loader)
        epoch_loss += loss / len(train_loader)

        with torch.no_grad():
            epoch_val_accuracy = 0
            epoch_val_loss = 0
            
        for data, label in valid_loader:
            
            data = data.to(device)
            label = label.to(device)

            val_output = model(data)
            val_loss = criterion(val_output, label)

            acc = (val_output.argmax(dim=1) == label).float().mean()
            epoch_val_accuracy += acc / len(valid_loader)
            epoch_val_loss += val_loss / len(valid_loader)

    print(
        f"Epoch : {epoch+1} - loss : {epoch_loss:.4f} - acc: {epoch_accuracy:.4f} - val_loss : {epoch_val_loss:.4f} - val_acc: {epoch_val_accuracy:.4f}\n"
    )

Training Preview

Model Saving & Loading for future use:

# Save Model:
PATH = "epochs"+"_"+str(epochs)+"_"+"img"+"_"+str(IMG_SIZE)+"_"+"patch"+"_"+str(patch_size)+"_"+"lr"+"_"+str(lr)+".pt"
torch.save(model.state_dict(), PATH)

# load saved model:
PATH = "epochs"+"_"+str(epochs)+"_"+"img"+"_"+str(IMG_SIZE)+"_"+"patch"+"_"+str(patch_size)+"_"+"lr"+"_"+str(lr)+".pt"
efficient_transformer = Linformer(dim=128, seq_len=49+1, depth=12, heads=8, k=64)
model = ViT(image_size=224, patch_size=32, num_classes=2, dim=128 ,transformer=efficient_transformer, channels=3)
model.load_state_dict(torch.load(PATH))

Model Evaluation — Accuracy:

# Performance on Valid/Test Data
def overall_accuracy(model, test_loader, criterion):
    
    '''
    Model testing 
    
    Args:
        model: model used during training and validation
        test_loader: data loader object containing testing data
        criterion: loss function used
    
    Returns:
        test_loss: calculated loss during testing
        accuracy: calculated accuracy during testing
        y_proba: predicted class probabilities
        y_truth: ground truth of testing data
    '''
    
    y_proba = []
    y_truth = []
    test_loss = 0
    total = 0
    correct = 0
    for data in tqdm(test_loader):
        X, y = data[0].to('cpu'), data[1].to('cpu')
        output = model(X)
        test_loss += criterion(output, y.long()).item()
        for index, i in enumerate(output):
            y_proba.append(i[1])
            y_truth.append(y[index])
            if torch.argmax(i) == y[index]:
                correct+=1
            total+=1
                
    accuracy = correct/total
    
    y_proba_out = np.array([float(y_proba[i]) for i in range(len(y_proba))])
    y_truth_out = np.array([float(y_truth[i]) for i in range(len(y_truth))])
    
    return test_loss, accuracy, y_proba_out, y_truth_out


loss, acc, y_proba, y_truth = overall_accuracy(model, test_loader, criterion = nn.CrossEntropyLoss())


print(f"Accuracy: {acc}")

print(pd.value_counts(y_truth))

Accuracy Preview

Model Evaluation — ROC Curve:

# Plot ROC curve:

def plot_ROCAUC_curve(y_truth, y_proba, fig_size):
    
    '''
    Plots the Receiver Operating Characteristic Curve (ROC) and displays Area Under the Curve (AUC) score.
    
    Args:
        y_truth: ground truth for testing data output
        y_proba: class probabilties predicted from model
        fig_size: size of the output pyplot figure
    
    Returns: void
    '''
    
    fpr, tpr, threshold = roc_curve(y_truth, y_proba)
    auc_score = roc_auc_score(y_truth, y_proba)
    txt_box = "AUC Score: " + str(round(auc_score, 4))
    plt.figure(figsize=fig_size)
    plt.plot(fpr, tpr)
    plt.plot([0, 1], [0, 1],'--')
    plt.annotate(txt_box, xy=(0.65, 0.05), xycoords='axes fraction')
    plt.title("Receiver Operating Characteristic (ROC) Curve")
    plt.xlabel("False Positive Rate (FPR)")
    plt.ylabel("True Positive Rate (TPR)")
#     plt.savefig('ROC.png')
plot_ROCAUC_curve(y_truth, y_proba, (8, 8))

ROC Curve

Model Evaluation Confusion Matrix

from sklearn.metrics import confusion_matrix
import seaborn as sn
import pandas as pd

y_pred = []
y_true = []

net = model
# iterate over test data
for inputs, labels in test_loader:
        output = net(inputs) # Feed Network

        output = (torch.max(torch.exp(output), 1)[1]).data.cpu().numpy()
        y_pred.extend(output) # Save Prediction
        
        labels = labels.data.cpu().numpy()
        y_true.extend(labels) # Save Truth

# constant for classes
classes = ('cats', 'dogs')

# Build confusion matrix
cf_matrix = confusion_matrix(y_true, y_pred)
df_cm = pd.DataFrame(cf_matrix/np.sum(cf_matrix), index = [i for i in classes],
                     columns = [i for i in classes])
plt.figure(figsize = (12,7))
sn.heatmap(df_cm, annot=True)
# plt.savefig('cm.png')

Confusion Matrix for cats-dogs dataset

Model Inference on New Images:

# Inference on Single Images (cats-dogs):
test_image = "new_cat_image.jpg"
test_image_null = "new_dog_image.png"
image = Image.open(test_image)
image_null = Image.open(test_image_null)

# Define tensor transform and apply it:
data_transform = transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor()])
image_t = data_transform(image).unsqueeze(0)
image_null_t = data_transform(image_null).unsqueeze(0)

# Labels:
for inputs, labels in test_loader:
        labels = labels.data.cpu().numpy()

# Prediction:
out_cat = model(image_t)
out_dog= model(image_null_t)
print("predicted cat tensor:", out_cat)
print("predicted dog tensor:", out_dog)
print("")
# Print:
if(labels[out_cat.argmax()]== 0):
    print("smoke")
else:
    print("else")
    
# Show Image:
plt.figure(figsize=(2, 2))
plt.imshow(image)
plt.show()
# Print:
if(labels[out_dog.argmax()]== 0):
    print("cat")
else:
    print("dog")
    
# Show Image Null:
plt.figure(figsize=(2, 2))
plt.imshow(image_null)
plt.show()

Appendix :

ViT Hyper-Parameters:

1. image_size: int (max size of w or h)
2. patch_size: int (# of patches, image_size must be dividable with patch_size, MUST be greater than 16)
3. num_classes: int (# of classes)
4. dim: int (last dimension of output tensor after linear transformation nn.Linear(..,dim))
5. depth: int (# of transformer blocks)
6. heads: int (# of heads in Multi-head Attention layer)
7. mlp_dim: int (dimension of the MLP-feedforward layer)
8. channels: int (image channels = 3)
9. dropout: float (between [0,1] — dropout rate of neurons)
10. emb_dropout (between [0,1] — dropout rate of embeddings — usually is 0)

ViT Learning Rate & Loss Function:

Optimizer: ADAM

Learning Rate: StepLR (decays LR by gamma every #(step_size) of epochs)

Loss Function: CrossEntropy (remember to try BinaryCrossEntropy also: nn.BCELoss())

Outro:

I can personally code your A.I project! Hire me via Fiverr:

https://www.fiverr.com/share/98GZLA

I hope my tutorial was of help!