Stories by Shubham Bhatt on Medium

Why Anchor boxes?

Shubham Bhatt — Sun, 07 Apr 2024 10:49:54 GMT

Have you ever wondered why anchor boxes come into the picture? Let us understand that today.

When we pass hundreds of anchor boxes as input. what happens to those?

Anchor boxes play a crucial role in the loss calculation during the training of object detection models. They primarily aid in training the model to accurately predict bounding boxes for objects of various sizes and aspect ratios.

Reducing these anchor boxes becomes manageable. for this we segregate the Once those are passed to the object detection model, these anchor boxes are adjusted to objects in the images. once adjusted, these are filtered out based on below factors:

Prediction and adjustment —

a) Objectness — if any anchor boxes have less probability of having an object, it will discard those.

b) Predictions on how to adjust the anchor box dimensions (scale, height, and width) and position (centre coordinates) to best fit the actual object in the image.

2. NMS — The next step is to remove the overlapping anchors. It keeps the box with the highest objectness score while removing other boxes that overlap it beyond a certain threshold (based on Intersection over Union, or IoU). NMS is applied per class to ensure that objects of different classes are detected separately, even if their bounding boxes overlap significantly.

3. Output — The final output after applying NMS is a set of bounding boxes, each associated with a class label and a confidence score. These represent the model’s predictions for the locations and identities of objects in the image.

4. Loss Calculation: The model is trained using a loss function that penalizes the difference between the predicted bounding boxes and their corresponding ground truth boxes. Anchor boxes are categorized as positive or negative based on their IoU (Intersection over Union) with ground truth boxes. Positive anchors have a high IoU with a ground truth box and are responsible for detecting objects, while negative anchors have a low IoU and are treated as background or non-object regions. This calculation happens only on positive anchor boxes. The loss calculation is applied only to positive anchors, focusing on refining their localization and classification predictions. This loss function incorporates terms for localization (bounding box regression) and classification.

Is anchor boxes used while inferencing on a trained model?

During inference, the trained model uses its learned parameters to make predictions directly on the input image. It doesn’t need anchor boxes anymore because it has learned to localize objects without them. Instead, the model predicts bounding boxes and their corresponding class probabilities directly from the features extracted from the input image.

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Shubham Bhatt — Mon, 11 Dec 2023 19:24:54 GMT

Today we will see the minute details in the MobileNets architecture. This is one of the architectures with light weight useful for mobile and embedded vision applications. This architecture describes an efficient network architecture and a set of two hyper-parameters to build very small, low-latency models that can be easily matched to the design requirements for mobile and embedded vision applications.

MobileNets are built primarily from depth-wise separable convolutions used in Inception models to reduce the computation in the first few layers.

MobileNet Architecture

The above table shows the architecture of the Mobilnet which utilizes depth-wise convolution and point-wise convolutional, which is explained further. The softmax layer along with flatten layer helps to classify the image.

The below diagram is a comparison to standard convolution with this architecture block used. It leads to a reduction in computation around 8 to 9 times than standard convolution.

Comparison of standard convolution and Mobilenet Depthwise separable Conv block

MobileNets use both batch norm and ReLU nonlinearities for both layers(depthwise and convolutional).

MobileNet architecture is broadly built upon two concepts:

1) Depthwise Separable Convolution

2) Two model shrinking hyperparameters:

A) Width multiplier

B) Resolution multiplier

Depthwise Separable Convolution:

The MobileNet model is based on depthwise separable convolutions which is a form of factorized convolutions which factorize a standard convolution into a depthwise convolution and a 1×1 convolution called a pointwise convolution. A standard convolution both filters and combines inputs into a new set of outputs in one step. The depthwise separable convolution splits this into two layers, a separate layer for filtering and a separate layer for combining. This factorization has the effect of drastically reducing computation and model size. The below figure shows, standard convolution factorized into a

a) depthwise convolution and

b) 1 x 1 pointwise convolution.

Depthwise separable convolutions are made up of two layers: depthwise convolutions and pointwise convolutions. Depthwise convolutions are applied to a single filter per each input channel (input depth).

In the above figure, we can see that the convolutional kernel of size Dk x Dk is applied to each input channel(input depth), contributing total computation contribution cost of:

Dk x Dk x M x Df x Df (Df is the input channel dimension)

Depthwise convolution is extremely efficient relative to standard convolution however it only filters input channels, it does not combine them to create new features. So additional layer that computes a linear combination of the output depthwise is via 1 x 1 convolutional also known as Pointwise convolution. Combining both called Depthwise separable convolutional. The total cost of Depthwise is:

Dk x Dk x M x Df x Df + M x N x Df x Df

By expressing convolution as a two-step process of filtering and combining we get a reduction in computation:

(Dk x Dk x M x Df x Df + M x N x Df x Df)/Dk x Dk x M x Df x Df + M x N x Df x Df

which is equivalent to:

1/N + 1/(Dk)²

Let’s talk about the other speciality of the MobileNet:

Width Multiplier:

Many times there is a requirement to build an even less computationally expensive and simple model. This architecture introduced a very simple parameter α called width multiplier. The role of the width multiplier α is to thin a network uniformly at each layer. For a given layer and width multiplier α, the number of input channels M becomes αM and the number of output channels N becomes αN.

The computational cost of a depthwise separable convolution with width multiplier α is:

Dk x Dk x αM x Df x Df + αM x αN x Df x Df

Resolution Multiplier:

The second hyper-parameter to reduce the computational cost of a neural network is a resolution multiplier ρ. This is applied to the input image and the internal representation of every layer is subsequently reduced by the same multiplier. computation cost with resolution multiplier is as:

DK x DK x αM x ρDF x ρDF + αM x αN x ρDF x ρDF

Below are a few of the interesting facts and results:

The impact of the depthwise separable layer can be seen in the below table, wherein the parameters & MAC operations are reduced to a large extent:

Depthwise Separable vs Full Convolution MobileNet

MobileNet with fewer parameters & MAC operations can match the accuracy of the other model having higher MACs and parameters.

Accuracy Comparison with few standard models

Intersection Over Union (IoU) Maths

Shubham Bhatt — Sat, 22 Jul 2023 18:36:47 GMT

This article demonstrates the maths behind the IoU(Intersection over Union).

We will start with calculating the intersection (shown in yellow region) and union(total area under blue and red area) on the first step.

Intersection over Union Explained and PyTorch Implementation(www.youtube.com)

For calculating the formula we will start with calculating the intersection area as follow:

box1(blue) = [x1, y1, x2, y2]

box2(red) = [x11, y11, x22, y22]

x1(of intersection) — max(box1[0], box2[0])

x2(of intersection) — min(box1[2], box2[2])

y1(of intersection) — max(box1[1], box2[1])

y2(of intersection) — min(box1[3], box2[3])

Intersection over Union Explained and PyTorch Implementation(www.youtube.com)

For calculating the union we simply have to add an area of box 1 and the area of box 2 and then subtract the common area since we have added them twice i.e.

area of box1+ area of box2 — intersection

Code snippet for implementing the same:

Hope this would have simplified the maths behind the operation.
Happy Learning!

Version Control using GIT

Shubham Bhatt — Sun, 16 Jan 2022 18:44:04 GMT

A version control system that allows saving the changes made to a file or directory in a project

The benefit of GIT:

You cannot overwrite the files, as it notifies you if you overwrite
GIT can synchronize work done by different people on a different machine
GIT stores all information in a directory called .git located in the root directory of the project. It is hidden by default to avoid any edit or delete by mistake.

Create new Git repo:

git init project_name: It will create a repository for a new project. The use git init for initializing the git.

Cloning a repo

It copies a repo in a new directory(or simply a new machine)

git clone www.sample.com/project.git

How can i find out where a cloned repo originated?

When you clone a repository, git remember where the original repo was. It does this by storing a remote in new repository configuration.

git remote –v shows the remote urls

When you clone a repository, Git automatically creates a remote called origin that points to the original repository. You can add more remotes using:

git remote add remote-name URL

and remove existing ones using:

git remote rm remote-name

You can connect any two Git repositories this way, but in practice, you will almost always connect repositories that share some common ancestry.

Now you have an online repo(like github or bitbucket), now you want to pull changes from those repo and push changes to them.

Pull

Pulling changes is straightforward, the command git pull remote branch gets everything in branch in the remote repository identified by remote and merges it into the current branch of your local repository.

git pull remote branch

For example, if you are in the “quarterly-report” branch of your local repository, the command, git pull thunk latest-analysis would get changes from “latest-analysis” branch in the repository associated with the remote called “thunk” and merge them into your “quarterly-report” branch.

How can I push my changes to a remote repository?

The complement of git pull is git push, which pushes the changes you have made locally into a remote repository. The most common way to use it is git push remote-name branch-name which pushes the contents of your branch “branch-name” into a branch with the same name in the remote repository associated with “remote-name”. It’s possible to use different branch names at your end and the remote’s end, but doing this quickly becomes confusing: it’s almost always better to use the same names for branches across repositories.

What happens if my push conflicts with someone else’s work?

Overwriting your own work by accident is bad; overwriting someone else’s is worse.

To prevent this from happening, Git does not allow you to push changes to a remote repository unless you have merged the contents of the remote repository into your own work.

Checking state of repository -

git status — It checks the status of the repository, i.e. it displays a list of files that have been modified since the last time changes were made. It will return the file(s) that were modified since the last save.

Git has a staging area in which it stores the files with changes you want to save that haven’t been saved yet. Committing means putting this staging area to the .git directory where no more changes can be made. Git status shows the files in the staging area and files that have changes that haven’t yet been put there

git diff filename — can also be used to check the difference in the file and if want to check in all files in a folder then git diff can be used. git diff directory will show the changes to all files in a directory.

diff on the other hand show the formatted difference between two files.

Sample output:

diff — git a/report.txt b/report.txt

index e713b17..4c0742a 100644

— — a/report.txt

+++ b/report.txt

@@ -1,4 +1,5 @@

-# Seasonal Dental Surgeries 2017–18
+# Seasonal Dental Surgeries (2017) 2017–18
+# TODO: write new summary

The command used to produce the output (in this case, diff — git). In it, a and b are placeholders meaning “the first version” and “the second version”.
An index line showing keys into Git’s internal database of changes. We will explore these in the next chapter.
— — a/report.txt and +++ b/report.txt, wherein lines being removed are prefixed with — and lines being added are prefixed with +.
A line starting with @@ that tells where the changes are being made. The pairs of numbers are start line and number of lines (in that section of the file where changes occurred). This diff output indicates changes starting at line 1, with 5 lines where there were once 4.
A line-by-line listing of the changes with — showing deletions and + showing additions (we have also configured Git to show deletions in red and additions in green). Lines that haven’t changed are sometimes shown before and after the ones that have in order to give context; when they appear, they don’t have either + or — in front of them.
Unique identifier ‘hash’ will remain the same if two files are same.

Adding file

git add filename — it adds the file(s) in the staging area. If you mistakenly stage a file, then you can revert that using git reset HEAD

Comparing files

git diff –r HEAD — It compares the particular version with a file in the staging area. -r flag means “compare to a particular revision”
git diff –r HEAD path/to/file — It can be used for specific version.

Saving changes to staging

git commit — It saves the changes in the staging area. Git allows entering a log message. It helps to track the changes. e.g. git commit -m “Program appears to have become self-aware.”

In case you miss-type the commit message then use git commit — amend — m “new message”

git log helps to view the log of the project's history.

git log path/to/file shows the specific file history instead of the entire project.

HOW DOES GIT STORE THE DATA:

It stored in three-level structure

Commit, contains the metadata such as author, the commit message and the time the commit happened.

Tree, Track the name and location of the repository when that commit happened.

BLOB, comprising a snapshot of the contents of the file when the commit happened. Reusing the blob that is not changed saves the memory.

Three Level Structure of GIT Storage

git log displays the overall history of a project/files while git annotation file shows who made the changes to each line of a file and when it was changed. If git log –3 filename is passed then only the last three changes will be seen.

git diff views different commit. Those changed.

Configuring the setting

git config –list : to see the setting and change then. You can add suffix –local, — global, — system. Local setting precedence over the global setting. To check a specific setting we can use git config — global setting value(e.g. git config — global user.email).

How can I undo changes to unstaged files?

Suppose you have made changes to a file, then decide you want to undo them. Your text editor may be able to do this, but a more reliable way is to let Git do the work. The command:

git checkout — filename

will discard the changes that have not yet been staged. (The double dash — must be there to separate the git checkout command from the names of the file or files you want to recover.)

How can I undo changes to staged files?

If you use just git reset it will reset everything from the staging or git reset HEAD folder, it will reset all the files from that folder.

By combining git reset with git checkout, you can undo changes to a file that you staged changes to. The syntax is as follows.

git reset HEAD path/to/file

git checkout — path/to/file

How do I restore an old version of a file?

The syntax for restoring an old version takes two arguments: the hash that identifies the version you want to restore, and the name of the file.

For example, if git log shows this:

commit ab8883e8a6bfa873d44616a0f356125dbaccd9ea
Author: Author: Rep Loop
Date: Thu Oct 19 09:37:48 2017 -0400

Adding graph to show latest quarterly results.

commit 2242bd761bbeafb9fc82e33aa5dad966adfe5409
Author: Author: Rep Loop
Date: Thu Oct 16 09:17:37 2017 -0400

Modifying the bibliography format.

then git checkout 2242bd report.txt would replace the current version of report.txt with the version that was committed on October 16. Notice that this is the same syntax that you used to undo the unstaged changes, except — has been replaced by a hash.

Restoring a file doesn’t erase any of the repository’s history. Instead, the act of restoring the file is saved as another commit, because you might later want to undo your undoing.

What is a branch?

branches, which allows you to have multiple versions of your work, and lets you track each version systematically. one branch does not affect other branches (until you merge them back together).

By default, master is the branch repository has. To get list of branches a project has, git branch is the option.

git checkout -b branch-name is to create a new branch

To compare two versions of the repo, you can git diff revision-1..revision-2 and git diff branch-1..branch-2 shows the difference between two branches.

To switch the branch use git checkout branch_name

Merge two branches:

While Branches work parallelly merging brings them together. git merge source destination. If those changes don’t overlap, the result is a new commit in the destination branch that includes everything from the source branch

SSD explained

Shubham Bhatt — Mon, 03 Jan 2022 02:27:44 GMT

Object detection models are broadly classified into two categories, single-stage, and multiple stages. SSD is one of the single-stage object detection models. This makes it much faster compared to other architecture and even faster than Faster R-CNN.

SSD has the following key features:

Multi-scale feature maps for detection— After the base network i.e. VGG16, CNN is used to decrease the size progressively which allows the prediction at multiple scales.

*Convolutional predictors for detection* — at the end of the base network, several feature layers were added, which produces fix set of detection predictions using a set of convolutional filters. The feature layer with kernel predicts score for a category or a shape offset relative to the default box coordinates
Default boxes and aspect ratios — Default boxes are similar to Anchor boxes in Faster RCNN. However, unlike Faster RCNN, they are applied at several feature maps of different resolutions. We predict the offsets relative to the default box shapes in the cell, as well as the per-class scores that indicate the presence of a class instance in each of those boxes. By default SSD used 6 default boxes per location.
Matching strategy — During training default boxes need to be matched with the actual box. That is done using Jaccard overlap.
Training objective — The training objective is to a weighted sum of localization loss and confidence loss(using softmax). Localization loss is Smooth L1 loss between the predicted box and ground truth box.

L(x, c, l, g) = 1/N (Lconf (x, c) + αLloc(x, l, g)), where N is matching default boxes.

Choosing scales and aspect ratios for default boxes — To handle different scale, SSD suggest preprocessing the image at different size and combining the result after. They Designed the tiling of default boxes so that specific feature maps learn to be responsive to particular scales of the objects. Lower layer’s feature map capture more fine details that can improve the semantic segmentation quality because lower layers capture more fine details of input objects. So, SSD used both lower and upper feature maps for detection.
Hard negative mining — After the matching step, Most default boxes are negative, especially when the number of possible default boxes is large. This introduces a significant imbalance between positive and negative raining training examples. Sorting negative examples with the highest confidence for each default box and picking only the top negative example so that the ratio between negative and positive is at most 3:1. This leads to fast and more stable optimization.
Data Augmentation for Small Object Accuracy — augmenting the dataset using zoom in and zoom out operation for images especially for the small objects has significantly improved the result to 2%-3% mAP across multiple datasets.

Accuracy over PASCAL VOC2007 dataset is as follows: