Comprehensive Guide: Top Computer Vision Resources All in One Blog
Save this blog for comprehensive resources for computer vision
Working in computer vision and deep learning is fantastic because, after every few months, someone comes up with something crazy that completely changes your perspective on what is feasible.
After spending 2+ years in this field, I have found many interesting and helpful resources that will help you in your computer vision work. Also, they will show you how huge this domain is. I planned to add topics in a systematic way as we work on a computer vision project. So without any further due, let's start with,
Dataset generation
To train and evaluate computer vision models, we want some data. A dataset is a group of samples (in this case, photos or videos). Examples that fall within a specific topic or domain are typically included in datasets. Open datasets are those that anybody may access, download, and use for any purpose. We should include images which are having our targeted labels of classes. You can use the below resources for creating your data.
★Kaggle image datasets: Link
Users of Kaggle may discover and share data sets, study and develop models in a web-based data science environment, and collaborate with other data scientists and computer vision experts.
★ Datagen: Link
There are other sites where you can download basic images and then you can augment or process them. They are free to use and millions of images are present on the below site.
★ Unsplash
★ Pexel
★ Pixabay
I found these websites much more helpful to process with an easy to download images. I also mentioned 2 techniques for dataset creation in my blog so you can refer to them for further resources.
Annotation:
I purposefully placed annotation before augmentation because many annotation tools now have a facility for augmentation.
Annotation is the process where you want to mark a mask or bounding box around your target in an image in order to teach your model about the features of categories.
★ Roboflow: inbuilt facility for augmentation and annotation
★makesense.ai : Very helpful for annotations and collabration facility is also present
★Vgg annotator : Useful for faster masking
★ LabeIimg
★ V7
★ Labelbox
★ Scale AI
Augmentation of images:
Image data augmentation is the process of generating new transformed versions of images from the given image dataset to increase its diversity.
★Roboflow: [Again !!!!] One of best augmentation tool for huge data
★ image_augmentor : It will help you for fast and different augmentations
★ My blog: I coded it with the help of library to augment images you can check this for customization.
Training of model:
Training is a teaching model to understand facts and draw predictions from them so that it can accurately carry out a task.
There are many models but I am providing links of papers with a repo for a few famous ones. I will provide papers for all models to understand what is its structure and its layerwise arrangements.
* Object detection
Object detection is a computer vision technique for locating instances of objects in images or videos.
★ YOLO v8: [Paper was not realesed until this blog ]and repo
* Segmentation
It is the process of dividing an image into different regions based on the characteristics of pixels to identify objects or boundaries to simplify an image and more efficiently analyze it.
★ YOLOv8 Instance Segmentation
★ YOLOv7 Instance Segmentation
Libraries to know
Few libraries are helpful in all sub-activities in computer vision projects.
OpenCV ,SimpleCV,TensorFlow,Keras,MATLAB,PCL,DeepFace,NVIDIA CUDA-X,NVIDIA,Performance,Primitives,BoofCV,OpenVINO,PyTorch,Albumentations,Caffe,Detectron2,CUDA,YOLO
Interesting and simple projects that I found to improve your computer vision thinking:
image-net — computer vision challenge
1. How to read an image in Python using OpenCV — 2023
2. Sketchy — Sketch making Flask App — Interesting Project — 2023
3. How to detect shapes using cv2- with source code — easy project — 2023
4. Rotating and Scaling Images using cv2 — a fun Python application — 2023
5. How to use mouse clicks to draw circles in Python using OpenCV — easy project — 2023
7. Object Detection using SSD — with source code — easiest way — fun project –2023
8. Face Recognition Based Attendance System with source code — Flask App — With GUI — 2023
9. Face Recognition — GitHub Link 1, GitHub Link 2, Video Tutorial
10. Easiest way to Train yolov7 on the custom dataset — 2023
11. Template Matching — Video Tutorial, Written Tutorial
12. Semantic and Instance Segmentation on Videos using PixelLib in Python — Video Tutorial, Code
13. Object Detection using Deep Learning — Video Tutorial, Written Tutorial
14. Drowsiness Detection using cv2 in Python — interesting project — 2023
15. Realtime Number Plate Detection using Yolov7 — Easiest Explanation — 2023
Simultaneous localization and mapping [SLAM] systems:
Simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent’s location within it.
★ DROID-SLAM : DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer.
★ DynaSLAM:DynaSLAM is a visual SLAM system that is robust in dynamic scenarios for monocular, stereo and RGB-D configurations. Having a static map of the scene allows inpainting the frame background that has been occluded by such dynamic objects.
*RGB (Monocular):
★ ORB-SLAM:ORB-SLAM is a versatile and accurate SLAM solution for Monocular, Stereo and RGB-D cameras
★ Kimera:An Open-Source Library for Real-Time Metric-Semantic Localization and Mapping.
★ PTAM: PTAM (Parallel Tracking and Mapping) is a camera tracking system for augmented reality
★ LSD-SLAM:LSD-SLAM is a novel, direct monocular SLAM technique. Instead of using keypoints, it directly operates on image intensities both for tracking and mapping.
★ SVO-SLAM:SVO uses a semi-drect paradigm to estimate the 6-DOF motion of a camera system from both pixel intensities
Neural Radiance Field (NeRF)
A neural radiance field (NeRF) is a fully-connected neural network that can generate novel views of complex 3D scenes, based on a partial set of 2D images.
★ Bmild
★ nerf_pl
Interesting blogs related to computer vision:
★ Introduction to object detection by Analytics vidya: Part1, Part 2, and Part 3
★ Instance segmentation : To understand all about instance segmentation
★ Semantic segmentation:To understand all about instance segmentation
★Ultimate Guide to Object Detection Using Deep Learning: Step by step approch to understand deep learning
★ Image processing : To understand basics of image processing
★ All CNN architectures : Understanding of basic cnn architectures
Informative videos related to computer vision:
☆ MIT 6.S094: Computer Vision by Lex Fridman
☆ CNN Architectures by Michigan online
☆ Tensorflow Object Detection by Nicholas Renotte
☆ Detection and Segmentation by Stanford
☆ CNN by Andrej Karpathy (2016)
☆ CNN by Stanford University School of Engineering (2017)
☆ Introduction to Deep Learning and Self-Driving Cars by Lex Fridman [MIT 6.S094]
Research Papers
These are a few research paper sources where you can get easily papers for any required model and method.
★ ICLR
Other Resources
★ GitHub — A famous host of open-source software projects.
★ Quora — Seek help and ask any questions here if you have any difficulties!
I will be updating this blog frequently because there are many things that are not covered in this blog. You can follow me for new updates. Also, you can suggest topics to add to make it more useful for newbies as well as for all computer vision engineers. Let’s embrace AI!
If you have found this article insightful
If you found this article insightful, follow me on Linkedin and medium. you can also subscribe to get notified when I publish articles. Let’s create a community! Thanks for your support!
If you want to support me :
As Your following and clapping is the most important thing, but you can also support me by buying coffee. COFFEE.