Mastering Computer Vision: A Guide for Aspiring Practitioners

Published in

Nerd For Tech

3 min readOct 4, 2023

Andrew Ng on building a career in machine learning

I landed my dream job as a Computer Vision Data scientist, and I was extremely excited to get started on it, however after a while my learning specific to Data Science hit a wall. Frankly, there’s a multitude of non-data science aspects to grasp when working in a professional environment, especially when you’re part of a small team with fluid roles — a situation I found quite rewarding. I delved into topics like MLOps, establishing data pipelines, adhering to coding best practices, and engaging in peer code reviews. While this was a valuable learning experience, I still recognize the need to further expand my expertise, particularly in the realms of Data Science and Computer Vision.

To enable that, I am planning to follow the guideline shared my Andrew Ng in a recent lecture, where he recommends that the best way to learn is by replicating results of research papers and doing the dirty work. And doing so at least 20–25 times.

So, to achieve that goal, I will list 25 research papers, and I will :

Replicate them and share on Github
Write a blogpost summarising my learnings
Add the respective links next to the list below for easy access

The goal of this exercise is to get a better sense of what’s happened in the ML research field and to hopefully do away with the fear of approaching something new.

Note: Starting with 24 papers, will keep on adding the papers until the number hits 25.

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. Original Paper, Summary
Generative Adversarial Networks Original Paper, Summary
YOLO: You Only Look Once: Unified, Real-Time Object Detection https://arxiv.org/abs/1506.02640
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs: https://arxiv.org/abs/2203.06717
ImageNet Classification with Deep Convolutional Neural Networks(Alex net): https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Very Deep Convolutional Networks for Large-Scale Image Recognition(VGGNet):
Deep Residual Learning for Image Recognition(ResNet)
Going Deeper with Convolutions Inception (GoogLeNet)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (CycleGan)
SSD: Single Shot MultiBox Detector
Convnet for 2020s: https://arxiv.org/abs/2201.03545
Attention is all you need: https://arxiv.org/abs/1706.03762
End-to-end object detection with Transformers: https://arxiv.org/abs/2104.12763
Emerging Properties in Self-Supervised Vision Transformers(DINO): https://arxiv.org/abs/2104.14294
SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
Trainable Projected Gradient Method for Robust Fine-Tuning (CVPR 2023)
Deep Learning for Time Series Classification
MobileNeRF: Exploiting the Polygon Rasterisation Pipeline for Efficient Neural Field Rendering on Mobile Architectures
DynIBaR: Neural Dynamic Image-Based Rendering
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
MAGVIT: Masked Generative Video Transformer
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
On Distillation of Guided Diffusion Models

Hopefully this list will be helpful if you are also looking for what to pickup next!

Happy learning.

Mastering Computer Vision: A Guide for Aspiring Practitioners

Written by Akanksha