Mastering Computer Vision: A Guide for Aspiring Practitioners

Akanksha
Nerd For Tech
Published in
3 min readOct 4, 2023
Andrew Ng on building a career in machine learning

I landed my dream job as a Computer Vision Data scientist, and I was extremely excited to get started on it, however after a while my learning specific to Data Science hit a wall. Frankly, there’s a multitude of non-data science aspects to grasp when working in a professional environment, especially when you’re part of a small team with fluid roles — a situation I found quite rewarding. I delved into topics like MLOps, establishing data pipelines, adhering to coding best practices, and engaging in peer code reviews. While this was a valuable learning experience, I still recognize the need to further expand my expertise, particularly in the realms of Data Science and Computer Vision.

To enable that, I am planning to follow the guideline shared my Andrew Ng in a recent lecture, where he recommends that the best way to learn is by replicating results of research papers and doing the dirty work. And doing so at least 20–25 times.

So, to achieve that goal, I will list 25 research papers, and I will :

  1. Replicate them and share on Github
  2. Write a blogpost summarising my learnings
  3. Add the respective links next to the list below for easy access

The goal of this exercise is to get a better sense of what’s happened in the ML research field and to hopefully do away with the fear of approaching something new.

Note: Starting with 24 papers, will keep on adding the papers until the number hits 25.

  1. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. Original Paper, Summary
  2. Generative Adversarial Networks Original Paper, Summary
  3. YOLO: You Only Look Once: Unified, Real-Time Object Detection https://arxiv.org/abs/1506.02640
  4. Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs: https://arxiv.org/abs/2203.06717
  5. ImageNet Classification with Deep Convolutional Neural Networks(Alex net): https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
  6. Very Deep Convolutional Networks for Large-Scale Image Recognition(VGGNet):
  7. Deep Residual Learning for Image Recognition(ResNet)
  8. Going Deeper with Convolutions Inception (GoogLeNet)
  9. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
  10. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (CycleGan)
  11. SSD: Single Shot MultiBox Detector
  12. Convnet for 2020s: https://arxiv.org/abs/2201.03545
  13. Attention is all you need: https://arxiv.org/abs/1706.03762
  14. End-to-end object detection with Transformers: https://arxiv.org/abs/2104.12763
  15. Emerging Properties in Self-Supervised Vision Transformers(DINO): https://arxiv.org/abs/2104.14294
  16. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
  17. Trainable Projected Gradient Method for Robust Fine-Tuning (CVPR 2023)
  18. Deep Learning for Time Series Classification
  19. MobileNeRF: Exploiting the Polygon Rasterisation Pipeline for Efficient Neural Field Rendering on Mobile Architectures
  20. DynIBaR: Neural Dynamic Image-Based Rendering
  21. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
  22. MAGVIT: Masked Generative Video Transformer
  23. REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
  24. On Distillation of Guided Diffusion Models

Hopefully this list will be helpful if you are also looking for what to pickup next!

Happy learning.

--

--

Akanksha
Nerd For Tech

Data Scientist | Machine Learning | Insights in data science on latest model releases and key research papers