Introduction to Computer Vision with PyTorch (1/6)

4 min readSep 1, 2023

Computer Vision (CV) is a field that studies how computers can gain some degree of understanding from digital images and/or video. Understanding in this definition has a rather broad meaning — it can range from being able to distinguish between a cat and a dog on the picture, to more complex tasks such as describing the image in natural language.

The most common problems of computer vision include:

Image Classification is the simplest task, when we need to classify an image into one of many pre-defined categories, for example, distinguish a cat from a dog on a photograph, or recognize a handwritten digit.
Object Detection is a bit more difficult task, in which we need to find known objects on the picture and localize them, that is, return the bounding box for each of recognized objects.
Segmentation is similar to object detection, but instead of giving bounding box we need to return an exact pixel map outlining each of the recognized objects.

We’ll focus on image classification task, and how neural networks can be used to solve it. As with any other machine learning tasks, to train a model for classifying images we’ll need a labeled dataset, that is, a large number of images for each of the classes.

Images as Tensors

Computer Vision works with Images. As you probably know, images consist of pixels, so they can be thought of as a rectangular collection of pixels.

Introduction to Computer Vision with PyTorch (1/6)

Images as Tensors

Written by The V Notebook