Introduction to Computer Vision with PyTorch (1/6)
Computer Vision (CV) is a field that studies how computers can gain some degree of understanding from digital images and/or video. Understanding in this definition has a rather broad meaning — it can range from being able to distinguish between a cat and a dog on the picture, to more complex tasks such as describing the image in natural language.
The most common problems of computer vision include:
- Image Classification is the simplest task, when we need to classify an image into one of many pre-defined categories, for example, distinguish a cat from a dog on a photograph, or recognize a handwritten digit.
- Object Detection is a bit more difficult task, in which we need to find known objects on the picture and localize them, that is, return the bounding box for each of recognized objects.
- Segmentation is similar to object detection, but instead of giving bounding box we need to return an exact pixel map outlining each of the recognized objects.
We’ll focus on image classification task, and how neural networks can be used to solve it. As with any other machine learning tasks, to train a model for classifying images we’ll need a labeled dataset, that is, a large number of images for each of the classes.
Images as Tensors
Computer Vision works with Images. As you probably know, images consist of pixels, so they can be thought of as a rectangular collection of pixels.