Computer Vision AI: Explainer and Examples

An introduction for the wider general public.

Steven Vuong
Analytics Vidhya
4 min readMay 9, 2020

--

To define computer vision artificial intelligence (AI) broadly, it aims to make sense of visual inputs, namely images. The type of image and use case can range from satellite imaging in monitor crop yields to retina scans in detecting eye diseases. Ultimately, the aim is to gain meaning from images.

In Radiology, Google trained AI even outperforms medical experts in diagnosing cancerous tumours, reducing the number of false negatives by 2.7% and false positives by 1.2% for women in the UK.

Nothwestern University
Source: Northwestern University

So how does this system outperform medical professionals who have dedicated years of their lives to training? One answer is that AI has also spent much time training and has seen far more examples than a medical expert may hope to see in their lifetimes.

To help answer this question with more clarity, we will introduce one form of a deep neural network. Specifically, a convolutional neural network used extensively in the field of computer vision research and application. For those who may shy away from maths, stick around, I try my best to make this as accessible as possible.

Aside from the cat, what we see above is a convolutional neural network (CNN). It consists of multiple layers, in this case of 5 layers condensing in size. Starting at the bottom layer, an image is fed as an input to the first convolution layer.

Now, comes the ‘network’ bit which connects one layer to another. In a CNN, this is done with a sliding window, or kernel over the pixel values which represent the image.

Source: http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

The above is a simple kernel which takes the sum of all the values within each window and maps it to the next layer. Whilst there are many additional operations you can apply such as pooling or dropouts, this is now a neural network and by stacking many of these layers, one can form a deep neural network.

Once this reaches the last layer, then this is mapped to a final one dimensional layer, called a dense layer and a prediction is made to decipher what the input is. The prediction is usually done with a softmax function, which assigns a probability score to each possible output and selects the one with highest probability as the final prediction.

Source: https://i2.wp.com/adventuresinmachinelearning.com/wp-content/uploads/2019/08/Layers-and-abstractions.png?w=391&ssl=1

Typically, the initial layers are able to extract simple features, such as lines and corners whereas latter convolution layers have abstractions that are more complex in shape. When training our neural network model, we have the actual values to compare model predictions and this is used in a process called back-propagation to recursively backtrack and update parameter values in each layer.

In addition, the more layers these networks have, the more powerful they become in terms of predictive ability. To achieve outstanding results, the volume of images required is usually in the 10,000s and upwards. That is in addition to having to train on energy hungry GPUs over multiple days in some cases which may equate to thousands of human training hours.

So with big advances in computer vision research, we now see use cases emerging in more public spheres:

  • COVID19: Even with global flight demand down 70% May 2020 from May 2019, countries are introducing thermal imaging cameras at airports with AI to automatically flag people with abnormally high temperatures which is a symptom of COVID19. The results are imperfect but a cost-effective way of quickly screening dense crowds. As a result of COVID19, AI equipped thermal cameras are becoming more prevalent in workplaces and hospitals as well.
  • Criminal Detection: Researchers in China have developed a 500MP camera that can be used with AI to recognise thousands of faces in a dense crowd and can potentially point out criminals in vast public spaces such as stadiums or busy street crossings.
  • Transport: Innovative start-ups are using AI in partnership with large incumbent organisations to optimise logistics and automate processes. In such huge industries, small percentages can result in big savings.
Source: https://media-cdn.tripadvisor.com/media/photo-s/0a/ec/e9/b5/the-shinjuku-crossing.jpg

This is my first medium article hoping to introduce the general public to computer vision. If this gets 10 likes, then I will make another one around General Adversarial Networks (GANs) which has been behind the deep fake videos imitating celebrities and politicians in recent news. Maybe this can even develop into a series.

Steven Vuong, Data Scientist at HackPartners
Open to comments & feedback: stevenvuong96@gmail.com
https://www.linkedin.com/in/steven-vuong/
https://github.com/StevenVuong/

--

--