Computer Vision Part 1: An Introduction

Ilias Mansouri
6 min readFeb 21, 2019

--

In 1966, a summer project somewhere in the Massachusetts was organised with an attempt to divide a picture into regions such as:

  • likely objects
  • likely background areas
  • chaos

By the end of July, it was the goal that non-overlapping objects could be detected in scenes with a homogeneous background. The plan for August was to prioritise on handling overlapping objects, complex surfaces and backgrounds. If there was some time left, the summer workers would extend the class of objects. To say that this ambitious summer project wasn’t clotured with success, is an understatement.

But, it can be argued that this project gave a major boost to a series of studies in the 1970’s which became the ancestors of many today’s CV (computer vision) algorithms such as edge detection and motion estimation. The 80’s knew an exhaustive set of more rigorous mathematical studies. It’s in this era, that linear algebra was applied to the problem of facial recognition. The next decade was arguably more practically inclined: 3D reconstructions, camera calibration and image segmentation for example. Especially by the second half of the 90s, it started to heat up pretty quickly thanks to:

  • major hardware improvements
  • more data available
  • data easily shared thanks to the internet

Almost 20 years later (feelin’ old yet?), we find ourselves where companies don’t use badges anymore but their employees facial features for access granting. Algorithms outperforms human pathologists in cancer detection. Amazon Go stores are introduced where no checkout stations or cashiers are needed. Tesla has its famous Autopilot feature.

Heck, you understand right? Computer Vision finds applications everywhere with arguably a lesser presence in the insurance and banking sector.

What is Computer Vision?

In a punchy one-liner: “Computer Vision is the art of mimicking the human visual system.”

And this folks, is probably the easiest part of the CV-challenge. Let us elaborate, shall we?

How does it work?

Human visual system

Story is that incoming light is focused by the cornea which then goes through the crystalline lens. Behind the cornea we find the iris existing of the pupil which expands/contracts to control the amount of light entering our eye and striking the retina. The retina constitutes of rods and cones which are both photosensitive cells. Cones exists in 3 variants to capture and convert the respective wavelength of light for red, green and blue. Rods are responsible for the monochrome element at night aka they respond to light intensity. Afterwards, those electrical signals generated by rods and cones are send to the brain through the optic nerve.

And here we stop because, at the moment, no one really knows what’s happening in the brain.

Computer vision system

Similar to the workings of an eye, cameras capture light which is then converted to signal. For traditional cameras, the conversion happens thanks to a photographic film. Digital and video cameras use an electronic sensor for the conversion. After the conversion, the image is represented as one big matrix with pixel values.

Use cases

We are all aware of the heavy research done on autonomous vehicles using video data. Furthermore, other techniques such as (AR) emoji’s and filtering effects have become ubiquitous. Let’s have a short overview of real life applications across different industries as an appetizer shall we?

Manufacturing

Impurities in microchips, calibration of wing planes, blemishes in bottles, quality control is necessary to ensure that manufacturer’s production line respect assembly standards. By using CV, manufacturers ensures customer satisfaction and product recall. With predictive and preventive maintenance higher throughput of production lines can be guaranteed. Deploying CV early on and potentially across the whole production line can for example detect a calibration error of a robot tool which later results in the goods not be properly placed on a belt and ends in a jam where a rushed employee forgets the safety-guidelines in an attempt to keep downtime a minimum.

Healthcare

CV techniques are more and more used to analyse medical images with as goal to detect or classify all kind of conditions and illnesses. While much higher accuracies are observed for those techniques, they are still employed as an assisting tool for professionals to minimize inaccurate diagnoses, incorrect treatment and other human errors. The same techniques also assists therapists with the revalidation of patients with walking disabilities for example, detecting (in)correct pose or risks that a patient will fall.

Banking

It can be argued that in the banking sector CV will more act as an enabling technology and less as a disruptive technology due to the heavy paperwork present in the backoffice. digitalisation of documents can happen with the help of OCR (Optical Character Recognition) which then, by using text analysis, can be classified for further handling. Some banks allow you to open a bankaccount from your smartphone with selfie and a videocall. ATM’s can be found equipped with a facial recognition system to combat fraudulent transactions.

Retail

Let’s start with arguably the worst part of the shopping experience: infinite queueing time. While the Amazon Go Store completely removed the checkout process giving the term “frictionless retail” a whole new dimension, brick and mortar retail stores could easily use their already installed cameras’s to predict and detect long queueing times which would then automatically notify an employee to go and support the checkout of customers. Furthermore, realtime sentiment analysis can be applied by visioning the customer’s emotions when approaching a shelf. This can be extended by placing eye tracking sensors to detect which cheese is more popular among the diary products. Also, by detecting loyal customers and going through their purchasing history, real-time recommender systems can offer discounts and thus encouraging brand loyalty which ultimately resonates with the “Customer is king” motto.

Security

A security system already uses cameras which its videostream runs to a centralised computer or server. The logical next step is to run, preferably real time, algorithms which would detect anomalies in those video data. Those CV algorithms could detect a person pointing a gun to a cashier or someone trying to break into a home. Crowd congestion could be prevented or anomalies such as a fleeing crowd could be detected. Furthermore, more complex behavioural situations could be addressed such as a blind person lost in a train station or two kids bullying a classmate during the break.

Agriculture

Drones equipped with cameras could detect (un)healthy crops and thus determine which crops are preferred for longer shipments and which crops would be more suited for the local market. Similarly, livestock can be monitored to detect potential disease outbreaks and to track their growth. Finally, by screening the harvest the best crops breed can be identified which would result in a next generation of crops with an improved genotype.

Who are we?

Overture makes artificial intelligence accessible for both experts and non-experts. We are a development tool which brings the most powerful computer vision algorithms to data scientists, AI Engineers and developers. With our GUI or REST API we aim for total transparency and ease of use throughout the creation, training and deployment of your models.

What to come?

In this part, we discussed how a CV system gets camera data and converts it into pixels. Afterwards, we discussed some use cases across different industries. Next chapter, we will dive into the basic function blocks of CV systems. We will discuss how certain operations on a matrix of pixel values can result in surprising effects and form the fundamental blocks of AI in CV. Once the necessary elements broached, we will elaborate and discuss on the different computer vision algorithms which are the most used today and their respective usages.

--

--