Creating Face Recognition Model & Connecting it to AWS and more
Task Description 📄
❄️ Create a program that perform below mentioned task upon recognizing a particular face.
📌 When it recognize your face then —
👉 It send mail to your mail id.
👉 Second it send whatsapp message to your friend.
📌 When it recognize second face, it can be your friend or family members face.
👉 Creates EC2 instance in the AWS.
👉 Creates 5 GB EBS volume and attach it to the instance.
In this we will be using Computer Vision in Python and AWS.
First let us look at what is computer vision,
Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, machines can accurately identify and classify objects — and then react to what they “see.”
Computers assemble visual images in the same way you might put together a jigsaw puzzle.
Think about how you approach a jigsaw puzzle. You have all these pieces, and you need to assemble them into an image. That’s how neural networks for computer vision work. They distinguish many different pieces of the image, they identify the edges and then model the subcomponents. Using filtering and a series of actions through deep network layers, they can piece all the parts of the image together, much like you would with a puzzle.
The computer isn’t given a final image on the top of a puzzle box — but is often fed hundreds or thousands of related images to train it to recognize specific objects.
Instead of training computers to look for whiskers, tails and pointy ears to recognize a cat, programmers upload millions of photos of cats, and then the model learns on its own the different features that make up a cat.
The effects of these advances on the computer vision field have been astounding. Accuracy rates for object identification and classification have gone from 50 percent to 99 percent in less than a decade — and today’s systems are more accurate than humans at quickly detecting and reacting to visual inputs.
Computer vision users in many industries are seeing real results — and we’ve documented many of them in this infographic. For example, did you know:
- Computer vision can distinguish between staged and real auto damage.
- Computer vision enables facial recognition for security applications.
- Computer vision makes automatic checkout possible in modern retail stores.
How computer vision works
Computer vision works in three basic steps:
Acquiring an image: Images, even large sets, can be acquired in real-time through video, photos or 3D technology for analysis.
Processing the image: Deep learning models automate much of this process, but the models are often trained by first being fed thousands of labeled or pre-identified images.
Understanding the image: The final step is the interpretative step, where an object is identified or classified.
There are many types of computer vision that are used in different ways:
- Image segmentation partitions an image into multiple regions or pieces to be examined separately.
- Object detection identifies a specific object in an image. Advanced object detection recognizes many objects in a single image: a football field, an offensive player, a defensive player, a ball and so on. These models use an X,Y coordinate to create a bounding box and identify everything inside the box.
- Facial recognition is an advanced type of object detection that not only recognizes a human face in an image, but identifies a specific individual.
- Edge detection is a technique used to identify the outside edge of an object or landscape to better identify what is in the image.
- Pattern detection is a process of recognizing repeated shapes, colors and other visual indicators in images.
- Image classification groups images into different categories.
- Feature matching is a type of pattern detection that matches similarities in images to help classify them.
Simple applications of computer vision may only use one of these techniques, but more advanced uses, like computer vision for self-driving cars, rely on multiple techniques to accomplish their goal.
Below is a simple illustration of the grayscale image buffer which stores our image of Abraham Lincoln. Each pixel’s brightness is represented by a single 8-bit number, whose range is from 0 (black) to 255 (white):
This way of storing image data may run counter to your expectations, since the data certainly appears to be two-dimensional when it is displayed. Yet, this is the case, since computer memory consists simply of an ever-increasing linear list of address spaces.
Training an object detection model
Viola and Jones approach
There are many ways to address object detection challenges. For years, the prevalent approach was one proposed by Paul Viola and Michael Jones in the paper, Robust Real-time Object Detection.
Although it can be trained to detect a diverse range of object classes, the approach was first motivated by the objective of face detection. It is so fast and straightforward that it was the algorithm implemented in point-and-shoot cameras, which allows for real-time face detection with little processing power.
The central feature of the approach is to train with a potentially large set of binary classifiers based on Haar features. These features represent edges and lines, and are extremely simple to compute when scanning an image.
Although quite basic, in the specific case of faces these features allow for the capturing of important elements such as the nose, mouth, or the distance between the eyebrows. It is a supervised method that requires many positive and negative examples of the type of object to be discerned.
Deep learning has been a real game changer in machine learning, especially in computer vision, where deep-learning-based approaches are now cutting edge for many of the usual tasks.
Among the different deep learning approaches proposed for accomplishing object detection, R-CNN (Regions with CNN features) is particularly simple to understand. The authors of this work propose a three stage process:
- Extract possible objects using a region proposal method.
- Identify features in each region using a CNN.
- Classify each region utilizing SVMs.
The region proposal method opted for in the original work was Selective Search, although the R-CNN algorithm is agnostic regarding the particular region proposal method adopted. Step 3 is very important as it decreases the number of object candidates, which makes the method less computationally expensive.
The features extracted here are less intuitive than the Haar features previously mentioned. To summarize, a CNN is used to extract a 4096-dimensional feature vector from each region proposal. Given the nature of the CNN, it is necessary that the input always have the same dimension. This is usually one of the CNN’s weak points and the various approaches address this in different ways. With respect to the R-CNN approach, the trained CNN architecture requires inputs of a fixed area of 227 × 227 pixels. Since the proposed regions have sizes that differ from this, the authors’ approach simply warps the images so that they fit the required dimension.
We will be using Haar Face Classifier for the task
The rectangle on the left is a sample representation of an image with pixel values 0.0 to 1.0. The rectangle at the center is a haar kernel which has all the light pixels on the left and all the dark pixels on the right. The haar calculation is done by finding out the difference of the average of the pixel values at the darker region and the average of the pixel values at the lighter region. If the difference is close to 1, then there is an edge detected by the haar feature.
The darker areas in the haar feature are pixels with values 1, and the lighter areas are pixels with values 0. Each of these is responsible for finding out one particular feature in the image. Such as an edge, a line or any structure in the image where there is a sudden change of intensities. For ex. in the image above, the haar feature can detect a vertical edge with darker pixels at its right and lighter pixels at its left.
The objective here is to find out the sum of all the image pixels lying in the darker area of the haar feature and the sum of all the image pixels lying in the lighter area of the haar feature. And then find out their difference. Now if the image has an edge separating dark pixels on the right and light pixels on the left, then the haar value will be closer to 1. That means, we say that there is an edge detected if the haar value is closer to 1. In the example above, there is no edge as the haar value is far from 1.
This is just one representation of a particular haar feature separating a vertical edge. Now there are other haar features as well, which will detect edges in other directions and any other image structures. To detect an edge anywhere in the image, the haar feature needs to traverse the whole image.
To learn about OpenCV: https://python.plainenglish.io/image-processing-using-opencv-in-python-857c8cb21767
First we will collect the training data. We will be capturing 200 images using the WebCam to train our model
Now, we will be loading the captured 200 images and we will be training the model
After training the model we will run the face recognition for detection, it will also show the confidence level upon recognition
Now according to the task description upon detecting the user it will send a WhatsApp message using WhatsApp web app and send an E-mail too. For this to happen the confidence score needs to be greater than 90%. And if it detects any other person like friend or family member, it will launch an EC2 instance in AWS, it will creates 5 GB EBS volume and attach it to the instance(We need to sign into the AWS account using AWS CLI in the terminal or command prompt in advance). The program will wait for 120 seconds i.e, 2 minutes before initiating the EBS block so that the EC2 instance is initialized and there are no errors while attaching.
We just need to hit enter to close the program.