How I Trained AI To Like A Post Based on Demeanor

14 min readDec 31, 2022

I recently caught the AI bug after playing with ChatGPT. The insanely accurate replies when given the right prompt were astonishing to me. This inspired me to create my own AI projects. Obviously nothing as advanced as ChatGPT.

I had toyed with AI projects in the past but never committed to anything seriously. Out of all the applications for AI, I found myself most drawn to Computer Vision. In summary this is when an AI is able to recognize certain objects or features in an image or video. Admittedly, I would want this to spy on my dogs and have it report back to me when they do something wrong. However, I am not yet familiar enough with AI to put this into reality. I needed to start with something smaller.

After playing with a few concepts I settled on one. I would create a computer vision application that would determine the demeanor of my face at any given moment. I would take this information and use it to like or dislike a social media post when prompted. Seems simple enough so let’s build it.

The Final Product

This is the end result. A webcam feeds a computer vision program frames of my face and it determines my current sentiment about a prompted post.

Prerequisites

Webcam(optional but highly recommended to follow along)
The latest stable version of Python(3.11.1 at the time of writing this)
The latest stable version of NodeJS(18.12.1 LTS at the time of writing this)

Disclaimer

First off, I am no expert in AI and decisions I make in this tutorial may be wrong but this is how I did it. Secondly, I am aware that there are more efficient AI libraries that use the GPU to improve performance of model training. However, OpenCV was easy to setup and use, but also the first result I found and therefore the one I decided to use.

Setting Up OpenCV

First verify that Python is installed and at-least version 3.11.1 by opening a new Command Prompt window.

Input the following:

python --version

If your output looks like mine you should be good to continue. If not, you may need to restart your computer(if you just installed python).

Now that you’ve verified your python installation we can proceed to install OpenCV. Input the following into your Command Prompt.

pip install opencv-contrib-python

Assuming your installation was successful we can proceed to the next stage.

Creating The Trainer

We’re going to start off just creating a folder to keep our project in. Visual Studio Code is my editor of choice. Once you’ve opened the folder in your editor we can add a required file.

The file is a HARR Cascade and it is what we will be using to recognize and isolate the face from our video feed. You can download it here. Place this in your folder and rename it to ‘haar_face.xml’.

Now that is done we can create the first python file. We’ll call it ‘train.py’. In here we will start off by importing the required libraries.

import os
import cv2 as cv
import numpy as np

Then we want to define our categories. These will be used to locate the respective folders of training data and to differentiate between the states. Since we are detecting demeanor we will keep it limited to 3 simple states.

categories = ['positive', 'negative', 'neutral']

Now we will need to define the directory our training data will be stored in. I created a folder in the root of my C: drive called ‘train’. And pointed the path accordingly.

DIR = os.path.join('C:', '/', 'train')

Add two arrays: features and labels. These will be used to categorize. So far your code should look like this:

import os
import cv2 as cv
import numpy as np

categories = ['positive', 'negative', 'neutral']
DIR = os.path.join('C:', '/', 'train')

haar_cascade = cv.CascadeClassifier('haar_face.xml')

features = []
labels = []

Once the libraries and variables have been established we can create the function that will create our training set, along with a for loop.

def create_train():
  for state in categories:

This iterates through all of the states in the aforementioned ‘categories’ variable.

def create_train():
  print(' ------- Training Set Initialized ------- ')
  for state in categories:
    path = os.path.join(DIR, state)
    path = path.replace("\\", '/')
    label = categories.index(state)

First, we are taking the state text and creating a path from it.

This is why I mentioned that your folder names in your ‘train/’ folder need to match the categories.

Next, I do some formatting to the path that you may or may not have to do. I was getting missing directory errors without it.

Finally, we define a ‘label’ variable that will store the index of the current state. This simplifies the training process for the computer so it doesn't have to store entire strings.

for state in categories:
  path = os.path.join(DIR, state)
  path = path.replace("\\", '/')
  label = categories.index(state)

  for img in os.listdir(path):
    img_path = os.path.join(path, img)
    img_path = img_path.replace("\\", '/')

Then we create a nested for loop that will iterate through all of the images in each of the state’s directories by using os.listdir(). Then I repeated the path formatting.

Now we need to read the image at img_path into memory. OpenCV makes this really easy:

img_array = cv.imread(img_path)

Next, we convert the image to grayscale, and input it into our HAAR Cascade to detect any faces.

#Convert to grayscale
gray = cv.cvtColor(img_array, cv.COLOR_BGR2GRAY)
#Detect any faces present in the image
faces_rect = haar_cascade.detectMultiScale(gray, scaleFactor=1.2, minNeighbors=8)

The scaleFactor and minNeighbors can be adjusted to increase or decrease the face detection sensitivity. Additionally, I like to add a printout so I can view the progress of the loading and labeling process.

This is optional

print(f'Labeled {img_path} as {state}')

Once we have recognized faces in the data we need to iterate through, label, and add them to the training data.

for (x,y,w,h) in faces_rect:
  faces_roi = gray[y:y+h, x:x+w]
  features.append(faces_roi)
  labels.append(label)

Here we are defining an object with properties: x, y, width, and height. this is how we define the region where the face is detected. For every face in the image being evaluated we will take these values and crop that area out of the grayscale image and save it to faces_roi. Then append it to the features list, as well as the label. This way the index of the feature will always correspond to its label.

You should have the following code thus far(give or take some printouts):

import os
import cv2 as cv
import numpy as np

categories = ['positive', 'negative', 'neutral']
DIR = os.path.join('C:', '/', 'train')

haar_cascade = cv.CascadeClassifier('haar_face.xml')

features = []
labels = []

def create_train():
  print(' ------- Training Set Initialized ------- ')
  for state in categories:
    path = os.path.join(DIR, state)
    path = path.replace("\\", '/')
    label = categories.index(state)

    for img in os.listdir(path):
      img_path = os.path.join(path, img)
      img_path = img_path.replace("\\", '/')
      img_array = cv.imread(img_path)
      gray = cv.cvtColor(img_array, cv.COLOR_BGR2GRAY)

      faces_rect = haar_cascade.detectMultiScale(gray, scaleFactor=1.2, minNeighbors=8)
      print(f'Labeled {img_path} as {state}')
      for (x,y,w,h) in faces_rect:
        faces_roi = gray[y:y+h, x:x+w]
        features.append(faces_roi)
        labels.append(label)

Our training function is now finished. All we need to do is execute it, take the labeled data, and convert it to a numpy array. This makes the data easier to save.

#Load and label the training data
create_train()
#Convert to numpy arrays
features = np.array(features, dtype="object")
labels = np.array(labels)

Up until this point we have been using Face Detection which only recognizes the presence of a face. Now we want to use Face Recognition. You can use this to distinguish between different people. However, as you may have already guessed, I will be using it to detect between different expressions on a single face.

#Creates a new face recognizer(Builtin OpenCV function)
face_recognizer = cv.face.LBPHFaceRecognizer_create()
#Train the face recognizer on our previously created data set
face_recognizer.train(features, labels)

Finally, you need to save the finished training set. You can elect to label them however you’d like just remember what you choose.

face_recognizer.save('readmyface_train.yml')
np.save('readmyface_features.npy', features)
np.save('readmyface_labels.npy', labels)

Now our trainer is complete and we can use it.

Finished Trainer:

import os
import cv2 as cv
import numpy as np

categories = ['positive', 'negative', 'neutral']
DIR = os.path.join('C:', '/', 'train')

haar_cascade = cv.CascadeClassifier('haar_face.xml')

features = []
labels = []

def create_train():
  print(' ------- Training Set Initialized ------- ')
  for state in categories:
    path = os.path.join(DIR, state)
    path = path.replace("\\", '/')
    label = categories.index(state)
    
    for img in os.listdir(path):
      img_path = os.path.join(path, img)
      img_path = img_path.replace("\\", '/')
    
      img_array = cv.imread(img_path)
      gray = cv.cvtColor(img_array, cv.COLOR_BGR2GRAY)
      
      faces_rect = haar_cascade.detectMultiScale(gray, scaleFactor=1.2, minNeighbors=8)
      print(f'Labeled {img_path} as {state}')
      for (x,y,w,h) in faces_rect:
        faces_roi = gray[y:y+h, x:x+w]
        features.append(faces_roi)
        labels.append(label)

create_train()

print(' ------- Training Set Created ------- ')

features = np.array(features, dtype="object")
labels = np.array(labels)

print(' ------- Training Initialized ------- ')
face_recognizer = cv.face.LBPHFaceRecognizer_create()
face_recognizer.train(features, labels)

print(' ------- Training Completed ------- ')

print(' ------- Saving... ------- ')
face_recognizer.save('readmyface_train.yml')
np.save('readmyface_features.npy', features)
np.save('readmyface_labels.npy', labels)
print(' ------- Saved! ------- ')

Establishing Your Data Set

You must create or find your data set. I chose to take 1000s of picture of my face in variations of each expression. This works but will only work for the individual you take pictures of. What I have determined from my limited AI experience is that a model is only ever as good as the data you give it.

Creating The Face Reader

Now that you have trained your AI we can proceed to using the training set. Create a second python file; I labeled mine ‘readymyface.py’. Now import the necessary libraries and capture our webcam feed.

import numpy as np
import cv2 as cv

#Live Webcam Feed
capture = cv.VideoCapture(0)

cv.VideoCapture() works by grabbing a video capture device at the specified index. In my case I did 0, because I only have a single webcam plugged in. If you have multiple or an additional builtin webcam you may need to change this number.

If you elected to not use a webcam you can ignore the next couple steps and simply plug in an image loaded with cv.imread(‘path’).

def changeRes(width, height):
    capture.set(3, width)
    capture.set(4, height)

changeRes(640, 640)

I add in this helper function to change the resolution. All this does is reduce the strain on my computer by limiting the number of pixels it has to analyze.

Now we need to load our training data. We start with the HAAR Cascade since we still need to detect and pick out faces from the frame. Then we establish our categories again, and load the previously saved training data.

haar_cascade = cv.CascadeClassifier('haar_face.xml')
categories = ['positive', 'negative', 'neutral']
features = np.load('readmyface_features.npy', allow_pickle=True)
labels = np.load('readmyface_labels.npy', allow_pickle=True)

face_recognizer = cv.face.LBPHFaceRecognizer_create()
face_recognizer.read('readmyface_train.yml')

I enable allow_pickle because it prevents a loading error on my machine. You can set this True globally, but I elected not to do so.

In order to read every incoming frame from our webcam we have to constantly check for one. We do this with a while loop.

while True:
  #Reads a new frame from the previously specified capture device
  isTrue, frame = capture.read()

To stop this loop we have to add a catch at the end to detect if we push the ‘D’ key. Outside of the loop we need to destroy the window and stop the webcam capture.

while True:
  isTrue, frame = capture.read()
  
  #Add the rest of the code here

  if(cv.waitKey(20) & 0xFF==ord('d')):
    break

#Stop webcam capture
capture.release()
#Destroy the window
cv.destroyAllWindows()

Now we can continue to adding the frame analysis

while True:
  isTrue, frame = capture.read()

  #convert to grayscale
  gray = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)
  #find all faces in the given frame
  faces_rect = haar_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=8)
  
  #these are use to narrow down the face detection to one face
  start_x = 0
  start_y = 0
  width = 0
  height = 0
  
  #check if there is even a face present in the frame
  if (len(faces_rect) > 0):
    isFace = True
  else:
    isFace = False
  
  #initialize the label and default it to neutral(at index of 2)
  label = 2
  confidence = 0.0
  
  if (isFace == True):
    #iterates through all of the detected faces and picks the closest one based on the size of the rectangle
    for (x,y,w,h) in faces_rect:
      if (w>=width):
        width = w
        height = h
        start_x = x
        start_y = y
    #draw a rectangle around the closest detected face
    cv.rectangle(frame, (start_x,start_y), (start_x+width,start_y+height), (0, 255, 0), thickness=1)

  if(cv.waitKey(20) & 0xFF==ord('d')):
    break

capture.release()
cv.destroyAllWindows()

This seems like a lot to process, but it is actually pretty simple. We start by converting the frame to grayscale and detecting any faces in it. Then we define variables that we use to check against when iterating over all the faces. This is done so that we can determine the closest face and only analyze it.

Next, we check if there are any faces detected at all and set isFace accordingly. If a face is present we proceed to iterate through all the potential faces and find the closest one based on the size of the face detection’s encapsulating rectangle. Then draw that rectangle in the window as an indicator to the user.

faces_roi = gray[start_y:start_y+height, start_x:start_x+width]
label, confidence = face_recognizer.predict(faces_roi)

After drawing the debug rectangle we need to isolate the face into its own image labeled faces_roi and plug that into the face recognizer to analyze against our trained AI. It will then output a label(the index of the state) and a confidence level.

Outside of the isFace check add the following labels and display the frame to a window.

#Background
cv.rectangle(frame, (0,0), (180,40), (0,0,0), -1)
#Is a face detected label
cv.putText(frame, f'Face Detected: {isFace}', (6, 12), cv.FONT_HERSHEY_TRIPLEX, 0.4, (255, 255, 255), thickness=1)
#The current sentiment of a detected face
cv.putText(frame, f'Face Sentiment: {categories[currentSentiment]}', (6, 24), cv.FONT_HERSHEY_TRIPLEX, 0.4, (255, 255, 255), thickness=1)
#The confidence of that sentiment
cv.putText(frame, f'Confidence: {int(confidence)}', (6, 36), cv.FONT_HERSHEY_TRIPLEX, 0.4, (255, 255, 255), thickness=1)

#This draws the current frame to a window labeled "face detect"
cv.imshow('face detect', frame)

Now for my usage I wanted to average out the detected sentiment so I put this into a function labeled ‘analyzeSentiment’.

numFrames = 1
numFramesTo = 3

numPositive = 1
numNegative = 1
numNeutral = 1

currentSentiment = 2

def analyzeSentiment(sentiment):
  global numFrames, numPositive, numNegative, numNeutral, currentSentiment, numFramesTo
  
  numFrames+=1
  
  if (sentiment == 0):
    numPositive+=1
  if(sentiment == 1):
    numNegative+=1
  if(label == 2):
    numNeutral+=1
  
  if(numFrames >= numFramesTo):
    if(numPositive > numNegative & numPositive >= numNeutral):
      currentSentiment = 0
      resetSentimentNormalizer()
    else:
      if(numNegative > numPositive & numNegative >= numNeutral):
        currentSentiment = 1
        resetSentimentNormalizer()
    else:
      currentSentiment = 2
      resetSentimentNormalizer()

def resetSentimentNormalizer():
  global numFrames, numPositive, numNegative, numNeutral
  numPositive = 0
  numNegative = 0
  numNeutral = 0
  numFrames = 0

This function simply takes the current sentiment and increments the relevant variable. Once the number of frames to average reaches the set amount, in this case 3 frames, it picks the sentiment with the highest count and sets the currentSentiment to reflect that.

I still have yet to determine whether this is actually helpful or just a waste of time so this is completely optional.

analyzeSentiment(label)

If you do choose to use it then just call the function right after the face recognizer makes its prediction.

Finished Face Demeanor Script

import numpy as np
import cv2 as cv
import requests

#Live Feed
capture = cv.VideoCapture(0)

def changeRes(width, height):
  capture.set(3, width)
  capture.set(4, height)

changeRes(640, 640)

haar_cascade = cv.CascadeClassifier('haar_face.xml')
categories = ['positive', 'negative', 'neutral']
features = np.load('readmyface_features.npy', allow_pickle=True)
labels = np.load('readmyface_labels.npy', allow_pickle=True)

#Face Recognition
face_recognizer = cv.face.LBPHFaceRecognizer_create()
face_recognizer.read('readmyface_train.yml')

isFace = False

numFrames = 1
numFramesTo = 3

numPositive = 1
numNegative = 1
numNeutral = 1

currentSentiment = 2

def analyzeSentiment(sentiment):
  global numFrames, numPositive, numNegative, numNeutral, currentSentiment, numFramesTo
  
  numFrames+=1
  
  if (sentiment == 0):
    numPositive+=1
  if(sentiment == 1):
    numNegative+=1
  if(label == 2):
    numNeutral+=1
  
  if(numFrames >= numFramesTo):
    if(numPositive > numNegative & numPositive >= numNeutral):
      currentSentiment = 0
      resetSentimentNormalizer()
    else:
      if(numNegative > numPositive & numNegative >= numNeutral):
        currentSentiment = 1
        resetSentimentNormalizer()
    else:
      currentSentiment = 2
      resetSentimentNormalizer()
  
def resetSentimentNormalizer():
  global numFrames, numPositive, numNegative, numNeutral
  numPositive = 0
  numNegative = 0
  numNeutral = 0
  numFrames = 0
  
while True:
  isTrue, frame = capture.read()
  
  #convert to grayscale
  gray = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)
  #find all faces in the given frame
  faces_rect = haar_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=8)
  
  #these are use to narrow down the face detection to one face
  start_x = 0
  start_y = 0
  width = 0
  height = 0
  
  #check if there is even a face present in the frame
  if (len(faces_rect) > 0):
    isFace = True
  else:
    isFace = False
  
  #initialize the label and default it to neutral(at index of 2)
  label = 2
  confidence = 0.0
  
  if (isFace == True):
    #iterates through all of the detected faces and picks the closest one based on the size of the rectangle
    for (x,y,w,h) in faces_rect:
      if (w>=width):
        width = w
        height = h
        start_x = x
        start_y = y
    #draw a rectangle around the closest detected face
    cv.rectangle(frame, (start_x,start_y), (start_x+width,start_y+height), (0, 255, 0), thickness=1)
  
  faces_roi = gray[start_y:start_y+height, start_x:start_x+width]
  label, confidence = face_recognizer.predict(faces_roi)
  analyzeSentiment(label)
  
  #Background
  cv.rectangle(frame, (0,0), (180,40), (0,0,0), -1)
  #Is a face detected label
  cv.putText(frame, f'Face Detected: {isFace}', (6, 12), cv.FONT_HERSHEY_TRIPLEX, 0.4, (255, 255, 255), thickness=1)
  #The current sentiment of a detected face
  cv.putText(frame, f'Face Sentiment: {categories[currentSentiment]}', (6, 24), cv.FONT_HERSHEY_TRIPLEX, 0.4, (255, 255, 255), thickness=1)
  #The confidence of that sentiment
  cv.putText(frame, f'Confidence: {int(confidence)}', (6, 36), cv.FONT_HERSHEY_TRIPLEX, 0.4, (255, 255, 255), thickness=1)
  #This draws the current frame to a window labeled "face detect"
  cv.imshow('face detect', frame)
  
  if(cv.waitKey(20) & 0xFF==ord('d')):
    break

capture.release()
cv.destroyAllWindows()

When running this program you should see something like the following:

Creating The ‘Social Network’

Since this tutorial is primarily AI oriented and my copy of Instagram leaves a lot to be desired I’ll just link the GitHub Here and you can download the NodeJs app, or build your own.

Once it’s downloaded open a command prompt and initialize:

npm init

Then just run the project:

npm run dev

After that you should be able to visit the clone at:

http://localhost/instagram.html

The background will be offset because I designed it to be run in full screen, and again this is not the way to make web apps.

Integrating Python and NodeJs Apps

Now that we having a working facial sentiment recognizer and a ‘working’ web app we can proceed to integrate the two. Assuming you downloaded the Node web app there is really only one step.

Add These Lines

#Add this at the top with the other imports
import requests

#Add this outside of the 'isFace' check
req = requests.post("http://localhost/sentiment?s=%s&f=%s" % (categories[currentSentiment], isFace))

It will look like this:

#Imports
import numpy as np
import cv2 as cv
import requests

#while(true):
#...
if (isFace == True):
    #iterates through all of the detected faces and picks the closest one based on the size of the rectangle
    for (x,y,w,h) in faces_rect:
        if (w>=width):
            width = w
            height = h
            start_x = x
            start_y = y
    #draw a rectangle around every detected face
    cv.rectangle(frame, (start_x,start_y), (start_x+width,start_y+height), (0, 255, 0), thickness=1)
    
    faces_roi = gray[start_y:start_y+height, start_x:start_x+width]
    label, confidence = face_recognizer.predict(faces_roi)
    analyzeSentiment(label)

req = requests.post("http://localhost/sentiment?s=%s&f=%s" % (categories[currentSentiment], isFace))

All this does is post to the REST API running in the Node server with url parameters describing the current sentiment state and whether or not a face is even present. If no face is present you will get this:

Conclusion

I am clearly no expert in AI but I was able to take what I know and create a fun project out of it. Maybe you can do the same. This project gave me a lot of insight into AI and more specifically Computer Vision. I plan to continue doing projects like this, and to publish more tutorials and general papers on my findings in the future. I hope you got something out of this, and thank you for reading. Make sure to check out My GitHub, and the sources listed below.