Image generated using DALL-E 3

Emotion Detection using Machine Learning

Varun Tyagi
10 min readFeb 22, 2024

--

In this blog post, we will explore the process of building an emotion detection system using machine learning. The goal is to create a real-time system that can detect emotions from faces captured by a webcam. The code will be broken down into sections, and each section will be explained in detail to provide a comprehensive understanding of the implementation. We’ll start with the basics and gradually delve into more complex concepts. But why is real-time emotion detection so important. Let us first understand it

Why real-time emotion detection?

Real-time emotion detection holds significant importance in today’s world, especially in corporate and industrial settings. The ability to accurately and swiftly detect emotions has various applications across different industries, contributing to improved user experiences, employee well-being, and overall business efficiency. Here are some use-cases where real-time emotion detection is valuable in today’s world

Enhanced User Interaction

In customer service and user experience design, real-time emotion detection allows systems to adapt and respond based on users’ emotional states. This can lead to more personalized and engaging interactions, enhancing overall customer satisfaction. But everything has its pros and cons. The biggest cons is privacy issues. Let us delve into it.

Pros

  1. Personalized Experiences: Real-time emotion detection enables systems to provide personalized content and interactions based on users’ emotional states.
  2. Increased Engagement: Adaptive responses to user emotions lead to more engaging and enjoyable experiences.
  3. Improved Customer Satisfaction: Understanding user emotions allows businesses to address concerns promptly, enhancing overall customer satisfaction.

Cons

  1. Privacy Concerns: Continuous monitoring of user emotions raises privacy concerns, as users may be uncomfortable with the collection and analysis of their emotional data.
  2. Accuracy Challenges: Emotion detection algorithms may not always accurately interpret complex human emotions, leading to misinterpretations and potentially inappropriate responses.

Employee Well-being and Productivity

In corporate environments, understanding the emotional well-being of employees is crucial for fostering a positive workplace culture. Real-time emotion detection can be used to assess stress levels, job satisfaction, and overall mental health, allowing organizations to implement measures to improve employee well-being and productivity.

Pros

  1. Health Monitoring: Real-time emotion detection can contribute to monitoring employee stress levels and mental health, fostering a healthier workplace.
  2. Adaptive Work Environment: Organizations can adapt work environments based on employee emotions, promoting a positive and productive atmosphere.

Cons

  1. Invasion of Privacy: Just as customers, employees may also feel uncomfortable with continuous monitoring of their emotional states, considering it an invasion of privacy.
  2. Ethical Considerations: The use of emotion detection data in employment decisions may raise ethical concerns, especially if it impacts performance evaluations or promotions.

Human-Computer Interaction

In human-computer interaction scenarios, such as gaming or virtual reality, real-time emotion detection can enhance the immersive experience. Systems can dynamically adjust content, difficulty levels, or responses based on the user’s emotional cues, providing a more tailored and engaging experience.

Pros

  1. Enhanced User Experience: Real-time adaptation based on emotional cues improves the overall user experience in applications like gaming and virtual reality.
  2. Increased Immersion: Emotion-aware systems contribute to a more immersive and realistic interaction between humans and computers.

Cons

  1. Technical Challenges: Developing accurate emotion detection algorithms for diverse user emotions and expressions can be technically challenging.
  2. User Acceptance: Users may find the constant monitoring of emotions intrusive or uncomfortable, affecting their acceptance of such systems.

Security and Surveillance

Real-time emotion detection can be integrated into security and surveillance systems for public safety. Identifying suspicious behavior or detecting distress signals in public spaces can enhance security measures and enable quick responses in critical situations.

Pros

  1. Public Safety: Real-time emotion detection enhances security by identifying potential threats or distress signals in public spaces.
  2. Quick Responses: Security personnel can respond swiftly to situations that may require intervention.

Cons

  1. Privacy Invasion: Continuous monitoring of individuals in public spaces may infringe on privacy rights.
  2. Bias and Misinterpretation: Emotion detection systems may exhibit bias and misinterpretations, leading to false alarms or misidentifications.

Banking Sector

Facial emotion detection also has great applications in the banking sector. This technology can be utilized for multitude of purposesd:

Pros

  1. Enhanced Customer Service: Implementing facial emotion detection at customer service desks or during interactions with bank representatives can help assess customer satisfaction and identify any issues that may need immediate attention.
  2. Fraud Prevention: Moreover, facial emotion detection can be integrated into security systems to identify suspicious or abnormal behavior, potentially indicating fraudulent activities. Unusual emotional responses during transactions or account access could trigger alerts for further investigation.
  3. ATM Security: This technology can also be installed in the ATM machine to monitor signs of distress or coercion during cash withdrawals, providing an additional layer of safety for users.

Cons

  1. Privacy Concerns: Continuous monitoring of facial expressions raises privacy concerns, as customers and employees may feel uncomfortable with the collection and analysis of their emotional data.
  2. Accuracy Challenges: Emotion detection algorithms may not always accurately interpret complex human emotions, leading to potential misinterpretations and impacting the reliability of the system.
  3. Ethical Considerations: Using facial emotion detection in decision-making processes, such as loan approvals, may raise ethical concerns, as decisions based on emotional data could lead to bias or discrimination.

Having had a detail understanding of emotion detection applications, let us dive into the code.

Libraries and Dependencies

Let’s begin with importing the necessary libraries

import cv2
import cvlib as cv
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
import joblib
import pandas as pd
import numpy as np

Let’s also understand the use of all the libraries

  • cv2: OpenCV, a popular computer vision library.
  • cvlib: A simple library for object detection using OpenCV.
  • train_test_split: A function from scikit-learn to split data into training and testing sets.
  • RandomForestClassifier: A machine learning classifier from scikit-learn.
  • Pipeline: A tool for building composite estimators in scikit-learn.
  • joblib: A library for lightweight pipelining in Python.
  • pandas: A data manipulation library.
  • numpy: A library for numerical operations.

Synthetic Data Generation

The first section involves generating synthetic data for training our emotion detection model. The generate_synthetic_data function creates a synthetic dataset containing images and corresponding emotion labels. This function is crucial for initial model training when real-world labeled data may be limited. For simplicity, we’ll use a basic function that creates random text and assigns emotions. This function creates a synthetic dataset with random image data and corresponding emotions. You’ll need to replace the image generation with your actual image data loading mechanism. Let us break down the code and understand each line.

  • num_samples (default value: 1000): Specifies the number of samples to generate in the synthetic dataset.
  • np.random.seed(42)sets the seed for the random number generator in NumPy to ensure reproducibility. Setting the seed allows others to reproduce the same synthetic dataset by using the same seed.
  • An array emotions is defined, containing four emotion labels: 'happy', 'sad', 'angry', and 'neutral'. These labels will be randomly assigned to the synthetic images.
  • A dictionary data is initialized with two empty lists: 'image' for storing synthetic image data and 'emotion' for storing corresponding emotion labels.
  • The function uses a loop to generate synthetic data for the specified number of samples (num_samples). For each iteration: image_data, a random synthetic image of shape (100, 100, 3) is generated, simulating a 100x100 RGB image with values between 0 and 255, emotion, a random emotion label is chosen from the defined list of emotions using np.random.choice. The generated image data and corresponding emotion label are appended to the ‘image’ and ‘emotion’ lists in the data dictionary.
  • The collected data dictionary is used to create a pandas DataFrame (df), where each row represents a synthetic sample with image data and emotion label.
  • Finally, the function returns the synthetic dataset as a pandas DataFrame, allowing it to be easily used for further analysis, training machine learning models, or other applications.
def generate_synthetic_data(num_samples=1000):
np.random.seed(42)
emotions = ['happy', 'sad', 'angry', 'neutral']
data = {'image': [], 'emotion': []}

for _ in range(num_samples):
# Generate random synthetic image data (replace this with your actual image data loading)
image_data = np.random.rand(100, 100, 3) * 255 # Example random image
emotion = np.random.choice(emotions)
data['image'].append(image_data)
data['emotion'].append(emotion)

df = pd.DataFrame(data)
return df

Data Splitting

Next, let’s split the data into training and testing sets using scikit-learn’s train_test_split . Here, X_train and y_train represent the features (images) and labels (emotions) of the training set, while X_test and y_test represent the testing set.

# Generate synthetic training data
train_data = generate_synthetic_data()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(train_data['image'], train_data['emotion'], test_size=0.2, random_state=42)

Building the Pipeline

Now, let’s build a machine learning pipeline. In this case, we’ll use a RandomForestClassifier within a scikit-learn Pipeline. The pipeline simplifies the workflow by combining preprocessing steps and the classifier into a single object.

# Build a simple pipeline with a classifier
model = Pipeline([
('classifier', RandomForestClassifier(n_estimators=100)) # Use RandomForestClassifier for image data
])

Flattening Images and Training

Before training the model, we need to flatten the images. The RandomForestClassifier expects non-negative integer input features, and flattening helps achieve this. Here, we flatten each image in the training set, resulting in a 1D array.

# Flatten the images for training
X_train_flat = [img.flatten() for img in X_train]

# Train the model
model.fit(X_train_flat, y_train)

Saving the Trained Model

After training, we save the model using joblib . This step allows us to reuse the trained model without retraining every time. The model will be saved into your present working directory.

# Save the trained model
model_filename = 'emotion_detection_model.pkl'
joblib.dump(model, model_filename)
print(f"Trained model saved to {model_filename}")

Loading the Trained Model

Now, let’s load the trained model for real-time emotion detection

# Load the trained model
loaded_model = joblib.load(model_filename)

Capturing Video

We use OpenCV to capture video from the default camera

# Start capturing video from the default camera (usually the built-in webcam)
cap = cv2.VideoCapture(0)

Preprocessing and Prediction

In the main loop, we detect faces, preprocess them, and make predictions.

Let’s go through each section of the code:

  1. Infinite Loop: The script uses an infinite loop (while True) to continuously capture and process video frames until the user decides to exit.
  2. Capture Frame: ret, frame = cap.read() captures a frame from the video source (e.g., webcam). cap is a video capture object created earlier in the code.
  3. Face Detection: faces, confidences = cv.detect_face(frame) uses the detect_face function from the cvlib library to identify faces in the captured frame.
  4. Face Processing Loop: The script then loops through each detected face, extracting the bounding box coordinates, cropping the face from the frame, and resizing it to a fixed size.
  5. Emotion Prediction: The flattened face image is used as input to a pre-trained machine learning model (loaded_model) to predict the emotion.
  6. Drawing Bounding Box and Label: The script draws a bounding box around each detected face and displays the predicted emotion label on the frame.
  7. Display Resulting Frame: cv2.imshow('Emotion Detection', frame) displays the resulting frame with bounding boxes and emotion labels.
  8. Break the Loop: The loop can be terminated if the ‘q’ key is pressed (if cv2.waitKey(1) & 0xFF == ord('q')).
  9. Release Resources: After exiting the loop, the video capture object is released (cap.release()), and all OpenCV windows are closed (cv2.destroyAllWindows()).

The code essentially captures video frames, detects faces, predicts emotions for each face, and displays the results in real-time. The loop continues until the user presses the ‘q’ key to exit the program.

while True:
# Capture frame-by-frame
ret, frame = cap.read()

# Detect faces in the frame
faces, confidences = cv.detect_face(frame)

# Loop through detected faces
for face, confidence in zip(faces, confidences):
(start_x, start_y, end_x, end_y) = face

# Crop the face from the frame
face_crop = frame[start_y:end_y, start_x:end_x]

# Resize the face for prediction (adjust the size as needed)
face_resize = cv2.resize(face_crop, (100, 100))

# Flatten the face image for prediction
face_flat = face_resize.flatten()

# Perform emotion prediction
emotion = loaded_model.predict([face_flat])[0]

# Draw bounding box and label on the frame
label = f'Emotion: {emotion}'
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), (0, 255, 0), 2)
cv2.putText(frame, label, (start_x, start_y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)

# Display the resulting frame
cv2.imshow('Emotion Detection', frame)

# Break the loop if 'q' key is pressed
if cv2.waitKey(1) & 0xFF == ord('q'):
break

# Release the capture
cap.release()
cv2.destroyAllWindows()

Alternative Models: Pre-trained Models

While training a model from scratch is one approach, leveraging pre-trained models can provide better accuracy. Here are two popular alternatives:

OpenCV Pre-trained Haarcascades

OpenCV comes with pre-trained Haarcascades for face detection. We can combine this with a pre-trained deep learning model for emotion recognition.

import cv2

# Load pre-trained face detector
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

# Load pre-trained emotion recognition model (replace with your actual path)
emotion_model = cv2.dnn.readNet('path/to/emotion_model.prototxt', 'path/to/emotion_model.caffemodel')

# ... (capture video and loop through frames)
# Within the loop, use face detection and emotion recognition

Deep Learning with TensorFlow and Keras

Another option is to use a deep learning model, such as one trained on the FER2013 dataset. Here, we’ll use a pre-trained model from Keras

import cv2
from keras.preprocessing import image
from keras.models import load_model
import numpy as np

# Load pre-trained emotion recognition model (replace with your actual path)
emotion_model = load_model('path/to/emotion_model.h5')

# ... (capture video and loop through frames)
# Within the loop, use emotion recognition

DeepFace

We can also utilize pre-trained deep learning models for facial expression recognition, such as those available in deep learning libraries like TensorFlow or PyTorch. We can achieve more accuracy and faster time to insights using the pre-trained deep learning model from the deepface library

# Make sure to install deepface library using
# pip install deepface

import cv2
from deepface import DeepFace

# Start capturing video from the default camera (usually the built-in webcam)
cap = cv2.VideoCapture(0)

while True:
# Capture frame-by-frame
ret, frame = cap.read()

# Detect faces in the frame
faces = DeepFace.detectFace(frame, detector_backend='opencv')

# Check if faces are detected
if faces is not None and len(faces) > 0:
# Loop through detected faces
for face in faces:
(start_x, start_y, end_x, end_y) = (face['box']['x'], face['box']['y'], face['box']['x']+face['box']['w'], face['box']['y']+face['box']['h'])

# Crop the face from the frame
face_crop = frame[start_y:end_y, start_x:end_x]

# Perform emotion prediction
result = DeepFace.analyze(face_crop, actions=['emotion'])

# Get the dominant emotion
emotion = max(result['emotion'], key=result['emotion'].get)

# Draw bounding box and label on the frame
label = f'Emotion: {emotion}'
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), (0, 255, 0), 2)
cv2.putText(frame, label, (start_x, start_y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)

# Display the resulting frame
cv2.imshow('Emotion Detection', frame)

# Break the loop if 'q' key is pressed
if cv2.waitKey(1) & 0xFF == ord('q'):
break

# Release the capture
cap.release()
cv2.destroyAllWindows()

Conclusion

In conclusion, while the initial code provided a basic machine learning approach, leveraging pre-trained models can significantly enhance the accuracy of emotion detection. Whether using Haarcascades for face detection or deep learning models, the choice depends on the specific requirements and available resources. Feel free to experiment with different models and datasets to find the best approach for your application.

Code

Emotion Detection using RF

--

--