Multimodal Sentiment and Stance Detection With Red Hen Lab

Mohamed Ahmed Krichen
5 min readMay 3, 2024

--

Ever wondered what lies beneath the surface of televised news? Dive into a groundbreaking project that aims not only to uncover the emotions behind news content but also to dissect the positions taken on critical issues.

By analyzing text, audio, and video, the goal is to understand how emotions and specific positions on issues interact in news narratives. The project involves developing sentiment analysis models for news transcripts, building stance detection models for specific topics, and integrating these into a unified system. The outcome could provide researchers and analysts with a valuable tool for nuanced analysis of news broadcasts.

This blog, curated by Mohamed Ahmed Krichen, offers insights into the ongoing advancements of the ‘Multimodal Sentiment and Stance Detection’ project, a collaboration with🔔 Red Hen Lab under the GSoC 2024 program🔔

*Coding Officially begins*

Week 1

Objective:

This week, I aimed to search for samples of desired data on three main topics: Immigration, Corona Virus, and Gun Control.

Steps Involved:

I sourced this information from CNN, Fox News, and MSNBC, finding four videos that ranged from slightly biased to obviously biased.

Video 1 Video 2 Video 3 Video 4

Week 2

Objective:

This week, I focused on developing a method to extract features from videos, specifically to analyze the emotions in the video frames.

Steps Involved:

Emotion Detector Initialization:

  • Set up a pre-trained emotion detection model that uses advanced techniques for accurate facial emotion recognition.

Video Analysis Function:

  • I created a function to read and process the video file frame by frame.
  • Within the function, emotions are detected at specified intervals to avoid processing every single frame, which optimizes performance.
  • For each processed frame, the dominant emotion is identified and recorded.

Video Processing:

  • Provided a video file for analysis.
  • The function processes the video, capturing and analyzing frames to detect emotions.

Result Aggregation:

  • After processing, the detected emotions are counted to provide a summary of the emotional content throughout the video.
  • This summary includes the frequency of each detected emotion, offering an overview of the emotional landscape of the video.

Example Outcome:

The analysis might reveal that certain emotions, such as neutrality or anger, dominate the video, while others, like sadness, appear less frequently. This data can be used to understand the overall emotional tone and to identify specific segments of the video where particular emotions are prevalent.

Week 3

Objective:

To process and analyze a video to identify key speakers, specifically the host speaker, using advanced machine learning models.

Steps Involved:

Video Input Preparation:

  • Obtain the video file containing the conversation.
  • Ensure the video is in a suitable format for processing.

Speaker Diarization:

  • Use the “speaker-diarization-3.1” pre-trained model with the Python Pyannote library.
  • Extract audio features from the video to identify different speakers.
  • Determine the time spans during which each speaker is active.

Creating a Conversation Dictionary:

  • Organize the extracted speaker data into a structured format.
  • Create a Python dictionary to represent the conversation, with each entry detailing a speaker’s identity and their corresponding time spans.

Data Preprocessing:

  • Clean and preprocess the conversation data to ensure compatibility with subsequent models.
  • Ensure the conversation dictionary is accurately formatted and free from errors.

Host Speaker Identification:

  • Utilize the “Microsoft/Phi-3” LLM model to analyze the conversation dictionary.
  • Feed the dictionary into the model to assess the context of the conversation.
  • Identify the host speaker based on contextual cues and interactions within the conversation.

Result Verification and Output:

  • Verify the accuracy of the identified host speaker.
  • Output the results in a clear and understandable format.
  • Review the process and make adjustments if necessary to improve accuracy and efficiency.

Week 4

Objective:

To process and optimize audio data by segmenting it into manageable chunks and embedding these segments for efficient storage.

Steps Involved:

Utilize Previous Data:

  • Use the dictionary and extract features from the previous week’s work.
  • Ensure all necessary data from previous analyses is readily available.

Segmenting Audio Data:

  • Divide the audio data into 5-second chunks.
  • Ensure each chunk contains the following key information:
  • Tone: The tone of the audio.
  • Audio Emotions: Emotional content of the audio.
  • Frame Text: Textual representation of the audio frame.
  • Speaker: Identified speaker for the chunk.
  • Rank: Time Rank of the chunks compared to the video.
  • Audio: Extracted audio.

Embedding Chunks:

  • Apply Principal Component Analysis (PCA) to each chunk.
  • Reduce the dimensionality of the data to ensure efficient storage.
  • Verify that the embedding process preserves essential information while minimizing space usage.

Data Optimization:

  • Check the embedded chunks for any data loss or inaccuracies.
  • Optimize storage without compromising the integrity of the key information.
  • Ensure the embedded data is easily retrievable for future use.

Result Validation:

  • Validate the embedded chunks to ensure they meet the required standards.
  • Review the entire process to confirm that the objectives of efficient storage and data integrity are achieved.

Week 5

Objective:

To manually annotate selected videos with time segmentations that categorize content as “neutral,” “biased with,” or “biased against” specific topics using the ELAN software.

Steps Involved:

Video Selection:

  • Refer to the videos chosen in the first week, ensuring they cover specific topics such as immigration or gun control.
  • Confirm that the selected videos are suitable for detailed annotation.

Manual Annotation with ELAN:

  • Use the ELAN software for precise annotation of the videos.
  • Load each video into the ELAN software, preparing it for time segmentation.

Topic-Based Segmentation:

  • Identify and mark the time segments within each video.
  • Categorize these segments based on the content’s stance towards the topic:
  • Neutral: Content that is unbiased and presents information without a discernible stance.
  • Biased With: Content that shows a positive bias or support for the topic.
  • Biased Against: Content that shows a negative bias or opposition to the topic.

Annotation Consistency:

  • Ensure that the annotations are consistent across all videos.
  • Apply the same criteria and standards for determining the bias or neutrality of each segment.

Review and Verification:

  • Review the annotated segments to ensure accuracy and consistency.
  • Make any necessary adjustments to the annotations for improved reliability.

Week 6

Objective:

To leverage an HPC cluster with Singularity for the first time, creating a script that processes videos and outputs a JSON file summarizing the conversation topic and bias orientation.

Steps Involved:

HPC Cluster Setup:

  • Access and configure the HPC cluster for use.
  • Install and set up Singularity to manage containerized applications.

Script Development:

  • Create a script designed to take a video as input.
  • Ensure the script can be run within the Singularity environment on the HPC cluster.

Utilize Previous Models:

  • Incorporate the models developed in previous weeks into the script.
  • Ensure the script uses these models to analyze the video content accurately.

Video Processing:

  • Run the script to process each video.
  • Analyze the video’s content to determine the conversation topic, bias presence, and bias orientation.

Model Integration:

  • Use the meta-llama/Meta-Llama-3–8B-Instruct model within the script.
  • Leverage this model to enhance the accuracy of topic and bias detection.

JSON Output Creation:

  • Format the analysis results into a JSON structure.
  • Ensure the JSON output includes:
  • Conversation Topic: The main topic of the conversation (e.g., “Immigration”).
  • Biased: Indication of whether the content is biased (“Yes” or “No”).
  • Orientation: The orientation of the bias (“For” or “Against”).
  • Week 7
  • Week 8
  • Week 9
  • Week 10
  • Week 11
  • Week 12

--

--