Case study: Read your audience
Computer Vision and AI — an opportunity to make online presentations more intuitive.
Like most people, I enjoy getting instant feedback when presenting to an audience. So, presenting to a screen, for the past year has been challenging as it’s less engaging and harder to read the audience’s emotions.
It made me wonder if there isn’t a way of making presenting online more engaging and how I could read people’s reactions to my presentation even better while being stuck behind my screen.
As a result, I spent the last few weeks experimenting with how I could make emotions visible for myself so that I could feel the same way I did when demoing my work in front of real people.
The problem
My initial thoughts were, that because the audience was not visible to me I couldn’t empathize with the audience as well as I would in a real-life setting. But then I realized, we need to communicate in a different way. The challenge, therefore, lays in how to get the audience to participate more during a demo.
Focus and frame of the project
Online presentations are the new norm and it seems we must investigate and enhance the experience. So that working from home or on a workcation does not become too monotonous or cause individuals to become frustrated after time. Frustrated in terms of a lack of personal feedback. Cameras have previously demonstrated tremendous potential as general sensors in various applications.
Why not use our web camera with help of computer vision and AI to generate emotional feedback?
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos.
With this technology, I truly believe that video conferencing can become more exciting and engaging. To keep things simple at first, the project focus will be concentrated on two emotional expressions: happiness and disapproval.
Why happiness and disapproval
It seems that happiness and disapproval are opposing emotions that commonly accrue in a presentation. People are happy when things are going well or when they enjoy what they see. Knowing that people are enjoying our presentations gives us a sense of happiness. Disapproval is the lack of approval or the belief that someone or something is terrible or incorrect. Therefore, if we experience disapproval from our audience that automatically makes us feel less confident.
Focus group
Listeners and presenters have different needs while using a video conferencing tool. Concentrating on the listeners and how computer vision might assist them to contribute more effectively could benefit the presenter as well. Based on this, the study will therefore focus on the listeners rather than the presenter. My research will keep a wide range of listeners in mind, therefore not only be based on a white males-focus group. As a foreigner, I have first-hand experience of how emotions can be expressed in different cultures. This aspect of inclusion I want to keep in mind when going forward.
Theory on what emotions are
When people think about emotions, they think about feelings at the same time, but emotions and feelings are different things. Feelings are subjective emotional experiences driven by conscious thoughts and reflections. Emotions, on the other hand, describe physiological conditions and are subconsciously generated. There are basic emotions exciting that are highly discussed and there are different opinions on their definition. The following is a definition of emotions by Robert Plutchik.
Plutchik’s Wheel of Emotions
Robert Plutchik, a psychologist, organized emotions into a wheel called “Plutchik’s Wheel of Emotions,” which contains eight primary emotions.
- Anger: opposite of calmness
- Sadness / Fear: the feeling of being afraid, frightened, scared feeling of loss
- Joy: feeling happy
- Disgust: the feeling of aversion or disapproval
- Surprise: being unprepared for something
- Trust: in which confidence is placed
- Anticipation: looking forward positively to something
These emotions can then be combined to create other emotions, such as surprised + sadness = disapproval.
Ideation
An affinity diagram was created during the ideation phase to organize the findings. What fascinated me the most during my research was “making recommendations based on emotions”. So I decided, to select this for a deeper exploration.
The Prototype
For testing purposes, a high-fidelity prototype in HTML and Javascript was constructed, together with a javascript library called “face-API.” Face-API is a basic face tracking toolkit that can recognize basic emotions in a video or image. Face-API uses the tensorflow.js core to recognize emotions in faces.
The emotions of the person who initiated the prototype are tracked by the built-in camera of a laptop while the prototype is running; this person will be one of two listeners in this simulation of a video conference. Another person is providing a PowerPoint presentation about ocean fishes to make the scene as realistic as possible.
Testing
The testing was carried out with three test participants and was followed by an interview. While three test participants are only providing a low level of insights, additional testing is ongoing at the moment.
Feedback on happiness
If the listeners indicate happiness for a brief period of time, the prototype reacts with an overlay of recommendations from which the user can pick to convey their sentiments to the presenter. For example clapping hands, a smile, and more.
Expectations on the testing.
- The listener's facal expressions would be recognized.
- I was hoping that the participant would therefore be encouraged to leave feedback.
Testing results
- The moving-in overlay gets perceived as a surprise. The participants recommended more modest feedback which not obscured the presentation screen.
- The selection of emotions is not well-understood. Feedback on better emojis: thumb up, some text with great work, and something which expresses a WOW!
Improvements
As an enhancement, I included a bar at the bottom of the shared screen that functions like a toast (snack bar) and contains buttons for sending an emoji to the presenter. The emoji will be shown for a brief moment of time at the bottom of the shared screen for all video conference participants. To the list of sendable emotional feedback, I added a thumbs-up and a star-eye-emoji button.
Feedback on disapproval
As previously said, disapproval is a combination of two basic emotions, and people experience it when they lack approval or believe that someone or something is terrible or wrong.
If during a presentation suddenly something becomes unclear, I realized it would be extremely beneficial to either ask a question there and then or make a note to discuss it later on. The function of being able to save a question and then pick it up at the end of the presentation has proven to be incredibly beneficial. In comparison for example, in other video conferencing tools you have to scroll back in the group chat to find your original comment. The presenter’s speech is tracked with the help of AI any signs of disapproval are recognized and converted into a series of questions. These questions can then be forwarded to the presenter in the group chat, therefore once again enhancing the interaction between the audience and the presenter.
Expectations on the testing.
- As a test result, I want that the listeners express feelings by face expressing disapproval
- As a test result, I want that the listeners send the question to the presenter.
Test results
- It would be help full to save the question for asking later after the presentation.
- I would help to send it automatically after the presentation.
- “Send in chat” got perceived as helpful
- Questions collect in a sidebar so that the listener can ask them in person after the presentation
- Moving-in overlay got perceived as surprising.
Improvements
As an upgrade, I included a bar at the bottom of the shared screen that functions like a toast (snack bar) and contains the produced question with two interactions. The first interaction is to send the question directly and the second interaction lets you keep the question for later, allowing all collected questions to be sent at the end of the presentation.
Learnings
- Using Computer vision to react to emotions is extremely beneficial for both the presenter and the audience.
- But the overall interaction needs to be subtle and not dominant in the presentation.
- Subconscious interaction over emotions can enhance video calls and can give the presenter more confidence and make the audience feel included.
- Interaction based on emotions can be used to educate people about various functionalities.
Limitations
- Emotional tracking could be used to urge users to participate more as the program initiates a dialog with the users through provided interactions.
- Because facial expressions can be faked, emotional tracking on a single sensor has a medium/low accuracy, necessitating the usage of multiple sensors for improved accuracy.
Next steps
- I believe it's very important for a designer to create inclusive technology. For this reason, I ask myself: “Does this technology work equally well for different genders and skin colors?”. Therefore this is something I am currently putting a lot of energy into.
- As a foreigner, I have first-hand experience of how different cultures express emotions, therefore I will try to research precisely what are the differences and how to create an equally good experience through all cultures.
Thank you for taking the time to read! Please clap and subscribe if you enjoyed it. It would mean a lot to me and encourage me to write more stories like this.
I would appreciate it if you would leave a comment to share your thoughts and experiences with the community and me.
If you want to support me, please buy me a Coffee ☕ 👉 https://www.buymeacoffee.com/flowr

