Sandeep M D
hubbleconnected
Published in
6 min readMay 28, 2018

--

Emotionally Intelligent Smart Environments

Photo by Ramón Salinero on Unsplash

“My father tried to teach me human emotions. They are… difficult”

— Sonny (i,Robot)

Introduction

Imagine after a tiring working day walking into your home where everything is pre-set to your tastes and likings — Dim lights to relax you, Hot Teapot ready to serve, Room temperature set to soothe, Bath ready with pre-heated water and psychedelic music playing from your usual play-list…

Imagine getting into your car and everything is pre-set to your usual settings — Seat adjusted to ease your drive, AC set to your usual level or based on the temperature outside, Dashboard presents map to your destination based on your usual travel time to work or based on a appointment you made in your calendar, Auto-check car health and alert on possible service appointments pending…

Autonomous decision-making self-sustaining environments are inevitable part of the future. Every object talking to every other object within a system making sure the objective around providing the best experience to User in the context is consistently achieved is no longer a distant reality. The most crucial part of this process though is fundamentally achieving seamless interactions, interactions that don't have to constantly depend on constant interventions from Humans, interactions which naturally makes sense under a natural language of communication between Machine & Humans which is well established and consistent.

Machine-Human interaction has been a fascinating area of technological advancements leading to several pathways around machines bettering their understanding of human behavior. As we begin advancing further into this area we will push more and more towards personalizing interaction in favor of human comfort and easiness. The key issues though will be — how far Human will voluntarily interact with Machine & how far can a Machine understand Human without any natural interactions. Looking at the latter case assessing Human conduct just by observation is a mammoth task for machine, its almost mimicking another human being and rightly in the direction of bettering their process of assessment. Its a process involving constant learning and deductions, not easy but definitely not impossible. Humans have a natural tendency to trust those who understand them and machines will be no different in establishing this trust factor to improve their communications with Humans. Although there are several factors to consider to build a relationship between Machine & Human, this article purely looks at the perspective of recognizing Emotions, understanding natural interactions & establishing context of a scene and going forward how these factors will affect the efficiency of a smart environment.

Emotion Recognition

On a very basic level Human beings have Six distinct type of emotions — Anger, Disgust, Fear, Happiness, Sadness and Surprise. More complex emotions can be derived out of these basic building blocks. The best way to assess these emotions from the machine’s point of view is to look at what facial expression or verbal expression can be consistently mapped for what emotion and how to code it for reference. This might be a broad set of data to reference but it will provide fundamental blocks to analyze emotion with a certain probability of accuracy.

Hubble HUGO Demo

Facial Action Coding System (FACS) developed by Carl-Herman Hjortsjo and adopted by Paul Ekman has been effectively used to develop such reference system and has been used in Emotion research from as early as 1980’s. The system has been constantly updated by multiple researchers across multiple application domains and has remained a standard for categorizing the physical expression of emotions.

There are several companies which have published solutions on the emotion recognition end but few have presented sufficient data to prove that their methodology works to provide adequate detection. We recently tried out internally an approach for a Detection Engine on emotion recognition and results were quite deterministic. The engine provides more than 30 parameters of assessment in each frame of image and plots score to determine what emotion were diagnosed more prominently. Collectively scores pointed out what sort of emotions to pick from the whole set for the video analyzed. The engine was tried out against multiple videos and the results were consistent for the scores of the individual emotion parameters assessed. This engine uses Deep learning to train models around emotion detections with varying criteria around the datasets we have [will provide additional data in a separate overview].

Hubble HUGO Demo

Multiple companies are jumping at the opportunity to deliver something concrete in this area as they see a key differentiation in bettering user experience through this skill.

Determining emotions correctly will in a long way help make right decisions with respect to a context in question. An important skill for a machine to have to engage on a next level !

Interaction Recognition

This part of the problem is relatively complex and types of data to be assessed will vary vastly. The approach most suitable to solve this problem is to turn to Deep learning where we prepare training models around specific set of interactions we can recognize very clearly from visual content. Designating these activities and varying datasets might provide us the opportunity to explore the interactions more accurately . For eg: “Closing a window” ,”Cutting a Cake” , “Riding a bike” , “Lifting a box”, “Exchanging gifts”, “waving goodbye”, “Handshake” — may have different varieties of visual information available but they can be clearly distinguished to certain extent.

Photo by rawpixel on Unsplash

We are currently in a process to come up with a suitable approach to design this component and are sufficiently convinced that in a given context (Indoor/Home Monitoring in Hubble’s case) certain Human-Human interactions can be recognized by the Machines with appropriate training. Though this effort is in beginning stages with a working prototype, we are in process to establish this as a base to provide to User a derivative content called “Key Moments” from the uploaded Visual media of User Camera’s.

The core criteria is to rank the frames with “recognizable & relevant” Human activity, Animal/Pet activity or any object based activity. Then choose the top ranked frames to provide the content as “Key Moments” for specific time period.

Its difficult to determine what qualifies as “Key” for individual users but based on a feedback system from the Consumer a learning system can possibly determine what content to avoid/not-avoid serving to specific customer and essentially keep them happy with catered curated content!

Scene Recognition

Emotion Recognition with Interaction Recognition forms the basis for recognizing scenes. The capability to recognize scenes accurately will assist in developing many interesting follow up use cases (as mentioned at the start the article) that can collectively pull up the quality of a smart environment extensively.

The scene recognition in totality will guide the decision making of multiple machines within a system where they arrive at a consensus and decide that at this moment we need to do Action1 , Action2 .. ActionN so that they achieve in providing a particular type of experience. For the accident/error/damage-prone surroundings such a system will play a crucial role in saving lives , declare warnings and also possibly predict/guess issues quite early thereby improving the overall quality of living. For entertainment driven society constant delivery of likable content through real time assessment is a major improvement. For a impatient world , decision on-demand based on such recognitions will be hugely conclusive & imperative inviting quick actions of progress.

Smart Homes, Smart Farms, Automated Factories, Autonomous Automobiles etc. all fall under such use cases and the best way to improve their accuracy is to develop a framework which can understand the context in completeness with Audio-Visual capability rich in recognition specialties.

Summary

Smart environments will stand firmly on the strong connect of communication between machines in the environments including Humans. A collective consciousness that drives every human & machine towards a similar objective of achieving certain goal. Reading Emotions & Understanding interactions in these environments will be the key to push it further forward. A world where we no longer have to depend extensively on buttons, switches & click-ables to make machines work!

References

https://en.wikipedia.org/wiki/Facial_Action_Coding_System

--

--