Attendance or Attention — What to be monitored in online meetings?

Pragmakar
ACM CSCW
Published in
8 min readOct 13, 2022

This post is a summary of our paper Bifurcating Cognitive Attention from Visual Concentration: Utilizing Cooperative Audiovisual Sensing for Demarcating Inattentive Online Meeting Participants by Pragma Kar, Samiran Chattopadhyay and Sandip Chakraborty which has been accepted to CSCW ’22.

Public Image on Unsplash

The ABCs of attention.

Everyone knows what attention is. It is the taking possession of the mind in clear and vivid form of one out of what seem several simultaneous objects or trains of thought.” — William James, The Principles of Psychology.

Attention can be perceived as the process of focusing on or being conscious about a particular information, from a pool of information dispersed in the environment, at any given instance. Attention, thus, involves the working memory of a person. This is facilitated by Top-down sensitivity control or Bottom-up salience filters. Among several other definitions of, and studies on attention, Posner analyzed attention from the angles of alertness, selectivity of information and the ability of a person to process information.

Why is attention estimation essential in online meetings?

A participant’s engagement in an online meeting promotes its quality and helps the organizer to plan efficiently. For online classrooms, attention ensures that a course is not just delivered, but also absorbed by the learners. Pertaining to the easy access to meetings and common distractions like reading a storybook or an article over the Internet, online meetings often experience high attendance in terms of presence, yet low attendance in terms of attention. Although attention estimation becomes essential in such scenarios for the speaker or the organizer to take necessary actions, the estimation becomes difficult due to the partial visibility of the participants.

Being one of the most well-researched topics in Psychology, attention has been classified into several sub-types and studied discretely. These sub-types lie at an intersection of Psychology and Technology.

  • Sustained attention: This type of attention indicates whether a person can be focused on an information or task over a duration of time. Since attention can be affected by factors like mind-wandering, effort required, motivation, task difficulty, visuals etc., sustained attention takes into account, time, rather than instantaneous assessment of focus.
  • Alternating attention: In this type of attention, a person switches their focus between two or more types of information.
  • Divided Attention: In this type of attention, a person processes multiple information and performs two or more tasks simultaneously. While the existence of this type of attention has been experimentally validated, the argument regarding it being only a rapid case of alternating attention still exists. This type of attention is also commonly termed as multitasking.

In the field of Human Computer Interaction (HCI), several works have aimed at automating the process of attention estimation. While there are systems like OneMind, Guru, Gestatten, and so on, there are datasets like MEBAL that serves this purpose.

Which type of attention is relevant to online meetings?

In online meetings, participant’s sustained attention level needs to be estimated. However, this can be estimated only by considering the participant’s instantaneous attention level through continuous monitoring. Since multitasking can both promote and reduce the sustained attention, the instances and type of divided attention needs to be detected too. While the existing solutions efficiently captures the attention levels of the participants through physiological signals (intrusively) or visual patterns (ubiquitously), they are more suitable for MOOCs where the online videos are pre-recorded. For live sessions like online meetings, we cannot merely rely on gaze patterns. Hence, in this work, we have used features like expressions, vocal emotions and intent, mouth movement, and ambient light reflection to capture the attention level and multitasking instances and types of individual participants in online meetings.

Can visual engagement indicate attention? Gaze being one of the widely accepted indicator of attention has often been used for detecting attention. However, we need to discuss the forms and associated events of visual attention.

  • Overt attention: In this type of visual attention, a person physically shifts their gaze to the object of interest and is hence, accompanied by visible movement of eyeballs or head. This type of attention is comparatively easier to be detected by image processing techniques.
  • Covert attention: In covert attention, even though the person is looking at a particular object, they might be attending to the surrounding objects as well. This is not usually indicated by movement of eyes and is often caused by “mental” allocation of attention.

Why is gazing not enough in online meetings?

In online meetings, a person can be a passive attendee. They might simply be staring at the screen without “seeing”, i.e., they might not be paying attention while blankly gazing. On the other hand, someone might not be looking at the screen but still paying attention to the auditory information being discussed. In terms of Psychology, a participant might also be experiencing inattentional blindness which will adversely effect the performance of a gaze-based technique. There can also be cases where the participant has opened a new tab and is reading an article completely irrelevant to the meeting. Hence, gaze is not a reliable indicator of attention on live online meetings. Thus, we focus on the cognitive aspect of attention that is related to the awareness of the participant of the meeting’s content.

Attention in online meetings — An analysis of the characteristics and impacts.

The general attentional characteristics of participants in online meetings has been studied through a large-scaled public survey and an annotation-based task. This section presents the key takeaways of this human study.

Lack of attention impairs the quality of the online meeting the responses of 39.5% of the survey participants indicated this. This finding necessitates the development of a real-time attention estimation system.

Multitasking, positive or negative, is inevitable in online meetings — 83.2% of the participants agreed that they perform parallel tasks (relevant and/or irrelevant to the meeting) while attending online meetings. This finding recommends the development of a system that not only detects the instances of multitasking but also classifies them as positive (relevant to the meeting) or negative (irrelevant to the meeting).

Visual focus does not always guarantee cognitive attention and vice versa —the survey proved the existence of multitasking instances where there can be low visual attention but high cognition of the meeting context, high visual attention and average cognition caused by visual context switching, high visual attention and low cognition due to inattentional blindness and so on. These findings motivated us to develop a system that would not depend on visual patterns but would use features like expression for understanding the cognitive involvement of the participants.

○ Multitasking does not always hinder attention — 91.4% of the participants supported this claim by answering that positive multitasking is always or sometimes necessary in online meetings.

○ Active participation is related to attention — 62.4% of the responses indicated that inattentiveness causes passive attendance.

○ Facial emotions vary over time in an online meeting — In the annotation task, the annotators independently agreed that the expressions of 91.6% of the meeting participants change over time.

○ Correlated emotions among participants indicate greater attention — The annotation task revealed that for attentive participants, the facial expressions mostly match at any given instance.

Mapping the expressions of attention and throwing light on multitasking with EmotiConf.

In the light of the above finding, we develop EmotiConf, a system that ubiquitously detects the attention level of the participants. The context of the meeting is well-reflected by the overall expression of the participants. for example, a humorous discussion is likely to generate an overall expression of “Happiness” among attentive participants. Similarly, a participant involved in reading an irrelevant article is likely to show expressions that are different from the meetings’ context. EmotiConf thus uses facial expressions as a parameter of cognitive involvement, along with the active verbal participation of the participants.

Empirically, we also observe that a change in a tab (visual multitasking) causes a sudden and significant variation of light reflected from the participant’s faces. EmotiConf thus uses light as an indicator of visual multitasking and further uses a rule-based approach for understanding whether the task is positively related to the meeting. The following figure shows the overview of EmotiConf.

The overview of EmotiConf. The system works in a real-time mode for attention detection by detecting the active speakers and mapping the facial expressions of the non-active participants with the rest. In the asynchronous mode, the system detects the instances of visual multitasking (opening of a new tab) by analyzing the change in the amount of light reflected from the user’s face. These instances are classified as positive or negative based on a behavioral model. Here, the system analyses the vocal emotion and speech intent (question / command / statement) of the active speaker during the detected instance, and maps it with the facial expression of the participant who performed the parallel task. The system also considers the behavior of the participant like their head movement and active verbal participation in the meeting, for classifying the multitasking instance.

Experiments and Findings.

To analyze the performance of EmotiConf, we considered 30 online meetings (online classes, general discussions and formal presentations) with 3–12 participants. For the ground truth generation, the participants were presented with Multiple Choice Questions based on the respective meeting discussions and the recorded videos were manually annotated by 4 independent annotators.

Overall, the system achieves an F1-score of 0.91 with Precision 0.89 and Recall 0.94 for attention estimation in different types of online meetings. In improvement analysis, it was found that EmotiConf performs better than gazed-based attention estimators. The following figure also proves EmotiConf’s potential in differentiating attentive and inattentive participants.

Sensitivity of EmotiConf in differentiating between attentive and inattentive participants

By comparing the detected multitasking instances with the ground truth, for the different meeting types under normal and bright light, different face-to-device distances of the participants and their screen sizes, it was observed the EmotiConf can detect the instances with significant accuracy. The following figure shows the effect of screen size on the multitask detection module.

Performance of EmotiConf in visual multitask detection under different screen sizes

In terms of visual multitask classification, the following figure reveals that although the rule-based approach may not classify all the multitask instances, the positive and negative instances that it returns are mostly correct. This can majorly be attributed to the challenge in detecting the vocal emotion of the speaker with considerable accuracy.

Performance of EmotiConf in visual multitask classification

In terms of usability, EmotiConf was widely accepted in a In-the-wild-study with 96 participants who provided an overall System Usability Score of 80.5 to the system.

Attendance or Attention?

The extensive study of the different aspects of attention, its correlation to online meetings, development and thorough evaluation of an automated attention estimator like EmotiConf leads to the conclusion that mere attendance in online meetings is not sufficient. What fulfills the objective of a meeting is the attention of the ones attending it. Thus, with the help of systems like EmotiConf, meeting authorities can focus more on the attentional aspect of the participants, rather than their (passive) attendance.

--

--

Pragmakar
ACM CSCW
Writer for

I am a researcher in the field of Ubiquitous Computing and HCI.