Ethical Concerns in Speech Emotion Recognition, Part 1, Privacy
This is the 1nd article of the series on Ethics in Speech Emotion Recognition.
You can find the other articles here:
It is some time now that I’ve been working on Speech Emotion Recognition (SER). This technology is growing fast, and before you know it, it will be implemented in many automated speech recognition systems, like smart speakers everywhere in our lives and all of those automated customer service systems that we shout at to connect us to a live rep. However, there are a lot of ethical concerns in using this technology that need to be answered as soon as possible to prevent significant damages to the privacy of the users and major lawsuits that can arise from problems created by SER.
Yeah, fun field to work in!
In the following weeks, I will try to highlight some of those concerns. But let’s first officially introduce SER and its use cases. The following is copied from one of my papers in 2021.
As I mentioned before, Speech emotion recognition (SER) is the task of recognizing emotions from speech signals, a significant step in advancing human-computer interaction. Understanding one’s feelings at the time of communication is constructive in comprehending the conversation and responding correspondingly. This is enabled mainly by applying Deep Learning to Signal Processing techniques. And it has many use cases.
For example, automatic SER helps smart speakers and virtual assistants understand their users better, especially when they recognize dubious words. For example, the term “Really” can be used to question a fact or emphasize and stress out a statement positively and negatively. For example, read the following sentences in different ways: “I really liked having that tool.” In addition, the same application can help translate from one language to another, especially as other languages have different ways of projecting emotions through speech.
SER is also beneficial in online interactive tutorials and courses. Understanding the student’s emotional state will help the machine decide how to present the rest of the course contents. Speech Emotion Recognition can also be very instrumental in vehicles’ safety features. It can recognize the driver’s state of mind and help prevent accidents and disasters. Another related application is in therapy sessions. By employing SER, therapists will understand their patients’ current state and possibly underlying hidden emotions as well.
It has been proven that in stressful and noisy environments like aircraft cockpits, applying SER can significantly help increase the performance of Automatic Speech recognition systems. Furthermore, the service industry and e-commerce can utilize Speech Emotion Recognition in call centers to alert customer service and supervisors early on the caller’s state of mind. In addition, Speech Emotion Recognition has been suggested to be implemented in interactive movies to understand viewers’ emotions. The interactive film can then go onto different routes and have different endings.
In all of the mentioned applications, there’s a gigantic elephant eating popcorn and watching us, the consumer privacy!
Privacy is one of the most significant ethical concerns regarding SER. The technology can be used to analyze an individual’s emotions without their knowledge or consent. The systems capable of SER can make decisions for you, and companies can offer you or reject services based on their evaluation of your emotions and your “State of mind.” This is well above facial or speaker recognition. This may decide someone is cuckoo! This level of invasion of privacy is unprecedented. And with the level of available tech, SER might have already been implemented in one of your beloved smart speakers.
Based on my research, there yet is not any comprehensive research done on how to control and tame this beast. However, a few papers have tried to implement privacy in their recognition methods to gain a privacy-first sort of recognition system. In addition, several other published works have focused on fighting the invasion by creating adversarial methods.
One example of a few efforts to put privacy first in SER is the work of Vasileios Tsouvalas et al.
They propose a privacy-preserving approach SER using a distributed machine learning paradigm called Federated Learning (FL). Existing SER approaches are centralized and do not consider users’ privacy. But, Federated Learning offers a way to collaborate on training models without sharing local data and compromising user privacy. The proposed approach unifies semi-supervision with federated learning, which addresses the significant challenges of scarcity of data labels and privacy regulations faced by SER. The experiments on the IEMOCAP dataset show that the proposed approach with as few as 10% labeled data, on average, can improve the recognition rate by 8.67% compared to other fully-supervised federated approaches. It is the first federated SER approach that learns models by utilizing both labeled and unlabeled samples on user devices. The proposed approach uses an attention mechanism to improve the representation power of SER models without increasing their complexity.
On the other hand, as an example of an adversarial attack on SER systems to preserve the user’s privacy, Brian Testa et al. present a solution to evade black-box speech emotion recognition (SER) classifiers that are tied to smart speakers.
The proposed method uses genetic programming to generate non-invasive additive audio perturbations (AAPs) that can protect transcription accuracy while degrading SER classifier performance. They call it Defeating Acoustic Recognition of Emotion via Genetic Programming (DARE-GP). Their system uses spectral features allowing AAPs’ transferability to previously unseen SER classifiers. The evaluations in the paper culminate with acoustic evaluations against two off-the-shelf commercial smart speakers, where a single AAP could evade a black box classifier over 70% of the time. The paper concludes that DARE-GP outperforms state-of-the-art SER evasion techniques and is robust against defenses employed by a knowledgeable adversary.
In the next mini-article I will discuss transparency and consent, bias, and fairness in SER world. Stay tuned in!
In the next mini-article I will discuss transparency and consent, bias, and fairness in SER world. Stay tuned in!