Ethical Concerns in Speech Emotion Recognition, Part 2, Consent, Transparency, Bias, and Fairness

Babak Abbaschian

9 min readApr 12, 2023

This is the 2nd article of the series on Ethics in Speech Emotion Recognition.
You can find the other articles here:

Ethical Concerns in Speech Emotion Recognition, Part 1, Privacy

It is some time now that I’ve been working on Speech Emotion Recognition (SER). This technology is growing fast, and…

medium.com

Ethical Concerns in Speech Emotion Recognition, Part 3, Accuracy, Application, Responsibility…

This is the last article of the series on Ethics in Speech Emotion Recognition. You can find the previous articles…

medium.com

In the last article, we talked about privacy in Speech Emotion Recognition. Now, we want to talk about “transparency and consent “, and “fairness and bias.”

First, consent and transparency, very important legally!

It seems we’re going into law territory, so let’s speak legal for a quick second! I’m by no means a legal expert, and what you are reading in this article is a layman’s understanding of privacy laws based on a few GDPR and CCPA trainings that I had.

Based on the personal data privacy laws enacted in the past few years, such as GDPR, CCPA, CPRA, VCDPA, and CPA, we have gained the right to own our data and decide how companies can use it.

This is a significant win for us as users. Based on those laws, companies must ensure that their privacy policy is written in simple language and presented in an accessible form. It covers all aspects of your personal data processing activities and is easily accessible to everyone free of charge.

And their privacy policy has to clearly and in simple straight forward language articulate all of the following:

1) Who processes the data, the name of every legal entity which accesses or processes your data, and their contact information.

2) What legal basis allows them to process our data? In other words, what gives this company the right to collect our data?

The most preferred legal basis that for the companies is “your consent!” Before privacy laws, they were putting these consents somewhere deep in the privacy policy or end-user license agreements. And they were forcing you to accept their collection and processing data policies, or they weren’t providing any service to you.

Twelve years ago, when I bought an expensive Samsung TV, I had to accept their terms and conditions and privacy policy to use my TV’s smart features. Even when I connected a camera to the TV, it said you must consent that we may record and send your video to Samsung servers anytime. Thankfully the camera stopped working very soon. But this level of forced consent had pissed me off enough even to consider the TV’s return.

Now with GDPR, they cannot just say we want your data. They have to prove that it is for the performance of the service!

The following are five other legal grounds that are not as widely used as consent:

1. processing is necessary for the performance of the contract between the service provider and the data owner;
2. processing is necessary for compliance with the legal obligations of the service provider;
3. processing is necessary to protect the vital interests of the data owner or another natural person;
4. processing is necessary for the performance of a task carried out in the public interest or the exercise of official authority;
5. processing is necessary for the legitimate interests pursued by the service provider or a third party, except where such interests conflict with the interests or rights and freedoms of the data owner.

3) The purpose for collecting personal data, they can’t just say we are processing your data based on your consent. They have to spell out the reason and what they will achieve with it.

Like what Tesla does with all those 11 cameras, they almost always record, based on our consent, for a vague, legally crafted reason. One of those reasons is the enjoyment of the Tesla Inc personnel by watching those and sending the funniest to their peers.

4) What types of personal data do they collect? Again, merely stating that they collect personal data is insufficient — you need to go into the details of each category.

5) How long will they store the data, they are obliged to inform visitors about the period of data storage, or if that is not possible, the criteria used to determine that period.

6) Whether they transfer the data internationally or not.

All the problems with TikTok arise from the fact that they haven’t yet proved/lobbied well enough that they won’t ship customer data to the government of China.

7) Whether they use the data in automated decision-making, for example, if you use automated decision-making in credit scoring or user profiling to provide services or products to your users, you must disclose this.

8) Which third parties do they share the data with? They have to mention the recipients or categories of recipients of the personal data, description of each third party, link to their data retention, location and storage policies, purposes of data processing, and the legal basis under which they process data using a given tool. They also have to reference a signed data processing agreement with third-party companies.

9) What are the data owner rights? They have to inform users about their rights and how to exercise them.

the right to be informed
the right of access
the right to rectification
the right to erasure
the right to restrict processing
the right to data portability
the right to object

10) And last but not least, how would you inform the users about changes to your policy? You have to clearly note how you will inform them about changes in your policy and make sure that notification is clearly available and visible to everyone using your service.

Why did I write this long story about data privacy laws?

Understanding that our data is not just texts, chats, emails likes, and comments is essential. Our voice, video, and pictures are also our data. Likewise, anything derived, deduced, or inferred from our data is also ours, sometimes called Metadata! Hence the name, of course, our data belongs to Meta; they have registered its trademark!

And among all these, the new kids, speech and facial emotions are also our data, perhaps more intimate than us sitting on a Zoom call with a jacket and shorts. We want the world to know us as serious and angry people. We don’t want anybody to know that despite corporate America’s efforts, we’re a bunch of happy people inside!

Here comes transparency and consent, crucially important ethical considerations when using SER. We have to be aware that if a service is now or in the future is going to analyze our speech for emotion recognition purposes, we should have the option to easily opt-out if we don’t wish to participate.

There goes the legal jargon; let’s get back to more tech stuff!

Bias is another concern in SER. The algorithms used to analyze emotions may be biased based on many factors. For example, excitement and anger both have high arousal content. Now, suppose the training data has many more angry samples than excited. In that case, the model may develop a bias toward anger and against excitement and classify samples as angry or happy rather than excited.

[figure source: https://www.researchgate.net/publication/335190986_Fast_Emotion_Recognition_Based_on_Single_Pulse_PPG_Signal_with_Convolutional_Neural_Network]

This type of bias that I described could be either “Sample Bias” or “Label Bias.” We have a few more biases, which all result in the same phenomenon. The model develops a bias against a group of samples and towards another.

Think about a model trained with many cats and dog pictures rather than bats, so it learns to classify bats as cats or maybe dogs. Nothing seems unethical here. Maybe, more than ethicality, it is a low-accuracy model. Ethics are not generally involved with bias because bias is a technical term that can be measured and corrected. However, maliciously training a model with bias against or toward a class of samples and using it will be definitely unethical.

Generally, an important source of bias is the sample population. However, looking at SER and bias, for a short and sweet review, I can proudly tell you that most of the SER training datasets that I know and have worked with are balanced gender-wise. CR LF!!!

That is it!!!

They are not balanced from any other point of view.

What does it mean? A model trained with a New York accent dataset, tested on Kentucky samples, will show we are calm! While we may actually be angry, that’s because in Kentucky, people don’t speak too fast, therefore lower arousal, and there’s a musicality in their accent that adds to the valence content of the sample.

There you go! With respect to a protected group, ethnicity, in this case, the system trained with northeast speakers has a bias towards the southern speakers and rates them calmer.

But what about fairness? Fairness is very close to bias; in fact, fairness is where we can judge a system for being ethical or not. So I changed my language a bit, and instead of speaking about a model, I spoke about the system. Because fairness is not a straightforward calculable value like bias, for bias, we can measure it through false positives and false negatives of groups of samples. For fairness, it is complicated! Fairness is subjective to your policies and standards. For example, in a society that doesn’t value gender parity, a system that never suggests females to be hired is acceptable, while in our society, something is very wrong with that system.

There are various definitions to mathematically gauge fairness, and I’ll just mention the names.

1. Equalized Odds

2. Equal Opportunity

3. Demographic Parity

4. Fairness Through Awareness

5. Fairness Through Unawareness

6. Treatment Equality

7. Test Fairness

8. Counterfactual Fairness

9. Fairness in Relational Domains

10. Conditional Statistical Parity

To read more about fairness and bias, I’d suggest the following paper by Mehrabi et al. https://arxiv.org/abs/1908.09635

A Survey on Bias and Fairness in Machine Learning

With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues…

arxiv.org

Now what does it mean for SER?

In the New York accent example, if it is being used to provide emergency services for distressed callers, and I call from Kentucky, the system classifies me as “Content.” And I won’t get the service, which might endanger my health. Not fair!

On the other hand if some group of people culturally speak louder and more assertively, and this is used in a job interview session, the system might score them as someone with anger issues, and they won’t get the job. Not fair!

A research by Gorrostieta, et al., assesses the effect of gender bias in speech emotion recognition and finds that emotional activation model accuracy is consistently lower for females compared to male audio samples. However, the paper proposes a simple de-biasing training technique to achieve fairer and more consistent model accuracy.

ISCA Archive

Cristina Gorrostieta, Reza Lotfian, Kye Taylor, Richard Brutti, John Kane Machine learning can unintentionally encode…

www.isca-speech.org

The paper also discusses the various factors that can contribute to producing negative bias in machine learning models, including incomplete or skewed training data, biased human labeling, and modeling techniques. The paper concludes by highlighting the importance of mitigating negative bias in machine learning and the potential risks of perpetuating or amplifying bias contained in the label data.

Another recent research by Chien et al. focuses on a special bias in SER, the Rater or Labeling bias.

https://www.researchgate.net/publication/368719577_Achieving_Fair_Speech_Emotion_Recognition_via_Perceptual_Fairness

SER has a particular problem with fairness because the people who label emotions have their own biases. Emotion is a perception that is closely related to the person’s background and mood at the moment. We need to do something about these biases to make SER work better and fairer. And they propose a two-stage framework for SER to mitigate rater biases and improve fairness. In the first stage, the model produces debiased representations using a fairness constraint adversarial framework to reduce distributional distances between different rater gender groups. Users can toggle between specified gender-wise perceptions on-demand in the second stage after gender-wise perceptual learning. The proposed system is evaluated on two fairness metrics, statistical parity and consistency score, to show that the distributions and predictions across genders are fair. The paper argues that fair SER should also provide transparent outcomes, making users feel fair and enabling toggling between results. The proposed model shows promising results, with a slight drop in recognition performance but improved fairness.

In the following article, we will discuss Accuracy, Data Security, Application control, responsibility, and accountability.

Stay tuned!