Most of us have at one point felt anger toward an automated system. An incoherent chatbot. A GPS automated voice that misdirects. A self-checkout scale that calibrates the weight of your purse instead of a bunch of bananas. Generally, we expect that anger to go unnoticed by the machine. But technology is increasingly advancing to detect and categorize an angry look on your face or recognize frustration in your voice as you interact with an automated device or program.
One of the largest emotion AI companies, Affectiva, has long surveilled faces to categorize the reactions of users. Gabi Zijderveld, chief marketing officer at Affectiva, told me there are “seven emotions and 20 facial expressions,” which can be detected across multicultural and multigenerational populations. Now the company is working to recognize emotion in a person’s voice. It has just launched a cloud-based API designed to track changes in tone, volume, speed, and other vocal qualities. The company interprets this data as anger, laughter, arousal, and other emotional states. I spoke with Zijderveld about the new product and Affectiva’s other methods of emotion categorization.
Why is Affectiva using the term “emotion AI” instead of affective computing ?
Gabi Zijderveld: We develop facial analysis to derive emotional expressions from a human face, speech, and the vocal expressions of emotion. Affective computing is a much broader field of research. It encompasses biosensors and wearables and a host of other technologies that we don’t even touch on. There were a number of different descriptors floating around, including emotion recognition software, but essentially what we build is artificial emotional intelligence.
We’re surrounded by all these hyperconnected devices and advanced AI systems: supersmart technologies, but not at all emotion-aware. It makes a lot of our technology interactions very transactional and very superficial. What if we could introduce that notion of emotional intelligence in technology? It’s really where we see the future: the inevitable merger of this IQ and EQ in technology. Our technology is an emotion AI, an engine of artificial emotional intelligence.
How is Affectiva working to clean the data of bias? For example, if the category of people from a specific age and race were misinterpreted and a neutral face expression were modeled as, say, anger. How would Affectiva correct this?
We take bias in data very seriously. People look different around the world: age, gender, ethnicity. But also cultural norms, depending on where you are in the world, influence how you express emotions. For example, in collectivist cultures like Southeast Asia and Japan, people tend to dampen their expressions in group settings, whereas in more individualistic cultures, such as North America, in group settings people tend to be very expressive and manifest their individuality. So we combat bias in data all the time for our technology to work accurately in real world.
Getting diverse data that reflects age, gender, ethnicity — that also has to work accurately regardless of lighting conditions and camera angles — is incredibly critical. We have analyzed over 5.6 million faces in 75 countries.
How you collect the data, what kind of data you collect, where you get it from, and how you label it is incredibly important. If we want our technology to be accurate in the real world, we have to avoid bias, because otherwise it won’t work. One of the ethical standpoints that many of us have in the organization is it’s just wrong to introduce algorithms with bias. It’s not ethical, and we have a lot of internal debate about this.
Do you have recommendations that you pass on to clients so they don’t use this data to stigmatize marginalized groups?
I don’t think we’ve come across a scenario where we had to do that. I will say that there are some scenarios or use cases where asking for opt-in from people becomes more difficult. So, for example, if this type of technology is used in retail stores, how do you then make clear to the shoppers that it’s there? We have one client that ensures there’s signage explaining that this technology is used, so people can turn around at that point if they don’t want to be on film. There are clients that have used our technology at events, maybe a concert or sports arena. Ticket stubs on the back will say, “Hey, if you buy this ticket and you enter into this venue, you consent to being recorded.” We’ve asked some of our clients to consider expanding or altering some of their branding.
What if an educator offered their own perspective on this data and determined that one emotion means “struggling,” when then creates a category that perhaps wasn’t intended on your end? Have cases like that come up at the company?
I think it’s too early for that. There are companies we’re talking to about using this in educational settings, but nothing that we have [yet] deployed. At the end of the day, we at Affectiva don’t build the education solutions. We’re basically an emotion AI engine. We power the emotion-detection capability. So, for it to be in a classroom, there would have to be a whole kind of app or application that needs to be build that at the core would use our technology.
Whatever solutions built with that interface with consumers will need a tremendous amount of deliberation on data privacy concerns and allowing people to opt in. I think one of the most important things to do there is [give] these people access to their own data. Education, especially when you start thinking about K to 12 involving data on children, is very complex. Even though we know this technology might have an impact in that field, it’s still early, and there’s no real solution in the market just yet.
Could you tell me more about the new product to detect emotion in speech? [Note: This interview was conducted before Affectiva’s announcement, and Zijderveld could only provide limited details.]
We’re not necessarily looking at and analyzing the words that are spoken — we’re not doing kind of the natural language processing or the text analysis — but we analyze how things are said tonally. In speech science, they refer to this as prosody. It’s the tone and strength of how things are said. You can derive joy, anger, frustration, but also even gender.
Do you have any other plans or goals for the company in the coming years?
Lots of ideas but nothing specific just yet on the roadmap. Of course, at the end of the day, we want to build what we refer to as a “multimodel emotion AI engine.” Humans express emotions through all these different ways — it’s facial expressions, it’s voice, but it’s also gesture, body language. It’s how words are spoken. We want to build a really robust emotion AI engine and build stronger partnerships — team up with other companies.