The Artificial Gaze (and Gait Interpretation) of Algorithms
It’s around Independence Day in the US, and I’m thinking about what we are gaining independence from. With machine learning, AI, and other assorted algorithms running our lives, we aren’t gaining independence from much, rather, we’re becoming more dependent.
This isn’t our fault necessarily (unless we are the people building and deploying these systems). We’ve slipped into these things as a way of coping with an onslaught of never-ending data that comes our way on a daily, and nightly, basis. We are constantly filtering emails, social media, texts, voicemails, and other messages, whilst simultaneously using apps to do a myriad of tasks that used to be done by others. For the most part, we’re our bankers, we fill out our order forms and payment, we process data, while trying to manage it and all the while, it keeps flowing our way. We’re drowning in data, and this is, quite frankly, more than overwhelming. Unfortunately, the more we use algorithmic based tools to help us with this onslaught, the more we become dependent upon these, and that is where we may get ourselves into situations that are harmful either to us, or to those we wish to cooperate with.
To help, technologists and researchers keep “innovating” ideas that use tech to solve human social problems. With any technology, each new tool creates new capabilities, and new outcomes, many of which cause more problems than they solve.
Apple has added a Facetime feature in iOS 13’s third developer beta that “makes it look like you’re staring directly at your front-facing camera during FaceTime calls, even when looking away at the person on your screen.” This feature is expected to be released to the public in the next few weeks. It is likely intended to fix the problem of gaze. Because of the physical construction of mobile devices and computers, cameras cannot be located embedded in the screen. Thus, when one is looking at the video of their conversation counterpart on the screen, the camera recording their messages to them is not located in the same viewing plane.
This “correction” from Apple is likely intended to fix eye position such that a person could be perceived to be looking at another on their screen, instead of at the camera above, which currently makes it look like people are looking down when they are video chatting. However, this technology could be deployed in such a way, that if both parties use it, the eye gaze will be mediated by an algorithm. Thus, instead of having human gaze-to-camera-to-human, there will be human gaze-to-camera-algorithm-to-human.
If this feature is utilized on both sides, it means that the algorithm corrected eyes will be looking at each other, whilst we are occupying the bodies. We’ll still be connecting, but through algorithms representing our gaze. If eyes are the “windows to the soul,” we become bystanders to this, deferring to algorithms.
A synthetic gaze is reminiscent of Harlow’s classic experiment with bonding, where he provided wire and cloth substitutes for infant monkeys to study how they bonded (or not). If we are trusting our gazes to an algorithm, it could be a displacement of bond, creating unknown repercussions as to how we perceive others and our relationships to them. Relying on an algorithm to alter our gaze as we seemingly converse with others might be “creepy,” even if we are present. Additionally, more distracted, busy, and/or overwhelmed people might instead use this new capability as a way to continue uninterrupted multitasking, while seemingly connecting in a meaningful way to another human being via “algorithmic focus.”
Divided Attention research already shows us that we cannot focus successfully on multiple things at once, and what is forfeited is comprehension. If we do not understand the intent of others, it can lead to offense, which means we do not cooperate successfully. Cooperation is how we live and work, and how our species survives. If we aren’t able to cooperate because we aren’t taking the time to understand each other, we can get into real trouble. It’s hard enough to understand someone’s intent when they are in the room and we are able to look at them in the eyes. What will it mean when our algorithms are removing our social cues?
But wait, technology researchers have a solution for that, too.
In a related story, researchers from the University of Chapel Hill and the University of Maryland have published a paper claiming that they can “identify a person’s perceived emotion, valence (e.g., negative or positive), and arousal (calm or energetic) from their gait alone.” In their own words, the researchers claimed that “our goal is to exploit the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral.” As of 2017, prior researchers claimed at least 27 different human emotions that are combined, blended, and processed as part of the human experience. Using gait as a way to define four, seems limited and dangerous. Furthermore, there is evidence that if someone is using a phone while walking, their gait changes. Since many people walk and use devices, having an algorithm accurately infer their emotional state in this instance would not be possible. While the researchers are well intended, it seems that “offshoring our interpretation of other’s emotional states” to algorithms is premature and risky.
This research combined with Apple’s new feature does point to the idea that people are realizing that the tech we have in place is removing social clues. However, in using technology to fix it, they may be creating even less social cues, which in time, and in aggregate, will make it much worse for all of us trying to cooperate.