Inclusivity in AI: Doing Gender Better
Tempted to use AI to predict someone’s gender based on an image of them? Just don’t.
Calculation without consideration
Hardly a week goes by without some new AI application wowing us with its borderline magical power. Seemingly every major enterprise organization is making their play for the space and a slew of start-ups are finding ingenious niches to exploit. While developments in AI move at a breakneck pace, it feels like we lag behind in thinking through what effect these innovations have. Lots of ink is spilled about the implications for privacy when it comes to data use and we’re waking up to the impending social revolution that widespread automation will bring, but there are other concerns that also deserve our attention.
Microsoft provides an impressive image recognition algorithm, and you can try it out for yourself. Simply capture two photos via your laptop’s camera, and it will tell you with uncanny accuracy whether or not they’re of the same person, what each person’s age is, and what emotions they’re depicting.
This is an amazing feat. In mere seconds, the system can process an image and make sense of it in a similar way to how we humans do — a persuasive demonstration of the power of AI. And yet, as elegant and effective as this solution appears, it’s also fundamentally flawed.
The problem is that the system attempts to predict the gender of the faces. And it displays that as one of two binary values: male or female.
There are two issues here. Firstly, gender is not just another superficial visual attribute, like whether you wear glasses or have a beard, but rather an integral part of your identity. There is a cost attached to faulty predictions of any kind, but this cost is far higher when the user feels like a core facet of their identity is unacknowledged, negated, or disrespected. In the moment, it might not seem like much for an AI in a quick tech demo to guess your gender wrong. But for many people, being misgendered by a piece of software is yet another contribution to the pattern of prejudice they face across a multitude of interactions, and reinforces the notion that they are not welcome in society.
The second issue is that representing gender as a simple male / female binary replicates a reductive and inaccurate view of gender — one that’s harmful in its own right. Even if all binary people were satisfied with being pigeonholed correctly by the system, the mere existence of the exclusionary binary in the platform hurts people who fall outside that binary by implying their gender identities are illegitimate.
In short, this feature enables the system to cause harm, both directly and indirectly. And the lack of warnings or opt-ins suggests that this potential for harm is something that the software designers are oblivious to.
Compassion in light of complexity
Let’s imagine we work at Microsoft and want to make our demo more inclusive. In order to do that we need to add a couple of requirements that are missing from the demo currently. Obviously, we’ll need our improved system to acknowledge the risk of misgendering, to reduce its chance of occurrence, and to mitigate its effects if it does occur. But before we can even start to achieve that, we need to build a more accurate model of gender, one that doesn’t rely on a false male / female binary.
Here we’ll face our first key issue: there is no consensus on what gender actually is, or how many genders there are. Regardless, we know in practice that there exists greater variety than male or female, and we should aim to expand our catalogue to include these other positions.
So let’s say we start with cisgender and transgender people who identify as binary male or female. We know we should add transgender non-binary people and non-binary people who do not identify as trans, too, so that’s next. Now we also need to unpack the non-binary position, to distinguish between people who are bigender, those who are agender, those who are demigender, and so on. We also need to add some understanding of those people whose gender is largely static over time versus those that are genderfluid. We’ll need to track the many gender identities with cultural specificities that don’t map onto any of the above, such as Two-Spirit people whose gender is bound into the cultural context of Indigenous Canadians. And we ought to make sure our system can understand performative gender identities, such as drag kings and queens, as well as those who are genderqueer or genderfucking (that is, who has as part of their gender identity the process of actively working to deconstruct the very categories of gender that we’re trying to codify).
Critics here may argue that this all sounds like a lot of work when male and female will suffice for the majority of instances. In particular when it comes to non-binary identities, why spend time chasing edge cases? There are several reasons, not least of which is that each edge case here is a human deserving of equal treatment and their share of dignity and respect. But even beyond that, the number of people actually affected could be substantial.
As of 2011, according to the Equality and Human Rights Commission, 0.4% of the UK population rejected identifying with binary gender. That equates to at least 250,000 individuals in the UK alone who would be ill-served by binary gender options.
If anything, that figure may be too low, right now and moving forward. In a couple of recent studies from California, not only was it found that a quarter of LGBT teens identified as non-binary, but also that a quarter of all teens identified with gender non-conformity.
So yes, it’s complicated. The critics aren’t wrong about that. But it is absolutely worth doing, because it matters for a lot of people. In other areas, engineers are often well aware of how simple assumptions hide a multitude of concerns, and how what appears to be “common sense” can be anything but. Think about how complex it can be to agree on what the time is, or how much variation is required to accurately capture people’s names. Where we know that there is a problem, we are more than capable and willing to develop solutions. So knowing this, we should enthusiastically grapple with this complexity and embrace the opportunity to improve the way things work.
And if even slow-acting bureaucracies like governments can introduce a third gender option to official documents, as Canada has recently done (albeit in the somewhat inadequate and obfuscating “X”), then surely forward thinking software companies can do better?
Culture, context, and categorization
Going back to our earlier effort, let’s assume — for the sake of argument — that our improved system contains gender variables with enough granularity and nuance to satisfactorily describe almost all people. That’s a necessary but far from sufficient condition to ensure the accuracy of our system. As with any machine learning process, its success will be dependent on how we train the system with sample data — and here lies our second pitfall.
As others have discovered, if you feed a system the top one hundred images returned by Google when searching “Women” and do the same for “Men”, you can build a system that has remarkable accuracy in detecting which image belongs to which search term. But examine the images yourself, and something else becomes apparent: differences between the two pools of images that are not directly related to the morphological features of each person. Things like the lighting, the palette, the poses, the backgrounds, and so on — there is a recognisable difference between how individuals from one term are photographed and depicted compared to the other. What you might call “gender recognition” is actually just distinguishing between two modes of aesthetic representation. It’s two positions that are aligned with gender, but are not the genders of the people themselves.
One way of attempting to overcome this bias is to train the system with a more diverse range of imagery. It’s a step in the right direction, but does little to address the underlying concerns about what is really being measured. If a system convincingly sorts images of people into two groups, it could be doing so on the basis of abstract properties of the imagery (palette, composition, etc), or it could have identified cultural markers within the image (glasses, make-up, accessories), or it could have distinguished on the basis of bodily and facial features.
One notorious study from Stanford University used machine learning to categorise images of people based on sexual orientation. The resulting claim was that people with different sexual orientations have correspondingly different facial features. In fact, when the experiment was replicated by another researcher using profile photos illicitly harvested from dating sites, they found that it was possible to entirely blur the photo and still produce the same results. In other words, not only was the system not relying on facial features, it didn’t need to process any pictorial content at all. Something in the lighting and colours of typical dating profile photos was enough to distinguish different groups of people. This is a fascinating phenomena, but far from what was originally claimed.
The sexual orientation study was widely criticised, because we understand that it was based on a faulty premise. You can, of course, have any combination of facial features and any sexual orientation. Naturally, we should criticise algorithmic gender recognition in a similar way. Many individuals’ gender presentation as perceived by the public at large doesn’t match their actual gender.
Even with a more detailed set of gender variables, even with extensive training across a variety of images, for some people there is simply no way to correctly extrapolate their gender from their appearance. Just as with sexual orientation, the only way to actually ascertain an individual’s gender is to ask them in the form of an open question, allowing whatever answer they wish to provide.
AI gender recognition can cause harm through misgendering and by reinforcing a narrow and exclusionary model of gender. We can attempt to build an improved system with a more expansive view of gender, and aim to increase its reliability through better data and training, but ultimately, the system is doomed to fail. It’s not capable of achieving what it’s intended to. It’s just not possible to know a person’s gender from an image of them. The best the system can achieve is detecting differences in image attributes that arbitrarily correspond to different groups of people.
When the fallibility of the system is guaranteed, then, and the harm of misgendering is impossible to mitigate, we need to think carefully about when (and if) this functionality is required and how it will be of benefit to anyone. Arguably, getting an AI to guess gender is the equivalent of a cheap party trick — it’s novel, fun, and might impress in the moment, but it serves little purpose for most users. There’s use in security and surveillance systems to identifying gender, and marketers could find use for serving targeted ads, but for most user-facing applications if the goal is to assign a gender to a person’s image, it’s best to ask the person to input that data themselves.
And of course, AI is just one piece of the puzzle. The wider lesson in working towards inclusivity in AI — or technology in general — is that when we focus solely on building widgets without thinking about the context and culture of the users, or when we capture and generate data without a greater curiosity about the nature of that data, we inevitably replicate the worst biases in society to the detriment of all.
There are positive signs that the industry is aware of and making steps to mediate some of the issues. Microsoft themselves published an insightful set of guidelines for human-AI interaction. It’s not a perfect guide — the best practice example to “mitigate social biases” compels builders not to autocomplete in favour of “him” over “her”, with no mention of other gender options at all — but the direction of travel is right. Even though it’s promising to see a major player in the world of AI addressing these issues directly, it’s clear more work is required to properly understand the path ahead.
The threat to us now is we are making long term decisions without big picture thinking. We need more than the will to do right — we need a nuanced and empathetic view of each challenge. The risk is that by codifying these biases into the infrastructure of our everyday future, we are not just reproducing them, but solidifying them for years to come.
No one has all the answers when it comes to the future of AI software, but one thing we’re sure of at Myplanet is that new capabilities bring new challenges in terms of how users understand, trust, and control these applications. That’s why we’ve created Smarter Patterns — an AI UI pattern library that explores how AI issues affect interface design. It’s a practical resource to inform best practice and inspire better products.
Visit the site, bookmark it for future reference, and send us any AI patterns you notice in the wild that you think should be included in the library. We’ll update it with credit where appropriate.
AI has the power to create incredible new opportunities and experiences for us all, but only when we are truly considerate in how we approach creating and applying it to the real lives of users.