Artificial Intelligence “Gaydar”
Those of us who study the digital society have a complex set of issues around the data we use to study it. A recent public debate about new research article on aritificial intelligence is an example of the tensions facing all social scientists in the digital society. A summary, from the The Guardian is an example of how the media first reported the new article’s findings:
Artificial intelligence can accurately guess whether people are gay or straight based on photos of their faces, according to new research that suggests machines can have significantly better “gaydar” than humans.
The study from Stanford University — which found that a computer algorithm could correctly distinguish between gay and straight men 81% of the time, and 74% for women — has raised questions about the biological origins of sexual orientation, the ethics of facial-detection technology, and the potential for this kind of software to violate people’s privacy or be abused for anti-LGBT purposes.
The issues are multi-fold. First, that isn’t really what the study is about. Second, the study’s authors argue that they were studying technologies already in use to explore how they might marginalize already marginalized groups of people. Third, the issue of who owns the data we share on social media sites, in this instance a dating app, can be used by researchers engages lots of issues with terms of service agreements, copyright, fair use, and research ethics.
Philip Cohen, a sociologist at the University of Maryland (among other notable distinctions), had this to say about the issue in Inside Higher Education today:
Philip Cohen, a professor of sociology at the University of Maryland at College Park who quibbled with Kosinski’s conclusions on his blog, Family Inequality, said via email that journals usually defer to institutional IRBs. “If they approved it, then I don’t see what the issue is,” he said.
Cohen has used his blog to offer much harsher criticism of past research involving gays and lesbians, including the now-infamous Regnerus study suggesting that children of parents who are gay fare worse than those with straight parents.
Some research, he said, “is so harmful that it should be protested through channels other than scientific debate, so unethical that the authors should lose their jobs or be ostracized.” But of Kosinski and Wang’s paper, he said, “I don’t see that here, based on the information I’ve seen.”
Cohen added, “I don’t think we need to villainize these researchers in order to object to or disagree with their research. The paper was reviewed and accepted. It’s part of the scholarly record, like a lot of other research I don’t like.”
As for the copyright issue, Cohen said that studying publicly available information, such as dating service profiles, is “fine” as long as confidentiality is maintained: no names or other information that could be used to identify them.
The issues of copyright, ethics, reproducibility, and practical significance are all challenges for data collection, systematic research and analysis in the digital era. Like most debates in social science the guiding concern should be does the practical significance of a study outweigh the risks to individuals and actors? Whether we always agree on the answer, it is a question sure to become ever more salient as we wrestle with how to study inequality in the digital society.