Attributing Cause in Algorithm Audits

Human causes of problems are hard to distinguish from technical ones. Simple experiments can sometimes untangle them.

When people experience discrimination online (or other problems like political polarization, unfair pricing, etc), where should we attribute the underlying cause? Would we point to a site’s design, algorithms, human biases, or something else? How could we tell?

Studies that audit tech companies can sometimes fail to achieve impact if researchers can’t pinpoint the source of the problem. When we fail to understand a problem’s cause, we risk blaming the wrong people and developing ineffective solutions.

identifying causes can help us imagine ways to resolve a complex problem

Many of the patterns we love and fear about the internet arise from complex interactions of psychology, social relations, network structures, the behavior of AI systems, and software design. When we look back at what went wrong, it’s often hard to untangle those factors. That’s a problem, because if we can’t point to a cause, we can’t be sure who to hold responsible. More importantly, identifying causes can help us imagine ways to resolve a complex problem.

By using your curiosity in a good cause, anyone can add to public knowledge on the causes of systemic online problems. That’s the idea behind CivilServant, which organizes volunteer citizen behavioral scientists for a fairer, safer, more understanding internet. Last month, I showed how to audit Facebook’s news feed with a poetry experiment I did with my own Facebook friends. In this post, I show how a followup experiment can untangle the underlying causes behind a discovery.

Using Poetry to Audit Facebook’s News Feed

Here are some of the poems I posted to Facebook to study how it’s algorithms promote text vs images

Last time, I investigated the Facebook feature that displays colored backgrounds behind short status updates. I posted 22 poems over a month-long period and used a spreadsheet to decide which should get colored backgrounds. I found that using Facebook’s color backgrounds on poems increased the rate of likes and comments by roughly two times.

Had I really discovered something about Facebook, or just something about my Facebook friends? When I shared the results, some people rightly pointed out that even without Facebook, my friends might pay more attention to larger, brightly-colored poems compared to smaller, black text on a white background. How could I be sure that I had made a discovery about Facebook? They were asking questions about the underlying cause behind the effect I observed. Only a followup experiment could answer the question.

might Facebook’s algorithms promote poems differently to my friends if I posted the poem as an image?

At this point, I could have taken one of two directions: focus on (a) the behavior of Facebook’s platform or (b) the behavior of my friends.

A psychologist interested in perception would have studied my friends. They would ask their friends to respond to poems outside of Facebook, to remove all influence from the platform and its algorithms. But I wanted to learn more about Facebook: might Facebook’s algorithms promote poems differently to my friends if I posted the poem as an image?

To test this idea, I continued to post poems from late November through December. This time, all of the poems received a colored background. For some poems, I just used Facebook’s colored backgrounds. For others, I posted a screenshot of a poem displayed with a colored background. Both poems would appear nearly identical to my friends, but Facebook’s algorithms would know the difference between them. I could then attribute a technical explanation to differences in outcomes between these visibly-similar posts.

If I and my friends had been more patient, I could have used what’s called a “factorial design,” where I tested combinations of different ways to display poems: with/without images, with/without colored backgrounds.

To create images that were equivalent to Facebook’s color backgrounds, I typed in the poem then used my browser’s developer tools to remove parts of the webpage that would make the image look different.

(During this project, I learned that Facebook uses machine learning to describe pictures to people with vision impairment. My poems were tagged with a message that said “Image may contain text.” When I posted a sailboat picture, Facebook reported that the “Image may contain: sky, ocean, outdoor, water, and nature.”)

Do Facebook’s Algorithms Promote Poetry in Images Differently from Text?

Over 30 day, I learned that posting poems through an image caused my friends on Facebook (I have just over 2k) to like and comment on poems 30% less, compared to the standard text on a colored background. Poem conversations with the regular colored background received an average of 14 likes and comments, ranging from 4 to 37. Conversations with the poem image received an average of 7 likes and comments, ranging from 2 to 19. To estimate the effect, I used a log-transformed linear regression.

Anyone can do this study with a spreadsheet, patience, and longsuffering friends. Learn how you can audit Facebook’s News Feed.

In this personal experiment, I posted poems to discover if Facebook treats images differently from posts.

Based on this small experiment, I think it’s likely that among my friends, Facebook’s News Feed promotes images differently than posts. Yet in our discussions about the finding, my friends also raised a few questions.

First, while this difference likely has a technical explanation, the effect might come from download speeds rather than Facebook’s algorithm. The problem is that images take longer to download than text. Facebook tended to compress my images to around 19kb, which takes a fifth of a second on 2G internet and much less time on 3G. Is it possible that a 1/5 second delay caused a 30% drop in likes and comments?

One friend wondered: might the effect come from differences in the relative popularity of the poems or their authors? To check, I rated the poems by whether their authors were famous and didn’t find any relationship between author fame and reader likes and comments (Pearson correlation: 0).

Finally, it’s also possible that my sample size was too small; my first poem experiment had a 37% chance of observing the effect I saw. If several other people tried this experiment with their own Facebook friends, our confidence in the answer would improve.

Citizen Behavioral Science

Why did I do this project? I have often argued that we need independent testing of social tech, especially when a company’s promises are great or the risks are substantial. At my new nonprofit CivilServant and as a post-doc at Princeton University, I’m working to develop citizen behavioral science, methods for people to imagine and test improvements to our digital lives, while also holding platforms accountable for their role in society.

If you’re interested in citizen behavioral science, send me a note (I’m @natematias on Twitter). CivilServant is a young project, and we can use all the help we can get!

Acknowledgments

I’m incredibly grateful to my longsuffering friends, who participated in this experiment with me. Thanks everyone! ❤ 📈 Special thanks to Cesar Hidalgo, Ken Anderson, and Nick Seaver, whose questions about my first poem experiment prompted me to follow up with this study.