Using public data for research? Consider the ethical implications with these insights from fandom

Brianna Dym
CUInfoScience
Published in
5 min readJun 26, 2020
A series of ones and zeros cascade down a blanks screen. They are colored in the pattern of a rainbow.
Image credit: Discovod @ Dreamstime.com

What’s the worst that could happen if your online personal data were shared without your permission elsewhere? For people making wildly inappropriate statements, we’ve seen that they might lose their jobs. But what if you’re sharing something seemingly harmless? What’s the worst that could happen if that fanart of Kirk and Spock holding hands or statement about how much you love the new She Ra series wound up in a Buzzfeed article — or even in an academic article? And what ethical considerations should journalists and researchers using public data from online spaces like Twitter or Tumblr be taking into account?

For most people the outcome of amplifying their online content is mild embarrassment at worst, but for many, sharing even seemingly harmless content carries a certain measure of risk. In an interview study, we explored transformative fandom as a case study for understanding the concerns of vulnerable communities online more broadly. This type of fandom is a corner of the internet where people gather to geek out over their favorite shows, movies, or books while sharing fanart and fanfiction about those stories within their community.

Fandom provides an interesting case study for considering ethical use of public data because the type of content seems harmless. But the community itself has faced stigma and marginalization. A recent survey study we conducted tracks to known demographics of transformative fandom — mostly women and non-binary participants, with 80% identifying as LGBTQ. Moreover, our other recent work on fandom as an LGBTQ support community revealed that many participants aren’t out in their offline lives. Fandom also just has strong, longstanding privacy norms for a variety of reasons. In other words, though sharing artwork online might seems like a fairly low stakes way to be involved in an online community, many fandom participants are in vulnerable positions and have reason to fear for their privacy.

Like many online communities, the interactions between folks in fandom happen on publicly viewable sites like Tumblr (or they did pre-Tumblr-forbidden-nipple-algorithm days). People might write that fanfiction or rant about how they love x, y, and z about the latest season of The Clone Wars, but they’re expecting that content to remain disconnected from their offline lives. People share this content within fandom with the expectation that it will stay there while still taking precautions to preserve privacy, such as using pseudonyms. This expectation is hardly unreasonable, considering that most social media users are unaware data like their tweets is of interest to researchers.

What’s the worst that could happen if your content showed up somewhere else without your permission? Participants described fears related to losing jobs or accidentally being outed as LGBTQ to homophobic and transphobic family members. People shared data in fandom “for fandom” and were frustrated they didn’t have any way to protect it from secondary use short of choosing not to participate in these communities, which can further isolate marginalized people from important communities of support.

The concerns our participants expressed are not unique to fandom. However, thanks to a long history of navigating the social stigma attached to fandom as well as dealing with a wraught relationship with copyright battles, community members of fandom are aware of the risks associated with sharing their data online, and were all too happy to share their thoughts on how they feel outsiders to the community ought to use their data. These insights inform how professionals like researchers and journalists can more ethically interact with public data online, particularly when it comes from vulnerable or marginalized communities, even if the data itself does not seem sensitive.

So what can researchers (and other folks making secondary use of public data, like journalists) do to help vulnerable people online stay safe?

  1. Obtain permission — It doesn’t really matter if the data is public. If you’re seeking to minimize the harm you might cause, obtaining permission from folks before amplifying their content is the #1 way you can assess the risks involved for this person. Not everyone wants to be featured in the next “45 wholesome tweets to brighten your day” list, and for legitimate privacy concerns. And in cases where content might be quoted verbatim, e.g., in a research paper or in a news article, it may not be laborious to obtain permission only from those whose content will be amplified, even if not for everything you analyzed. (A caution: It is possible that people may react poorly to knowing that they were part of a research study, so keep in mind possible harms when navigating this.)
  2. Obscure data — If you can’t obtain permission or if you just want to reduce traceability to cut down on unintended harms, consider obscuring the data (through techniques like paraphrasing or other ethical fabrication) you quote to limit how easily people can trace the content to its source. Even if your participants might not think the data you collect will harm them, you never know when your data set could suddenly turn into a source of targets for trolls and other bad actors.
  3. Spend time with the community — Most importantly, get to know the space that you’re taking information from. This important practice applies even if you are only analyzing data in aggregate and there is no risk of amplification or other privacy violations. The better you know a community, the better you can understand the norms and potential harms in researching a community.

When you share something in an online community, odds are you aren’t thinking about that data being used in a research context, picked up by a journalist, or weaponized by trolls. Usually, people are sharing to connect with a community and be part of a conversation. When we as researchers scoop up that data and share it in a different context, we run the risk of amplifying it to the wrong audience as well as exposing someone to risks they perhaps didn’t consider before making that post, whatever seemingly harmless content it might contain. By taking just a few precautions into consideration, we can help reduce the unintended harms our work might otherwise cause.

For more detail, check out our paper recently published in Transformative Works and Cultures:

Dym, Brianna, and Casey Fiesler. “Ethical and privacy considerations for research using online fandom data.Transformative Works and Cultures 33 (2020).

--

--

Brianna Dym
CUInfoScience

PhD Student of information science @ CU Boulder. Internet Rules Lab researcher.