Mastodon is a zero-privacy network — here is what you need to know

Stefan Kopf
9 min readJan 15, 2023

--

The Fediverse, and especially Mastodon, gained a lot of media attention recently. Many see it as the “better social network” and project whatever they consider “better” into it. This can be dangerous — especially with respect to privacy.

When you provide information to a Mastodon server, like your name, picture, bio or a post, you effectively give your irrevocable, unlimited, worldwide consent that anybody can do whatever they want with your data. It will be displayed in any context on arbitrary websites, it will be used to find your profile, it can be analyzed, and it will be archived long term.

Unfortunately, many Mastodon instances do not emphasize this aspect properly, or even promote themselves as “privacy-friendly”. This is dangerous. It might entice users to provide sensitive information they do not want to be shared, indexed and archived.

You can observe this regularly on attempts to create search engines for the Fediverse. This happened twice just recently with two independent approaches, fedsearch.io and fedisearch.io. Both got dogpiled immediately and received a massive backlash based on privacy concerns.

How the Fediverse works

The Fediverse is a decentralized social network, consisting of thousands of independent servers communicating via the ActivityPub protocol. When setting up an account on a Mastodon server, your information is initially only stored on this instance (1). With your first post, or any other interaction, your data is then transferred to other peer servers and replicated there (2). These servers are then using this information to display your post on their website (3), to allow their users to search for your profile, or to interact with your content.

Replication of data between instances

ActivityPub does not initially replicate your data to all instances. It only sends your post to instances which house at least one of your followers. These can then “boost” your post by announcing it to their followers (1). These will then in turn contact your home instance and ask for the announced post (2) which will then be transferred to them as well (3).

Boosting a post

A similar mechanism is applied when someone replies to your post. Based on the popularity of your post, it might reach a broad subset of the currently 25K+ instances.

How your information is used is solely up to these instances and completely out of your control. You can easily see this when looking at any mastodon instance, like mastodon.social. It displays a “federated timeline” of posts originating from other instances. This information got transferred there from these instances and is now permanently stored.

mastodon.social displaying posts from journa.host

The information is further indexed and used for a profile search across the known Fediverse. It is up to the software and configuration of each instance how powerful this search feature is. Some instances only search in the profile name, other instances also search across the bio, or take other attributes into account as well. A search for the term “Böhmermann”, a famous German comedian, returns more accounts than just those with an exact match in the name:

mastodon.social search result for “Böhmermann”

And there is not only Mastodon. The Fediverse consists of thousands of servers running dozens of different software projects, all implementing the same ActivityPub protocol. There is no guarantee that the same limited search capability provided by your home instance is implemented across the entire Fediverse.

Long-term archiving

Once your information is replicated across hundreds, if not thousands of instances, it will be permanently stored there. You can still find old posts even from banned accounts or dead instances on other servers.

If there is a post you want to get removed, this is close to impossible. You would need to contact all instances in the Fediverse that replicated and stored this post and ask for removal.

This is a serious problem for sensitive content, especially when the creator intends to remove it in the future. For example, you can still find posts from switter.at, a now closed instance dedicated to sexworkers, on various other instances:

Redacted screenshot showing posts from switter.at

But I opted out from search engines

Mastodon has a horribly misnamed option to “opt-out of search engine indexing”. Many users assume that this would prevent them from being indexed across the Fediverse. A good part of the negative comments on Fediverse search engines is related to this option.

The problem is: even Mastodon itself does not respect this option. If you have it set on your profile, just try to search for your name on your own home instance, or any other instance in the Fediverse. Your profile will come up.

This option was intended to limit how your profile appears in web search engines, like Google or Bing. However, that does not work as expected. When activated, it sets the “noindex, noarchive” meta tag on your profile page.

HTML header snippet of profile opting-out from search engine indexing

But since all your data is replicated to other instances, and these instances display your information as part of their website, Google will simply pick it up from there. In some cases, it even picks it up from a different page on the same instance:

Google indexing a post from a profile which opted-out from search engine indexing

There are discussions around this feature on GitHub here and here, showing impressively how this feature is misunderstood by users.

Feeding data into Meta’s ad network

Facebook has defined the Open Graph Protocol to “allow any web page to have the same functionality as any other object on Facebook”. Mastodon, as well as many other software projects, are using this standard to make their content easily accessible by other networks. And this applies to profile information as well as individual posts:

Usage of Open Graph og:title and og:description tags on profile page
Usage of Open Graph og:title and og:description tags on post detail page

This allows Facebook, among others, to consume this content easily and display it on their website:

Original post on Mastodon and same post desplayed in facebook

Facebook even offers a full text search across posts:

Search for post message on Facebook finds post — demonstrating that post message is indexed

This is limited to posts Facebook knows about, for example after someone shared a link, but still concerning considering the privacy expectations of some users.

Setting the right expectations

What we see here is pretty much what you would expect from a federated twitter clone. Gnusocial, the initial project which started the Fediverse, is very clear in it’s privacy policy:

Typically that means that the data can be copied far and wide, for commercial and non-commercial purposes, and in modified or unmodified form. If you’re not OK with that, don’t use the service.

Unfortunately, Mastodon, and other projects in the Fediverse, are not as clear in their privacy policies:

We do not sell, trade, or otherwise transfer to outside parties your personally identifiable information.

There are multiple instances out there in the Fediverse dedicated to marginalized groups that have been facing herassment on other social networks. These instances typically apply a strickt moderation policy and curate the content of their timeline. This happens via “defederation”, a mechanism where an instance owner can block incomming content from other instances. The strictly curated timeline on this instance now creates a safe place feeling which can entice users to share very personal information.

While “defederaton” is limiting incomming data, it has (almost) no impact on outgoing data. You can easily find profile information with sensitive PII announced publicly on prominent websites, where it is possible that the profile owner only intended to share this information on the home instance.

Legal aspects around GDPR and copyright

Many comments around Fediverse search engines are questioning their compliance with GDPR without an explicit opt-in. These comments then started some more generic discussions around the Fediverse and GDPR.

Is it even legal to run a Mastodon server?

We all know that you cannot simply download a picture from the internet and use it on your own website. But this is exactly how the Fediverse works: After an instance receives a post or a boost referencing an image, it reaches out to the home instance, downloads the image, stores it locally, and then uses it on its own website.

I am not a lawyer, but one could argue that the author has given an implicit consent to this usage by posting the image on the Fediverse. Without this consent, the entire Fediverse would not work. Gnusocial, again, is very explicit about this by enforcing all content to be uploaded under CC-BY. Mastodon does not make any statements about copyright.

The Fediverse has recently seen a big influx of professional authors, like journalists or artists. How can they protect their work? What if they do not want their content to be shown in a specific context? One could setup a Mastodon instance, follow accounts boosting a couple of journalists and artists, create a custom theme making this instance look like a news site, and then earn money via advertisement.

If you argue that by posting on a Mastodon instance, you did not give your implicit consent to any usage, this would be a serious problem for all Mastodon instance owners. One could simply post a popular photo, wait for it to be boosted and replicated far and wide, and then start copyright infringement lawsuites against instance owners.

The discussion around GDPR is similar, but more complicated. Especially since the Fediverse is using the industry standard webfinger to publish profile information, I think it might be fair to assume that this information has been made “manifestly public”. But I really like to hear the opinion of a specialised lawyer on this topic.

Conclusion

Please toot responsibly!

Mastodon is a great software project and ActivityPub is a decent protocol — for its intended use case: a federated, Twitter-like microblogging service. Unlike a centralized service, it is replicating your data between different instances, with each instance deciding on its own how it handles your data. This is a great approach to prevent a single entity from having full control over the network. The flipside of this coin is: no single entity has full control over the network. Nobody can enforce privacy guarantees. Meta can be hold accountable if companies scrape profile information, in the Fediverse this information is inherently public and accessible. There is no single entity for take-down notices. It is already hard to get content removed from Twitter, Facebook or Instagram. At least, you can delete your own content. In the Fediverse, it is close to impossible to identify all instances on which content is replicated, let alone to get it removed.

In addition, it requires you to provide your content under a very permissive license, so that every instance is able to display your content. It is a basic principle of the Fediverse that you do not have any influence on what instances actually display your content and in what context.

You need to be fully aware of these aspects and keep them in mind for every piece of information you share in the Fediverse.

Gnusocial is very clear about these aspects, Mastodon is not.

Proposal for a better approach

If you are concerned about privacy and data sovereignty, and you are interested in technology, you might want to take a look at the Social Profile Exchange Protocol.

I started this protocol initiative a while ago after being disappointed by existing social networks and none of the proposed decentralised alternatives seemed to get it right.

--

--

Stefan Kopf
Stefan Kopf

Written by Stefan Kopf

Software Architect in my day job, initiator of SPXP as side project