Twitter Algorithm is Open Source now

Manoj Ahirwar
Geek Culture
Published in
2 min readApr 1, 2023
Photo by Edgar on Unsplash

So Twitter Algorithm is out. Here are the high-level points from that

On GitHub, you’ll find two new repositories containing the source code for many parts of Twitter, including our recommendations algorithm, which controls the Tweets you see on the For You timeline.

The foundation of Twitter’s recommendations is a set of core models and features that extract latent information from Tweets, user, and engagement data.

These models aim to answer important questions about the Twitter network, such as, “What is the probability you will interact with another user in the future?” or, “What are the communities on Twitter and what are trending Tweets within them?”.

Twitter is using invisible subreddits via Topics to algorithmically organize tweets. Because the For You page isn’t chronological anymore, viral tweets can’t be as timely as they used to be.

They have to be kind of evergreen. It helps if they’re commenting on something that’s already going viral. And it helps if you post a thread, reply to yourself, or create some kind of discussion in the replies. There also seems to be a bigger emphasis on video now.

One of Twitter’s most useful embedding spaces is SimClusters. SimClusters discover communities anchored by a cluster of influential users using a custom matrix factorization algorithm. There are 145k communities, which are updated every three weeks.

Communities range in size from a few thousand users for individual friend groups, to hundreds of millions of users for news or pop culture. The more that users from a community like a Tweet, the more that Tweet will be associated with that community.

For each user session, Twitter extracts around 1500 tweets that it believes will potentially be of interest to each person, before ranking them in the ‘For You’ feed. For You timeline currently consists of 50% In-Network Tweets (people you follow) and 50% Out-of-Network Tweets, avg.

Twitter also predicts the likelihood of engagement between two users. ‘The higher the Real Graph score between you and the author of the Tweet, the more of their tweets we’ll include’.

Another factor is the tweets that people you follow are engaging with — which is not a revelation, just a point of note.

Tweet ranking is via a ‘~48M parameter neural network which is continuously trained on Tweet interactions to optimize for positive engagement (e.g. Likes, Retweets, and Replies)’. There’s no note, however, on how Twitter determines positive vs negative engagement in this context.

--

--