In my vision Syften.com was a tool that scans developer friendly places, like Hacker News, Stack Exchange and Reddit. But my users kept asking for Twitter. So I set out to see what it would take to scan Twitter in its entirety.
A firehose is a complete stream of public messages. I’m already receiving all available Reddit and Stack Exchange content, and Twitter seems to have a comparable amount of traffic so this seems reasonable.
Having found very little information about pricing on twitter.com I dug around the internet and found this Hacker News thread. The large amount of traffic that they mention is not due to a large amount of tweets. It’s because the API sends so much metadata! For example, each tweet contains the full user profile with details such as
profile_background_color. One tweet is represented in roughly 14kB on the wire. That’s a minor road bump though, the real blocker is the price. The thread talks about hundreds of thousands dollars per month. I can’t afford that.
Let’s try a different approach.
Standard Search API
Perhaps I can use the Standard Search API — simulate a firehose by querying the API for all of my users’ keywords.
Even with the 450 calls/15 minutes quota I thought I could pull it off, until I read this: Not all Tweets will be indexed or made available via the search interface.
It was wishful thinking anyway, let’s investigate further.
Premium Search API
The Premium Search API does not seem to have that limitation: it provides Tweets posted within the last 30 days. Presumable all of them. Access to this endpoint isn’t free, but I’m sure Twitter is reasonable.
Let’s see how much it would cost me. $2,499.00/10,000 requests, or $0.25 per request. If I wanted to implement notifications with an up to 24 hour delay, a user on the Premium plan (100 filters) would cost me 100 filters*30 days*$.25 = $750/month. Ouch.
Let’s conclude the Search API was not meant for this and look at other options.
The PowerTrack API
The Power Track API is almost like a firehose. It’s a live stream of all Tweets that match your filters. Up to 250,000 filters per stream, up to 2,048 characters each — perfect, how much does it cost? To use this API, you must first set up an account with our enterprise sales team. Okay, fine.
Creating A Twitter Developer Account
I created the account on 13 August, 2019. Twitter makes it a point to act in a way that shows they care about user privacy, so after exchanging around 5 emails and promising I won’t do business with governments, analyze the tweets or display them anywhere they approved my account 25 hours later. Not great, but not bad either. Let’s contact sales. Twitter recently started focusing on making money and promised to treat businesses better, so we should be able to make a deal.
Welcome To Twitter’s Corporate Nightmare
I submitted my contact request form with sales. After a few days I got no answer. I resubmitted the form. Again no answer. It seems I’m not the only one ignored by Twitter’s sales team. I submitted the form a bunch more times, this time providing email addresses from all of my different domains.
9 days later, on 22 August I finally got an email. I was told we can discuss Twitters API plans over video chat, the soonest available time being 27 August. What’s 5 more days? I waited.
Discussing numbers over video is rather inconvenient, and they didn’t seem to want to give me any concrete information anyway, it was just chit chat. I asked them to send me the proposed plan with all the numbers by email, which they promised to do “once they get back to their desk”. It must have been rather far away because I got the email 12 hours later.
The pricing was ridiculous and the number of Rules I can track was unusable. I asked for pricing of plans with a different number of Rules, to which they asked me to clarify what I meant by “Rules” — a term they used themselves in just the previous email… Infuriating! Are they paid by the number of emails they send? After clarifying I got an automatic response telling me my salesperson is away on vacation until Friday, 30 August. Sure, let’s pick it up on Monday, 2 September.
I got a reply on Tuesday, but it wasn’t the answer I was hoping for. After an unproductive video call, an infuriating number of beat-around-the-bush emails and 21 wasted days I concluded I probably can’t afford any of Twitters premium plans. I say probably, because they didn’t want to share them with me.
Is it impossible for Syften to support Twitter?
One last option: an arcanely named POST statuses/filter stream. The default access level allows up to 400 track keywords, 5,000 follow userids and 25 0.1–360 degree location boxes. If you need access to more rules and filtering tools, please apply for enterprise access. I’m not falling for that one again. Can we make the basic limits work?
On the premium plan my users get 100 filters — meaning I could support only 4 users.
I could ask my users to provide their Twitter Developer Account credentials and process their streams. It seems like a grey-area though, and asking my users to apply for a Twitter Developer Account is not what I would call convenient.
But what if we monitor the 400 most popular keywords, and construct an almost firehose? A nice try, but unfortunately
POST statuses/filter will limit the stream to no more than 50 tweets/second.
The Best I Can Do
If I can’t get all tweets perhaps I can at least get the most relevant ones. I created a new filter:
track="https", language="en". This allowed me to receive around 12% of English tweets containing a link. If a tweet gets retweeted 10 times this gives me a roughly 1-(.88¹⁰)=72% chance of receiving it. Not great, but not bad either.
It has to do for now.
But Not The Best You Can Do
You can create your own app and enjoy the 400 keyword limit of
POST statuses/filter. It’s not enough for someone running a keyword notification product, but it’s generous enough for an individual.
Twitter said it’s working on A more flexible and powerful endpoint to replace statuses/filter. The card got created on 5 Apr 2017, and then on 17 May 2019… it got archived. Luckily a quick look around revealed it got replaced by a new card:
Filtered Stream (Labs)
A new filtered streaming service to replace POST statuses/filter. This will include the same powerful filtering capabilities supported today for enterprise APIs. Filter Tweets for a realtime stream that delivers just the Tweets you care about.
Cool. If I’m able to filter out retweets I just might get all the data I want and still fit in the 50 tweets/second limit. Let’s just hope Twitter’s development team moves faster than their sales team.