The previous post in this series, How I Built Emojitracker, was quite popular despite being long and rambling, and continues to be linked to fairly heavily. However, quite a bit has changed since then, so 1.5 years later, it seems an update about Emojitracker was overdue. (P.S. If you haven’t read that one, this one won’t make any sense, so do that first.)
Traffic and press coverage have continued relatively unabated, at this point I’m reasonably confident the project is here to stay. Emojitracker has now analyzed 8.9 billion tweets, and the press continues. Emojitracker was even used to determine the choice of ❤️ as the word of the year for 2014, and then there was this bombshell, confirming Emojitracker’s acceptance into the Emoji Illuminati:
While I will only focus on the major technical updates in this article, I feel I should note that a fair amount of nontechnical work goes into the Emojitracker “empire.” I wish there existed a better open-source contribution model for thing such as marketing, community management, press relations, etc, because there is a ton to be done to keep this thing going that could benefit from the help of not just technical people. If someone knows how to make this happen, I’d love to hear your ideas.
But this post is going to be highly technical. Worth caveating, while the previous post had a narrative arc that tried to explain things to newcomers, this one is going to be mostly a brain dump of pure technical updates due to time constraints — so that people relying on the previous as a technical guide to Emoji or Emojitracker’s architecture are not getting out of date information. I started the last post with the caveat “I am not an engineer” (which I caught a bit of flack for)— I’ll start this one with the caveat “I am not a writer.” Here be dragons! (🐲🐲🐲)
In this post, my plan is to discuss:
- The modularization of Emojitracker into over a dozen independent open source projects.
- Updates to UTF encoding of Emoji (“variant encoding”) that offered a new solution to one of the frontend rendering issues discussed in the previous post.
- The optimization process of the emojitrack-streamer service — including standardizing the API, building an acceptance framework for that API, building benchmarking tools, and then finally doing a platform migration to observe the results — twice. (The end result being a 25x increase in concurrent streaming connections per server.)
- The optimization for the emojitrack-feeder service — lots of twiddly micro-optimizations involving memory allocation, event loop handling, and hopes and dreams.
Modularization of services
As Emojitracker has grown, it’s become necessary and helpful to break out components of it into independent projects, which can be tracked and updated independently, with different dependency chains.
To see this idea at a glance, below are the Emojitracker repositories on GitHub, with activity visualized over time:
While “service oriented architecture” is increasingly becoming a meaningless buzzword, it does pretty accurately describe the Emojitracker platform today. Components communicate strictly via defined APIs over the network.
The key advantage of this modularization has been it has become significantly easier to upgrade features for certain components of the project in isolation , or even perform a number of platform migrations completely invisibly to the other components of the overall website.
One of the tricky things I’ve noticed is that when a project becomes complex enough to span so many separate repos is it’s hard to get new contributors up to speed. I’ve turned the main Emojitracker repo into a “table of contents” of sorts, but I’d love to figure out how to do more here.
Unicode Variation Selectors
In the previous post, I discussed the problem stemming from trying to render emoji characters that also had a Unicode plaintext equivalent. If you recall, we relied upon some pretty nasty font-face hacks to work around the issue.
Thankfully, with Unicode 6.3 implementation becoming more adopted, there is a much cleaner solution. Unicode specifies what are known as “variation selector” code points, which can come after a codepoint and are used to indicate an alternate variation of the glyph. There are two in particular that we care about for this.
VS15 (U+FE0E) specifies the “text” variation of the glyph, whereas VS16 (U+FE0F) will get you the Emoji bitmap glyph. Support for this has been built in to MacOSX and iOS for a while now, so Emojitracker has been switched over to using that mechanism, eliminating the complex hacks from before. Instead, we simply rewrite all Emoji that support variants (but don’t have one specified) to use the optional variant form. This means more multiple-codepoint characters, but we were already dealing with that anyhow.
Of course, as you probably already know from this series of posts, it’s never actually that simple. There are some gotchas to be aware of: for example, as seen above, in some of the existing double-codepoint Emojis, the variation selector actually goes in the middle. Wat. Oh well.
Optimizing for High Volume Streams
The most heavy portion of Emojitracker are the web streamer boxes, which consume the internal Pubsub event feed from Redis and push out relevant events to connected clients via SSE.
This is the part of Emojitracker that needs to scale with traffic — in the previous post, when I mentioned spinning up extra servers during usage spikes, these were the types of boxes we need more of.
There are a lot of websocket/SSE libraries out there that claim to be “high performance,” but their numbers typically assume a published event rate on the order of a chat app broadcasting out messages a few times a second. Emojitracker streaming servers need to send a minimum of 60 messages per second to every single client (more if they are subscribed to detail streams as well), so it’s considerly more demanding. The original production version I wrote in Ruby could roughly handle 40–50 simultaneous clients on a server instance before it started to see some latency degradation.
That number is actually not terrible, but as people tend to leave Emojitracker up and running in a window (sometimes for hours or even days), usage spikes related to major press coverage would sometimes mean I could need a few dozen streamer servers running to handle the load — maybe not terrible for a business, but for me this was all paid for out of pocket. With the popularity of Emojitracker still continuing to grow, I needed a more sustainable solution.
The ideal thing to do first in these situations is to try to rely on something off-the-shelf, hopefully someone else has solved the problem for you. I tested a number of the popular existing libaries with mock data to see if they could do better, and found that my results were, unfortunately, pretty consistent with everything else out there. But could this really be the best possible? After doing some back-of-envelope math, it seemed like there was no theoretical reason that the hardware shouldn’t be able to support at least an order of magnitude more capacity.
In order to do better, I was going to have to roll my own.
Developing a Streamer Spec
Before embarking on a port, I knew I was going to have to be careful to make sure the new version handled everything in the API exactly the same way, so that I could seamlessly swap it in. This was a critical part of Emojitracker’s infrastructure, and replacing it would be a risky operation.
The best way to document is to document in code, so I created an acceptance framework that could be run directly against a staging server and make sure it did everything properly. This is a situation where unit testing is not the best solution, as I wanted to verify behavior cross-platform, and in environments where other factors (the routing layer, for example) would have impact on the results.
This proved to be invaluable, and additionally helped nail down some of the “taken for granted” particulars of the API in more explicit detail than before. (I also owe a debt of gratitude to the excellent Heroku routing team who spent a great deal of time and energy to help debug some of the more arcane interactions experienced at the routing layer.)
Of course, even if the port worked, I still needed to know “is it faster?” Turns out this is actually a nontrivial question to answer when it comes to HTTP streaming. Most HTTP benchmarking tools try to open as many connections as possible, primarily in serial, and measure how many connections can be completed (e.g. closed) in a time period.
SSE is an entirely different ballgame despite occuring over normal HTTP connections. I knew from previous monitoring that what I needed to monitor was the actual message received rate spread out across multiple connected clients, versus the expected rate of sent messages. Additionally, there were cross-interactions between the number of existing streams and their message receive rate and the latency of new stream subscriptions. No existing benchmarking tools monitored the proper dynamics for this situation.
So in order to do this, I again had to roll my own (sigh). I reluctantly created a benchmarking tool sse-bench, which tests exactly that. It still needs some cleanup but it was mostly serviceable enough for the task at hand.
Enabling Production Testing
Testing in isolation isn’t enough — the real world is messy, so to avoid unexpected results, you need to be able to test in production, with real traffic. …But of course, you also don’t want to break everything.
The emojitrack-web frontend was modified so that based on configuration variables, a certain percentage of clients could be directed to different streaming servers. This allowed me to test alternate implementations of the streaming server with real traffic by handing them some small percentage of production traffic and seeing how they handled the load.
Porting to NodeJS
The ideal candidate for the job (to my naive mind) seemed to be NodeJS. After all, it excels at I/O, and this was I/O, right?
Using the streamer-spec, it was easy to write a complete port to NodeJS, which ended up feeling pretty clean to me (using CoffeeScript made it easy to port the Ruby code, since they use similar idioms). But when I benchmarked it, I was disappointed. It was better, but not significantly better. Depending on the situation, it seemed to buy about an overall 20% increase in capacity, and also gained a more linear degradation curve once it went over capacity. Still, being able to handle usage spikes with 10 servers instead of 12 didn’t seem like a massive win to me, and my pencil calculations suggested there was still a lot of potential remaining. This is the future, I was promised jetpacks!
I’d still like to know why it didn’t perform better, but at the time I asked 5–6 people who knew NodeJS well to review it and tell me what was wrong, and each of them came up empty handed in terms of yielding big perf changes. I still think someone smart can find the answer and make it super fast, and I’m sure that now that I’m publishing this someone will and embarrass me (please do!). But at the time, I got frustrated and moved on. Performance shouldn’t need to be a black art with arcane rituals and secret knowledge. I needed something better suited for my use case.
Porting to Go
Finally, I tried porting it to Go. Go’s concurrency primitives were actually extremely well suited to the data pipeline here, and it ended being mostly quite fun to write (albeit requiring an extremely different way of thinking about the pipeline, and significantly more code).
After a little tuning, the results were surprising: I was able to support about ~1200 simultaneous clients per server before seeing latency degradation, a 25x improvement. This means that in almost all cases, 100% of Emojitracker’s stream traffic, even during usage spikes, can be handled with only two streaming server instances. Phew!
(I carved out the generic portion into a library that be used for your own streaming needs, found on GitHub as sseserver. In my casual benchmarking, it handles streaming 100,000 messages per second to web clients without breaking a sweat. Share and enjoy.)
When the time came to bring in this replacement for real, the aforementioned A/B infrastructure enabled me to ramp it in slowly, progressively moving the traffic balance over time. For the first day it handled 20% of all production traffic, then for a week I sent 50% to each platform, comparing how they handled various production situations and tuning. Finally, full cutover. All completely invisible to the web clients. And I could afford to buy lunch at a New York restaurant again (well, almost).
While the streamers bear the bulk of Emojitracker’s scaling needs, the feeder is of utmost importance, and the pressure on it had been increasing steadily as the global popularity of emoji grew.
Emojitrack-feeder does three primary things:
- Streams from Twitter Streaming API: handles connections, reconnects and backoffs, and event/error handling for streamed items from the API.
- Process entities: deserialize JSON, pattern match on content to identify Emojis, pick from and transmute a number of fields to build our optimized output data.
- Push to Redis: Updates a number of keys for each emoji symbol, and pushes to PUBSUB streams.
Relatively simple, right? But as we’ve learned: at very high volume, nothing is simple.
Stream processing of this kind can be notoriously difficult to parallelize. Since we only get the source from Twitter as a single stream, we’d have to manually fan out to distributed systems on our side, and the overhead of doing this communication is actually far greater than the gains from parallelizing in our configurations, since we are significantly more IO bound than CPU bound (e.g. we already spend more time moving things around than performing operations on them). So keeping things on a single system is better in our case.
At a current average of ~500 tweets/second, this means the total time to fully process each tweet has to be under 2 milliseconds. Peaks and bursts push that significantly lower. So to scale emojitrack-feeder to match expected growth of Emoji usage worldwide, I had to get into some serious optimizations. Let’s talk about some of the more interesting ones.
Process entities: Avoiding unnecessary type conversion and memory allocation
The Ruby gems tend to deserialize JSON into Twitter::Tweet objects which are highly optimized, and nice for using in Ruby. Some things are almost too nice. For example, if you access a field that represents a specific data type, you get back a native Ruby class (DATE or URI, for example), instead of a string.
Now, the Twitter gem is very smart about this, and only does this conversion upon first demand (via method name overloading), and then memoizes the results so subsequent access is fast.
However, in our case, we’re going to be translating those URLs right back into strings anyhow, so we’re doing a back and forth double type conversion that’s unneeded. What’s worse, the memoization is not needed by us since we only access the fields once, but it creates a heap memory allocation which later has to be garbage collected. Why does this matter? Bear in mind, in our use case we create and then discard thousands of these objects per second.
To get around this, I modified our code to sneak past the accessor methods and peek at the attribute properties directly, use them directly when creating our new ensmallened data object, which cut out most of the unnecessary conversions/allocations and sped up that hot path.
ALWAYS BENCHMARK these sort of changes. In this case I used the benchmark-ips library to compare the performance of the relevant methods with every single micro-change, before and after, so I could see what made a difference of how much. In one iteration of my improvements I was able to pretty much destroy all performance gains with a tiny change, which I would have never noticed if every part of the overall change wasn’t comparatively benchmarked, and rather I only looked at the changeset as a whole. Real world applications are complex and can defy all common sense when it comes to performance—always measure when it matters.
Pushing to Redis: Move pipeline creation to the server
If you recall from the previous blog post, our Redis writes were fairly simple:
For each “update”, we were actually making several distinct Redis queries. To make it more efficient, they were pipelined such that they are all sent in bulk, avoiding back and forth round-trip time over the network.
However, I realized there was some necessary duplication with this approach — some of those queries contained the same data, so the output network IO (in terms of bandwidth) was larger than it needed to be. Additionally, our client still has to parse the success status of all those commands, which adds a tiny bit of overhead.
Interestingly enough, Redis actually allows for server side execution of scripts. So what we can do instead is push as much of the logic to a Lua script which Redis can execute natively, and perform a remote-procedure-call from our script. So instead, we use the following Lua script:
When emojitrack-feeder boots, this script is loaded directly onto the Redis server via the SCRIPT LOAD command, which returns a SHA1 digest to reference the script. We then can use that digest to make a single Redis call via EVALSHA and pass along just our two variables — everything else then happens server side. The new code execution on our side therefore simply looks like this:
REDIS.evalsha(sha, , [matched_emoji.unified, status.tiny_json])
That’s it! By reducing down to a single remote call, this not only cuts down on bandwidth and feeder processing requirements, it opens it up to more easily adding additional database side functionality per each update in the future, moving responsibility out of the feeder app itself.
Longer term: migration to another platform?
TLDR: Probably not.
I started with looking at a full port to a different language, after all “Ruby is slow lol.” and as seen in the previous section, I’d had great success with the Go port of emojitrack-streamer.
However, never blindly trust prevailing wisdom. The maturity of the libraries you are going to use and the methods you use them with will make a lot more difference than the platform itself.
In my case, emojitrack-feeder is highly dependent on Twitter Streaming API and Redis libraries. There are thankfully good off-the-shelf Redis libraries out there for most platforms, but Twitter Streaming API libraries are another story.
The twitter and tweetstream gems for Ruby are excellent — especially when compared to what exist on other platforms. Most other implementations lack proper error/event handling, and I found even the “fast lol” languages such as NodeJS and Go had libraries with performance characteristics that were lacking in comparison. I have a very detailed, ongoing investigation of this with benchmarks that you can read about here: the twitter streaming showdown.
Current library benchmarks aside, if I ever do decide to do a platform migration, Elixir is the most promising bet, as it offers some capabilities that would be very labor-intensive to replicate on other platforms. Being built on the Erlang VM (BEAM) means amazing support for distributed message passing, supervision trees, and even code hot-swapping, where a module can be updated in place without restarting the app.
Well, perhaps most pressingly, there’s now this:
With these newly standardized Fitzpatrick Skin Type Emoji Modifiers, there’s still a ton of work to be done just to stay up to date with the evolving Emoji landscape. Think that one is going to be easy? If you’ve been paying attention to anything in these articles thus far, you can guess that’s pretty unlikely. So, if you’d like to get involved in helping make Emojitracker more diverse (there should be some interesting data with this one!), ping me.
As Emojitracker now approaches 10 billion tweets I know the scaling work will also have to continue. These Emoji thingamabobs aren’t going away anytime soon, and the rate continues to rise.
As for everything from first article I wanted to do — history over time, predictions, trend detection — nope! Just scaling the service has been pretty much my full-time side-project job. I’d love for some smart people to take this thing further than I can on my own. Maybe you are a data science student looking for a huge realtime dataset to analyze? I have millions of smiling poops for you. Perhaps you’re interested in making dynamic visualizations to do something interesting with hundreds of emoji tweets per second? The streaming API is public, and I’d love to help you get started.
If you’d like to contribute with code (or just hang out), you can now find me in #emojitracker on Freenode IRC. If you aren’t an open-source developer or don’t have time, you can also help out by donating via Square Cash or Gratipay to help pay the server bills.
Thanks everyone for the continuing wild ride.