Scaling the 8tracks Music API
A story of data management
At 8tracks, the most important API system is the Player. The Player is a set of HTTP endpoints that control which song a client is/should be currently playing and maintains the user’s state while “playing” through the playlist. Some example state data are tracks the user has played, number of tracks skips the user has attempted, and whether the user listened to the playlist. Over half of requests to our application servers are Player requests and historically was a scaling pain for us.
Scaling the Player is mostly about managing the data in our databases. The Player endpoints all do essentially the same thing with regards to the state:
1. Read the current state from the database
2. Manipulate the state (move to next song, skip the current song, start a playlist)
3. Write the state to the database
The endpoints are backed by ruby rack handlers and are fast enough clocking in at ~35ms per request. Retrieval and storage of the state is simply get-and-set style queries to the database. Nothing complicated.
The problem emerges when we manage the data in our database. We keep state for a user and the playlist being listened to for at least a month. This allows a user to listen to a long playlist partially and resume it at a later time. We assume a month is a long enough time for a user to remember they should continue a playlist. After a month we are free to discard the state.
8tracks currently receives ~500K unique listeners a day. An average listener will listen to ~3 mixes per day. If the player state were represented as a single record in a SQL table with a unique index on user and playlist, we’d create 45 million (30 days x 500k users x 3 playlists) records a month. Even with an index on an updated_at column, deleting 1.5 million records that are older than 30 days from a 45 million record table in MySQL isn’t exactly performant.
A little history
I mentioned storing the state in a MySQL table because it was the first implementation of the Player system. 8tracks is a Rails shop and we have limited resources. Therefore, early on the Player system was built with Rails’ ActiveRecord backed by a MySQL table. Every night a cron job query ran to purge stale data older than a month. It was simple and very effective.
It worked for the first three years of 8tracks. Since there weren’t many users and playlists data management was easy. The table had an updated_at column with a MySQL index so querying for stale data was efficient. When data grew large enough that a single delete query caused problems, we changed the script to delete records in batches. No big deal.
Then growth happened
When we started seeing growth we stored more records. Storing more records meant more delete queries (in batches). More delete queries on a highly active read/write table means slower delete queries. The MySQL server saw more CPU and disk IO spikes. This lead to slower query times across the server. Quality of service for the entire app dropped. No bueno.
We’re weren’t (and still aren’t) a large shop at the time so we couldn’t throw money at the problem and upgrade our database. Deleting stale data from the database shouldn’t affect quality of service. We had to do something.
At this time I’ve been working at 8tracks for at least 6–9 months. I was using redis successfully for other features so I decided to use redis for the Player.
The main property of redis that made it extremely attractive for this problem was redis key expiration. I was also unable to detect any noticeable slowdowns when a large amount of keys were expired. I also went the extra mile and dug into the redis source code to figure out how expiration worked. I wanted to knock this scaling problem out of the park!
So I rewrote the Player to use redis as a data store (side note, this was a fun refactoring which I’ll write about). All redis keys tied to a playlist and user were set to expire after one month. I no longer needed to maintain a script to remove stale data. Code deleted!
The only problem was redis is an in memory database. Therefore the amount of data I could store was constrained to the amount of memory the servers had. Luckily, we weren’t storing that much data at the time so one ~30GB server on Amazon EC2 was enough to hold all Player data for the entire service with plenty of room to spare.
Several months later
This solution started to show cracks. We were growing faster than expected and memory on the redis box filled up. When I configured redis initially, I set the maxmemory-policy to allkeys-lru. This way, when memory was maxed out, redis would evict older keys (side note: I setup a redis server with maxmemory-policy = noeviction once. I had an awesome wakeup call a week later when the server ran out of memory and the site was down. I no longer use setup a server with noeviction.) Well, traffic increased and the average existence of redis keys went from a little over a month to three to four days. Users started complaining that 8tracks would reset the playlist they were listening to the day after they listened to it. No bueno.
The solution? Obviously, build a bigger box! I built a new server with more memory on Amazon’s Elastic Cloud Compute (EC2). More growth means we could spend more money!
This calmed our users down a bit, but it’s obviously not the long term solution we wanted. Memory is very costly to scale on EC2. And 8tracks ain’t rich!
We needed another solution. A solution (database) that could store more data than the available server memory. And well, Elastic Block Store (EBS) volumes were cheaper so I wanted software that utilized disks and memory.
I looked around for other open source software that could fit the bill. After a few days of research I settled on couchbase as it essentially behaves like redis from a programmer’s perspective — a key-value store. The feature that caught my eye was data could be larger than available system memory. This wasn’t the case with redis.
I spent a few days reading documentation, playing with the tools, and finally refactoring and deploying the 8tracks app with couchbase setup. Everything worked pretty smoothly for a few days. Then the memory on the machines maxed out and I remember being on BART at the time when the site slowed down to a crawl. My boss was trying to contact me during this time and was freaking out a bit. Good times.
I couldn’t figure out what had happened, but after looking at the graphs (which by the way, couchbase’s built in monitoring tools were awesome) I noticed it had to do with moving in-memory data to disk. I spent two to three weeks scouring the web and trying many different couchbase configurations to resolve the problem. Sadly, nothing worked. Every time a couchbase machine ran out of memory, the server’s disk spiked, and all operations on the database slowed for several minutes. This in turn backed up our application severely. No bueno.
I scrapped Couchbase. I went back to the drawing board again. Feeling a bit burned, I was more cautious with using an unknown piece of technology for a very critical part of the application. At this point I looked back at the two databases I knew very well — redis and MySQL. Was there some way to use either one of the databases again? Could I re-architect the code in a way to allow us to scale horizontally without slowing down the service when the app needed to remove stale data?
I hunkered down and looked at the strengths and weaknesses of MySQL more.
Deleting rows from a large table by iterating over an indexed datetime column is slow in MySQL.
But one day while doing other database administration, I deleted a very large table (~100 million records) from the database. I did it during off peak hours. I had every possible monitoring tool open in my browser (New Relic, our internal monitoring tool, Amazon’s RDS monitoring tools, an SSH tunnel tailing logs from one of our application servers, a terminal tailing the log of my development server…).
Then I deleted the table.
And … nothing.
The table was removed. Really fast actually. Even with ~100 million records in the table. And it didn’t affect the database or application like I thought it would.
The devil we know
With this knowledge I re-architected the Player code once again to use MySQL.
The way I changed the architecture was to partition the data by date. Each day, a new table was created and all new and existing player data stored in this table. Every night tables older than 30 days were deleted (simple ruby script looking at the table’s name which had a date suffix like _YYYYMMDD). So far so good.
There was still a problem though. What about users who wanted to continue a playlist they started a week ago? If a user listened to a playlist a week ago, stopped it to go on spring break, and came back to the playlist again the application should continue where the user left off. I needed a solution to find the player state from a week ago for this user.
In MySQL, I could search each table in reverse chronological order but that’s 30 queries and 30 round trips to MySQL — not the most efficient set of queries. Or, I could create a giant UNION query that searches all tables at once but that seems inelegant and would probably cause other problems — seems a little risky and at this point in time I’m not interested in risk.
Being a simple programmer that I am, I started with a naive approach. I needed a reference table or lookup table. Something that looked like:
> select * from player_lookup_table limit 3;
user_id | playlist_id | data_partition
1 | 100 | 20150220
1 | 101 | 20150221
9 | 1423 | 20150301
Now given user_id 1 and playlist_id 100, I knew to look into the table suffixed with 20150220 for the user data. Putting this data in MySQL wouldn’t solve my problem though as it put me right back at square one. I still had to delete the stale records from this massive table.
This data looked awfully like something a key-value database would be good for though… maybe redis?!
What if that reference table was in redis? Given a user_id and a playlist_id I could call a redis#get on a computed key (maybe something like “uid:1:pid:100”) and retrieve the partition (“20150220”) where the data was stored. Then I could set expiration on that lookup key and it expires without my intervention. Since the amount of data (an 8 character date string) I had to store in redis was significantly smaller than the data in MySQL, I wouldn’t have to scale out the redis boxes as fast as I did in the original redis implementation!
The scaled solution
This is the final and current solution used on 8tracks to handle the Player system. When the app loads state for a user and playlist the app looks into redis first to see which MySQL partition the data is stored. If the redis key doesn’t exist then the table used is the current day’s partition. Then the app queries the MySQL partition to load the player data.
After manipulating the data (moving to the next track, etc.), the data is inserted into current day’s partition and the lookup key in redis is created/updated to reflect the new partition. We make this change in partition to keep the data fresh — remember the user that went on spring break? If the user is listening to a previously listened playlist we want keep that data around for another 30 days (i.e., refreshing the time to live for the state).
The last piece is a nightly job that runs to remove any MySQL partitions older than 30 days. Piece of cake!
For MySQL, our main concern is disk space and disk IO. With AWS we could easily scale out disk space by resizing EBS volumes for MySQL. For disk IO we could use Provisioned IOPs and increase as we needed.
For redis, our main concern is memory usage and staleness of lookup keys. If we’re seeing that the redis server isn’t holding lookup keys for at least 30 days we increase the size of the server or add more servers with twemproxy or redis cluster. Luckily, this wont grow that fast so scaling this is much more cost efficient.
Finally, I do have to mention that while this solution works really well there is the complexity of managing two databases for the Player endpoints. If either redis or MySQL fails, then all Player endpoints are out for the count. Luckily, we rely on Amazon RDS for our databases so we’ve setup Multi-AZ Deployments for the MySQL database. As for redis, we use Twitter’s twemproxy heavily for many of our redis clusters. So if one redis server in the twemproxy cluster fails, twemproxy will remove it from the list of available redis servers and the app continues humming along. We will lose data from the failed redis server though and we acknowledge and are content with that situation.
I wish I could say this was easy. It wasn’t. This project probably took a year of effort. Plenty of research. Plenty of trials. And a crap ton of errors.