50,000 new users in day; How we built a product from scratch to handle scale.

Published in

Soundwave Stories

8 min readSep 7, 2015

The 18th of September, 2013, started off like any other day. Soundwave was now live for two months. Technically we had a flawless launch. Of course there were UI bugs and features that needed improving but from an architecture point of view it was a home run. We knew iOS7 was going live that day and we were ready. We were so confident in Apple’s new OS that we had rebuilt our iOS app solely for iOS7,dropping iOS6 fully in the process. Ballzy move. iOS7’s UI was such a drastic change we placed a bet on a massive early uptake. The bet paid off (thanks @bboyle18).

Excited by the release of our new iOS7 build we all had gathered in the office. Apples release was in the evening (Irish time) so work had ceased for the day. We had our Data Dog dashboards (@datadog) up and running as we patiently waited.

Right on time, Apple being punctual as ever, we began to see movement on our realtime graphs. One of the benefits of putting time into internals metrics development (thanks Dave @DavefromDublin and Clodagh @ClodaRgh). Users were starting to update to our new app. Unexpectantly we began to see a lift in registrations. We put it down to extra activity in the App store due to users upgrading their OS. We thought it would level off . We were wrong. The registrations kept coming.

It was then we spotted the reason for the surge in activations. Apple had placed us smack bang on the front of the app store. We flicked through the different stores; Ireland, UK, America, Japan, Germany. We were featured in all on what I can only assume was one of the busiest days ever on the App Store.

“I think iOS 7 is the biggest day in technology ever. “There’s never been another day like this in the history of the universe where hundreds of millions of people will see a big change to something that they’re used to” — Phil Libin, CEO, Evernote

We had the honour of being placed in the “Designed for iOS 7” banner beside Run Keeper and Evernote as well as their “Best New Apps” section of the App Store.

Soundwave Appstore “Designed for iOS7 feature”

Over a 24 hour period more than 50,000 new users had registered for Soundwave. This usage was on top of an already active user base. We were handling a couple of hundred API calls per second. Not too shabby. To handle this load, we scaled horizontally, adding extra instances to our Amazon Web Services (AWS) infrastructure but that was all we needed to do. One of the scaling decisions Dave had made was not to depend on autoscaling. This put more dependancy on us to scale when required but gave us more visibility on performance. Most prefer to have autoscaling on from the start but I still stand by our decision. Handling that scale from a standing start is no mean feat and we did it with ease. Once the numbers started to settle we were able to scale vertically by adding the correct “bigger” boxes to handle the activity. Some may think the time has come where web services like AWS have taken the complexity out of scaling (to the size we cater for) but they are wrong. The complexities and nuances of building something to scale are still very much there. This article does not get into the hidden complexities like idempotent updates, relaxing ACID, multi-threaded design, distributed queues. If you are after these details checkout our articles on metrics, our metrics engine and our map. These tend to be the stories in a sprint that seem to be inconsequential and underwhelming to the success of your product whereas in fact the knowledgable out there know they are the stories that will keep your ship afloat. Dave championed these stories and had us ready when they were required. Here I will outline the three key pillars as I see them to conquering scale:

The Team;
The Process;
The Architecture.

The Team

Before we put together our engineering team we surrounded ourselves with tech advisors. People who had been there before and had the battle scars to prove it. Our key advisor was Paddy Benson (@pbenson), skilled in building tech teams and scaling product. We leaned heavily on his experience in building the right team. We had decided on the team size and the skillsets we needed. One of the key mindset changes was understanding that we were building a “Web Service” with an Android and iOS frontend (still native of course) rather than an Android and iOS app. The difference being that the key feature being built was the interaction between client (app) and server rather than features within the client app. This thinking focused our search when hiring the right team. We were not looking for Android or iOS engineers we were looking for full stack engineers with skills in Android, iOS and architecture.

We now had our team. Three full stack engineers (@davefromdublin, @bboyle18 and @Dockheas23). Dave highly skilled and experienced in architecture, Brian in iOS and George gifted at anything he touched that fancied the Android challenge. Although our engineers were full stack they were leaders in their field. They each had the autonomy to build their products how they wanted as long as it linked into the overall setup. Dave led all decisions on scale and reliability. If it fell over we would be looking to him. If it scaled it would be because of him. Brian built the iOS7 app and made sure the client adhered to the setup. His high quality builds got us featured multiple times by Apple. Check out his article here. George took the lead on what is now our ‘Editors Choice’ Android app. The Michelin star of Android development. Over the years we have continued to expand our engineering team with the same methodology, language agnostic full stack engineers. Take Clodagh (@ClodaRgh)for example. She has lead iOS development, our metrics engine, API and database designs, security decisions, and tackled problems from smoothing music playlists to iOS performance tuning to tough distributed concurrency bugs. This has allowed the team to grow, allowing no one to get siloed as the “only one that knows” and if we were to lose an engineer, multiple others were ready to take their seat (Bus Syndrome)

The Process

Being a small team it is very easy to fall into the trap of starting to build too early. Although we were always tinkering with prototypes and demos we didn’t begin to build our MVP until our development process was in place. Our engineers were experienced and we utilised this. We brought in, and embraced, Jira, Bitbucket, two week sprints, demos, estimates, code reviews, testing and later on continuous integration. As everyone says “our own form of Scrum”. As opposed to the others that don’t fully understand Scrum. We knew the rules and broke the ones we chose. We were ready to build.

We knew the ability to scale was going to be essential to the success of Soundwave. We also knew there would be no scale without the right product. These were our initial goals. Build a product that could scale but also build the right product. The first thing you learn in the start-up / product world is build lean. Excellent. We all now know to build your Minimal Viable Product (MVP) and get your learnings. What most people forget to mention is the unintentional foundations, that more than often are laid, when building your MVP. Architecture decisions when building your MVP are difficult to change; if not for technical reasons then due to time constraints. How our product was going to scale was almost certainly going to be consistent “irrelevant” of product changes. This was the section of the product we were going to build right first time.

The server needed to be able to communicate with the clients, ingest the data (asynchronous and synchronous) and make data available to the clients. Infrastructure and languages were discussed and locked down. For us AWS, MongoDB, Java, JSON, and native Android / iOS were our spine. Of course infrastructure could change vertically or horizontally and other languages were used for certain tasks but the core remained the same. Our foundation was set.

To learn what product we needed to build we worked lean. “Keep it simple stupid”. Before we launched we had over 800 beta users on our Android and iOS apps (Don’t tell Apple!). Building features, changing features, dumping features all happened over a three months period. None of this took from the stability we were building on the backend. We were ready to scale.

The Architecture

A picture paints a thousand words. Rather than show you our architecture independent of other touch points and items that take up our mental resources we put it all together in one diagram.

The key to our scaling capabilities lies in the skills of our team, Dave leading the scaling conversations with the others keeping him on his toes. Every move was discussed at length. We built strong relationships with the AWS (Declan Kavanagh)and MongoDB (@samuel_weaver, @MongoDB)teams, both were based in Dublin, Ireland. We learned from their other customers and used their teams to review what we had built. Both teams were amazingly open to conversations and more than happy to help. We repaid them wherever possible, usually by presenting at their conferences or being part of one of their panels.

Here is a quick (high level) explanation of the key components of our architecture:

ELB: The Elastic Load Balancer (ELB) controls all the requests coming in from our Android and iOS apps. If we add more machines to handle high loads the ELB makes sure the requests where shared across these machines.

Application Servers: The Application servers are the heart of our architecture. Written in Java and utilising Tomcat the Application layer ingests the data synchronously to our datastore MongoDB and asynchronously to our SQS queues.

SQS: Our SQS queues feed our roadies with data. They queue up any asynchronous work needed to be done by a roadie. If the roadies are under pressure it allows them breathing room by backing up the work.

Roadies: Our roadies are our workers. They are mainly used to populate data. Our primary roadie (play roadie) returns images, ISRCs, 30 second clips, etc for each song play captured. We also use the roadies for sending out notifications and for Business Intelligence. Essentially anything that can be done asynchronously is done by the roadies

MongoDB: MongoDB is our DBMS. Chosen for its geo features ( blogpost), its non relational structure and its ability to scale (once sharded). At the time of the iOS7 release we had yet to shard.

Summary

We were able to scale to 50,000 new users in a day because of our three pillars; team, process and architecture. Employing the Lean mentality is great to find Product Market Fit (PMF) but make sure you know the potential consequences. If you don’t architect your product right from the start know you are building “a house without a foundation”.

50,000 new users in day; How we built a product from scratch to handle scale.

The Team

The Process

The Architecture

Summary

Written by Aidan Sliney