How to Scale a 100% Load Web App
Last week I have published a side project called Wurstify. It’s a Chrome extension which adds beards to all faces on the Internet. More specifically, the Conchita Wurst beard. The core algorithm has been implemented by David Fankhauser, I have developed the web proxy and the Chrome extension.
You can have a look at it here, it’s still online. And the only reason for that is Cloud Scalability (buzzword, I know). Let me explain…
Wurstify works as follows: The Chrome extension intercepts all image requests before the browser even sends them off and redirects them to our server (rest assured, we use SSL whenever the original request used SSL, and we do not log any IPs, URLs or traffic). Our server downloads the image, runs a face detection algorithm on it, does some heavy lifting with texture transformation/mapping and returns the resulting wurstified image to the browser.
When we developed this as a first prototype, we were surprised that it worked at all. It took about 2-3 seconds for an average page to load. That might not be a good loading time for a regular web page, but considering the amount of indirections we were pretty happy with it.
So we installed the extension on another Chrome and both surfed around on the web. Now it took about 4–5 seconds for all images to load, on each browser. If we added another Chrome the load times would probably have been around 7–8 seconds I guess. Can you see a pattern here?
The problem: Downloading each and every image, running the algorithm and then sending it off to the user basically means 100% IO load, followed by 100% CPU load, followed by 100% IO load again¹. For one image. A typical web page has somewhere between 20 and 60 images. And this is just for one concurrent user on one page.
“That thing is definitely not scalable.” — Me, last week
Anyway, we bootstrapped a simple landing page and put it online — what could possibly go wrong?
Long story short, it took us 3 days to be featured on Product Hunt, another day to get some blogs and news media write about it and the next day the team around Conchita Wurst (!) called us on the phone to tell us how awesome Wurstify is and how they are having lots of fun with it. Conchita Wurst herself tweeted about Wurstify another 2 days later. Hundreds of tweets on twitter, thousand of installations and tens of thousands of visitors — within one week. Now that escalated quickly.
But wait, how is it even possible that Wurstify is still online, with ~500 concurrent users, good response times, and receiving 5-star ratings on the Chrome Web Store? Because that definitely not scalable thing suddenly turned out to be very scalable actually!
When we noticed that the load was going up, we just added a second server. Three minutes, boom, load was down again. When we noticed that we were ranked #8 on Product Hunt we added another 3 servers. Five minutes, boom, load back down again. When Conchita Wurst tweeted about us and we wanted to be prepared to serve her 130 000 followers, we added another 15 servers. Few minutes later, and we were done. (That last one turned out to be a little overprovisioned, but hey, never underestimate fandom.)
In the last few days we have been busy reading and responding to users, finding out where our visitors were coming from, see who wrote about us and monitoring the server load and if everything works as it should. And while this whole project is just there for fun and we are not depending on it in any way, we were very happy that the following things were not on our ToDo list last week:
- Connecting to 100% busy servers to diagnose load issues
- Phoning someone at the data centre
- Frantically comparing server hosting plans
- Manually setting up one new server after the other
How did we manage to turn those things down?
I know, there are lots of good PaaS providers out there, probably handling some of the following things as well (or even better). However, we decided to go the bare-bones IaaS-way instead. We are hosting at DigitalOcean, which allows us to add a new server in 55 seconds. We wrote a bash script, which would set up and provision the server once it was up and running, which took another ~3 minutes. The bash script together with some binaries formed a self-contained installation package, which would set up any Ubuntu server to be running as a Wurstify proxy, and finally print out some commands which we would then copy over to the load balancer to add the server to the cluster.
To be able to keep an eye on the current server load we also wrote some bash scripts, grep’ing the server logs and counting the requests.
What our strategy still misses:
- Automatically ordering new servers when the load goes high (and shutting down existing ones on low load)
- Batch management of many existing servers (once you have a certain amount of servers everything becomes a very repetive task otherwise), e.g. using chef
- More sophisticated load monitoring strategies, e.g. using DataDog
- Centralized logging
Most of these things are probably automatically taken care of with PaaS-providers.
What did we learn from all of this?
“Wurstify is definitely scalable.” — Me, today
Wurstify scales because it’s totally stateless. There is no database and no user session. We have added a memcached server a few days in to improve performance but that’s as far as it goes with state. And stateless things scale very easily: Just add another server behind a load balancer.
Mostly for testing purposes we have also deployed a second load balancer, using DNS Round Robin — which Just Works™. Also, some requests are handled entirely by a CDN (CloudFlare) and don’t even hit our servers.
When scaling Wurstify the only actual bottleneck is the budget. Other than a commercially-aimed startup which needs to be able to scale quickly, we run our servers for the sake of making some peoples days and don’t have any plans of monetizing the traffic, so Wurstify is limited to what we are willing to invest in it (so far, as a learning experience regarding scalability it has been worth every cent).
Wurstify started off as an experiment — the last few days have been very exciting and we are thankful for all the great feedback and interest from beard-spirited folks around the world. Right now we are looking into ways to keep the project alive for the longer term. (Like, not running a whole cluster of servers in a data centre just to add beards to some photos on the web. Maybe we can move more code to the client-side? If you have any ideas, let us know!)
And now I am going to leave you with some Wurst Beards. Enjoy!