Bootstrap to Billions on AWS
Guest post from Eric Anderson, CTO of CopperEgg
I talk to a lot of startup founders, and many of them ask similar questions. Most of these founders are very technical, very sharp, and very ambitious. Nearly all of them end up choosing Amazon Web Services (AWS) for their business hosting (I really dislike ‘hosting’ as a term for this, since it is so much more than that).
I think we’re all pretty familiar with why people choose AWS for running their new startups: It comes down to the availability of many different services that can serve as building blocks for the technology portion of your company. Things like compute are the most obvious, but the availability of databases (regular SQL, key-value stores like DynamoDB, and more), as well as object stores (like Amazon S3), queues, and scaling can all help accelerate the building of your business.
Worry-free Scaling … or Not?
Even though AWS provides many services and offerings to help bootstrap your new business, this does not mean you can simply forget about the details of scaling, performance, and availability. It is critical to start thinking about growing the business without taking too many shortcuts. You don’t want to get into a situation where you have a successful product-market fit, considerable traction, rapidly accelerating growth, and lots of good will from customers, only to be forced to spend all your development time rebuilding a bad architecture instead of providing the features your new customers are asking for. Going stale at this time will kill your business!
However, you also need to steer clear of the polar opposite: overengineering your business to scale past where it needs to be in the near term.
Finding the Balance
So how exactly do you find the balance between leveraging AWS without over- or underengineering? I like to approach things with Einstein’s idea: “As simple as possible, but no simpler.” Here are some rough steps you can take to help you find the right balance when scaling your business:
1. Learn the available services at AWS. This is critical. Without this, you are likely to fall into the standard engineer trap: Building something that may already be available for cheap that is not part of your core business. Lots of people I know have built tools and services only to later find out there is an existing service that does it better for dirt cheap. So get to know the various options and think about ways you could leverage each one, before you really start building a large back-end system or platform.
2. Whiteboard it. I always like to start with a sketch depicting how I think I should attack a business problem. If I can’t draw it, it’s probably too complex for this stage of the company.
3. Mental-scaling. Once I have a pretty good idea how I am going to solve this business problem, I go through a bunch of mental scaling exercises. This is actually pretty fun, since you are prebreaking your app before you do any real work.
The best way to do this is to think about everything going 1,000 to 10,000 times over what you initially think your company will need. If you can imagine what a customer looks like as they use your service or application, then imagine what it looks like when ten thousand people are using it. Go through the flow of your app and see where you think things are going to get ugly.
The usual culprits are how many requests a server can handle, how much memory your app takes (and as it scales up, does it fill up the system it is running on?), and how much your database can handle. Make note of all the issues you can think of as a potential problem.
4. What if ___? Here’s where you test your ability to leverage the services that exist already, as well as identify areas where you are least knowledgeable. For every little box and line you drew on the whiteboard, think how you scale those out.
Note I said “out” and not “up.” Why? Up means buying bigger, faster systems, disks, etc., whereas out means getting more of those things that may not be faster¾more systems, more disks, more databases. What happens when you go from one Amazon EC2 system to 100? What happens when your database is overloaded and you cannot scale to a larger size? What happens when the network pipe between your EC2 servers and your Redis server is full?
You don’t have to solve them all now, but you need to know about them. Document them somewhere so anyone building code, running systems, etc., can see and reference them. This will be a constant reminder to build software that scales. You should always be asking yourself, “Does my app run okay on 200 systems as well as 1?”
5. Tool chest. What tools do you have in your tool chest to help you as you grow? How do you know when to scale these out and what needs to be scaled?
At CopperEgg, we use a bunch of tools to help us do this. We leverage Amazon’s Elastic Load Balancing for load balancing our app and API servers, even when we start with a single server. We know that we’ll eventually need more, and an elastic load balancer is pretty cheap. We also use autoscaling groups to help us automatically spin up new servers if one dies. You can tie this into existing metrics to automatically spin up additional servers if you need to.
Of course we leverage monitoring as much as possible (yes, we use CopperEgg to monitor CopperEgg’s own service). This gives us a view into how the EC2 systems and Amazon services (like Amazon ElastiCache, Amazon DynamoDB, and Amazon Relational Database Services) all are behaving and whether we are getting close to some limits. We watch for things like iowait (if a server is spending a lot of time on disk or network I/O), steal (if a server’s CPU is being stolen by a neighboring instance), file system usage (ahem—left a log spewing to disk? Oops!), etc.
Seeing this and keeping tabs on it continually will help you scale your business easily without spending time and effort on either rolling your own monitoring or scripting up data to other services. You don’t want to be spending your time on those; you need to be building your new business.
6. Apply what you learn. Now here’s the key: Don’t build everything to scale to 400 million users. In fact, build it for one. However, when you do, consider all these points above and build your software with this in mind, all the time. It will take a tiny fraction more effort to do, but the dividends will pay off big time later on down the line.
You do not want to end up at a period of high growth with massive technical debt to pay off. As with all debt, technical debt has a tax. And that tax is a loss for your business. The goal here is to make solid decisions all through the process of building and growing your startup.
But what if you have already started your company, and you didn’t do those steps? That’s okay! You can do this at any time, and you should probably do it continually anyway, as a reference check, to make sure you have not lost your way.
Obviously many startups are cash strapped, either because they are self-funded or running on some angel cash. Either way, cost is a big concern. Amazon can get expensive if you don’t pay attention to what you are using.
I recommend using spot instances for bursting, development work, and test systems. Keep in mind that spot instances can go away at any time as well, so don’t depend on them for long-lived processes unless your application can handle that. Keep an eye out for the right system for the right job: Right sizing your instance to the application is an important cost-saving technique and not that tricky if you use available tools.
Of course CopperEgg has a few tools you can use to help with performance tuning, sizing, cost, etc. One is an AWS sizing tool: You run CopperEgg on your instances, and it matches the appropriate instance type at AWS to your applications actual usage (including disk I/O and if you should use provision IOPS EBS or not).
Another is a cloud pricing cheat sheet, available here. This helps you find the best price/performance, depending on whether you need high memory, CPU, disk I/O, etc. It also shows the cost savings of using a reserved instance (which you should definitely know about, since it can save you a lot of money) and monthly costs of the instances (in case you are multiplying each hourly charge by 744).
CopperEgg also has some visibility into the costs of your Amazon bill so you can see how your costs have grown and where they are going over time, giving you the visibility to control growing costs.
Onward and Upward
Try these tips to help you get your startup ready for rapid growth and avoid some costly pitfalls along the way. Grow your business in a smart way and it could become the next InstaWhatsAppOculusBook thing.