The Move to Serverless… Digital Ocean, Heroku, AWS, OH MY!
I have vowed to try and write more about what I am doing in the technology space from day to day, but this is actually something that we have been experimenting with at 38th Street Studios going awhile back now and I’m just documenting it now. This idea is around the new craze and buzz surrounding “serverless computing”, “serverless architecture”, “the cloud”, etc.
For the longest time, we had always used Digital Ocean for droplets that we spun up and hosted everything from marketing websites to web apps on. It was great for learning, plus you had complete control over everything That was great, however, a downside was… you had complete control of everything…. For example, as some of the projects we made began to get more traffic and scale, our droplet was getting pounded by more traffic than it could reasonably handle, not to mention having to take the time to understand and debug items like nginx and firewalls. To scale, we set up a droplet to act as a load balancer and distribute the requests it received to other droplets that had replicas of the same code base on it, which all talked to a database that we had to have a failover and automatic mirroring on. At the end of the day, our bill was high, we were debugging dev ops, and doing everything except making progress coding.
This is when we found out about Heroku. Ah yes, I remember the day when I read their homepage for the first time and all of my dev ops concerns were all of a sudden lifted off my shoulders with their free package. If we wanted to upgrade, it was only $7 a month per app! No more were the days of writing automation code to replicate code across servers, debugging ssh keys not being up to date on new servers, and setting up failovers ourselves. You just simply specify your launch command for your framework to start the server, include a requirements.txt file for python projects, push to your normal github repo, and deploy from github. It seemed so magical, with autoscaling servers, easy add ons (many of which are free) all within a single button click. However, still with them taking care and optimizing all of the dev ops for our apps. It was cheaper, but we were still paying for servers during non peak times and provisioning new autoscale servers was cheaper on time and money, but still more money than we wanted to be paying. Then came the era of acceptance of AWS and a serverless architecture.
I want to note real fast, that we still use Heroku for certain things. It is really nice for having all of your operations and items in a single, fully managed dashboard. AWS does have a bit of a learning curve to it, so if you have someone with experience, or you want to gain experience yourself, it can be well well worth it. Just be prepared to bang your head against a wall for about 8 hours a week for your first month, but as technologists that’s sort of our thing so you are semi prepared already.
AWS came on to the scene, revolutionizing the cloud computing game with their EC2 instances. You could spin up servers with a ton of flexibility and turn them off and on as you pleased. They also introduced Elastic Beanstalk, which allowed you to think about your app as a project and it took care of spinning up load balancers and autoscaling groups with EC2 instances. This is great because you get a similar idea like with Heroku (I still think Heroku is actually easier to work with than Elastic Beanstalk) while also having access to customize every component of the dev ops to your liking. However, something truly magical happened a couple of years later….. Lambda was born, and now introduce the realm of “serverless” computing.
AWS Lambda is what is known as Functions as a Service (FaaS) and unlike an EC2 instance where you are paying whether your server is idle or is doing processing, with Lambda, you only pay for the compute time you use. One more time, YOU ONLY PAY FOR THE COMPUTE TIME YOU USE. Being a data scientist, this sounds amazing because now instead of using something like Flask on a normal server to serve different types of models, I just put it in Lambda and ping it via API Gateway and I cut my operation cost in a fraction! Woah woah woah, API Gateway…. What’s that???
So before I get into API Gateway, let me briefly mention how AWS is structured. AWS can be thought of as a suite of different dev ops microservices that each have a unique name, which can all be connected to one another easily. In your AWS account for example, you can get access to DNS setup via Route53, unstructured storage with S3, a simple virtual server with EC2, a load balancer with Elastic Load Balancer, and many, many more. Go in the console and look at all of the services. Before long, you will forget the first half of the list before you can finish the second half.
Now that we know the basics of AWS, API Gateway is simply what its name states. It can be configured to accept http requests along with provisioning different api key credentials, set up paid api plans, easily do rate limiting, etc. The main feature that we have been using is triggering lambda functions and returning the responses.
So now that we briefly talked about some components that can be used in a serverless setup of an application, what is a full example?
The components that we will be talking about at a high level:
- Zappa (API Gateway & Lambda)
This is great if your users hit the index page and then start navigating around your site, however, what if we want to allow users to go directly to a certain page that isn’t the index? Introducing CloudFront!
CloudFront serves a couple really great purposes, but it is not completely intuitive right away on how to use it appropriately. CloudFront is a distribution network that allows you to cache your files on servers closer to your audience all around the world. What this means for you is that your site will load faster in addition to another concept that blew my mind, IT’S CHEAPER THAN JUST USING S3!!! By using this, you also can catch and redirect certain errors as well. For example if someone goes to yoursite.com/page2, you can add an Error Page that redirects them to your index.html, but it will then render your second page instead of the index. At first it feels sort of hacky, but it works perfect.
Great so now that we can serve our single page app, without a server, what do we do about a database? Depending on the size and use case of your project, we have found Firebase to be an incredible tool for various projects.
Firebase is one of the quickest ways to get web and mobile apps up and running. They were acquired by Google a couple of years back and revamped a ton of their system to make it extremely fast to get up and running. They do this by already having some prebuilt items such as Authentication, linking of different auth accounts like Google, Twitter, etc., a robust api wrapper for numerous languages, and a scalable data solution that is lightening fast and REALTIME. For example, if you are building a transportation application and you want to monitor the movement of vehicles around a city, simply have the updates of the locations saved up to firebase via mobile app and if you are loading the data on a map for example, you’ll get updates pushed to your page automatically via their api. That means no configuration of webhooks, websockets, or apis and you have what would be considered a difficult engineering problem if done by scratch up and running in no time at all. Firebase is great for prototypes as well as web apps that don’t need massive scale because you get a ton of usage without ever leaving the free tier.
Another component to make this all work is Route 53, Amazon’s DNS service.
Route 53 is just a DNS service so you can really use anything you want to for this. We just chose it so that we could have all of our components in one place as well as have nice linking between everything since you can choose Route 53 domains you have set up easily from CloudFront. Also, you can get free SSL certificates, which is always nice as opposed to the archaic services still charging outrageous amounts for SSL like it’s still the 90's.
At this point, we have a pretty great stack in terms of extremely low cost and great performance with a modern feel. Even though there are a lot of articles out there about how front end programming is starting to become more and more full stack programming as compared to the php days, you will still have many tasks that you wish to perform server side. Right now everything is being done by our infrastructure components or by our front end React single page app and Firebase. To accomplish stuff server side in a “serverless” setup, we chose to use a package called Zappa for Python.
Zappa is an amazing package that allows you to write server side code like you are writing a Flask or Django application, but instead of using the web servers that come with these web frameworks, it takes the code in exact form and packages it up to be sent to the cloud for “serverless” compute. What I mean here is that it uploads and configures API Gateway and Lambda on your aws account with a simple 1 line command from the terminal. It also provides extremely simple configurations for scheduling lambda functions to fire on certain time schedules via CloudFront and storing extra files in S3.
We use Zappa to move code into production for doing things such as large batch processing jobs that happen overnight in some cases as well as building various microservices to power our applications. It also allows for extremely easy versioning and deployment of scalable models with api endpoints.
Wrapping Things Up
You will always want to pick the infrastructure that works for your use case and application. Hopefully some of the different options I have laid out here will help to inform various routes to go when trying to make those decisions as they all have their pros and cons. They all allow for learning different skills as well so if your on the path to become a unicorn / jack of all trades, try them all and more! Like I mentioned earlier, we still use a mix of these different solutions as well as mix and match different offerings of aws in various ways every day at 38th Street Studios.
One thing I didn’t go into much detail about in this post was database selection. That is for another time as I intend to discuss different options when it comes to applications and analytical database processing covering OLTP and OLAP options :)
Also, I’m a huge advocate of Docker for different deployment options as well, which I’ll hopefully be covering in another post too!