Embracing Serverless — Part II
Inspired by S3, Going S3rverless
In this second-part blog series about the journey of adopting serverless architecture, Fajrin will take over the baton from Salvian’s benefits of serverless architecture to talk about the options for serverless databases that promise the experience pioneered by Amazon S3. How do they fare? Let’s dive in to find out.
Fajrin Azis is Software Engineer in the Backend Infra team, whose responsibilities, among others, include maintaining best standard practices of backend development.
It’s funny that we often tend to take the most sophisticated technologies for granted. For example, we are curious about cryptocurrencies and NFT. But how about something old-fashioned, let’s say, concrete? Yes, the concrete that builds pretty much every building around you? Concrete has existed since ancient times. At one point, Roman concrete was invented and it triggered the concrete revolution. Then… voila, there were coliseums, baths, aqueducts, roads, and an advanced civilization built on concrete!
What the Roman concrete unlocked was the capability to build non-right angle structures such as an arch, vault, and dome. It’s easy to imagine how much harder rocks and bricks are, compared to concrete, which is just paste that hardens over time and yet, surprisingly to many if not all, it is the second most used resource in the world after water.
I believe there are also a lot of similar stories in the whole span of software engineering history; a simple but, in hindsight, breakthrough idea. This time, I would like to tell you a little bit about AWS Simple Storage Service (S3).
S3 is an object storage service; you store and retrieve objects (files) to and from a certain path (bucket name + directory). You cannot modify the contents of an object. You can only replace it with another object. GCP and Azure also have their own object storage service: Google Cloud Storage and Blob Storage respectively.
S3 kickstarted a revolution in 2006 and probably will do another one in the near future. Here’s the story, be prepared!
Traveloka maintains various lifestyle business units, three of which are the nine-year-old Flight, the five-year-old Xperience, and the one-year-old Eats. You can imagine each business unit as a distinct start-up that explores different unknown territories.
Exploration is expensive because not only do we have to pre-guesstimate almost everything — traffic, user behavior, time-to-launch product or promotional event, growth, and risks — until sufficient data is gathered and analyzed (also leads us to a deeper unknown), but also the infrastructure size that will serve our products to customers through all the turbulence in the journey.
Thanks to cloud services’ flexibility, infrastructure cost for start-up is significantly lower compared to what it was 20 years ago.
In Traveloka for example, engineers can scale the infrastructure up X times on demand just before major events such as the Epic Sales, and reverted it post-event. Eats’ production environment can also be set up in just days instead of weeks and Cinema was frozen during COVID-19, then “defrosted” in October 2021. In addition, clusters set with auto-scaling can have different scaling policies for day and night traffic. And to maximize the team autonomy, we apply multi-account organization as well.
Offering those flexibilities was how cloud services that we know today — started by S3 and EC2 — revolutionized the way people build internet companies. Imagine wanting to start an internet company in the early 2000, but you, as well as everyone else on the same boat, were not sure how much to spend on all the hardwares. Until S3 and EC2 were invented in 2006. Then… voila! Pinterest, AirBnb, Netflix, and all the dot-com companies that we know to this day came to be, including Traveloka!
Despite the merits, operating the cloud itself was not an easy task. The nine-year old Flight and the one-year old Eats each have their own set of engineers with varying degrees of expertise and bandwidth, and a set of challenges. For Flight, it is how to establish a reliable service for traffic from Asia-Pacific users. For Eats, it is how to scale the business up exponentially. There are load testings, migrations, refactors, firefightings, adjustments, etc.
Recently, as the number of Traveloka’s products keep increasing (to over 20 today), while still regularly going through all the safety drills to keep the system running, I realized that I never ever have to manage S3. It just works all the time. From storing Traveloka’s invoices and tickets, web assets, static websites, backups, logs, to data for analysis.
If we take a closer look, S3 does not just adapt to our traffic automatically, S3 is also highly efficient: S3 charges based on stored data and the number of operations; no wasted resources!
All the complex operational procedures and firefighting mentioned before were done diligently because I thought we had to! Operational cost — I mean, the cost to adjust our infrastructure given any situation — is the price for keeping the system/product reliable with reasonable infrastructure cost; no shortcut.
S3, and similar storage services, show us that operational tasks could be completely outsourced. Combining elasticity and pay-per-usage pricing model brings us the exact benefits of maintaining the resource by ourselves. For me, this is why S3 has Simple in its name.
It’s hard to imagine an idea that is more Simple. We can call that idea serverless.
Emergence of Simple S3rverless Databases and Joining the Bandwagon
You may disagree with me and believe AWS Lambda coined the serverless term. Or serverless is not revolutionary in the first place. Or it’s just another term for XaaS that was already popular before serverless was a buzzword. Or S3 was not the pioneer of the pay-per-usage pricing model. That’s fine (well, Lambda was created to handle S3 events).
Regardless of how important S3’s influence in the serverless revolution actually has been, which I think is very important, there are a lot of S3-like serverless databases (s3rverless? I stole it from this article) available in the market today that has emerged in the last five years. If we realize how awesome it is to work with S3, why don’t we use similar services?
Serverless computing and serverless databases are independent of each other. You could bridge Lambda and RDS Postgres/MySQL via RDS Proxy, or use the Query API to call ElastiCache cluster from Lambda. I’m not saying that you can easily replace all your databases to the serverless equivalent, which we’ll get to later. But, I think we can start considering serverless databases adoption regardless of our view of serverless computing.
Let’s start with clearly defining what a serverless database is, which I’ve described as Simple in the previous section. The properties below are compiled from various sources:
- (Near-)zero operational cost: no need to do any kind of operation for patching, backup, scaling, etc. However, you still have to take care of your data modeling and queries.
- Elastic, both the storage and the compute: elastic means adaptive to traffic; not only it scales up at spikes or busy hours, but it also scales down when the traffic is low.
- High availability and durability: serverless databases usually promise five 9s (99.999%) uptime. S3 has eleven 9s. Your data is also replicated to multiple physical servers across different regions.
- Pay-per-usage: You pay as you use; charged based on stored data, number of queries, and query complexity/latency. The cost is not based on internal server usages (e.g. uptime of provisioned server, CPU and memory usages, spikes, etc) and can be easily calculated and forecasted.
- Provisioned setting (optional, not available in S3): your business is probably running really well so hiring experts could be cheaper than using serverless.
Here are some serverless database examples you can try on:
- RDBMS: AWS Aurora (and AWS Aurora Serverless V2 for serverless compute) or CockroachDB Serverless.
- NoSQL: MongoDB Atlas (MongoDB Atlas Serverless is in preview), AWS DynamoDB, DataStax Serverless, or Firebase.
- Caches: Redis Enterprise or Upstash.
- …and many more!
The serverless database examples above, in general, have all of the properties aforementioned to some degree. There are also limits and extra configurations here and there. Ultimately, you must decide carefully by learning each option in detail and consulting your requirements.
We, Traveloka, decided to be an early adopter of serverless databases (see Figure 1 above). For us, unless there is a specific need, less operational cost is always better as it lets the teams focus more on feature development and delivery, while developing various business units. It also reduces the number of incidents and infrastructure costs. Ultimately, improvements from all teams strengthen Traveloka’s identity as a super-app.
I, alongside some peers, incorporated DynamoDB into the system when we built Traveloka Eats. After we understood (1) how the business was expected to grow and (2) how complex the system would be, we hypothesized that DynamoDB could be a perfect fit. We created a single DynamoDB table to handle delivery transactions, while other parts of the system are using Postgres.
It was a hit. While there were small incidents caused by wrong guesstimation across the system, parts that interfaced with the DynamoDB (so far) didn’t get any. The cost is also reasonable; Traveloka Eats traffic has spikes at busy hours and is extremely small at night.
Where the Operational Cost Actually Goes
When we talk about serverless’ disadvantages, it’s usually about vendor locking and expensive baseline. Both of them are valid issues. Please discuss them with your business stakeholders.
I think there is one topic that needs more attention: using serverless databases means working with distributed data storage. There’s no way serverless databases could achieve all of their expected qualities by relying solely on a single master architecture. Your data is distributed across hundreds or thousands of physical storages.
Adapting to the “work with distributed data storage” mindset is probably the highest cost of adopting serverless databases because, at least in my experience, we are all already used to single master databases.
Traveloka also faces the same problem. So far, the system is mainly using Postgres and MongoDB as databases, with 1 master and X read replicas. We plan to adopt more DynamoDB and Aurora Serverless. But, we have to do it slowly and incrementally.
Let’s elaborate on the issues :
- Distributed architecture makes familiar features unavailable or more expensive than usual.
There’s no transaction in S3. There’s no Join and any kind of aggregation in DynamoDB, and you are charged more for transactions than writing items one by one.
AWS Aurora, even though it is functionally compatible with Postgres/MySQL, it is still a proprietary fork of those open-source solutions on top of a distributed storage. You should not think everything will run the same. Please discuss with your AWS customer support. Please test in something that resembles production data and traffic (or in production itself).
In Traveloka, there are two teams that are using Aurora in production; my team is working on adding more clusters across more teams. Rather than migrating all the remaining RDS Postgres clusters to Aurora in a go, we chose to do it conservatively.
- NoSQL serverless databases introduce data modeling and query mechanisms that suit distributed data more.
The new concept is usually a little bit hard to grasp and could lead to some issues: slower development process, bad table design, harder knowledge transfer (e.g. to a new joiner), etc.
For example, DynamoDB has the single-table design practice. In Traveloka, even though Eats has piloted the practice successfully, so far that was not the case with other teams that I’ve been helping mainly because the requirements are always changing, so the Access Patterns become obsolete fast, and it’s hard for engineers with no DynamoDB experience to adapt their design smoothly.
It was the same thing with MongoDB. I observed, our engineers gradually turned the document design into “SQL tables”. The documents were normalized, there was “Join” done on the application server, and of course it was not effective. We did a couple of migrations from MongoDB to Postgres to stop further bad implementations.
- Using distributed databases means embracing eventual consistency.
Data consistency delay is now a part of your requirements; it could be negligible or it could be severe (e.g. probably in a banking system). If your database could work better asynchronously, at some point, you also have to change the way you architect and implement your system.
The majority of Traveloka’s business flow is naturally asynchronous. For example, in Eats, the system waits for the restaurant to accept orders, waitsfor the driver to accept the assignment, waits for the user to confirm delivery, etc. 1–2 seconds delay between those events are practically not important. We built the system accordingly, with a lot of message-passing, non-blocking communication.
S3 is a magnificent work. It revolutionized how we build an Internet company. First, by becoming the earliest building block of the concept of cloud infrastructure. Then, by being the epitome of serverless.
While S3 is so easy to use, S3-like databases are more challenging to adopt. This comes from how different it is, being a distributed data storage, compared to the single-master RDBMS that we are all familiar with. I think the challenge is worth overcoming; we demand more “zero operational cost”. The trend is geared towards more s3rverless.
Do you think this topic is interesting? A trivia: did you know that S3 stored 100 trillion (yes, 14 zeroes) data in 2021 globally? Let’s have an exciting discussion then! Lucky for you, Traveloka is also hiring.