The Architecture of Amazon’s Aurora Database and Why It Is better

Meg's tech corner
The Startup
Published in
8 min readJul 22, 2020

Abstract

Aurora Database is Amazon’s cloud-native database. It can hold up to 64TB of data and is much faster than MySQL database. Many companies have adopted Aurora Database.

In this article, we will introduce the architecture of Amazons’ Aurora Database. We will start with the architecture of a traditional relational database system, such as MySQL database and discuss their limitations. We will then discuss how Aurora Database extends the functionalities of a traditional database to improve the availability, reliability and scalability.

If you are interested in the architecture of DynamoDB, another featured database from AWS, you can read it from my other blog [Click here].

Architecture of a traditional database

A database is a collection of data, typically describing entities and activities of an organization. An e-commerce database, for example, may contain information about the customers, products and sales.

A database management system (DBMS) is designed to maintain and utilize those data. Specifically, a DBMS normally provides the following functionalities.

  • A reliable storage of the data.
  • A simple interface to create, update, remove and query the data.
  • Transaction support.

A DBMS normally consists of Query Evaluation Engine, Transaction Manager, File and Access Methods, Buffer Manager and Disk Space Manager, as illustrated in the following figure. We will use two simple examples to illustrate how different components work together.

Architecture of a traditional DBMS
  • Example 1: Write To Database

Imagine a case that we want to insert two rows into a database table. We would issue the following command to the database:

INSERT INTO task(title, priority) VALUES (“DBMS”, 1), (“Aurora”, 2);

The request is first forwarded to the Query Evaluation Engine. Query Evaluation Engine parses the SQL command and understands that the command is to insert two rows into the task table. 3 layers that are below the engine, File and Access Methods, Buffer Manager and Disk Space Manager work together to persist data into the disk.

Disk structure

DBMS reads data from or writes data into the disk in fixed size, called a page. A page is normally 4KB or 8KB and is a configurable parameter of a DBMS. Each page is associated with a unique id, page id. The data inside each database table are stored in a few pages. To make search faster, data in different pages are sorted. Page 1 for example contains the rows with keys in the range [a, c]. Page 2 is for keys in the range [d, f]. Beside the pages, a DBMS stores a changelog inside the disk as well. The reason we need a changelog is that some operations need to modify different pages and we can’t flush different pages into the disk atomically. DBMS may crash after the first page is flushed to the disk and leaving the DBMS in an inconsistent state. We therefore first log the change we want to make which can be flushed to the disk atomically. If the DBMS crashes, we can reconstruct the state based on the changelog. See next paragraph for an example.

In our example, the DMBS would identify that we need to insert row (“Aurora”, 1) to page 1 and row (“DBMS”, 2) to page 5. Before modifying the data in page 1 and 5, the DBMS first writes an entry to changelog and flushes it to the disk.

Table: task

Operation: insertion

Data: (“Aurora”, 1), (“DBMS”, 2)

If the DBMS crashes after partially updating the database, Recovery Manager would read the changelog and continue to update the database and make sure both rows are inserted.

After the changelog is flushed to the disk, the DBMS inserts (“Aurora”,1) to page 1 and flushes the page to the disk. It then inserts the row (“DBMS”, 2) to page 5 and flushes it to the disk.

Now the write operation has completed successfully and the DBMS can return to the client application.

  • Example 2: Query Database

The steps to query database is similar. Consider the following command

SELECT * FROM tasks WHERE title=”Apache Flink”;

The Query Execution Engine first parses the command. The DBMS would then identify which page might contain the data. In our case, page 1 could possibly have the data we want. Disk Space Manager then reads that page into memory if it is not present in the buffer. The DBMS then searches through the page for the data and returns the result to the client application.

Limitations of traditional DBMS

Traditional DBMS has worked well for decades and is a critical component of almost all the software applications. However with the emergence of the Cloud and higher requirement on scalability, reliability and availability, traditional DBMS gradually fails to keep up with the expectations.

  • Scalability

A traditional DBMS runs only on a single machine. We can only scale it by using more powerful computers. This approach is expensive and not so scalable. The number of IO operations per second (IOPS) a disk supports can quickly become the bottleneck of the system.

  • Reliability

Though the traditional DBMS employs techniques such as changelogs to make it more reliable, it is not so reliable. If the disk corrupts, the DBMS may lose all the data.

  • Availability

The traditional DBMS is not very available as well. If the machine crashes, the DBMS won’t be able to serve user requests until the machine is fixed and all the DBMS recovery completes.

How Aurora improves a traditional database

Aurora Database is built on top of the traditional DBMS. It reuses the majority of components inside the traditional DBMS, such Query Execution Engine, Transaction Manager and Recovery Manager. (And that’s why we spend some time on the architecture on traditional DBMS in the previous section.) It, however, makes several improvements to the traditional DBMS to improve its availability, reliability and scalability. Those changes are

  • Replicating the data to remote storage
  • Storing only changelog to the remote disk
  • Employing a primary-replica setup

Replication

The first thing Aurora did is to store the data remotely instead of on the local disk. As shown in the figure below, Aurora Database extends the Disk Manager to make it compatible with remote storages.

Remote disk

To improve reliability, Aurora Database replicates the data. It normally replicates the data 6 times in 3 different data centers. It is very unlikely that user data is lost with this number of replications.

Aurora Database uses 1 virtual machine (Amazon EC2) to manage 1 copy of data. The data is stored in the local disk of the EC2 instance. In our case, Aurora Database uses 6 EC2s in 3 different data centers to manage the replicated data.

However one possible issue with this setup is that the disk manager needs to make sure data is successfully sent to 6 different EC2s. The latency can increase if any of the EC2 is slow or busy handling other requests. With 6 instances, it is more likely that one of the instance is slow and therefore the latency is more likely to increase. To deal with this, Aurora database only requires receiving responses from a subset of EC2s before it returns to the user application. We denote this as Vw number of EC2s. Same for the read operations, Aurora database only fetches data from a subset of EC2s, Vr. As long as Vw + Vr > total number of replications, we can guarantee that read operations will see the results of the previous write operations. As shown in the figure below, it is guaranteed that there is at least one machine that has seen the write and serves the read. Vw =4; Vr = 3. This is the quorum-based replication mechanism used in Aurora Database, interested users can refer to this paper for more information. [Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In SIGMOD 2017]

Quorum based replication

To make the system more efficient, Aurora Database only reads data from 1 replica since it knows which replica has the data. These are the EC2 instances from which it has received the response.

  • Changelog as the data

Aurora Database takes one more step to make the system even more efficient. It only stores the changelog to the remote storage. In our write example, Aurora Database only saves the changelog to 6 EC2 instances. As shown in the figure below, when EC2 instance receives a request to persist a changelog, it first saves it to the changelog in the disk. Then it applies the changelog to the pages. This can greatly save network bandwidth.

For the read operations, it behaves the same as the traditional DBMS. If the page is not available in the buffer, disk manager sends a request to the EC2 and the EC2 returns the page to the disk manager.

  • Primary-replica configuration

So far, Aurora Database has made the data more reliable. However the performance of the database is still limited by a single machine.

To make the system more scalable, Aurora database supports a primary-replica configuration, as shown in the figure below. [The figure below is from Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In SIGMOD 2017]

Primary-replica configuration

The primary instance can serve both the read and write requests. The changelog is sent to all 6 EC2 instances in 3 different data centers.

Replica instance serves only read requests. When there is an update to a page, the primary instance sends a notification to replica instance, informing it that page is stale. If that page is inside replica instance’s buffer, it will evict the page from the buffer. When replica instance receives a read request, it will send a request to the EC2 instance to fetch the page if that page is not present in the buffer.

With remote storage, Aurora Database supports multiple instances running in parallel without the need of much coordination between them. With primary-replica setup, Aurora database is able to support a higher throughput than the traditional DBMS.

--

--