An Introduction To Databases on AWS

Steve Barker
6 min readOct 27, 2019

--

This article is part of a collection of my own informal notes on AWS that might be useful to others.

Amazon offers a range of database services. The AWS relational database service is known as RDS (Relational Database Service), and the no-SQL service is called DynamoDB. AWS also offers Redshift for data warehousing and Elasticache for cache storage.

Filling cabinet.
Photo by Jan Kolar / VUI Designer on Unsplash

RDS

RDS offers various systems for your database. I won’t run through all of them, but they include popular relational solutions such as MySQL, PostgreSQL, and Amazon’s proprietary Aurora.

Two key selling points of RDS are multi-AZ and read-replicas. I’ll start by explaining what they do:

Multi-AZ

‘AZ’ refers to ‘availability zones’, which you can think of as equivalent to Amazon data centres. So by having a copy of your database in multiple AZ, you can recover from an outage easily.

Should one of your databases fail or be subject to maintenance, AWS will automatically route any incoming requests to the failover (backup) database, without you having to do anything; you aren’t required to update the connection details that your application uses.

Multi-AZ is not available for Amazon Aurora — this is because Aurora has it’s own disaster recovery solution built in. See Aurora section below.

Read Replicas

Read replicas improve the performance of your database. A read replica is an asynchonrous copy of your database, which is kept up to date every time some data is changed in the master database.

Read replicas are really useful for taking some of the load away from your master database, by forcing some of the read queries to use the replica. This is an effective technique for scaling a database. You can have up to five read replicas of each database, and each replica can have it’s own replicas, though be aware of latency issues.

Each replica is given it’s own DNS endpoint, and can be in a different region it’s master, and can have multi-AZ availability. If you wish, it is also possible to promote a replica database to be a master database, but this of course will disable replication.

RDS Backups

The RDS service offers two kinds of backup: Automated backups and snapshots.

Automated Backups

Automated backups of your database allow you to recover your database to any specific point in time, within a retention period. You can define your own retention period, between 1 and 35 days. When you set up a new database instance, automated backups are enabled by default.

Each backup includes all the data and logs for that day, which allows you to have a backup that is accurate to the second — as the full day’s data is restored, then the queries stored in the logs are run to the time that you wish to restore to.

Where are automated backups stored? Great question. RDS instances include free storage space in S3 equal to the size of the database, which is used to store the backup.

Backups are taken within a defined window — the backup process might increase latency on the database being read.

Snapshot backups

Database snapshots are manually initiated, and unlike automated backups, are retained after the RDS instance has been terminated.

Restoring backups

When restoring either an automated backup or snapshot, the database is restored as a brand new RDS instance with it’s own DNS endpoint.

How are RDS instances encrypted?

All instances are encrypted at rest using the AWS Key Management Service. All data is encrypted not only in the database, but in backups, snapshots and read replicas.

RDS Reserved Instances

Reserving RDS instances is more cost-effective than paying for them on-demand. It doesn’t change the way RDS works, and is available with all database engines in all regions. Reserved instances come in one or three year terms, and can be paid for entirely up-front, partially up-front, or month-by-month. The greater the up-front payment the lower the overall cost.

DynamoDB

DynamoDB is Amazon’s managed no-SQL offering, designed for applications that need consistent, super-low latency at any scale.

Dynamo databases are stored on SSD drives, which are replicated across at least three separate physical data centres, providing great redundancy.

Dynamo offers two mains kinds of read model: eventually consistent, and strongly consistent.

If eventually consistent reads are selected (as they are by default), this provides the best read performance, but reads can be expected to be updated 1 second after a write.

If data is required to be read less than one second after writing, strongly consistent reads should be used: this ensures that any reads taking place are reflective of the latest successful write.

Dynamo Pricing Model

You will always pay for data storage, as well as reads and writes. If you stay within one region, there is no data transfer charge for using Dynamo. Other charges will apply for features such as continuous backups.

Redshift

AWS Redshift is a fast, fully-managed, scalable data warehousing service.

Redshift compresses data by column rather than just by row, which gives it a smaller file size relative to traditional relational storage, because each column is likely to have a lot of duplicated data.

Redshift doesn’t require materialised views or indexes, which also saves on space. Helpfully, Redshift selects the most appropriate compression technique based on a sample of your data.

Backups in Redshift

Backups are enabled by default with a 1 day retention period, but this can be increased up to 35 days. In addition, Amazon always tries to maintain at least three copies of your data in total — the original and replica, as well as the backup copy mentioned above, which is stored in S3.

When using Redshift you will be billed for compute node hours, backups and data transfer within a VPC.

Redshift Security

Data in Redshift is encrypted both in transit and at rest; key management is managed by Amazon by default, but keys can be self-managed.

Amazon Aurora — In Depth

Amazon’s Aurora database solution claims to offer the speed and availability of commercial databases, with the simplicity and low-cost of open source. It claims up to five times better performance than MySQL, at a low cost — all at high availability.

Aurora is compatible with MySQL as well as PostgreSQL.

What is resources are available with Aurora?

Aurora instances start at 10GB, and automatically scale in 10GB increments up to 64TB. Compute resources can scale up to 32vCPUs and 244GB of memory.

All data stored in Aurora is stored in a minimum of three availability zones, each with two copies (6 copies total). Aurora is designed to be able to lose two copies without affecting write availability, and three copies without affecting read availability.

The underlying storage in Aurora is automatically scanned for errors and repaired.

Aurora Read Replicas

Aurora databases can be replicated with MySQL or Aurora, each option has it’s own attributes.

MySQL replicas

  • Number of replicas: 5.
  • Replication type: asynchronous (seconds)
  • Performance impact on memory: high.
  • Automated failover: no.
  • Act as failover target: yes (with some temporary data loss)
  • Supports user-defined replication delay: true.
  • Support for schema that differs from primary: true.

Aurora replicas

  • Number of replicas: 15.
  • Replication type: asynchronous (milliseconds)
  • Performance impact on memory: low.
  • Automated failover: yes.
  • Act as failover target: yes (without data loss)
  • Supports user-defined replication delay: false.
  • Support for schema that differs from primary: false.

Aurora Backups

Automated Aurora backups are always enabled and do not impact on the performance of the database. Snapshots are also available and do not impact performance. Snapshots can be shared with other AWS accounts.

Elasticache

What is Elasticache? It’s a service that makes it easy to deploy, operate, and scale an in-memory cache. It improves the performance of applications by allowing you to retrieve data from in-memory caches, instead of slower disk-based databases.

Elasticache is compatible with Memcached and Redis. To summarise, Memcached is a simpler solution, but with a more limit feature set than Redis.

--

--