How fast can one create records in a low-budget Amazon Cloud PostgreSQL?

Vadzim Tashlikovich
6 min readJun 17, 2022

--

In short, it’s really fast. AWS PostgreSQL for $110 per month can create around 370 records per second in a heavy-indexed table suitable for storing banking transactions. In a simple, almost no-indexed table, the speed is 490 rps.

In this article I will share the details of my experiment. I will show you what test stand I used and the exact table configuration I created for the tests. Also, you will know how quickly DB queries were executed on 15M+ data sets. Hope, you are already excited!

What is low-budget?

Amazon cloud proposals start from x2 vCPU and 8GB memory installations, and that’s the minimum you can get. I focused on the initial setup, the one marked as db.t3.large. It’s $110 (Sep 22) per month for 30GB disk storage and 80% CPU usage (US region).

The test stand

Below is the test stand I used in my experiment. The Loader app is the one that connects to a PostgreSQL instance over the local network and executes “insert” statements directly. I used a simple EC2 instance: t4g.micro — that’s x2 CPU and 1GB memory.

Test stand component diagram

For the software service implementation, I used Node.js (v16.15) + Sequalize.

Dummy data was generated using withshepherd/faker library. The loader can generate up to 3.8k rps locally, without connection to a database. The average ping between the application and the database is ~0.151ms.

I tested Two PostgreSQL services provided by AWS:

  • RDS for PostgreSQL
  • Aurora PostgreSQL

In my tests I used the default DB configuration. I’m sure the total resulting numbers I got during my experiments could be even higher after some tuning.

Tables

When filling test tables with data, I used the Fintech industry as the subject area. So my experiment project operates records in two key tables named Account and Transaction. You can find the structure of tables (create scripts) in the repository of this project.

It’s important to mention the overall size of each table record and indexes involved, which are:

  • Account record ~ 964 bytes. 1 index, size: 19 bytes.
  • Transaction record ~ 1097 bytes. 4 indexes: 33 bytes, 8 bytes, 29 bytes, 45 bytes (the bold one).

Account table will be used as an example that uses a minimum of indexes. Transaction table is a complex and heavy one and will get all the heat.

1M transactions

So how fast records would be inserted when the big table contains 1M records?

Records insertion speed per 1 second

🚀 It runs fast! The storage is almost empty, 1M records is nothing for a database now.

And here goes a surprise. RDS installation dominates. Why? I tried different configurations and found out that the difference is in … disks. Aurora services just do not let you choose the type of the disk. No any SSD/HDD switch. So you can only guess what’s used there.

But for RDS it’s all clear — you can select the type of the disk: from General purpose SSD to Provisioned IOPS SSD (io1). And for 1M records the insertion rate varies from 383 to 396 for Transactions table (a complex one).

In my tests Provisioned IOPS SSD is only 3% faster but a lot more expensive, at least $100 more. So just a simple General purpose SSD in RDS is already a brilliant choice.

Now, let me show you how select queries were working with 1M records. When checking the performance using the standard psql tool, it stated:

  • Accounts’ select for 30 records: ~5ms
  • Transactions’ count(*): ~140ms
  • Transactions’ select for 30 records with filters: ~1,5ms
  • Transactions’ grouping by account: ~1,6ms

The difference here between RDS and Aurora was not so noticable.

15M transactions

The charts below show what happened when the number of records was increased to 15M.

Records creation speed per second

The speed of record insertion into the simple Account table stayed the same. But for Transaction table it got a bit lower, down to 370 rps. Aurora tried to go upper but no chances, it’s already lost the battle 🏁.

With 15M records, queries started working significantly slower, around x11 times. Grouping operations were a bit better, but it seems the database’s caching worked well.

Queries execution speed (ms)

As you can see, in these tests Aurora worked almost the same as RDS. However, I stopped checking Aurora on higher data sets and further statistics are for RDS only.

30M transactions

Insert rate per second vs data amounts

With 30M of records in a table, insert rate decreased to 348 rps. That’s 12% lower than for 1M records. But anyway it’s quite impressive.

Interesting fact: when being on around 22M records, the insertion rate dropped to 99 rps suddenly. When resumed after a 15 minute pause, it returned back to normal. I’ve done a separate experiment without the boldest index (45 bytes), and the insertion speed was constant all the time. So the database couldn’t quickly handle such a big index very fast and some restructuring efforts were needed.

What about queries? Doubled data required 50% more time for select queries execution. It’s not bad. Alas, grouping queries execution time increased x5 times.

Does table size affect creation speed?

Yes, but not so significantly on the 30M records scale. It’s only 12% decrease but still fast: 348 rps.

The same experiment was performed for a bigger setup: x4 CPU, 16GB memory. The insertion speed didn’t change. What really affected — the application of a SSD disk. So when selecting your next database setup, you should mind the storage access speed.

Summary

My experiments have shown that even the minimal Amazon Cloud’s PostgreSQL setup can bring you a great insertion speed. However, usage of the SSD disk is strongly recommended.

Trying both RDS and Aurora services, I found out that only RDS service allows you to select a disk for your needs. For this reason, Aurora is not recommended for software systems on high load.

Size of tables’ indexes really matter when designing your database structure. Heavier, bolder indexes will require more resources of PostgreSQL leading to an insertion speed decrease.

If you are aware of the data amounts bigger than 30M records, start using PostgreSQL’s partitions. It will help you to segregate load over some criteria. But that’s up to the business model of your application.

P.S. If you are interested, the tests took around 48 hours to execute. It’s hard to believe but it was boring (filling up tables with millions of records) and exciting at the same time :)

Speed icon by Icons8. Repository of the article.

--

--