Hack Reactor SDC | Part III | PostgreSQL vs MySQL

Chase Norton
Glitter Guys
Published in
3 min readMar 21, 2018

10 million…seems like a big number, right? As I’ve come to find out — in the world of Big Data, 10 million is your typical tuesday. After creating and seeding 10 million data points into a MongoDB using mongoose/faker/async/await — I shift my focus to the establishment — you know — SQL. Why might you ask?

Chase, your Schema is so simple! You’ll never need joins or complicated relationships!

I know and I agree to a point. But, just because it is simple — does not mean that NoSQL is any faster than SQL. As scientists and engineers, we are allowed to hypothesize, but we must then test to determine which is better for our current project/Schema.

The first step in the SQL world is which SQL database should we work with? MySQL is a easier setup and often will have better documentation/articles about it while PostgreSQL is the ideal choice for production databases where read and write speeds are critical and first-class support for JSON— but the amount of user friendly documentation is not as readily available and has a significantly more difficult setup period.

I decided to attempt PostgreSQL, but would timebox it to 4 hours. If I could not get PostgreSQL setup, running and seeded.

Turns out — it takes about 25 minutes to set up PostgreSQL on a mac using Postgres GUI install. From there, I was able to create two tables and connect them via a foreign key — all under an hour. The next step was to find a way to allow my javascript to connect to the database, seed it and then query it.

UPDATE: I switched to one table for the following reasons:

  • PostGres is able to handle arrays
  • My one to many relationship of details to photos did not gain any speed improvements and only confused the matter
  • Most importantly, I would never need to access a table of photos separate from the meet up details

The first step anyone attempt this will often take is to reach out to a library called pg. This sets up a connection with your database, allows you to send queries and is fairly straightforward to setup. In total, it was about 30 minutes of reading to get the configuration correct and begin seeding my database with faker. This is when I began to see the limitations of pg — namely, we could only insert one at a time. To insert 10,000 data points took 4.3 seconds which means that it would take 71minutes. Ugh:( Wasn’t PostGres supposed to be fast? Especially compared MongoDB? How do I improve this?

The answer? PG-Promise — I promise.

The pg-promise library has many features and I will only touch on a few. After the 4.0 update, it now has a helpers.insert method which will handle insertion of many queries at once which will reduce the number of connections/communications with the database. It also now fully supports the new ES7 async/await, which makes communication between server and database much cleaner and easier to read/work with.

After more reading on the pg-promise docs, blogs and a few videos — everything was running like a well oiled machine. The result?

10 million data points seeded in 13.2 minutes.

Compared to the first attempt, we acheived a decrease in seed time of 81% by switching over to pg-promise. Crazy? Nah, just another day at Hack Reactor.

Just Keep Coding,

Chase Norton

--

--