HBS Relational databases topic guide

Najaf Ali
Happy Bear Software
7 min readJan 17, 2017

Working with relational databases is part of the core skill set for full-stack developers at Happy Bear Software. It’s a topic that new developers get to last because it’s seen as “back-end” and perhaps more difficult than front-end development. While it may require more imagination and visualisation ability to master, it’s nevertheless an essential part of of web developers toolkit.

Relational databases represent the core data model of your web application. How you structure a relational database in your application has a significant long-term effect on how that application will work. The right structure will lend itself more easily to modification and extension later. The wrong one will lead to architectural, performance-related, and other limitations very quickly.

This article drills down into the specifics of what we expect developers to know while working at Happy Bear Software. For each sub-topic, we’ll list what the thing is, why you should know the thing, and how to get better at the thing if you’re not already good at it.

The things are, in no particular order:

  • ACID
  • Basic SQL for CRUD
  • Designing relational databases
  • Querying relational databases
  • Query Performance
  • PostgreSQL
  • ActiveRecord and ORMs

ACID

ACID stands for Atomicity, Consistency, Isolation, Durability. It is a set of properties of database transactions. Relational databases that we use to deliver software will satisfy some or all of these criteria.

You should understand ACID because you’ll often be presented with the choice of what data store to use in a given project. Non-relational databases typically sacrifice one or more of the ACID guarantees to better suit a specific use case. In order to make an intelligent decision either way, you’ll need to understand what ACID guarantees are and the implications of sacrificing one or more of them.

A good start is to read the wikipedia page on ACID, and to look for articles about how and to what degree ACID is implemented in RDBMS’s like PostgreSQL, MySQL, and Oracle.

Basic SQL for CRUD

SQL stands for Structured Query Language, and is the pre-dominant interface you will use for interacting with relational databases. CRUD stands for Create, Read, Update, Delete. Aside from manipulating a database schema itself, these are the operations you will use SQL for.

You should learn to do basic CRUD operations in SQL because these are the SQL statements that web applications you create will be sending to the database they’re integrating with. Without knowledge of how those SQL statements work (beginning with how to write them yourself) you won’t be able to reason about their operation in terms of functionality or performance.

You should learn basic SQL by working through an introductory text, including plenty of exercises that get you writing queries by hand in an SQL prompt. At this level, there’s little difference between the most commonly used database systems, so SQLite is fine to start with.

Designing relational databases

The design of a relational database is the high-level architecture of a web application. When talking to a potential client about a feature, you may already be mentally visualising what the data model would look like. Relational databases are designed to handle just about any use case you can come up with, provided you get the design of the schema right.

Relational database design gives you a number of tools to put to use when designing a schema. One-many relationships model one thing being related to many other things. Many-many relationships model two-way relationships with many things related to zero or more many other things, often using a join table. Parent-child relationships model things in the same table related to one other thing in that table. Database normalisation gives you a framework for thinking how to decide what fields should go in what tables so that the database is overall more amenable to good operation and future change.

To learn to be better at relational database design, read about these topics in depth and do plenty of design exercises. Read about “normal form” (no need to understand it in depth, but it’s worth reading through the first three stages). Think about how you would model Eventbrite, GitHub, Twitter, or other services you use. What would be a many-one relationship? What would be many-many? Write out the SQL required to create your database schema designs and run them, just like you would with a programming exercise.

Querying relational databases

Besides the basic ability to read fields out of database, much of the power of using a relational database lies in the ability to query it in ways you won’t foresee at the design stage. Your ability to query the database in flexible ways while still being performant will depend heavily on how you designed the schema.

At the most basic level, querying a database schema involves selecting, filtering, and ordering the data you want from a single table. More advanced queries using inner/outer left/right joins allow you traverse different tables, returning data in a second table “joined” to data in a first. SQL also provides aggregate functions that allow you to count, sum, average, and perform other operations produce a single value from a set of data. Yet more advanced queries can be built by using the result of one query as part of another, a facility known as subqueries.

You should learn these techniques because they will often be put to use in the web applications we build for our clients. In some cases you will be writing these queries by hand and in others an ORM will be generating them for you. Without an understanding of how non-trivial queries are put together, it will be difficult for you to reason about the capabilities of a relational database system. Understanding how these queries work may also inform your database schema design decisions and help you make good decisions about query performance.

You can learn how to write these queries by practicing. You can do this using sets of exercises like SQLBolt, by doing exercises out of books, or by getting access to publicly available data and experimenting with SQL queries on it.

Query performance

The performance of SQL queries is a complex topic. Your role will be to determine what there performance characteristics of the SQL your application runs are and take steps to make those performance characteristics better.

This will involve looking at the queries being made to the database and using debugging tools to analyse how the database is executing those queries (typically the query planner). Based on this you will need to make decisions about creating/removing indexes of various types, keeping in mind the trade-off in terms of keeping those indices up to date.

A good place to start learning about query planning and indexing is the documentation of various database management systems. MySQL for example will have a different approach to optimising query performance than PostgreSQL will. There are also several books on the topic. You can also jump into most Rails codebases and identify at least one or two query performance issues that can be optimised by the intelligent use of indexes.

PostgreSQL

While it’s good to have a general knowledge of different relational database systems, the one we end up using the most is PostgreSQL. This is partly because it is the default database provided on our PaaS of choice, Heroku. It’s also because it’s a rock-solid piece of software that has been around for thirty years. For the vast majority of the software you will build with us, PostgreSQL will handle all of your use cases just fine.

You can learn more about PostgreSQL by working through documentation, books, and any other written material you can get your hands on. Get it running locally and do all the exercises you’re using to learn SQL in PostgreSQL if you haven’t been doing so until now. Have a look at the special index types that it makes available to you that other databases don’t. The JSON data type is another PostgreSQL specific feature it may be worth getting acquainted with.

Using ORMs like ActiveRecord

We’re a Ruby on Rails consultancy so much of your interaction with the database will be via the ActiveRecord ORM. Using ActiveRecord effectively is a skill in itself that it’s worthy of your time to spend developing.

Learn how migrations work. In addition to the documented interface, have a look at how indexes are created and how you might create some of the more esoteric index types in PostgreSQL using the ActiveRecord migration interface. If you can’t bend the migration code to your will, can you make it so that you can send raw SQL to do what you want?

Learn the ActiveRelation query interface, and take a look at the query that gets output to the database. This information is usually available in the development logs, or you can get it inline by calling `.to_sql` on an ActiveRelation object. Learn to name parts of a relation and group them into chainable named scopes.

More generally, think about the queries that the ORM is generating and what this means for the performance of your application. Here we’re talking not about the performance characteristics of a particular query, but instead the number of queries your application is sending to the database. If you’re getting a big list of records, only to iterate over and do an additional query for each one, you’ve run into the classic N+1 query problem, for example. A single query represents a trip across the network, so one single query loading tons of data is almost always better than lots of queries loading a small amount of data.

Learning how and how best to use an ORM like ActiveRecord takes time and experience. The relevant Ruby on Rails Guides are a good place to start, but the only meaningful way to get practice with it is to create real software with it.

That’s a broad overview of the range and extent of skills and knowledge we want from developers at Happy Bear Software. I’ve left a lot of things (e.g. stored procedures, views, constraints) off this list that may be useful but we just haven’t seen come up yet on client projects. This list, the specific points of emphasis and the guides to improving could also do with a bit of work. This article should however give you ideas for where to improve your abilities with databases if you have time to focus on them.

--

--