Stop Crippling Your Database (3/6): Types and Constraints

Alessandro Desantis
Ruby on Rails
Published in
4 min readNov 15, 2016

This is the third of a series of six articles on how to improve the way you interact with your DBMS, especially in the context of Rails development. The previous one was entitled “Meet Triggers, SQL’s Callbacks”. Enjoy!

Welcome back! It’s been a while since I’ve talked about how to make your database do more for you. Today we’re going to talk about some of the most common mistakes we make when creating our tables.

Let’s dive right in.

You’re Not My (Native) Type

ActiveRecord has a nifty little feature that allows you to serialize any kind of object in your database. You can just create a text column and define it as serialized and ActiveRecord will automatically serialize and deserialize whatever you put into it.

Let’s try it with an array. Suppose we want to have a tags array in our Post model. We can have the following migration and model:

Internally, ActiveRecord will use YAML to serialize and store the value of the tags column. We can now write code like this:

Now, what if we wanted to retrieve all the posts with the foo tag? That’s right, we can’t. Not unless we use some ugly and error-prone regular expressions.

It would be awesome if our DBMS allowed us to define the tags column as its real type — an array — rather than a text column which has to be serialized by our app, wouldn’t it? Well, that’s what PostgreSQL arrays are for.

Let’s rewrite our migration and model the right way:

We can still use the tags column just as if it were a native Ruby array, but now we can also retrieve all posts with the “foo” tag:

The above was just an example of a native DB type that’s rarely leveraged in Rails apps. PostgreSQL has built-in types for a lot of different data structures that you probably don’t know about: hashes, ranges, JSON objects… you name it! For more information, refer to the ActiveRecord documentation.

To NULL or NOT to NULL?

I bet you’re familiar with the ActiveRecord presence validation and use it daily in your Rails applications. For instance, if you wanted to ensure your posts all have a title, you could write this migration for the title column:

However, nothing prevents a third-party with access to your database (or even a call to update_column, which skips AR validations) to set the title column to null.

This can be solved with a NOT NULL migration:

Attempting to create a post without a title or setting an existing post’s title to to null will now raise an exception.

However, this does not mean that you can remove the presence validation from the Post model:

  • it is much more user-friendly to present a validation error than raise an exception when a field is missing;
  • even though the column cannot be set to null, it can still be set to an empty string, which is only caught by the Rails presence validation.

Yes, Size Does Matter

Did you know that not all string columns are created equal? When you create a string column, you can specify its maximum length.

Suppose you have a credit_cards table where you store, among other information, the last 4 digits of each card, so that the customer can easily identify it. (You’re not storing the full card number, right?!) In that case, there is no need for this column to allow strings longer than 4 characters:

This will throw an error if you try to enter a string longer than 4 characters in the last4 column (unless the additional characters are all spaces, as specified by SQL-92).

You can also specify the precision and scale of a decimal, just like you specify the maximum length of a string. The precision is the maximum number of digits that can be used to represent the number, while the scale is the maximum number of decimal digits. The biggest number representable with a precision of 5 and a scale of 2, for instance, would be 999.99. The migration syntax is quite similar:

As one would expect, numbers are handled a little differently than strings in this case. Here’s what the PostgreSQL manual says about it:

If the scale of a value to be stored is greater than the declared scale of the column, the system will round the value to the specified number of fractional digits. Then, if the number of digits to the left of the decimal point exceeds the declared precision minus the declared scale, an error is raised.

In the case of our price column, here’s what will happen for different values:

  • 999.989 will be rounded to 999.99;
  • 1000.0 will throw an error;
  • 999.999 will be rounded up to 1000.0 and throw an error.

As you have seen, size matters.

Note: There is a widespread myth that a string column with a maximum length of, for instance, 100 characters always requires more storage space than a column with a maximum length of 50 characters. This is not true. Your DBMS might use 2 bytes instead of 1 to store the column’s content’s size if the maximum length is more than 255 characters, but it’s definitely not the kind of optimization worth pursuing. The same is true for numeric columns: the size of the column is only determined by the data you put into it.

This was the third of a series of six articles on how to improve the way you interact with your DBMS, especially in the context of Rails development. The next one will be about indexes and foreign keys. Stay tuned!

--

--

Alessandro Desantis
Ruby on Rails

Technology leader and strategic advisor. I work at the intersection of people, software and words.