7 reasons not to use composite keys

Pablo Dall'Oglio
Jan 26 · 13 min read

You probably know people who advocate single primary keys, and others who advocate composite primary keys, both passionately. In this article, I will try to present technical arguments whereby I believe that the use of single primary keys makes the development and maintenance of complex systems easier. I will be quite rational and I will support my considerations in respected authors like Eric Evans, Martin Fowler, besides my own experiences.

To begin, let’s imagine a (partial) scenario. An academic system, where we have the concept of person, contract, class, enrollment, course, subject, and others. A student may have a contract, which in turn links the student to a course. Let’s assume there is a business rule, which says that a student can only have one contract with each course. So if he leaves the course and returns to the course, we are talking about the same contract that is reactivated. The contract then binds the student to a course and has enrollments, which are records that link that contract to the classes. An enrollment is the record of that student (who is represented by the contract) in a class. The enrollment is not directly linked to the person, because the school records need to know which course (contract) it refers to. Enrollment is linked to a class, which represents a subject within a semester.

Obviously the system is incomplete, lacking many attributes, relationships and others. For example, we would have to link the subjects to the courses (N-N), but this does not contribute to the understanding of the purposes of this article. So let’s simplify things.

Reference model using simple keys

First let’s see the model using single primary keys. Note that each table necessarily has an ID to identify a record row in that table. So whenever we need to perform an operation on a student’s enrollment, for example, it would enough to know the record ID. The relation to the contract is performed by a simple foreign key, the same for the class. We can also create a unique index for foreign keys, so that the database itself prevents the insertion of repeated combinations of records (contract + class).

Reference model with single primary keys

What not to choose as primary key

When modeling classes, some of them are Entities, other Value Objects, some Aggretates. For Evans, an object of type ENTITY is an object that has a unique identity, which lasts in its life cycle. Take a person, for example, his attributes may change, but his identity remains. An ENTITY can be a person, a contract, an enrollment, etc.

An ENTITY can be transmitted by a Web Service, stored and retrieved from the database multiple times. In each process, a new object is instantiated in memory. What ensures identity is a unique and immutable attribute.

Many people are tempted to use real-world attributes as the key for an object. Identifying a person by Social Security number, a book by ISBN are some examples. But why avoid these strategies that use real-world attributes to identify objects?

The identification of an object must be unique and immutable. Otherwise, what would happen if the keys kept changing at all times? Why would fields like ISBN or Social Security change? I explain … SSN represents a person for the national record system, it does not uniquely identify that person in the universe. That way, the country can change its registry system, changing also the values of these keys, and even their key names. So your database would be susceptible to these changes. Imagine running UPDATE on several tables that refer to the person’s Social Security Number, which is no longer called "Social Security Number", and now has a different structure.

So to avoid these problems, remember that a key must be unique and immutable. You must ensure these conditions and link the database records so that the structure does not have to undergo future changes. To do this, use artificial keys (surrogate), generated by the database itself (autoincrement). Artifitial keys will ensure that identification will not undergo changes, and so will bring greater stability to the structure (primary key and foreign key relationships).

Elton Minetto adds that “a major trend is to use UUIDs for key fields, avoiding using database auto_increment.” For him, UUIDs “help migrate from databases or even have distributed databases at some point in the project.” The friend Renato Mendes Figueiredo complements that UUIDs “relieve the database load in generating ids and make the application not worry about order or oneness”.

Reference Model Using Composite Keys

Even if you persist in the idea of using composite primary keys, in this version of the model, we would have some changes. The simplest tables (person, discipline, course) would continue the same way, since they only have one attribute for identification, anyway. Things start to change for contracts. As a contract is unique by the combination person + course, it would have a primary key composed by person_id and course_id. So, whenever we want to perform some operation on the contract, we would need to enter the two keys. The same is true of enrollment, which previously had the key to the contract and the class. Since the contract key is composed, it is now represented by a triple composite key (person, course, and class) to prevent this combination from recurring. And whenever we want to perform any operation on a registration (select, update), all keys must be informed.

Model with composite primary keys

Now we return to the original motivation of this article. Motivate the reader to build systems with the use of single, non-composite primary keys. We try to organize the reasons in topics to make the discussion easier, as well as the visualization of the reasons.

1. Persistence is much simpler

A known Design Pattern to implement the persistence of an ENTITY is an ACTIVE RECORD. For Fowler, an ACTIVE RECORD is “An object that represents a row in a table, and encapsulates access to data and adds domain logic”. ACTIVE RECORD is possibly the most commonly used pattern in persistence layers because of its ease of use. The following is an example where we load the Contract object into memory based on its ID. Think how easy is to map objects between database and memory with a single field to identify that object. Now imagine loading this object from various attributes that would identify it when using composite keys. You can use other patterns like DATA MAPPER, but then the game is not as fun, because each class would require a mapper for it, that is, more code for maintenance.

<?php
$contract = Contract::find($param['id']);
$contract->deactivation_date = date('Y-m-d');
$contract->save();
?>

One of the requirements for using ACTIVE RECORD is another pattern called IDENTITY FIELD. For Fowler, an IDENTITY FIELD “Saves a database ID field in the object to keep the identity between the memory object and the row in the database.”

While in the Database we use primary keys to reference records, in memory we use pointers (variables) to reference objects. To maintain the relationship between Database and memory, an identification field is necessary that must be preserved in memory, to know that when it is updated in the database, it is the same record, not a new one. Usually persistence mechanisms do much better with single key. Remember to use a surrogate key, less susceptible to real-world changes.

2. Object Cache (Identity Map)

A few years ago, we had a performance problem on a customer, Univates University. During the enrollment there were about 5,000 students doing the process simultaneously, and the process loaded many objects into memory to do financial calculations, enrollment records, and a number of other business rules. The server was on the edge and upgrading the hardware was an expensive option.

We found that some objects (such as configuration objects) were loaded many times during the same enrollment. We decided to create a cache of objects in RAM using PHP’s APC technology. Thus, once the object was loaded from the database, it was stored also in the cache. In later requests, we always fetched from the cache first. Only one table was queried 250 times in the enroll process, since it was a key/value table used to fetch different configuration parameters). We decided to apply this technique also to tables like courses, cities, and person. This technique saved thousands of queries in the database by enrollment process, and millions of queries over a month. It was not necessary to upgrade hardware, and the process was much more fluid, since the query time in the cache was much shorter than the query time in the relational database.

This technique is described by Fowler in the Design Pattern IDENTITY MAP. For Fowler, an IDENTITY MAP, “Ensures that each object is loaded only once by storing it on a map. Search for the objects on the map when necessary to reference them".

The IDENTITY MAP search engine searches the map, usually by PK (surrogate). Saves resources, loads only one version of the object per transaction. When it loads from DB, it first checks the map. If it is not on the map, put it on the map.

// open transaction
Course::find($course_id); // load and write on cache
Course::find($course_id); // load from cache
Course::find($course_id); // load from cache
// close transaction

You would probably ask, why would someone load the same object over and over again? Notice that we are not talking about the same method, but methods of different classes, executed within the same transaction. Methods that load an object that has already been loaded earlier, in the same transaction, or even by another user in another transaction.

Imagine an enrollment program where we will need the Course object in several places, such as academic records, financial records. This Course object would be loaded N times for only one student during the enrollment process. When using cache, this object would not be loaded from the database anymore, but from the cache, by all students who enroll in that course. In addition, administrative employees performing back-office operations (reports, academic record operations) would also use the cached version of the registry, making the whole system faster and more fluid.

Cache tools, such as APC, Redis, generally operate on the key-value pattern. Now imagine the complexity of caching with composite keys? Note that in this type of implementation (single key), the simple primary key itself identifies the object in the cache.

Referências:

3. Registration maintenance

Forms

Now imagine basic day-to-day operations like the registration form of an object. When working with single primary keys it is very simple to decide if the application will perform an INSERT or UPDATE in the save action of the form. Imagine that we are registering a new contract. If the object has an ID, we execute an UPDATE, otherwise we execute an INSERT in the database. The INSERT in a serial / sequence primary key field is simpler, since we just have to leave field empty, than the database uses auto increment (eg PostgreSQL). But with composite keys, we must first consult the database with a combined criterion (Ex: personal_id + course_id) to check if that record exists before the recording.

if (empty( $object->id ))
{
insert...
}
else
{
update...
}

Datagrids

Now imagine a datagrid with contract listing. This datagrid has some actions (cancel, reactivate, delete). When working with single primary keys, you just have to pass the ID field in the URL (GET method) to identify the contract that will be handled. But when working with composite primary keys, we need to pass more fields through the url (person_id + course_id) to identify the same record. In addition, the object’s load time is generally shorter when using a single field, than when using multiple fields (composite key).

Now imagine you performing database maintenance task on the terminal. Someone did something wrong and you need to fix this by the console with some UPDATE or a DELETE on some records. With single primary keys you can do this maintenance with IN statement, for example. Now imagine how you would change these records with composite primary keys.

// delete some records
DELETE from contract where ID IN (5000, 5020, 5030);
// fix some records
UPDATE enroll set cancel_date=NULL where ID IN (40300, 40301);

Widgets

Now imagine that you need to do a Combo (SELECT) to list the contracts of a student, to select the contract to do an operation. Generally, these visual components (Widgets) are prepared for a key (ID) and a value. But if the contract has a compound key, what will COMBO’s key look like? Then you will begin to bypass the situation by concatenating the fields, and parsing to the other side of POST, splitting the fields.

4. Proliferation of fields in the structure

Now imagine that the student no longer has only one grade per enrollment, but several. So you need to create a table that references the enroll. In the single primary key model, you need to add a single foreign key, and record the note date and value.

Notes table with simple keys

In the composite primary key model, you need to add three foreign keys, since they are the three that uniquely identify a enroll.

Notes table with composite keys

Note that there was a proliferation of keys. This cascading of keys only tends to worsen with the size of the database. More tables, more keys, and more joins in queries.

5. Join Queries

Now imagine that you need to write reports with queries across multiple tables, since you need to search data from different places. By using single primary keys, we have simpler links in Joins as well. When we use composite primary keys, we have more fields to remember when making connections. Because our model is minimalist, this difference will not be as evident as we would like. But as the system becomes larger, the differences become larger.

When using single primary keys, note that the enrollment table will be linked with contract by only one connection field.

SELECT person.name,
subject.name,
enroll.enroll_date,
partial_grade.grade
FROM person, contract, enroll, class, subject
WHERE person.id = contract.person_id
AND contract.id = enroll.contract_id
AND enroll.class_id = class.id
AND class.subject_id = subject.id
AND class.id = 5003;

When using composite primary keys, note that it is necessary to connect each of the foreign key fields.

SELECT person.name,
subject.name,
enroll.enroll_date,
enroll.grade
FROM person, contract, enroll, class, subject
WHERE person.id = contract.person_id
AND contract.person_id = enroll.person_id
AND contract.course_id = enroll.course_id
AND enroll.class_id = class.id
AND class.subject_id = subject.id
AND class.id = 5003;

In principle, we have seen that more Joins will be needed. But in addition to performance, note that more Joins are needed to get the same result. By using single primary keys, the indexes become more compact, which favors the search strategies.

6. Change logs

Nowadays it is almost unthinkable to build a good business application without logs. In our applications, we have logged accesses (login, logout), SQL executed (INSERT, UPDATE), and change logs. Change logs store the state of an object before and after the change.

The following table shows the change logs. Note that we have a very simple structure with: the record primary key, change date, user login, table, column name, operation, old value, and new value. The structure is always the same, regardless of the table, and the field to be logged.

To implement this Log is also quite simple. Since we use an Active Record, the object itself performs the comparison with its previous state before writing to the database. Now, how would we perform these logs with composite keys? Creating multiple columns for identification would be impractical because the number of columns that identifies a record using composite key is variable per table. We would have to get around the situation again by making the ID (PK) column something serialized or concatenated. If we chose this way, how would you easily locate those logs later? Imagine that you need to identify all changes in a particular contract. Go thinking …

Change logs

7. REST API

Most of the REST APIs are built around the pattern shown in the following table, where the resource to be manipulated is uniquely and exclusively identified by a simple ID. See that not even the name of the attribute (ID) is identified. Now imagine that we need to get the data from a contract via API (GET operation by ID). When working with single key, the operation is simple GET https: //domain/contract /1.

REST API Operations with single keys

Now imagine the same operations with composite primary keys. We would have to define routes formed by several attributes to identify the object, which is not the common language of the REST universe. It got complicated, right? Of course you will find a way out, but think about it, it may be another “workaround” in your project. The "default" everyone knows how to implements, but the different one generates another extra, unexpected work.

REST API Operations with composite keys

The second idea is bad, because it links (couples) the resource to the database schema. Renato Mendes Figueiredo also adds that “In the case of restful, the composite key model has another disadvantage, creating all resources with a very segmented data model makes several calls are necessary in the API to create the entire data model.”

Final considerations

In this article I tried to present design patterns from two of the world’s most respected Software Engineers, Eric Evans and Martin Fowler, as well as hands-on examples we work on every day. My goal with this article is to show that the sum of details can make a big difference. When we implement a large system, we have to eliminate small obstacles and make our life easier, not complicate. It is possible to use both the single and composite primary keys approach. But realize that when choosing single primary keys, we will have several shortcuts, facilities. These facilities translate into time gain, leaner code, and less headache trying to adapt techniques that were born to be simple, not complex.

I would like to thank the friends Renato Mendes Figueiredo and Elton Minetto for the review of the article and precious contributions.

Pablo Dall'Oglio

Written by

Teacher, writer, speaker, creator at @adiantisolution. http://www.dalloglio.net

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade