How I do “ORM”, Part 1

Khun Yee Fung, Ph.D.
Programming is Life
7 min readJan 20, 2024

Well, I guess it might be interesting to talk about one specific way I have been using for years, ever since I discovered jOOQ, to bridge Java and Postgres databases I use in quite few projects I wrote by now.

Methods

How many ways can we specify queries in our programs to be run on a relational database, get the result, and process the result?

Literal strings

Use strings like “select * from emp where emp_id = 10” in the program. How the database and the program are linked together was very hairy but you usually only had to do it once.

You can certainly imagine what is wrong with this approach. We all work with JDBC (or ODBC) directly before; I don’t know about you, but for me, it was a pain in the you know what to maintain the code. One typo somewhere, and the application crashed when it got to that somewhere. Writing the scaffolding for the queries turned out to be more work than writing the queries themselves, if you wanted to control and manage the queries.

Preprocessor-based framework

For this method, you have literal SQL statements in the program, but not as strings. Instead, you would mark off the literal SQL statement in the program like

==SQL
select * from emp where emp_id = 10
==SQL

You would then run the preprocessor and code would be generated to replace this literal query into the language of the program in question.

We all know the danger of Bobby Tables https://xkcd.com/327/ by now. But it wasn’t really an issue in the old days, as only operators had access to computer programs. The Internet (circa 1981 for TCP) was not widely used in the industry.

Still, it was very tedious to have to bother with two languages in the same program, especially since the names of tables and attributes in the database and the names of tables and attributes in the application program (usually COBOL, PL/I, and later C and C++) are “disconnected”, meaning that changing the name in one did not cause anything to fail until the application is run after the change.

Manual ORM

Of course, once we started programming in C++, there was a desire to somehow map classes in C++ with tables in the database. And the shallow mapping was quite straightforward: one class in C++ mapped to one table in the database. And the fields in the C++ class mapped to the attributes of the table. So, on the surface, it looked neat. But of course, the devil is in the details. In any case, it was still better than the manual literal SQL queries.

Tons of work to write the classes. And obviously if the classes were written manually, the classes would be quite uneven, whether they inherited from an abstract class or not. That would help, obviously, but not by much.

Generated ORM

After that, it was just one step to generate the classes themselves. And if the database allowed a program to access its catalogues, that was even better as the classes would reflect the database schema so much better.

To prevent the literal names from being used directly, a two-layer framework would be very helpful: the bottom layer did not do any logic, other than reflecting the database exactly. All the literals, like table and attribute names, etc., are in this layer. Every time the database is modified, this layer is re-generated. Now, you get syntax errors if you still referred to the old names, or the old schema.

Still, it is difficult to write elaborate queries with this kind of ORM. So, something like a query language would be used. For JPA, it was JPQL. Now, it looks like SQL, but it is not SQL. It sort of works like SQL, but it does not really work like SQL. At least it does not have the Bobby Tables issue. It still has the same issues with names, now in two places: the entity classes, and the JPQL queries. You have literal strings like

select a from Emp a where upper(a.name) = :name

and you have literal strings like

@Entity
@Table(name = “emp”)
public class Emp implements Serializable
{

Change the name of the class “Emp”, and you have to hunt for its references in JPQL strings. Change the name of the table “emp” in the database, and you have to go to this entity class to change the name. And perhaps in the persistence.xml file as well, if you do JPA that way.

Add or subtract an attribute. Change the relationship, etc. and you will be hunting for a while to change your program. Programmers are lazy. So, faced with that, most programmers would just refuse to change the database schema. And development DBA, … Okay, I think there are not that many development DBA any more. Database schemas seem to be designed by application programmers now a day. That is actually a scary thing. Very scary thing.

Issues?

The main issue in the literal queries is quite obvious: SQL queries in strings are a completely different thing than SQL queries in a database. SQL queries are not literal strings. They are programs in a language. In a string, there is no syntax checking possible (although the preprocessor approach does check the syntax of the query, but it is not syntax checking while you are programming).

How about the ORMs? Well, it is the mismatch of semantics: Relational databases are relational; object-oriented languages are object-oriented. This means there will be a mismatch in whatever mapping you can think of. That mismatch usually is the crucial bit of the mapping. Relational databases simply don’t do object-oriented stuff well. The same vice versa. So, thinking that an ORM is going to do the mapping perfectly is mistaken. What you get is a compromised mapping. You can choose to design your database schema to fit the object-oriented way of working, or you compromise your program to match the relational way of the database. None of these is satisfactory.

jOOQ

The jOOQ framework does it in a different way: it does not pretend to be an ORM. It uses object-oriented features in Java to provide a framework to work with relational databases, but the tables are still tables, attributes are still attributes. There is a syntactic mapping, but no semantic mapping.

What this means is that jOOQ is extremely good in presenting the database schema as it is actually in the database (if you don’t use any features that transform the queries for you). With its use of a fluent API, SQL queries look like queries again. Mostly.

It also has two layers: the literal name layer, which you don’t usually see and in my opinion you should not use, and the abstracted layer that you can use.

The process is very simple:

  1. Design your database schema; deploy it to a relational database. Or you already have a database somewhere.
  2. Set up the jOOQ XML file to point to the database.
  3. Run the jOOQ code generation .

The end result is a set of classes that you can then use in your application.

For instance, you get

Result<Record> result = dsl.select()
.from(Tables.EMP)
.where(Tables.EMP.EMP_ID.equal(10))
.fetch();

What is so good about this? Well, the SQL is recognizable right away. And if you modify the name of the emp table, you simply re-run the jOOQ code generation. Suppose you changed the table to employee, and you have run the code generation. The statement above will give you a syntax error. And you will need to modify it to become

Result<Record> result = dsl.select()
.from(Tables.EMPLOYEE)
.where(Tables.EMPLOYEE.EMP_ID.equal(10))
.fetch();

This seems to be a very simple thing. But if you need to maintain code like I do, it is such a lifesaver. And if you need to refactor the database schema, for whatever reason, it is even more useful.

Now, I don’t like the fact that we have Tables.EMPLOYEE.EMP_ID.equal(10), as EMP_ID is really not an integer. Syntactically it is a number, but semantically it is not. It is like your SSN number (or SIN number here in Canada) or whatever numeric ID you might have. They are numbers but arithmetic operations on them are just silly. Semantics matter, as that is where a lot of run-time errors are due to the mismatch of semantics. For instead, if both emp_id and employer_id are both numeric, this expression below is completely fine for the database and the program as syntactically it is completely fine.

Tables.EMP.EMP_ID.equal(Tables.EMPLOYER.EMPLOYER_ID)

And it is nonsense in terms of semantics. But you would not know it and the program and the database will happily run it with not a worry in the world. Except that when you got some weird records in the database, or you get result that does not make any sense, you just would not know how they got there or why you get the result that you did. And finding the cause means going through each and every query in the system in the worst case. That is not fun at all.

Still, jOOQ is wonderful in what it provides.

If you wish, jOOQ could generate POJO, and it generates Record classes for you. Like for instance, the DBEmpRecord class would be generated (if you set the prefix to be “DB”, a configurable thing) for the emp table.

Next

So, jOOQ is very good at what it does. It makes database interface an almost enjoyable thing to do. Okay, that is overstating it. Still, using jOOQ is mostly a non-event.

It does not handle common table expressions (CTE) very well. However, that is really a limitation of Java. Java does not allow you to specify context-sensitive things. That is to be expected from the current form of Java. Whether it is possible to provide such a thing in a language like Java, I don’t know. It is certainly a research project someone can pick up and run. If it is done well, I think it has a PhD in it. Anyhow, that is the fundamental issue in a language like Java. I can’t do a thing about it, other than designing a preprocessor to do the context-sensitive bits, which I hate to do.

I also can’t get over the semantics mismatch thing, however. I know I can do something about it.

In the next part, I will talk about what “enhancements” can be had that would make jOOQ even more enjoyable to use, and mostly remove the semantic mismatches.

--

--

Khun Yee Fung, Ph.D.
Programming is Life

I am a computer programmer. Programming is a hobby and also part of my job as a CTO. I have been doing it for more than 40 years now.