ORM: The deepening division of labor

alex_ber
Geek Culture
Published in
21 min readFeb 12, 2022

In this story I will overview the history of the ORM Frameworks and why I think they eventually fail. I will use specific examples from Java and Python.

Quote from Adam Smith:

The greatest improvement in the productive powers of labour, and the greatest part of skill, dexterity, and judgment with which it is any where directed, or applied, seem to have been the effects of the division of labour.

Chapter I, p. 7 — The Wealth of Nations (1776) — Book I

See also:

Full Stack: The deepening division of labor labor

Let’s start from the very beginning — what problem ORM Framework was meant to solve.

SQL

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables.

Originally based upon relational algebra and tuple relational calculus, SQL consists of many types of statement…SQL was one of the first commercial languages to use Edgar F. Codd’s relational model. The model was described in his influential 1970 paper, “A Relational Model of Data for Large Shared Data Banks”. Despite not entirely adhering to the relational model as described by Codd, it became the most widely used database language.

SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987.

By 1986, ANSI and ISO standard groups officially adopted the standard “Database Language SQL” language definition. New versions of the standard were published in 1989, 1992, 1996, 1999, 2003, 2006, 2008, 2011, 2016 and 2019.

SQL deviates in several ways from its theoretical foundation, the relational model and its tuple calculus. In that model, a table is a set of tuples, while in SQL, tables and query results are lists of rows; the same row may occur multiple times, and the order of rows can be employed in queries (e.g. in the LIMIT clause).

Object–relational impedance mismatch

The object–relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system (RDBMS) is being served by an application program written in an object-oriented programming language or style, particularly because objects or class definitions must be mapped to database tables defined by a relational schema.

Objects (instances) reference one another and therefore form a graph. Relational schemas are, in contrast, tabular and based on relational algebra, which defines linked heterogeneous tuples (groupings of data fields into a “row” with different types for each field).

The relational model has an intrinsic, relatively small and well-defined set of primitive operators for usage in the query and manipulation of data, whereas OO languages generally handle query and manipulation through custom-built or lower-level, case- and physical-access-path-specific imperative operations. Some OO languages do have support for declarative query sublanguages, but because OO languages typically deal with lists and perhaps hash tables, the manipulative primitives are necessarily distinct from the set-based operations of the relational model.

The concurrency and transaction aspects are significantly different also. In particular, transactions, the smallest unit of work performed by databases, are much larger in relational databases than are any operations performed by classes in OO languages. Transactions in relational databases are dynamically bounded sets of arbitrary data manipulations, whereas the granularity of transactions in an OO language is typically is much finer.

In object-oriented programming, data-management tasks act on objects that are almost always non-scalar values. For example, consider an address book entry that represents a single person along with zero or more phone numbers and zero or more addresses. This could be modeled in an object-oriented implementation by a “Person object” with an attribute/field to hold each data item that the entry comprises: the person’s name, a list of phone numbers, and a list of addresses. The list of phone numbers would itself contain “PhoneNumber objects” and so on. Each such address-book entry is treated as a single object by the programming language (it can be referenced by a single variable containing a pointer to the object, for instance). Various methods can be associated with the object, such as methods to return the preferred phone number, the home address, and so on.

By contrast, until recently, many popular database products such as SQL database management systems (DBMS) are not object-oriented and can only store and manipulate scalar values such as integers and strings organized within tables. The programmer must either convert the object values into groups of simpler values for storage in the database (and convert them back upon retrieval), or only use simple scalar values within the program.

The heart of the problem involves translating the logical representation of the objects into an atomized form that is capable of being stored in the database while preserving the properties of the objects and their relationships so that they can be reloaded as objects when needed. If this storage and retrieval functionality is implemented, the objects are said to be persistent.

Naïve solution

First web application was pretty simple, there was no separation on tiers — everything was done in 1 tier including connection to DB, retrieving result-set and converting them into objects. For simple web application it was good enough. When complexity of the web application starting to grow the “separation of labor” principle was applied. The initial separation was using 3-tier application or Model-View-Controller (MVC) design pattern. Nowadays, MVC is considered obsolete, but some of it’s variants such as MVP or MVVM are still in use.

Data Access Layer (DAL)

Eventually designated layer for accessing RDBMS becomes the standard thing to have in any web application.

The rest of the story will be list of the different implementation technique and their evolution.

But first, I want to reiteration what is DAL. DAL is an architectural term.

The Data Access Layer (DAL) is the layer of a system that exists between the business logic layer and the persistence / storage layer. A DAL might be a single class, or it might be composed of multiple objects. Those objects can be really anything, it can be POJO (plain objects), DAOs (Data Access Object — more about them below), Repository (more about them below). It might be a third-party object-relational mapping tool (ORM, more about this below) such as Spring Data, Hibernate, TopLink, JPA or Spring JDBC Template or SqlAlchemy (with or without ORM capability). It serves as a façade to hide the complexity of the data access logic. The business layer talk to this façade and from his perspective he deals only with objects.

Introducing distinct layer for dealing with data access logic have opened the door to further specialization. First, it was noticed that the same DAL can be used in different web application — we have totally different business requirements for web application, but both have similar data schema stored in RDBMS, so we can reuse the same DAL.

The next step was outsourcing this layer. It can be done in 2 very different ways — as separate library or framework. Example of the library approach is Spring JDBC, Spring Data or SqlAlchemy without ORM. Example of the framework will be SqlAlchemy with ORM, JPA, Entity Framework Core (.NET) or any other ORM framework.

What is difference between library and framework? See Framework vs Library

For example, Spring is non-intrusive framework. The application code doesn’t need to depend on any of the Spring objects directly. ORM framework are typically very intrusive. I do have experience of using them, but more on this below. Another example will be JPA.

Let’s go back to DAL. As I’ve said above outsourcing of it creates 2 different ways to use it as library (Spring JDBC for example) or as framework (Spring Data for example or SqlAlchemy with ORM). In 2010s however totally different approach emerge that solve this problem in totally different way. This is the best evidence of the failure of the ORM framework across the board. This is example of not deepening of labor, but deepening, new knowledge. I’m talking about NoSQL of course. For example, MongoDB is document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. It was created at 2009 and become quite popular at about 2013. If you opt-in in MongoDB (or other NoSQL) you don’t need to do any ORM, you can store your entity directly to MongoDB, just convert them to JSON (as you do in the Controller to communicate with client anyway). So, with deepening, new knowledge we can exclude the ORM step altogether. Unfortunately, this approach have it’s own downsides, I will talk about them below. So, in last few year we’ve started to see a new trend the convergence between SQL and NoSQL databases. More on this below. At the end of the story I will describe some of my architectural choices on the current Python project that I’m working on

Data access object (DAO)

DAO is the classic way to implement Data Access Layer. As of 2021 it is still my preferable way to abstract of data persistence. Quote:

2. DAO Pattern

The Data Access Object Pattern, aka DAO Pattern, is an abstraction of data persistence and is considered closer to the underlying storage, which is often table-centric.

Therefore, in many cases, our DAOs match database tables, allowing a more straightforward way to send/retrieve data from storage, hiding the ugly queries.

Let’s examine a simple implementation of the DAO pattern.

2.1. User

First, let’s create a basic User domain class:

public class User {
private Long id;
private String userName;
private String firstName;
private String email;

// getters and setters
}

2.2. UserDao

Then, we’ll create the UserDao interface that provides simple CRUD operations for the User domain:

public interface UserDao {
void create(User user);
User read(Long id);
void update(User user);
void delete(String userName);
}

2.3. UserDaoImpl

Last, we’ll create the UserDaoImpl class that implements the UserDao interface:

public class UserDaoImpl implements UserDao {
private final EntityManager entityManager;

@Override
public void create(User user) {
entityManager.persist(user);
}

@Override
public User read(long id) {
return entityManager.find(User.class, id);
}

// ...
}

Here, for simplicity, we’ve used the JPA EntityManager interface to interact with underlying storage and provide a data access mechanism for the User domain.

https://www.baeldung.com/java-dao-vs-repository

DAO can use any library such as Spring JDBC or even framework in order to implement data access logic.

I’ve seen many DAL layers implemented via many DAOs. I’ve seen even one that supports MySQL, Postgress and Hive. It has very good design on Java 1.6. I’ve modernized it to Java 8, when Lambda and default method on interface were added — that is some elements of functional programming was added. So, some design decision that was very good on Java 1.6 could be reshaped. In my latest Python project I’ve implemented DAL layer via many DAOs using SQLAlchemy without ORM. More on this below. It looks like roughly like this:

Lines 131–152 should be organized in single method on some DAO. This will be analogue to public User read(long id) in the Java code above.

Repository

Repository also deals with data and hides queries similar to DAO. However, it sits at a higher level, closer to the business logic of an app.

Repository is an abstraction of a collection of objects.

Repository could be implemented using DAO’s, but you wouldn’t do the opposite.

Repository would be considered closer to the Domain, dealing only in Aggregate Roots.

From Evans DDD:

An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes. Each AGGREGATE has a root and a boundary. The boundary defines what is inside the AGGREGATE. The root is a single, specific ENTITY contained in the AGGREGATE.

And:

The root is the only member of the AGGREGATE that outside objects are allowed to hold references to[.]

This means that aggregate roots are the only objects that can be loaded from a repository.

An example is a model containing a Customer entity and an Address entity. We would never access an Address entity directly from the model as it does not make sense without the context of an associated Customer. So we could say that Customer and Address together form an aggregate and that Customer is an aggregate root.

https://stackoverflow.com/a/1958765/1137529

Repository is generally a narrower interface. It should be simply a collection of objects, with a Get(id), Find(ISpecification), Add(Entity).

A method like Update is appropriate on a DAO, but not a Repository - when using a Repository, changes to entities would usually be tracked by separate UnitOfWork.

Example:

https://examples.javacodegeeks.com/enterprise-java/spring/data/spring-data-jparepository-example/

See full example detail in the link above.

Another example, quote:

…How to add a custom method to Spring Data JPA CrudRepository and MongoDB MongoRepository

1. CrudRepository

1.1 Review a CustomerRepository, we will add a custom method to this repository.

1.2 Create an interface.

1.3 For implementation class, the class name is very strict, you need to follow the “core repository interface + Impl” pattern. Failed to follow this pattern will cause the “Spring property unable to find” error message.

1.4 Update CustomerRepository to extend the new CustomerRepositoryCustomAbc interface.

Done.

2. MongoRepository

2.1 Another example to add a new “update a particular field” method to MongoRepository

2.2 Custom interface.

2.3 Custom implementation.

Done.

https://mkyong.com/spring-data/spring-data-add-custom-method-to-repository/

For reference see https://docs.spring.io/spring-data/jpa/docs/2.6.x/reference/html/#repositories.create-instances

https://docs.spring.io/spring-data/jpa/docs/2.6.x/reference/html/#repositories.custom-implementations

Repository as reinventing of SQL DML

I want to expand a little on line 19 above.

Query query = new Query(Criteria.where(“domain”).is(domain));

SQL DML is short name of Data Manipulation Language which deals with data manipulation and includes most common SQL statements such SELECT, INSERT, UPDATE, DELETE, etc., and it is used to store, modify, retrieve, delete and update data in a database.

Using Repository design pattern is basically reinventing Data Manipulation Language in object oriented way.

I want to illustrate my point with one more example from the same link that I’ve provided above, from
https://blog.marcnuri.com/spring-data-mongodb-custom-repository-implementation Again, you can follow the link to see all the details:

Despite that all provided examples above was in Java (with Spring) these are not Java specific. Here is the same Repository design pattern in Python (with SqlAlchemy):

https://docs.sqlalchemy.org/en/14/core/tutorial.html#conjunctions

You’re using Object Oriented wrapper code both in Python and in Java that will be converted internally in the library\framework to plain SQL. In the last example such pseudo-SQL is printed in lines 12–15.

Personal take on Repository

I’ve used Repository pattern successfully in number of projects. But I did it in somewhat not-standard way. My button line is:

Repository design pattern is good for:

1. Simple CRUD applications.

2. If you have small number of entities.

3. If you typically need to deal with (SELECT, UPDATE, etc) small number of entities.

Otherwise, you have to bring SQL DML to your application, that feels wrong to me.

What I did in practice I’ve used Repository as DAO. So, every entity was Aggregate Root and I did write some data manipulation code that glues different entity's together. If your application fulfills the points above, it is ok to do. Otherwise, the complexity of writing such gluing code is significantly higher that using DAO design pattern.

If your application fulfills the points above you can safely use DAO. You indeed will write some boilerplate code (especially with simple CRUD application) that you can avoid with Repository design pattern (I suspect that this was the main motivation for introducing Repository design pattern at the first place) and this boilerplate code will be scattered through many DAO classes, but this is low price to pay comparing to mastering SQL DML in Object Oriented way. I will expand more on this point in the next section below.

As I’ve said above I prefer to use DAO design pattern, but I have no problem to work with Repository design pattern.

TopLink, Hibernate and JPA

Until JPA 2.1 specification that was released in 2013 I’ve never hear about it or see in practice that somebody is using JPA.

After 2013 I’ve used Hibernate as implementation for JPA specification.

But I did hear about Hibernate before. I’ve read about it theoretically, and I’ve talked with people that had some practical experience with it. In retrospective I’ve found that they were right.

One of the main selling point of Hibernate was that developer

doesn’t need to know SQL in order to write data manipulation code.

Moreover, the Hibernate advice was to model entities as object with some additional Hibernate metadata (back then, it was in XML files, than it was replaced with Java annotations) and make Hibernate generates DB Tables from the entities.

This actually, as far I know, never worked in practice. The DB Schema that Hibernate creates was not optimal. Queries/insertion/updating such DB Schema was extremely slow.

I’ve actually tried to do it with JPA in ~2014 (and with SQLAlchemy in ~2020) and I’ve review the DB Schema. In both cases some essential indexed wasn’t created, for example. In SqlAlchemy case I have very good control on mapping from object’s attribute to DB table’s column. In JPA case some common DB mapping was hard to model with it without thoroughly reading the whole JPA spec (yep, I was forced to read it all)

There are some additional considerations for bidirectional mappings with indexed collections (where one end is represented as a <list> or <map>) when using Hibernate mapping files. If there is a property of the child class that maps to the index column you can use inverse="true" on the collection mapping..

There are three possible approaches to mapping a ternary association. One approach is to use a Map with an association as its index…

A second approach is to remodel the association as an entity class. This is the most common approach. A final alternative is to use composite elements, which will be discussed later.

https://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch07.html#collections-advancedmappings

See also https://hibernate.atlassian.net/browse/HHH-8229 and https://stackoverflow.com/questions/40379807/what-is-the-different-between-manytoone-optional-flag-vs-joincolumn-nullable-f for explicit example why “it is not obvious” how to use Hibernate for attribute mapping.

Let’s go back to plain Hibernate. The main difficulties was the following:

  1. Hard to reason about what attribute mapping is the right one. Eventually, the only reliable way to do it is to see what is generated SQL. But if I’m forced to use SQL anyway, than this layer is unnecessary.
  2. There are many levels of caches. See also https://alex-ber.medium.com/hibernate-recursive-initialization-d738cd01035 You can have many annoying bugs when you’re changing your entitles\DB Tables. You can have some stale copy of the older version of entity somewhere in the cache and it will kick off only on some execution path. Because of Hibernate’s optimizations rerunning the whole application may be not sufficient. You may need also to clean and rebuild all classes. This is pretty annoying. This also happens with JPA.
    Similar thing happen with SqlAlchemy with ORM. It also has cache for all entities. Rerunning Python application will do however the job (I’ve never need to manually delete *.pyc files).
  3. Creating DB Schema from the entities definition has significant performance problems. This is true for Hibernate, JPA and for SQLAlchemy with ORM and for SQLAlchemy without ORM. In the case of SQLAlchemy without ORM it is theoretically possible to create optimal DB Schema (using custom compiler plugin), but the effort is pretty big. Out of the box, it was suboptimal.

JPA

JPA 1.0 was created as merge of Hibernate and TopLink, about 2/3 was taken from Hibernate and about 1/3 from TopLink. After spec for JPA 1.0 both product release new version that was compliant to the JPA spec by default. You may chose to use Hibernate-specific (or TopLink-specific) feature alongside JPA or continue to use directedly as Hibernate (or TopLink), but most project that I’m aware of did switch to JPA eventually. I’ve worked in many project that has Hibernate as implementation for JPA spec.

As I’ve said above in ~2014 I’ve worked with JPA. In order to be able to do it, I was forced to read the whole JPA spec. I have choose another mode —

I’ve writing SQL DDL to create DB Schema and then I’ve used annotation for mapping entities’ attributes for table’s column.

So, I didn’t skip the step of designing DB Schema, I did it in old school way with normalization, indexing, etc.

With this approach I didn’t have performance problem, at least with small-to-average amount of data.

I’ve used this approach successfully both with JPA 2.1 and with SQLAlchemy without ORM. This works very well.

Note: I want to emphasize few things:

  1. While most of the examples are from Java (and some from Python), the same problem exists also in .Net and in any other programming language.
  2. JPA was historically developed by Sun as part of Java EE. JPA was Java Persistence API. Hibernate was de facto reference implementation. After Oracle has acquired Sun, because of the complexity of the JPA, Oracle outsourced it and now it is known as Jakarta Persistence. The reference implementation for JPA is EclipseLink.

Eventually, I see very little benefit for having this extra layer. It is far easier for me to skip it and use DAO for handling object-relational impedance mismatch.

ORDBMS: object–relational database management system

ORDBMS or ORD (object–relational database) is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, just as with pure relational systems, it supports extension of the data model with custom data types (more on this in a while) and methods.

An object–relational database can be said to provide a middle ground between relational databases and object-oriented databases. In object–relational databases, the approach is essentially that of relational databases: the data resides in the database and is manipulated collectively with queries in a query language; at the other extreme are OODBMSes in which the database is essentially a persistent object store for software written in an object-oriented programming language, with a programming API for storing and retrieving objects, and little or no specific support for querying.

The basic need of object–relational database arises from the fact that both Relational and Object database have their individual advantages and drawbacks. The isomorphism of the relational database system with a mathematical relation allows it to exploit many useful techniques and theorems from set theory. But these types of databases are not optimal for certain kinds of applications. An object oriented database model allows containers like sets and lists, arbitrary user-defined datatypes as well as nested objects. This brings commonality between the application type systems and database type systems which removes any issue of impedance mismatch. But object databases, unlike relational do not provide any mathematical base for their deep analysis.

Object–relational database management systems grew out of research that occurred in the early 1990s. That research extended existing relational database concepts by adding object concepts. The researchers aimed to retain a declarative query-language based on relational algebra as a central component of the architecture. Probably the most notable research project, Postgres (UC Berkeley), spawned two products tracing their lineage to that research: Illustra and PostgreSQL. By the next decade, PostgreSQL had become a commercially viable database, and is the basis for several current products that maintain its ORDBMS features.

Many of the ideas of early object–relational database efforts have largely become incorporated into SQL-1999 via structured types. In fact, any product that adheres to the object-oriented aspects of SQL-1999 could be described as an object–relational database management product. We will take a look on SQL-1999 below.

NoSQL

Obviously, I wasn’t the only one that see that having ORM Framework to solve object-relational impedance mismatch doesn’t work very well. At very least, it is time-consuming and hard to debug last. In order to master it, you need only to learn the new tool — Hibernate,JPA, SQLAlchemy, etc, but you should also understand the underlying generated SQL and caching.

Another problem with tradition SQL -based RDBMS was explosion of data — the size and number of available data sets have grown rapidly. More about this see NoSQL, MongoDB, HiveQL

SQL:2016

SQL:2016 introduced 44 new optional features. 22 of them belong to the JSON functionality, ten more are related to polymorphic table functions. The additions to the standard include:

  • JSON: Functions to create JSON documents, to access parts of JSON documents and to check whether a string contains valid JSON data
  • Row Pattern Recognition: Matching a sequence of rows against a regular expression pattern
  • Date and time formatting and parsing
  • LISTAGG: A function to transform values from a group of rows into a delimited string
  • Polymorphic table functions: table functions without predefined return type
  • New data type DECFLOAT

SQL:1999

The SQL:1999 standard calls for a Boolean type, but many commercial SQL servers (Oracle Database, IBM DB2) do not support it as a column type, variable type or allow it in the results set. Every 1–8 bit fields occupies one full byte of space on disk. MySQL interprets “BOOLEAN” as a synonym for TINYINT (8-bit signed integer). PostgreSQL provides a standard conforming Boolean type.

Structured user-defined types

These are the backbone of the object–relational database extension in SQL:1999. They are analogous to classes in objected-oriented programming languages. SQL:1999 allows only single inheritance.

The SQL:1999 standard introduced a number of object–relational database features into SQL, chiefly among them structured user-defined types, usually called just structured types. These can be defined either in plain SQL with CREATE TYPE but also in Java via SQL/JRT. SQL structured types allow single inheritance.

Structured types are supported to varying degrees in Oracle database, IBM DB2, PostgreSQL and Microsoft SQL Server, although the latter only allows structured types defined in CLR.

SQL examples
Object structured type

In order to define a custom structure type using Oracle database one could use statements such as these:

CREATE TYPE Person_Type AS OBJECT (
person_title VARCHAR2(10),
person_first_name VARCHAR2(20),
person_last_name VARCHAR2(20),
)
NOT FINAL;

Such structure type can be then used to create a table that would also hold all columns defined in Person_Type:

CREATE TABLE Person_Table OF Person_Type;
Custom structure types support inheritance, which means that one can create another type that inherits from previous. NOT FINAL statement must be however included in a base structure type definition in order to allow for creation of any other subtypes.

CREATE TYPE Student_Type UNDER Person_Type (
matriculation_number NUMBER(10)
);
Student_Type then could be used in order to create a Student_Table which will include all columns defined in Person_Type as well. Primary Key and Constraints should be defined during or after creation of table and cannot be defined inside structure type itself.

CREATE TABLE Student_Table OF Student_Type (
matriculation_number PRIMARY KEY,
CONSTRAINT person_title_not_null_constraint NOT NULL (person_title),
);
Each custom structure type can also contain other types in order to support more complex structures:

CREATE TYPE Address_Type AS OBJECT (
address_street VARCHAR2(30),
address_city VARCHAR2(30),
);
CREATE TYPE University AS OBJECT (
university_name VARCHAR2(30),
university_address Address_Type
);

SQL after NoSQL

The current ISO SQL standard doesn’t mention the relational model or use relational terms or concepts. SQL deviates from the relational model in several places.

Note that few database servers implement the entire SQL standard and in particular do not allow some of these deviations. Whereas NULL is ubiquitous, for example, allowing duplicate column names within a table or anonymous columns is uncommon.

* Duplicate rows. The same row can appear more than once in an SQL table. The same tuple cannot appear more than once in a relation.
* Anonymous columns. A column in an SQL table can be unnamed and thus unable to be referenced in expressions. The relational model requires every attribute to be named and referenceable.
*
Duplicate column names. Two or more columns of the same SQL table can have the same name and therefore cannot be referenced, on account of the obvious ambiguity. The relational model requires every attribute to be referenceable.
*
Column order significance. The order of columns in an SQL table is defined and significant, one consequence being that SQL’s implementations of Cartesian product and union are both noncommutative. The relational model requires there to be no significance to any ordering of the attributes of a relation.
* Views without CHECK OPTION. Updates to a view defined without CHECK OPTION can be accepted but the resulting update to the database does not necessarily have the expressed effect on its target. For example, an invocation of INSERT can be accepted but the inserted rows might not all appear in the view, or an invocation of UPDATE can result in rows disappearing from the view. The relational model requires updates to a view to have the same effect as if the view were a base relvar.
*
Columnless tables unrecognized. SQL requires every table to have at least one column, but there are two relations of degree zero (of cardinality one and zero) and they are needed to represent extensions of predicates that contain no free variables.
* NULL. This special mark can appear instead of a value wherever a value can appear in SQL, in particular in place of a column value in some row. The deviation from the relational model arises from the fact that the implementation of this ad hoc concept in SQL involves the use of three-valued logic, under which the comparison of NULL with itself does not yield true but instead yields the third truth value, unknown; similarly the comparison NULL with something other than itself does not yield false but instead yields unknown. It is because of this behavior in comparisons that NULL is described as a mark rather than a value…

Array part of SQL standard SQL-1999: added nonscalar types (arrays)

JSON part of SQL standard SQL:2016: adds JSON: Functions to create JSON documents, to access parts of JSON documents and to check whether a string contains valid JSON data. All of these are optional features.

https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/
https://clarkdave.net/2013/06/what-can-you-do-with-postgresql-and-json/
https://www.postgresql.org/docs/release/9.4.0/
https://en.wikipedia.org/wiki/PostgreSQL (search for JSON)

In January 2016, with the PostgreSQL 9.5 release was the first FOSS OODBMS to offer an efficient JSON internal datatype (JSONB) with a complete set of functions and operations, for all basic relational and non-relational manipulations.https://en.wikipedia.org/wiki/Object_database

  • 2006 XML as CLOB in Oracle
  • 2012 JSON as CLOB in Postgress

Convergence of NoSQL and SQL

As you can see, after drastically divergence with SQL and NoSQL and sometimes development in parallel in two difference branches, in 2012 some initial support for JSON was added to Postgress (and another RDBMS approximately at the same time) and than JSON support was added to SQL-2016 standard and it was reimplemented in Postgress and any other RDBMS.

On another hand, many lacking features of NoSQL was added in newest version, the canonical example being HIVEQL. When HIVE was started it was deliberately chosen not to be SQL-92 compliant, but practice push them to implement it eventually.

It was very long journey, part of it I’ve extracted as separate stories, so you have option to skip them. Now, I want to give you the bird you of what you have read:

  • Tradition SQL requires some mapping to the objects used in Python/Java. It is better to be done in DAL (for reuse in another applications).
  • DAL layer can be implemented as DAO or Repository. My personal choice is DAO.
  • DAL layer can also be optionally implemented as ORM, but I prefer not to use it.
  • You can use NoSQL DB altogether, for example HIVEQL or MongoDB. You application code shouldn’t care, just use DAL as façade.
  • You can use Postgress as SQL, but still use some of the object types, such as arrays and JSON and this totally fine. It is “violate” spirits of early SQL standard being not part of relation model, but latest SQL standard abolish any mentioning of relation model.
  • Trying to use ORM as division of labour has failed (IMHO). Deepening, new knowledge of NoSQL and going back to basics by abolishing relation model on one hand and going to basic DAO layer (Repository is also ok) by writing some manual code that nevertheless is easy to maintain was the answer.

--

--