Data Abstraction Layers

And Why They’re Better than ORMs

Greg Prisament
5 min readJun 26, 2015

In this article I discuss the Siren’s call of the ORM and propose an alternative, which is to write your own Data Abstraction Layer (DAL).

Many software programmers, especially newer ones, tend to never code database queries directly. Instead they use an ORM, or Object-Relational Mapping, for database access.

An ORM maps database rows to objects in a programming language. This is very convenient for developers because it simplifies interactions with persistent data. An ORM makes DB interactions feel natural and familiar.

The Problem: ORMs make databases access too easy, encouraging direct database access in situations when it should be abstracted.

The argument for ORMs goes like this: In todays fast-paced world of rapid prototyping and agile development, why spend the time writing all this SQL code (in Python):

sql = """INSERT INTO USERS(FIRST_NAME,
LAST_NAME, AGE, SEX)
VALUES ('Philip', 'Fry', 29, 'M') """
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()

When you can write something like this (using the Django ORM):

u = User(first_name="Philip", last_name="fry", age=29, sex="M")
u.save()

After all, the second version is much cleaner, easier to code up and read, and better conveys the code’s intent.

While true, I’ve seen from experience that the convenience of an ORM can lead to abuse, especially in larger projects.

The problem: ORMs make database access too easy, encouraging direct database access in situations when it should be abstracted. For rapid prototyping this may not be a problem, but it can be a real danger for larger software projects, especially once data denormalization enters the picture (which I’ll come back to later).

I experienced this danger firsthand on a project that used the Django ORM. The convenience of Django led the team’s frontend developers to make a practice of accessing the database directly in the application’s “view” code. This practice made it difficult to later make schema changes. Every place a table was accessed had to change. Soon hard-to-trackdown bugs where cropping up as data hazards and consistency issues got unintentionally introduced.

Had the team been forced to write SQL statements manually, I believe there would have been a more positive outcome.

Had the team been forced to write SQL statements manually, I believe there would have been a more positive outcome. The burden of writing SQL would have encouraged the team to write utility routines for database access earlier in the development cycle. These utility routines could have then been modified when the data model changes where introduced, with minimal effort and disruption.

I’ve found that by taking this concept of DB utility routines a step further and designing them in from the start of a project, we end up with highly maintainable and robust code for accessing persistent data, whether SQL, NoSQL or a combination are used.

Data Abstraction Layers

Instead of using an ORM, I’ve found a better solution is to create what I call a Data Abstraction Layer or DAL. This is something you code yourself, thinking carefully about the interface to your system’s persistent data.

For example (something like):

class DAL:
def CreateUser(first, last, age, sex):
sql = “””INSERT INTO USERS(FIRST_NAME, LAST_NAME, AGE, SEX)
VALUES (%s, %s, %d, %s) “””
try:
cursor.execute(sql, (first, last, age, sex))
db.commit()
return true;
except:
db.rollback()
return false;

Your project’s other software modules (such as view code) can now easily create a user by calling:

dal = DAL()
dal.CreateUser("Philip", "Fry", 29, "M")

In this example the abstraction provided by the DAL is similar to the abstraction provided by an ORM, but the implementation is now something that we have full control over. In particular, if the underlying data model changes or even if we change to a completely different database (such as MongoDB) the rest of the software stack can still call “dal.CreateUser”.

I’d like to point out the main difference between an ORM and a DAL:

  • An ORM provides a direct mapping to the underlying tables in the database.
  • A DAL provides an abstracted mapping to the underlying data in the database.

In particular, the DAL provides a normalized view of your persistent data, regardless of whether the underlying data is normalized or denormalized. The other software modules in your project shouldn’t know (or care) if it takes 1 query or 5 queries to update a user. They should just be able to call “dal.UpdateUser(age=30)” and expect the DAL implementation to do the right thing.

Normalized and Denormalized Data

As a refresher, data in a database can be “normalized” or “denormalized”. Normalized essentialy means that data is not repeated. Denormalized means that data is repeated in multiple places.

For example:

NORMALIZED DATA MODELTABLE "users"
ID FIRST_NAME LAST_NAME AGE SEX
------------------------------------
0 "Philip" "Fry" 29 "M"
1 "Turanga" "Leela" 39 "F"
2 "Bender" "Rodriguez" 1 "O"
TABLE "teams"
ID NAME
------------------------------------
0 "The Humans"
TABLE "team_members"
TEAM_ID USER_ID
------------------------------------
0 0
0 1

Whereas if the data is denormalized:

DENORMALIZED DATA MODELTABLE "users"
ID FIRST_NAME LAST_NAME AGE SEX
------------------------------------
0 "Philip" "Fry" 29 "M"
1 "Turanga" "Leela" 39 "F"
2 "Bender" "Rodriguez" 1 "O"
TABLE "teams"
ID NAME
------------------------------------
0 "The Humans"
TABLE "team_members"
TEAM_ID TEAM_NAME USER_ID USER_FIRST USER_LAST
---------------------------------------------------
0 "The Humans" 0 "Philip" "Fry"
0 "The Humans" 1 "Turanga" "Leela"

In general, normalized data is less prone to bugs and data hazards (data inconsistency). On the other hand, denormalized data is typically more efficient to read. In the above example, it is faster to list the names of all team members using the denormalized data model because no joins are necessary. It is increasingly common, particularly among large-scale web services, for data to be denormalized for maximum performance. The challenge is then ensuring consistency across tables when data is modified.

DAL (Normalized) → DB (Denormalized)

A DAL helps you to achieve the best of both worlds. The DAL provides a normalized view of the data to other software components, preventing them from making “mistakes” in the form of inconsistent data updates. However, the underlying data can be denormalized for faster access because the DAL implementation ensures consistency.

Pseudocode for updating a user’s name in the denormalized example looks like this:

class DALUser:
def UpdateName(first, last):
# 1) Update firstname & lastname in "users" table
# 2) Lookup all teams that have this User as a member
# For each such team:
# 3) Update firstname & lastname in "team_members" tbl

An ORM simply does not provide this type of abstraction.

Roles and Advantages of a DAL

In a real project the DAL code is much more complex than in the example above. Typically, the DAL has the following responsibilities:

  • Abstract all database access.
  • Provide a normalized view of the data to other software modules.
  • Perform input validation before inserting any data.
  • Ensure data consistency in cases when multiple tables need to be updated at once (for example, when data has been denormalized for performance reasons).

To summarize the advantages of implementing a DAL:

  • Puts all DB access code in one place, so that it is easy to find and modify.
  • Helps you avoid data hazards and inconsistency issues, particularly if your data is denormalized.
  • Makes it easier to switch databases or even support multiple backends.

Using Both an ORM and a DAL

You can certainly use both and ORM and a DAL. In this case, the DAL should call the ORM instead of making SQL calls directly.

class DAL:
def CreateUser(first, last, age, sex):
u = User(first_name=first, last_name=last, age=age, sex=sex)
u.save()

In this case, the ORM is an implementation detail of the DAL. Your other software modules should always go through the DAL.

Real World Example of a DAL

My current venture Canopy Services (http://canopy.link) is developing an open source cloud engine for IoT. The server-side code is written in Golang and uses the Cassandra NoSQL DB. The DAL is still a work in progress, but take a look to see a real-world example: https://github.com/canopy-project/canopy-server/tree/master/src/canopy/datalayer

--

--

Greg Prisament

Software expert and entrepreneur. Currently working on Canopy: the open cloud for IoT. @CanopyIoT http://canopy.link