A Beginner’s Guide to Data Persistence and Object-Relational Impedance Mismatch

Chetansharma
8 min readMay 29, 2024

--

Data Persistence:

In computer science terminology, we define the phenomenon of making a modification permanent to the storage system as Data Persistence.

Implementing data persistence is one of the most critical challenges of an Enterprise application. In any domain, the greatest concern for the enterprise application is data, which is enormous in size and critical in nature. So implementing a proper data persistence strategy will allow the application to be scalable, maintainable, and efficient.

Components Of Data Persistence:

The persistence phenomenon consists of three core components: Data, Medium, and Storage.

The data that needs to be persisted can be:

  • Raw data: which is collected from a file or any other source in the form of bytes.
  • Java object: which is the data contained in the object of a Java class.

Data can be persisted using RAM or secondary storage device like a hard-drive. Further, logical storage devices like DB or Files are also be used for storing the data.

To persist the data, Java provides mediums like:

  • I/O Streams and Serialization
  • JDBC
  • ORM Frameworks like Hibernate

Java I/O

The Java Input-Output(I/O) API provides classes for performing input and output operations on raw data.

  • These classes are available in the java.io package.
  • Java I/O API is built on four abstract classes. This depends upon the type of data it can handle (byte/character).
  • InputStream and OutputStream: deals with bytes.
  • Reader and Writer: deals with character.

Serialization

Serialization helps in sending Java objects through the network and this also can be used to store these Java objects in a file.

  • An object can be marked serializable by implementing the java.io.Serializable interface.
  • Serializable objects can be converted into a stream of bytes.
  • This stream of bytes can be written into a file.
  • These bytes can be read back to re-create the object.
  • Deserialization is the process of retrieving an object from the byte streams.

Java I/O And Serialization — Shortcomings

Java I/O APIs pretty much covers all the functionalities as a data persistence medium.

  • But working with the File system is very difficult and inefficient in handling large and complex data.
  • And using Java I/O also need lower-level details of the data to be retrieved, stored, or manipulated.

Serialization too has its own disadvantages:

  • Since storing and retrieval of the entire object graph is done at once, it is not a suitable approach while working with a large amount of data.
  • Concurrent access is not possible.
  • It provides no query capabilities.
  • The data cannot be retrieved without de-serialization.

JDBC API

JDBC or ‘Java Database Connectivity’ is a Java Core API for performing database interaction.

  • Using JDBC API, a Java application can access a variety of databases such as MySQL, Oracle, etc.
  • JDBC follows a relational database-oriented approach to work with the data using SQL queries.

The problem with Serialization is solved by JDBC, but it does not store the Java objects directly. The data from the objects need to be converted into a SQL query and then executed, for persistence.

  • SQL code has to be embedded within Java Programs which makes it non-portable.
  • JDBC API allows the developer to fire the SQL queries from the Java code. This means the developer needs to know the specific SQL constructs for the Relational Database Management System (RDBMS) used.
  • Also, it is the responsibility of the programmer to make sure that the data model and the object model are synchronized properly.

Due to this JDBC API is not a maintainable solution for enterprise applications.

Object-Relational Impedance Mismatch:

JDBC, I/O, Serialization do not solve the problem of data persistence effectively. For a medium to be effective, it needs to take care of the fundamental difference in the way Object-Oriented Programs(OOP) and RDBMS deals with the data.

  • In Programming languages like Java, the related information or the data will be persisted in the form of hierarchical and interrelated objects.
  • In the relational database, the data is persisted as table format or relations.

The greatest challenge in integrating the concepts of RDBMS and OOP is a mapping of the Java objects to databases. When object and relational paradigms work with each other, a lot of technical and conceptual difficulties arise, as mapping of an object to a table may not be possible in all the contexts.

Storing and retrieving Java objects using a Relational database exposes a paradigm mismatch called “Object-Relational Impedance Mismatch”. These differences are because of perception, style, and patterns involved in both the paradigms that lead to the following paradigm mismatches:

  • Granularity: Mismatch between the number of classes in the object model and the number of tables in the relational model.
  • Inheritance or Subtype: Inheritance is an object-oriented paradigm that is not available in RDBMS.
  • Associations: In object-oriented programming, the association is represented using reference variables, whereas, in the relational model foreign keys are used for associating two tables.
  • Identity: In Java, object equality is determined by the “==” operator or “equals()” method, whereas in RDBMS, uses the primary key to uniquely identify the records.
  • Data Navigation: In Java, the dot(.) operator is used to travel through the object network, whereas, in RDBMS join operation is used to move between related records.

Object-Relational Impedance Mismatch — Problem Of Granularity

Consider the object model of Customer and Address as depicted below. Each Customer has an Address as shown below.

In the database, the Customer details can be represented as a single Customer table as shown below

Number of tables = 1

In the object model, there are two Java classes — Customer and Address. However, the data of the classes are being pushed into only one table(Customer) of the database.

The Granularity problem comes when the number of classes mapping to the number of tables in the database do not match.

Object-Relational Impedance Mismatch — Problem Of Subtype

In the Object-Oriented paradigm, the parent-child or base-derived class relationships are implemented using inheritance. Consider the object model for the types of Customers in a retail application. The Customer can be a Corporate or Retail Customer.

The object model is represented with inheritance as shown below.

In the data model, since inheritance is not possible we have to create two tables even though columns are getting repeated.

SQL does not support the concept of super-tables and sub-tables. The databases do not allow to create a table that inherits certain columns from its parent.

The Inheritance or Subtype paradigm mismatch occurs because inheritance is not defined explicitly in any standardized RDBMS. Most of the RDBMS does not define anything similar to inheritance.

Object-Relational Impedance Mismatch — Problem Of Association

Association relationship in Object-Oriented languages like Java is called has-a relationship (a reference of one class is created as an instance variable in the other). For example, a Customer has-a Address.

Let us re-look at the object model of Customer and Address. Each Customer has an Address as shown below.

It is possible that a Customer can have multiple Addresses. To represent this in Java, Collection APIs like Array or List can be used to represent multiple objects.

addressId is Primary Key in the Address table

In the relational model, the association between tables is represented using the primary key and foreign keys.

In the case of a customer having multiple addresses, the relational model supports one-to-one, one-to-many, and many-to-many relationship mappings.

The Association paradigm mismatch exists because Java represents associations (has-a relationship) using object references and in RDBMS association is by a foreign key column.

Object-Relational Impedance Mismatch — Problem Of Identity

In Object-Oriented terminologies, Identity is a feature that determines the equality of two comparable units.

Consider the object model of Customer depicted below:

Java provides the following methods to check if two customers (c1 and c2) are the same.

  • The equals() method for checking the equality of two objects. Based on the attribute of the object, the equals() method can be overridden.
  • The == operator checks if two object references are denoting the same instance.

However, SQL gives exactly one notion of ‘sameness’: the primary key. The equality of two rows of a table is determined by checking the primary key value. The customers are identified by primary key values (1001 and 1002). Two customers with the same customerId are treated as equal in SQL.

CustomerId is the primary key

The Identity paradigm mismatch occurs because Java defines similarity using == and equals() whereas RDBMS uses a primary key.

Data Navigation refers to the procedure of traversing through the Java object network.

Let us take the same example of an object model of Customer and Address. If we need to retrieve the zipCode of a customer, which is available in Address, in Java, data is accessed by navigating through the object network. For example: customer.getAddress().getZipCode();

addressId is Primary Key in the Address table

RDBMS uses SQL JOINS to navigate from one database table to another. To retrieve the zipCode of the customer Rick, the information is available in two tables- CUSTOMER and ADDRESS, a join operation has to be done as below:

The Data Navigation paradigm mismatch occurs due to the dissimilarities in the ways we access the data in Java using objects and in an RDBMS.

--

--

Chetansharma

Experienced in Java, Spring, Spring Boot, MySQL, PostgreSQL, MongoDB, KNIME, Apache Superset, and R&D also data management, and analytics.