How to choose a data model?

Jhansi Karee
4 min readFeb 4, 2020

--

Data models, here I am referring to data storage types. There are majorly three different data models:

  1. Relational
  2. Document or No-Sql
  3. Graph

As part of this blog, we will understand each of these data models and which to use when? It is important to understand this because each of the data models assumes on how it is going to be used. If a data model is used for something it is not meant for, we might fail miserably.

This blog might give some pointers if you are in the process of choosing a right data store for your application.

1. Relational:

Relational is a data model where data is organised into Relations called Tables. Data is stored in a structured form, in rows and columns.

When are Relational data models preferred:

  1. ACID-compliant: Major Relational data models are ACID compliant. Hence many systems like Banks where ACID compliance can’t be compromised go for Relational databases.
  2. Joins: Joins are very well supported in Relational data models. Data can be easily joined from multiple tables. Hence it works for data involved with one-to-many relationships. To an acceptable way even for many-many relationships.

This comes with a cost:

  1. Pre-defined schema: Schema is pre-defined in Relational data models. It can’t handle data with dynamic schema.
  2. Limited Scalability: Traditionally Relational data models are scalable vertically. This scaling can go only to a certain point and it becomes really expensive to scale them vertically. However, Relational data models can still be scaled horizontally, we will discuss this in my coming blog.

This means you are good to choose a relational data store, if you can’t compromise with the ACID properties. Along with it, if you are okay with the limited scalability and the schema restrictiveness.

Examples: Oracle, Mysql, Postgresql..

2. Document or No-Sql:

No-Sql is a data model which was introduced, because of the frustration with the restrictive schema of SQL. Along with it, the need for higher scalability and throughput. No-Sql was originally referred to as Not SQL, but now they are called as Not only Sql.

When are No-Sql data models preferred:

  1. Highly scalable: No-Sql data models are designed to be horizontally scalable. Hence, they are highly scalable. This makes it easy to store huge volumes of data.
  2. Schema flexibility: When there is no control over the incoming data, there is a need for a data model that supports dynamic schema. Hence, No-Sql data models are preferred for this kind of use case. Although there is no schema defined, clients who are reading the data would expect it to be of a certain format. Hence, they are called as schema-on-read.

But, this comes at a cost:

  1. Eventual consistency: Out of consistency and availability, No-Sql data models focus on the availability aspect than consistency. This means they are okay to return the stale data than not being available at all. But eventually, they promise to be consistent with the latest data.
  2. Support for Joins is weak: Most of the No-Sql data models don’t support joins. This means if there is a need to join, application in-turn might have to handle it instead of a data model.

This means you are good to choose a No-Sql data model if you are looking for a highly scalable data model. Along with it, if you are okay to be eventually consistent and don’t see much need for a data model to do joins.

Examples: Cassandra, MongoDB..

3. Graph:

Graph is a data model where data is interconnected. When data is highly connected with many-many relationships, it is natural to use Graph.

When are Graph data models preferred:

Many-to-Many relationships: Systems where the data is highly interconnected like social media. This data model is highly preferred.

Examples: Neo4j

There are different data models preferred in different use cases. But, what if we need the best of all these data models. Confused? let me give more details:

Polygot persistence:

Let’s take an example of an e-commerce site where there is a huge number of reviews for the products (which is ever-increasing). At the same time, there are financial transactions where consistency can’t be compromised.

We need the best of Relational to be ACID compliant and the best of No-Sql to be Highly scalable. Which data store to choose now? Why do we need to restrict to one? Why can’t we use both? We could store data of an application in multiple data stores like below:

  1. No-Sql for Reviews: No-Sql for the data set which is huge and not highly related.
  2. Relational for transactions: Relational Data model for the data where ACID compliance can’t be compromised.

Conclusion:

We have discussed different data models and which is preferred when and the cost involved in them. Hopefully, this has given you some pointers on how to choose a data model for your application.

--

--