Designing the Data Layer in an Application

Nitin Khaitan
Towards Polyglot Architecture
6 min readAug 27, 2022

--

The data layer is a key layer in the overall architecture. It comprises of Database layer, Database abstraction layer and Data access layer. Its design is key to the success of an application from data scalability and security perspective. By-Nitin Khaitan
Design Database layer, Database abstraction layer and Data access layer

In a product built on microservice architecture, each microservice talks to one or multiple data sources and produces and consumes data as required. Different services require different data sources to solve other business problems for which that specific data source might be good.

The data layer, a crucial component in a microservice architecture, communicates with the desired data source. In this article, I will explore the perspectives we should consider when designing this pivotal data layer for an application or service.

Database layer

Known as the database tier

This layer is the database tier, where the actual data resides. Every data source can have one or more enslavers and enslaved people.

Application/services should write data to the master, and replication should happen to the slave via replication logs to maintain consistency.

We should serve read load from the read database wherever possible.

Store type

Different data sources deal with data at the persistent or non-persistent stores per the underlying architecture of the data source/database.

We will keep the data in the persistent store, which will be available for future reference and serve as a source of truth for most cases. However, we use non-persistent stores to serve performance-related needs in most cases.

Hosting

The data source hosting strategy can be on-premises, in the cloud, or hybrid.

We can also take a different approach depending on the environment. For example, the dev and test databases can be on-premises, while the staging and production load can be in the cloud.

Security architecture and considerations should be well planned, as hackers look for any possible loophole to access or tamper data.

Security

From a data security point of view, we should always enable data encryption while at rest at the data source level. This induces some processing overhead, but it must be done from a security and compliance perspective.

We should also ensure that any application service or client tool that accesses data should connect using an SSL certificate. This ensures data in motion is encrypted as well.

From a deployment perspective, it should reside in a private subnet and have restricted access to the individuals/tools who can access it.

Data Authorisation

Data authorisation should be denied by default. Users should be associated with groups, and based on the authorisation provided for the group they are associated with, they should be able to access data.

The authorisation rule should govern what data he can:

  • view/read
  • write
  • alter

Users with the proper privileges should be allowed to access data.

Database architecture

Based on the usage pattern and the load on the microservice, the database architecture needs and requirements change. On a high level, we can think of it from the perspective:

  • Master slave
  • Multi-master
  • Data transfer through data pipelines
  • Data replication strategy
  • Partition strategy
  • Serving specific needs from cache/graph/search or document database to achieve performance.

Supports querying on data

All the data sources provide their querying language to fetch or store data in the physical store. i.e., SQL

We should have constraints in the code to ensure that the service layer should not query beyond a limit to ensure we are not clogging the database.

We should have a mechanism in place to ensure that the unused indexes are removed and new indexes are added according to the changes in the query.

Data source examples

A few example data sources are:

  • MySql
  • MongoDB
  • Redis
  • Cassandra
  • Oracle
  • Elastic Search
  • Neo4j
  • And many more; we have more than 100 databases that exist…

Database abstraction layer (DBAL)

Abstract database interaction complexity

This is part of the application layer that unifies communication between the application layer and the database.

It provides easy-to-use interfaces for the programmers to interact with the actual data source.

These well-defined interfaces hide the complexity of the underlying connection and pooling logic and requirements from the end user.

Database connectivity

This layer is responsible for establishing connectivity with the database layer when the application starts and closing it when the application layer shuts down.

The database connectivity interface manages attributes like connection pool, SSL, and validity.

This area must be handled differently with serverless applications like lambda functions.

Security

An SSL certificate should always encrypt the data from the data store to the service consuming it. This will ensure data security in motion. This should always be enabled whenever data travels on the public network or is interchanged between the services.

Connection Strategy

A data source is hosted on an IP address. We should not use public IPs. Instead, we should always use a DNS name instead of an IP address to connect to the database. This will allow us to change the data source if required.

Read Write Strategy

In the database calls, we can strategise and delegate control and decision-making regarding which call should go to the write/master data instance and which should go to the read instance. This can be used to control the number of invocations going to the write instance and thus will help us keep the write operations performance optimal from the end user perspective.

Databases supported

This layer supports various data sources. The required library, as per the programming language, must be imported into the application layer for every data source. Once imported, we must configure the data source interaction via the provided interface.

A few example data sources concerning the type that are supported:

  • Relational database: MySql, SQL Server, Db2, PostgreSQL, Oracle, MariaDB, AWS Aurora, AWS RDS
  • NoSql database: MongoDB, Apache Cassandra, Couchbase, etc
  • Key value database: Redis, etc
  • Comprehensive column store: Scylla, HBase, Cassandra, etc
  • Columnar database: Amazon DynamoDB, Apache Cassandra, etc
  • Search database: Solr, Splunk, Exasticsearch, etc
  • Graph database: Neo4j, DataStax Enterprise Graph, etc

Data access layer

Database interaction

This layer is responsible for interacting with the database to perform read and write operations via the interfaces exposed. Different libraries in different programming languages exist to support this interaction.

This layer should be singleton per database by design and injected into the code via dependency injection.

Transaction Management

Though transaction management happens at the service/business layer, this layer is also responsible for supporting transactional consistency around multiple database updates or fetch in case they need to be considered part of one transaction.

With the growth in the business, transaction volume increases, leading to a high concurrency system. If the system is not designed to manage this, various reading and writing problems might arise.

Object-relational mapping (ORM)

It is a mechanism via which we create objects concerning database tables so that the application layer always talks about things and their attributes instead of tables and columns. The layer abstracts the table and column name from the application layer.

A few commonly used ORMs in different programming languages are:

Java: i.e. — Hibernate, iBatis, etc

Node: i.e. — Sequeize, Prisma, etc

Python: i.e., SqlAlchemy, etc

.NET: i.e. — LINQ, etc

PHP: i.e., Doctrine, etc

No business logic

This layer should not have business logic-related stuff in place. It should be clean and work with entities to interact with the database layer.

However, we can have database layer-related transformation happen in this layer.

Thinking from the tenancy perspective

Single tenancy

This is an architecture where one application and its respective data sources support one client.

A few critical pointers associated with this are:

  • Security and Isolation: The entire stake is isolated per client, which is more secure. Every application can follow its life cycle.
  • Deployment: A release impacts one client only. Suitable for non-SAAS applications.
  • High cost: As this is associated with one client only, its resource and maintenance costs are high.

Multi-tenancy

This is an architecture upon which any SAAS application is designed.

This is an architecture where one application deployment supports different data sources, varying per client.

A few key benefits associated with this are:

  • Data security: Every client's data resides in a separate data source. Clients can keep their data sources on-premise or in a cloud data centre.
  • Deployment: As the single application layer supports multiple clients, a single deployment at the application level will propagate new changes for all the clients.
  • Low cost: All the clients share the same computing and memory concerning the application layer deployment. Its resource and maintenance costs are low.

Conclusion

The data layer is critical in the overall architecture. It comprises the Database layer, the Database abstraction layer, and the Data access layer. Its design is crucial to an application's success from a data scalability and security perspective.

--

--

Nitin Khaitan
Towards Polyglot Architecture

Strategic thinker, a technically astute developer/architect with 15+ years of experience owning engineering, backend, and frontend from infancy to success