Twelve-factor and data layer

Published in

Towards Polyglot Architecture

12 min readOct 31, 2022

Twelve-factor is a methodology for building software as a service. It helps achieve clean code, maximum portability, high automation, minimum divergence between environments, continuous deployment, and scale support without much change in tooling, architecture, and development practices.

An application's data layer comprises the code's data layer and the database it interacts with.

In the article, I would like to showcase how we can apply different stages of the twelve-factor model to the data layer or database of an application.

1. Codebase

One codebase is tracked in revision control, and there are many deploys.

Twelve-factor: same codebase for all environment — Twelve-Factor: same codebase for all environment

Codebase means the code residing in the repository. We should be able to deploy to different environments using the same codebase, though the version might vary from environment to environment.

A few rules that make a repo twelve-factor codebase compliant are:

one repo for one app or microservice
If one repo must be shared across multiple apps or microservices, we should treat them as a library and include them using a dependency manager.

We can make the data layer twelve-factor compliant for the codebase by:

The microservice's data definition language (DDL) and config data should be in the code.
Its deployment on all different environments should use the same script we wrote for the microservice.
For deployment, we can use tools like Liquibase, Flyway or Mongoose.
this script should execute as part of the CD/CI pipeline while we deploy on different environments

2. Dependencies

Explicitly declare and isolate dependencies

twelve-factor: manage dependencies — Twelve-Factor: manage dependencies

Dependencies mean the app or the microservice relies upon some library, tool, or external service for smooth working.

A few rules that make a repo twelve-factor compliant for dependency are:

We should not rely upon the implicit existence of dependency.
It should use dependency isolation tools to ensure no implicit dependency leaks from the surrounding systems.

We can make the data layer twelve-factor compliant for the dependencies by:

We should keep database layer dependencies in artifactory servers like JFrog, Nexus, S3, or NPM as per the programming language and architecture requirement.
The application stack should resolve dependency from the organisation’s artifactory server and not from any public repo.
The private artifactory source should not be exposed in the public domain.
Joint or shared code should be published to an artifactory tagged with a version, and all the consumer apps or microservices should use those dependencies concerning an understanding.
our deployment artifacts, like docker image, should also reside in some docker repository with version and tags

3. Config

Store config in the environment

twelve-factor: externalise the config files from the codebase — Twelve-Factor: externalise the config files from the codebase

Config is likely to vary across environments, i.e., resource handler for the database, credential for internal or external services, pre-deployment values, etc.

A few rules that make a repo twelve-factor compliant from the config perspective are:

strict separation of config from the code
A way to check if the code is twelve-factor is to ask if we can make the same code open source at any given time without making any changes or compromising any credentials.
Config that does not vary across deployments can be part of the repo, but it should not have anything from a security perspective.
config files should not be checked into the code repo

We can make the data layer twelve-factor compliant for the config by:

creating environment files for the application or the microservice
keeping it separate from the deployment
The environment file should isolate the database connection details and other such details for the environment. It should not be inferred from the system variables.
An environment file should be used per environment deployment.
The database configuration should be part of the script that should be executed per environment via the deployment script of CD/CI

4. Backing Service

Treat backing services as attached resources

twelve-factor: backing service for the application & the microservice can be changeable without code change — Twelve-Factor: The backing service for the application & the microservice can be changeable without code change

A backing service is any service consumed by the application or the microservice over the network, such as different databases, Keystores, session stores, event systems, etc.

A few rules that make a repo twelve-factor compliant from a backing service perspective are:

The backing service source should be changeable without any change in the repo.
The repo should not distinguish between local or third-party backing services.
Changing the third-party backing source might require a restart but should not require any code change.

We can make the data layer twelve-factor compliant for the backing service by:

Externalise the connection and config details from the code repo.
Code should be tested with different backing service sources so that switching in case of need does not impact the codebase.
we can develop shared libraries for this, which can be deployed to the artifactory server and can be consumed by the code repo

5. Build, Release and Run

Strictly separate build and run stages

twelve-factor: build once, deploy and run on all the environments — Twelve-Factor: build once, deploy and run on all the environments

The codebase resides in a repository and can have different code versions in other branches. The different backgrounds might have code deployed from another branch. Here, we will talk from the perspective of how to keep the build and the release decoupled.

A few rules that make a repo twelve-factor compliant from build, release and run perspective are:

The build stage converts the code and config to an executable bundle from a branch.
The executable bundle created in the build stage can be seamlessly deployed in all environments.
Every build and executable bundle should have a release ID associated with it.
Different environments can deploy separate builds on them per the project's need.
the deployment tool should support the rollback option as well

We can make the data layer twelve-factor compliant for the build, release and run by:

The DDL and config scripts should be built and deployed via the CD/CI pipeline on different environments.
New library changes should be built concerning the version number and deployed on the artifactory server, and the consumer apps or microservice should upgrade the version as needed.
artifactory server and release management tools for databases like Liquibase, Flyway, and Mongoose support this

6. Processes

Execute the app as one or more stateless processes

Twelve-factor: all the instances should be able to serve different requests from the same client in the same session seamlessly

With cloud and auto-scaling features on server deployment getting popular, we expect the app and the microservice to behave statelessly and should share sessions. Every request can be served independently of the process by which the last request was served. This is equally achievable from the database perspective as well.

A few rules that make a repo twelve-factor complaint from a process perspective are:

All requests should be served independent of the instance the last request was made from
On load, a new instance should spon, and with a reduction in load, the cases should reduce
An instance should have a cache only concerning a single transaction.
Logs should not reside on the instance but should be externalised to the centralised store so that logs from all the cases are published over their

We can make the data layer twelve-factor compliant for the process by:

having one or more masters and one or more slaves
We should tune the database instances to keep the replication delay minimum.
We can do sharding on the master to distribute the load on the master to multiple instances.
The application or the microservice should be configured to connect to the master or the slave instance based on the functional flow and its need around real-time versus near-real needs.
the configuration and the connection for the microservice and the application should be designed in a way that it should work independently of which instance we write upon verses from where we are reading the record

7. Port Binding

Export services via port binding

Twelve-factor: The client for the service should be exposed to a URL via port binding

A twelve-factor app is wholly self-contained and should export HTTP as a service by binding to a port and listing to the request coming on that port.

A few rules that make a repo twelve-factor compliant from a binding perspective are:

The app or service should export HTTP as a service binding to a port.
A routing layer handles routing requests from the public domain to the port-bound process.
We can expose not just HTTP service but any different kind of service via port binding as well.
one app can become the backing service for another app via port-binding

We can make the data layer twelve-factor compliant for the port binding by:

We can have databases like MySQL, MongoDB, Redis, Neo4J or ElasticSearch deployed on stand-alone EC2 instances.
They can all be running on a host at a particular port.
We can use the AWS Route53 service to configure private DNS and map the DNS name with the host and the port of the backing service.
If we use AWS-managed service, they expose a private/public DNS to us concerning the actual master and slave combination bundle.
if the underlying application or microservice is using this mapped DNS to serve the client, then we can say that the database layer is compliant from the port-binding perspective

8. Concurrency

Scale out via the process model

In a twelve-factor app, the process is a first-class citizen; it takes cues from the UNIX process model and runs them in daemons.

A few rules that make a repo twelve-factor compliant from a concurrency perspective are:

The developer should assign each type of work to a process type.
A web process should handle HTTP requests, and a worker process should manage background tasks.
It does not create constraints by using multiplexing and threading inside the VM.
We should think from the perspective of horizontal versus vertical scaling.
It suggests sharing nothing and horizontal partitionable scaling so that concurrency can be achieved by simple and reliable operation.

We can make the data layer twelve-factor compliant for the concurrency by:

having master and one or more slaves
The system should calculate the load and automatically increase or reduce the slave size.
We can do sharding on the master to distribute the load on the master to multiple instances.
We should use an architecture of at least three shards, and every shard should have at least two replica sets, all sitting behind the group proxy, to achieve auto or manual scaling at the database layer without inducing any downtime.

9. Disposability

Maximise robustness with fast startup and graceful shutdown

This means that we should be able to start or stop a process at any instance, which means easy scaling and minimum startup time, and the process should be robust against sudden death.

A few rules that make a repo twelve-factor compliant from a disability perspective are:

It should have a minimum startup time.
The process should shut down gracefully when it receives a SIGTERM signal.
The process should be robust against sudden death.

We can make the data layer twelve-factor compliant for disposability by:

A failure can happen to the master or the slave.
We should architect the database layer from the perspective that the system should continue with zero downtime, even in database failure.
We can design it from two different perspectives. The first one is that an extra-small-size instance should be kept in the stack as a replica set so that we can increase its size and capacity as required and use it.
Another way is to have an automated process to copy an existing instance, add it as a replica set, and enable binlog replication. Once the data is in sync, the application or the microservice load should be allowed to be served.
We should also have a DR database in place so that we should be able to start to use it the moment we require

10. Dev/Prod Parity

Keep development, staging, and production as similar as possible.

Twelve-factor app helps in keep parity between the dev and prod — Twelve-factor app helps keep parity between the dev and prod

In the early days, the development and production gap was too high. The interval used to be high in the monolith application due to the following:

The time gap: Long release cycle
The personnel gap: We used to have a different cycle for development and then for deployment
The tools gap: At times, the developer stack and the prod stack were different as well

A few rules that make a repo twelve-factor compliant from dev/prod parity are:

The twelve-factor is designed from a CD/CI perspective to minimise the gap between development and deployment.
Backing services should be at parity between the dev and prod.
We can achieve this by running local Docker containers for the backing services or by connecting to dev deployment for them.

We can make the data layer twelve-factor compliant for the dev/prod parity by:

Dev should either point to the dev database instance via port binding.
The dev database instance should have the same configuration and version as compared to the prod (though the instance size can be small to keep the cost in control)
The dataset on the dev should be similar to the prod, but the data masking should be done to ensure data security.
The DDL or config changes on the dev database should be made using the same script we plan to execute on the prod via the CD/CI pipeline.

11. Logs

Treat logs as event streams.

Twelve-factor: Logs should be published via event stream to centralised source — Twelve-factor: Logs should be published via event stream to a centralised source

Logs provide insight into the behaviour of the running application or the microservice. They write the logs to the log file. We should have an event stream to read from the output log file and publish the logs to a backing service like Splunk or CloudWatch log per the system requirement.

A few rules that make a repo twelve-factor compliant from a logging perspective are:

Every service should produce its log as per the design of the output log file.
The event system should continuously collate the logs from the log file and publish them to the log service via the event stream.
We can then have an alert or dashboard configured on the log system to generate insight.
We can also search in these logs as required.

We can make the data layer twelve-factor compliant for the log by:

All the database logs should reside in the database log tables.
We can create a data pipeline to read the logs and publish them to a columnar database like BigQuery.
As all the database logs now reside in BigQuery, we can create the dashboards and alerts on top of it as required.

12. Admin processes

Run admin/management tasks as one-off processes.

Admin or maintenance tasks should be executed via a long-running admin process on the environment. Twelve-factor strongly favours language that provides a REPL shell out of the box.

A few rules that make a repo twelve-factor compliant from an admin process perspective are:

An admin process should be treated as an independent, standalone running process.
Any maintenance or one-off administrative work should be done via this process.
All such work should be planned as part of a release and should follow release ceremonies.
A few examples can be database migration, executing a script or some code via shell.

We can make the data layer twelve-factor compliant for the admin process by:

The database layer requires maintenance and upgrade activity to be performed on it.
It should be done via a release cycle and follow its respective ceremonies.
The change should first be made on the lower environment using the script from the repository, and then the same script should be executed on all the up environments via the admin process as we did in the first environment.
The scripts for all the published changes should be managed in the repository.
Shell prompt should be used to execute those changes, and the change log file should be maintained and kept for future tracking.

Related resources:

Towards Polyglot Architecture

Articles around different concepts and techniques useful in the world of microservices.

medium.com

Twelve-factor and data layer

1. Codebase

2. Dependencies

3. Config

4. Backing Service

5. Build, Release and Run

6. Processes

7. Port Binding

8. Concurrency

9. Disposability

10. Dev/Prod Parity

11. Logs

12. Admin processes

Towards Polyglot Architecture

Articles around different concepts and techniques useful in the world of microservices.

Written by Nitin Khaitan