How I Use the Twelve-Factor App Methodology for Building SaaS Applications with Java & Scala

Streamlining the Development to Release Process

Hashmap, an NTT DATA Company

Published in

Hashmap, an NTT DATA Company

10 min readMar 20, 2018

by Jay Kapadnis, Tempus Architect

What is the Twelve-Factor App Methodology?

The Twelve-Factor App Methodology is suggested by developers for smoothly working and delivering Software as a Service (SaaS) Applications or Web Apps with a focus on Microservices. For that reason, for purposes of this discussion, I will be using Service and Application interchangeably.

A Twelve-Factor App as described at 12factor.net will be:

1. Automated from a development perspective
2. Portable across execution environments
3. Deployable in the cloud minimizing the need for servers and server administration
4. Enabled for continuous deployment with minimal divergence between Dev and Prod
5. Scalable without significant change or effort

At Hashmap, we do our best to follow the methodology as much as possible, but reality dictates that sometimes we must deviate a bit due to limitations that are encountered. Regardless, it’s always a best practice to keep the Twelve-Factors top of mind while building SaaS applications and continue to reinforce why they matter.

So let’s explore the Twelve-Factors now…

Codebase

An application codebase should always be tracked by a Version Control System (VCS). Widely popular in the development world is Git, and in today’s world you really can’t find a development team who are not using VCS. It provides substantial benefits such as code tracking and code versioning while also easing the collaboration of a development team working on the same application.

In general, you should always have one repository for an individual application to ease CI/CD pipelines, which we will discuss shortly. That being said, one service can have multiple running instances (official documentation terms it deploys).

Code should not be shared between applications — you can’t add dependencies of deployable services (think Spring Boot or Play applications) on each other. So, if there is common code you want to use across applications, create a repo for that code as a library, publish it to Maven, and integrate using Maven, sbt, Gradle or any other tool you prefer.

There are workflow tools which can help you manage your codebase till release, such as GitFlow (Release Management Workflow). Developers working with Extreme Programming Principles (frequent releases in short development cycles) generally don’t enjoy following these types of release management workflows — it’s an altogether different discussion for another day.

Dependencies

For any application you should not copy any dependencies to the project codebase, rather use dependency management tools to get the required project dependencies, declared in manifest, from the server.

If you consider Maven as dependency management tool, manifest will be pom.xml, which fetches dependencies as jar (artifacts) from various repositories (it’s a Maven repo not a Git repo).

There are times though when you will need to rely on few jars which are declared as a provided scope. In other words, you will need to assume their existence on server like tools.jar from JDK. Always ensure you use the correct versions of dependencies so that all environments are in sync and reproduce the same behavior. I can tell you from experience that it’s a nightmare to track issues that have occurred due to a mismatch in versions of dependencies on different environments.

For the Java/Scala world, there are tools such as sbt and gradle and they differ on the way the manifest is written to manage dependencies. Note that there are more differences, but the most obvious difference is manifest.

Backing Services

Backing services refer to the infrastructure and other services by which the application communicates over the network. Database, Message Brokers, other API-accessible consumer services such as Authorization Service, Twitter, GitHub etc., are loosely coupled with the application and treat them as resource.

You may be asking yourself, “So don’t all Microservices generally follow this principle — what’s new here, Jay”.

The difference here is that you must be able to easily swap the backing service from one provider to another without code changes.

For example, let’s take an app which acts as an Authorization Server supporting OAuth2 capability with it’s own database and we need to change it to different client database maintained on a different server. There should be no code changes done to the application — only configurations should be changed.

The focus is on Configurations over Conventions.

For use cases like this, it can be easy when using something like Spring JPA, but you can’t try to code for each and every possible competing technology used in an application, that’s really unrealistic.

In order to solve for this challenge, we generally suggest coding to Interfaces where you create a Façade which can be used by an application to integrate with backing services. Depending upon the configurations (e.g., in Spring Boot .yml file) we can inject the appropriate backing service at runtime. A Plugin based architecture also provides a way to add support for different types of services using a Façade to an application.

The reality is though that most applications can’t be 12 Factor apps if this is a “rule” vs an “exception”. Do your best to avoid over-engineering your application if it’s not necessary. Make a judicious call on supporting backing services depending upon the need of the business and project itself.

Config

Configurations are a central part of any application, specifically when there is a need to support multiple environments or clients. Use cases are as follows:

· Database connection properties

· Backing services credentials and connection information

· Application environment specific information such as Host IP, Port, etc.

The assumption is that all configuration are stored in .config, .yml, .json or .properties (depending upon the framework you are using to build your application) and not in the code.

There should be a strict separation between config and code. Code should remain the same irrespective of where the application is being deployed, but configurations can vary.

So the rule of thumb is, Properties, which vary depending upon the environment or client, should go into config files as in the examples above. REST endpoints/API (URI) and other components that don’t vary can be safely put into the code.

In order to achieve the above, most frameworks support the use of System Environment variables inside configuration files. For instance, in the Spring Boot yaml file you can write:

Security.oauth2.client.clientSecret: ${CL_SERVICE_PASSWORD}

The above assumes that CL_SERVICE_PASSWORD is set as a sys env variable for deployment. Similar approaches apply to the Play Framework as well.

This enables you to ensure that no sensitive information is exposed within the codebase and you can customize depending on the environment where the application is deployed. Maven also provides a feature of “profiles” which can be used to toggle between deployments such as dev, test, staging, and production.

Build, Release, Run

A twelve-factor application requires a strict separation between Build, Release and Run stages. Let’s look at each stage in detail.

Build

The Build phase takes code from VCS and builds an executable bundle. For Java/Scala environments, it’s mostly JAR, though with Spring Boot you can generate executables such as deb, rpm, etc. For now, I’ll focus on JAR.

Ideally, this stage should also take care of executing all Unit Tests available in an application. If tests fail, then the entire process should be abandoned as you don’t want a failing application deployed.

This can be assumed as Continuous Integration (CI) where code from the VCS is continuously pulled as and when changed by a team member and built on a server. Make sure your builds are fast to get quicker feedback since this is a stage that tends to fail.

Release

In this stage, an executable build is combined with environment specific configurations, assigned a unique release number, and made ready to execute on the environment.

Run

Finally, the package is executed on an environment using the necessary execution commands. This can be seen as Continuous Deployment once pipeline and all previous stages pass.

The typical Development Pipeline is depicted below:

Every deployment, whether Dev, QA, Stage, or Prod needs to follow all stages mentioned earlier. Software deployment through this lifecycle is quick, continuous, and without any manual intervention. Tools that help in achieving a full CI/CD pipeline are Jenkins, Thoughtworks go, and Codeship to name a few.

Processes

This factor is focused on executing the app as one or more stateless processes. A Process is an application running on server. An Application can be deployed with multiple instances/processes depending upon the network traffic. Generally, a load balancer is used to manage traffic and route to an app instance, which enables quick request handling.

We can’t guarantee consecutive requests from the same client will go on the same instance. For this reason, we prefer not to rely on data of previously processed requests. Think about this in terms of User Session information.

With multi-node deployments in the Cloud when application scaling can be automated it’s best never to rely on data written on the file system or data in memory because you can’t be sure that it will be available or accessible to all nodes. A failure could cause data to get wiped out or be made inaccessible.

You can use the file system for brief periods of time in the same request, and to achieve this:

· Use a database to store state if needed for subsequent requests
· Avoid using sticky-session, and instead use scalable cache stores such as Memcached or Redis for storing session information
· Package assets in executables (e.g. by using webjars at build time)

Port Binding

Unlike some web apps that are executed inside a webserver container, a Twelve-Factor acts as a standalone service and is self contained meaning it doesn’t rely on any existing/running application server to get executed. This implies that the Port on which the application is connected to is also stored in Config (discussed earlier).

Spring Boot can really assist here so that we can use embedded Tomcat, Jetty or Undertow using the dependency manifest (pom.xml). Additionally, the Play framework comes with AKKA-HTTP and Netty server.

This approach also helps in Backing Services communication.

Concurrency

As mentioned earlier, Process as a first-class citizen of a twelve-factor app. Concurrency is a bit redundant with the Process factor, but the key point is that because of Processes, Concurrency is simple and reliable. Here are some guidelines for Concurrency:

· Don’t rely too much on threads in an application as vertical scaling can be limited for process running on server
· Adhere to Process guidelines to achieve Horizontal Scaling
· Don’t daemonize or write PID for app process as the platform process manages handle processes better

Disposability

Processes in twelve-factor apps should be started or stopped in minimal time. As we discussed in Build above, processes should be less time consuming. Similarly Run and Stop should be minimal to avoid catastrophic failures between different applications working as backing services.

Consider the example of a retail website which has services such as web-app, auth-service, catalogue-service, order-service, etc. In the case of failure of a process of order-service, the user should still be able to access the website and eventually within no time should be able to order immediately as order-service comes up.

Disposability is also about keeping HTTP requests short which sometimes is not possible. In the case of WebSocket connections, if data is streamed continuously to the client, and the connection is lost, this adds responsibility on the client to seamlessly attempt to reconnect.

The implicit message in this factor is to provide resiliency and automated scaling of application processes, which is generally easier to achieve with Cloud and Containerized deployments.

Dev/Prod Parity

Your development environment should be as similar to your Production environment as possible. Twelve-factor applications are designed for continuous deployments by keeping less gaps between production and development environments. This is done to avoid unforeseen issues once an application goes live when the app was working fine on the development environment — this doesn’t necessarily mean having same the OS on both environments.

This also implicitly encourages a DevOps culture where Software Development and Operations are unified. Containerization is a huge help here for the developer enabling an ability to simulate the production environment and bridge any gaps.

Logs

Twelve-factor apps should not be concerned about routing and storage of it’s output stream or writing/managing logfiles — the app will write it’s event stream to stdout.

There are plenty of logging frameworks available for Java/Scala., solutions such as SLF4J with logback that can manage logs really well plus aggregation frameworks like graylog or logstash to gather logs and provide analytics plus alarming capabilities. You can also perform analysis of logs using tools like ELK stack.

You can customize depending on the environment using configurations. There are times when a developer might want to have different log levels and see logs on stdout in the development environment.

Admin Processes

Twelve-factor apps aim to run admin/management tasks as one-off processes — tasks like database migration or executing one-off scripts in the environment.

This seems ok if it’s really a one-off task but what if your database migration is a periodic task and you need to handle it with schedulers and perform it automatically.

Also for one-off tasks, ideally these should be handled before the application process start-up. It’s better to make it part of an automated process so that you don’t need to execute scripts manually on all servers and it’s identical across the board.

This also promotes having Read–Eval–Print Loop (REPL) languages for an application — take into consideration how many times these one-off commands will be executed.

Wrapping Up

We have seen how twelve-factors can make the development to release process hassle free and predictive with higher scalability and we are experiencing this daily with our ongoing development of the Tempus IIoT Cloud-Edge-ML Framework.

There are times where it makes sense to deviate from a few of the factors such Backing Services and Logs, but it’s best to adhere to all Twelve-Factors as much as possible.

In future posts, I’ll do my best to combine the Twelve-Factor methodology and concepts with Microservices architectures, Cloud Computing, and Containerization — stay tuned!

If you’d like to share your thoughts on Twelve-Factor apps and understand more about what I’m working on daily with Tempus reach out to me at jay.kapadnis@hashmapinc.com.

You can also give Tempus a test drive now or schedule a personal demonstration.

Feel free to share across other channels and be sure and keep up with all new content from Hashmap at https://medium.com/hashmapinc.

Jay Kapadnis is a Tempus Architect at Hashmap working on the engineering team across industries with a group of innovative technologists and domain experts accelerating high value business outcomes for our customers.