A couple of weeks ago I got a frantic email from one of our marketing teams-they were sending out a campaign, and the functionality they needed wasn’t quite working. I needed to fix it quickly!
In days of yore (pre-2014) SpareBank 1, the customer I am currently working with, ran its entire net banking application on a monolithic Weblogic portal. Every single developer worked on the same codebase. Releases were arduous processes. Developers checked their code into the monolithic repository. Checked in code had to be deployed to various environments for integration and acceptance testing. Delivery approvals were needed. Documentation had to be updated. In many cases, the approval cycle took longer than the development and testing phases. This meant that releases were limited to 10 a year.
In 2015, the bank decided to improve delivery times and efficiency. A goal of 24/30 was planned, which meant:
- Conception to production could be performed in 24 hours
- Code could be deployed in 30 minutes
To achieve this, the bank made the following changes, which we adhere to even today:
1. The codebase was split into manageable chunks
The code was broken up into a microservices architecture — porting functionality from the Weblogic monolith to smaller applications that communicate with each other via a Single Sign On token. For more details on this, please watch Stian and Vidar’s beautiful JavaZone presentation.
2. Developers are split up into autonomous teams, with full liberty when it comes to releases, innovation, and coding.
Using an inverse version of Conway’s Law, teams are split up by net banking functionality, to reflect the new architecture. Each team has its own code repositories and is responsible for its own releases.
3. Every developer uses the same environment.
Every single developer uses the exact same setup, on the same OS, and codes on the same IDE.
Creation of branches, pull requests, merging of branches is made consistent with the development of an internal builder script that automates tasks like branch creation, Jenkins job creation, etc.
Running the script with a “begin” option, for example, creates a feature branch on git, and a corresponding Jenkins job for building and unit testing.
Git hooks ensure that branches cannot be merged unless they go through a Pull Request, with the corresponding approvals from fellow developers.
Once the Pull Request is approved, the changes can be merged, and the builder script run with a “complete” option to delete the feature branch and the Jenkins job.
Having a consistent development environment ensures that developers can easily move between teams if they so wished.
4. Every application shares the same architecture.
Of the over 20 applications that were splintered from the monolith, every single one has the same architecture and file structure. Front end and back end directory and module structures are consistent across applications.
The nuts and bolts of the framework running the applications are separated from domain code. So, if a developer from another team were to come in, the framework code would already be familiar.
5. There is a common build process.
Every application contains a build script. Since file structures and modules are consistent across applications, this build script is mostly the same across applications. Remember the Jenkins job that was created by the builder script when developers started a feature? All that job needs to do is run this build script.
6. Dependencies to common libraries are updated frequently, and automatically.
Dependencies to common code libraries are updated weekly. Application developers must still approve the Pull Requests and merge the changes manually, to eschew unforeseen changes, but the bulk of the dependency updates is automatic.
7. Everything is automated!
Well, almost everything. Every commit to git triggers a Jenkins build. Builds on feature branches run the full suite of front-end test with npm and back-end tests with jUnit. Dev and release branches additionally run integration tests with Finesse.
Documentation is now a part of the source code and built and packed when a release branch is built. The only thing a developer who creates a release must do is create a list of the Jira IDs (a task tracking tool) that went into the release.
When the release is delivered, this list is uploaded by a script to confluence, which means a release manager can easily see which Jira IDs were involved in a release by simply looking at a Confluence page.
As you can see, the entire release approval process was eliminated. Well, almost. The infrastructure group still needs to be kept informed about the dates of planned releases, the content of these releases (as Jira IDs) and functionality that could be affected. This prepares them for possible incidents that may occur post-release.
According to Puppet’s 2018 State of DevOps Report, this is acceptable:
Having an external change approval board had a negligible impact on stability, but a detrimental effect on agility. Despite this evidence, we see all too often that the authority to make decisions is removed from the people who have the relevant information and are doing the actual work.
8. Everything is logged!
Calls to back end services are logged as JSON, which makes it easy for log analysis tools to index events.
9. Every environment is provisioned
Servers are now virtualized, which means they can be instantiated from scripts. This ensures that every environment — test, QA, pilot, and production have the same setup.
10. Releases are performed by the development teams themselves
When the code has passed unit, integration and system tests, the developers themselves can choose to release it. The process itself is extremely simple because we use a web-based deployment platform. All it boils down to is choosing the application by name, the correct version number and clicking a button.
This process didn’t happen overnight, or even over the course of a few weeks. We actually went through several iterations over several years before we realized what worked and what didn’t.
When I released the fix for the marketing team two weeks ago, the entire process from frantic email to deployment in production took 1 hour and 37 minutes. This included going through every test phase, updating documentation, deploying and testing on QA and pilot environments and finally what ended up taking longest of all — waiting 30 minutes for logged in sessions to drain before deploying the release to production.
As we move towards API and Open Banking concepts, our DevOps strategy will evolve further. An inevitable move to the Cloud will also bring its own challenges and rewards. But the changes made over the last years will allow us to be nimble enough to adapt to these challenges.
If you have any comments or suggestions, please free to note them below.