Modernising Our Channel Services Layer — Our Journey

Published in

Standard Bank Engineering

7 min readNov 22, 2017

A few years ago, Standard Bank started embracing modern Agile engineering practices. In modern software engineering we know that empowered, self-organizing teams have the highest chance of success. We also know that the environment in which a team works needs to allow the team to be empowered and self-organizing. So just as we have been working to change our team structures and adopt new Agile processes, we have been continuously working to change our operational and infrastructural components to create an environment conducive to building and supporting empowered, self-organizing Agile teams. One of the major initiatives in aid of this was our transition away from a monolithic application development and deployment model.

Where We Started

4 years ago, Standard Bank was just starting their digital and agile transformation journey. The dream was to build a multi-channel services layer, but we had to start somewhere. The focus at that time was to just build the services needed to support the new mobile channel. The result was a complex, monolithic, Spring based Java application running in WebSphere Application Server (WAS).

There was a separate team responsible for operating the WAS servers. The team running the WAS servers handled ALL application deployments across all environments. These deployments were all done manually; triggered by a request logged by our developers in task management system and followed up by email. The WAS support team were also responsible for first level support of these application servers because our developers were not allowed direct access. So, when our application misbehaved we’d have to log a request to ask the WAS support team to provide us with a copy of the server logs from the WAS boxes.

In production, we had a traditional PROD (HOT) + DR (COLD) setup. Code deployments required downtime and had to be done at night, usually 22:00. These deployments would have an impact on customers as in-flight traffic would be affected when the application servers were brought down to apply the change. In a DR situation, we had to ask yet another team to update the load balancer to point to the DR machines instead of the PROD machines.

This model presented so many challenges it’s hard to believe that we actually worked like this and were able to deliver anything at all.

Where We Are Now

Standard Bank decided to then embark on a bit of an engineering renaissance program. We knew that we had to:

build smaller, domain focused feature teams
reduce coupling between these teams to allow them to move at their own pace
give these teams the space which allows them to take ownership of and be accountable for the development, deployment, and operation of their features.

A — Passive Test App

A customized build of the current version of the mobile application was created to allow us to test our production deployments in the passive zone before they are made available to all customers. This “Passive Zone” testing app has 2 modes of operation:

HA Proxy Test Mode

This mode allows us to test changes and upgrades to our HA proxies while they are in a passive state or zone. When in this mode the app is configured to make calls to a special purpose port which only routes to the HA proxy instances which indicate that they are in a passive state.

Domain Service Testing Mode

This mode allows us to test changes to our Domain Service instances while they are in a passive state or zone. When in this mode the app is configured to add an extra request header which tells our HA Proxies to route the traffic to the Domain services which are in a passive zone

These 2 modes of operation allow us to test HA Proxy changes independently from Domain service changes.

B — Application Firewall + Load Balancer (IBM DataPower)

The load balancer groups on the IBM DataPower appliance were changed from being a static configuration based on a fixed list of target servers; to a dynamic model based on the Active/Passive “status” reported by our HA proxy server. If the status reported by the HA is “1” then that HA is included in the Active load balancer group; otherwise its included in the “Passive” load balancer group. The IBM DataPower appliance polls this status service and automatically reconfigures the load balancer groups when this status changes. This configuration change is also done in a way which results in absolutely no impact to in-flight traffic.

This allows us to take a single HA Proxy server out of the Live traffic group, do all our changes, test the changes via a dedicated testing app, and then reintroduce the HA Proxy back into the live traffic group; all with zero impact to the customer.

C — HA Proxy (Routing + Load Balancing)

Previously all load balancing and PROD/DR switching was handled by an IBM DataPower appliance, which was operated by a separate team. We wanted our feature teams to take ownership of these responsibilities and the introduction of a set of HA Proxies between the IBM DataPower appliance and our Domain Services allowed us to do this. This has introduced some complexity which could be perceived as being unnecessary (load balancing to load balancers?) but has allowed us to achieve some significant improvement in terms of operational and delivery efficiency.

Some of the highlights in this regard are:

ALL production deployments can be done during the day with no impact to the customer (no downtime required). This also applies to instances where we make changes to the HA proxies themselves.
Feature team specific changes can be deployed independently from one another.
We have now moved from a PROD/DR model to an Active/Active model meaning that we can make all of our servers in both datacenters “LIVE”. We now get more value out of our DR infrastructure
We are no longer dependent on another team to do DR switches.

D — Monolith to Domain Services

Separate domain focused feature teams were set up and given the mandate to carve out their domain specific functionality from the monolith. Each team created their own separate projects containing their domain specific services. Each project / application also ran in its own virtual machine. We called this layer our “Domain Services” layer.

Part of the evolution to this Domain Services model involved extensive automation of infrastructure setup and application deployments. Chef was introduced to deal with the automation of infrastructure provisioning and we used Bamboo together with Chef to automated the application deployment.

Our teams now had full control of their application deployments and operational environments.

We also adopted a Blue/Green deployment approach which allowed us to deploy changes into a “passive” environment in production, and do some pre-verification before the changes are made available to all customers. This operational environment has been enabled to a large extent by the introduction of a set of HA Proxy servers in front of our Domain Services which our teams have control of.

Our transition to this architectural state and deployment model has yielded a number of benefits but we’d be the first to admit that it is far from perfect.

Where to Next

Standard Bank is now well into the 3rd phase of the journey to further modernize our channel services layer. The primary themes driving this phase of the evolution are that our services must be:

Multi-Territory aware
Multi-Segment
Omni-Channel
Cloud Native
Microservices

Underpinned by the following secondary themes:

Resilience
Simplicity
Convergence

A — Multi-Channel, Multi-Segment

4 years ago, our teams mandate was to build the new mobile retail banking application for Standard Bank South Africa. Today our mandate also includes the build of the new internet banking websites and the USSD channels across the Standard Bank group. The Standard Bank mobile banking application has already been engineered in such a way that customers from 12 of the countries in which Standard Bank operates can do their banking through a single app published to the Google Play and Apple App stores. Soon our mandate may also include the ATM and Branch channels. This channel convergence strategy will help us realize significant productivity efficiencies.

B — API Gateway

We’re moving to a model where we’ll be publishing our channel APIs on an API Gateway (IBM API Connect). This will allow internal and external customers to consume these APIs.

The API Gateway pattern not only gives us the benefit of maintaining control over the consumers of our APIs but will also allow us to respond to regulatory requirements such as PSD 2 (Payment Services Directive 2).

C — Microservices

We’re now in process of splitting our domain services into finer grained microservices fronted by client specific gateways. The need for microservices has been primarily driven by the mandate to build services that can be used by multiple channels. In this environment, request volumes can be difficult to predict. The microservices architecture gives us a solution in that it allows us to easily scale horizontally, especially when these services are run in a Platform as a Service (PaaS) environment. We’re currently building these services in Wildfly Swarm and Spring Boot and will soon be running them in a Kubernetes cluster. The move to the PAAS based execution environment will result in significant simplification of our current infrastructure setup and application deployment processes.