Migrating tens of thousands of charging stations to a new microservice

This April the technology unit of Virta achieved a major milestone: for the first time 100% of all charging stations connected to our platform were talking to our new Chargers Hub service. This is the story how we re-built a central piece of the Virta platform on top of a microservice architecture and how it was gradually rolled out to be the contact point of many tens of thousands of charging stations. Today the Chargers hub is handling hundreds of thousands of messages per minute while talking to over 150 different charger station models over a variety of protocols.

Chargers Hub is a testament to the technical excellence that Virta is delivering to bring industry leading scalability, flexibility and reliability to EV charging.

The graph is representing all charging stations connected to the Virta platform from a period of one year. The striped areas are stations connecting to the old system, while the solid colors are stations connected to the new Chargers Hub.

The problem

Around 2019 our senior developer Mostafa Aghajani had a project to estimate the delivered energy during an ongoing charge. Some stations don’t provide the information until a charging session has ended, but he figured out that it was possible to estimate the delivered energy from other data points. At the heart of the Virta platform is a system called virta-core. Diving into the charger communication code inside the virta-core made it evident that implementing the energy delivery estimation for all the different charging stations models would not be easy. The system that originally supported a single proprietary charging station communication protocol in 2014 had grown to support multiple protocols, different versions of the protocols and quirks of specific charging station models. At the same time Virta platform experienced high growth rate in the stations connected to the system. The high pace of increasing station count also exposed a scaling issue: at minimum each station sends a heartbeat message at small intervals and then there are large number of other messages flowing back and forth between the Virta system and the charger. This is the part of the system that will hit the limits first and require scaling the infrastructure of the part that communicates with the stations. It seemed like it would be fruitful that the functionality of communicating with the physical charging stations should be spun out of the virta-core to a new service: the Chargers Hub.

The work begins

The newly formed device communication team made up of Mostafa Aghajani and Hieu Cao went to work at solving the issue. There was an initial proposal of developing a Virta specific abstraction of a charger communication protocol. The idea was that the new protocol would be the only one implemented in the virta-core for the purpose of talking to charging stations. Multiple independent middleware services would be built to implement the conversion between the abstract protocol and a specific one the physical charger stations supports, like OCPP-J 1.5. With the initial idea in hand, the high-level design work began. Consultations with the technical owners, senior developers and the infrastructure team uncovered a lot of lessons from previous work done within charger communications. It was agreed that a separate microservice implementing the middleware layer was the best option and Node.js/Typescript was selected as the technology stack. A new microservice would allow for separation of concern and fast iteration speed. The clear integration point in the abstract Chargers Hub protocol between the microservice and the virta-core would ensure the design was kept clean. After a lot of discussion and research, the initial architecture took shape and the request/response style API was scrapped in favour of an evented approach. RabbitMQ was used as transport between the new microservice and virta-core.

The Chargers Hub is a collection of independent microservices. Each supported protocol in the middleware layer has it’s own service that act as the adapters between the device protocol and the Chargers Hub abstract protocol. There is a manager service the middlewares connect to which is the single integration point of Chargers Hub to the rest of the system. Manager also later gained caching layer and is publishing domain events to be consumed by other systems interested in station events.

Early stage architecture diagram of the chargers hub

When the plan had been iterated over and was starting to look good, the actual implementation work started in the latter half of 2019. The device communications team worked on the project with Mostafa taking the lead in implementation and Hieu assisting. Much needed focus was possible by the division of labour where Mostafa worked mainly on the new service and Hieu provided support and implemented development requests coming in as an endless stream from customers and other stakeholders in the company.

The Chargers Hub Protocol and Protocol Builder

The really important part of the project was the definition of the Chargers Hub Prococol (CHUB Protocol). Since the microservice, to be implemented in Node.js/Typescript, and the virta-core written in PHP 7 would be talking to each other a lot, it was important to be able to create a common type safe way to do so. A shared source of truth for all message types and their data formats was built to implement the protocol. This master protocol is then automatically translated to strongly typed data transfer objects in each language. Since the protocol is defined programmatically using code, it is also strictly versioned avoiding issues on deployments with mismatching code. There was already similar need in the Virta mobile apps sharing implementation between the Android and iOS apps. The initial version used for mobile apps was further developed for the purpose of defining the unified protocol. This project has been released as open source: rhythmicode/protocol-builder. The Protocol-Builder was used to define the actual data models and the language specific implementations are then generated to their own git repositories. These git repositories are imported as git submodules in the repository of the service that uses the protocol. This allows fast iteration and ensures that there are no version or type errors when communicating between the services. The use of protocol builder has later been expanded to cover also other inter service communications in the Virta platform supporting also other languages like Kotlin as output.

Example definition from the protocol-builder master file and the generated source code for Typescript and PHP:

Example definition of Protocol Builder and the generated DTO files

Simulator

Fast iteration also requires that you can constantly test and verify your implementation. When the main purpose of the system is to communicate to 3rd party hardware devices, fast iteration becomes difficult quite quickly without a simulator. A simple charging station simulator was developed to allow fast turnaround time in the development. Simulator allows for automated or interactive workflows. The simulator has later been expanded with more functionality and has also been released as open source: virta-ltd/charge-device-simulator

Screenshot of the charging station simulator in interactive mode

Starting small and expanding little by little

The task of migrating the tens of thousands charging stations to a new service is quite gargantuan. One needs to start small. The first steps were to implement a small subset of the OCPP-J protocol using the simulator and then later connected to a real test charging station. The bare minimum to get a charger to communicate back and forth was the first goal: Boot notifications (message that the charger sends when it comes online), Heartbeat (a periodical message that confirms the connection between charger and the backend is working) and a status update (message that charger uses when it changes state).

Test chargers at the Virta Headquarters

The first victory was connecting a real charging station at the office, that was able to boot up and update its status! With this first version you couldn’t actually deliver energy, as the relevant messages to start a charge were not yet implemented. It did however prove that the chosen architecture and system could be made to work. After the initial success, the next step was not implementing the remaining messages of the protocol. Instead, the team focused on building really good monitoring and logging infrastructure. This was deemed necessary as it would allow visibility on large scale to things like how many chargers are connected, what is their status and also provide debugging capabilities at the really detail level. The debug tools would allow inspecting individual messages from a charging station and what the systems response was.

This screenshot is showing a small sample of the performance metrics available for Chargers Hub

When all of these systems were built, it became evident that running the whole microservice setup on a developer laptop was too resource intensive and slow. A development server was set up which ran parts of the system and could be connected to from the local environment.

Once the new development environment and the monitoring capabilities were up and running, it was time for a full protocol implementation of the OCPP-J 1.6. In addition to supporting the protocol, the system would need to match the features and capabilities of the old platform for seamless switch over. At this point it was clear that using the new architecture, there were quick and easy optimizations that could be done. The optimizations could lessen the load that the virta-core had to endure and create a better user experience in the form of faster charging station reaction times. A caching layer was added to the otherwise stateless microservice, so that certain messages from the charging station did not require a round trip to the virta-core.

Before going live with the first stations in production, careful load testing was performed. Large amount of simulated chargers were created to a test environment to model the load that real chargers would bring in production environment. With the load testing it was possible to demonstrate performance gains achieved with the caching and the more efficient architecture.

Slow and steady production rollout

The production rollout required infrastructure to deploy the new services to. Our infrastructure team set up a Kubernetes cluster to run the different parts of the services that make up the Chargers Hub. Once the CI/CD pipelines were configured and the deployments to production ran flawlessly, it was time to connect production charging stations.

From the start it was clear that it would be necessary to be able to easily toggle which backend the charging stations are connecting to: the new Chargers Hub services or the old virta-core implementation. Issues were sure to crop up during the rollout and minimizing impact required easy switch on and off of the new implementation at a granular level. The charging stations in the Virta network are configured to connect to a group of specific hostnames depending of their protocol and other variety of factors. DNS and the load balancer were used as the toggle points between the old and the new system. This allowed easy and granular control about which chargers are connecting to which backend at any time.

Production rollout was started with a single device and quickly expanded to a handful of chargers near the Virta headquarters. There were issues of course, but getting quick feedback from the EV drivers inside the company using the chargers allowed ironing out the initial issues very quickly. Pretty soon the rollout was expanded to more stations and charger models. The task was integrating tens of thousands of physical chargers and ensuring they all work flawlessly in all the different scenarios. This boiled down to doing the following steps on repeat:

  1. Connect additional chargers to the new backend
  2. Monitor closely
  3. On any issues or anomalities, revert back to the old backend
  4. Use the simulator to replay the captured messages from chargers with issues and debug
  5. Deploy fix
  6. Start over

Virta platform supports around 150 different models of charging stations with over 600 different configurations in total. Verifying that each one of these works without any issues is just not possible in a lab environment. Being able to quickly and easily toggle between the new and old implementation allowed testing and building the support for the quirks of each model with minimum of disruption. Migrating all the chargers and all the protocols took over a year. Lots of surprises were encountered along the way, like that some devices were very picky about the whitespace in the xml -formatted messages it received.

The last chargers were successfully migrated in 22.4.2021.

What’s next

Being able to toggle between the old system and the new one has provided a fail-safe mechanism to avoid network wide outages in the event something goes wrong. But keeping the old system around just for that purpose is not desirable. For the fail-over purpose an independent backup cluster will be set up before completely retiring the old system. The traffic can be re-directed to the backup cluster in case of a disaster.

Another development is that the stateless middleware will be supplemented with stateful services holding the source of truth information on everything related to the charging stations current and historical status. This is accomplished in the projects called Chargers Hub Sage, Historian and Charge Data Mart.

There are also developments regarding the Plug & Charge implementation and many more features adding capabilities to the charging networks powered by the Virta platform.

--

--