Introducing Rafiki

Our new Interledger Connector implementation

A few months ago the Cape Town-based Coil team set out on a mission to re-think the architecture of an Interledger connector. We have since finished implementing this new design, and describe it below, along with some of the decisions we made and motivations we had along the way.

Rafiki

We could have just called our project Connector v2 but that would have suggested it was simply an update from the old. We have re-used significant parts of the old code-base, and where possible ported all of the existing tests, but the reality is, this is a whole new beast.

We also believe that for a while (maybe forever) both connector implementations will be used on the network, so rather than confuse things by suggesting our implementation is an “upgrade” of the other we decided to give it its own name.

We settled on the name Rafiki for a few reasons. It appeals to our team’s African roots, being the Swahili word for friend (exactly the kind of connector you want to peer with). It is the name of the coolest character in The Lion King, voiced by South African legend John Kani. It is also the title of a Kenyan film, released to critical acclaim, but banned in its country of origin for breaching archaic local laws. (So you can see why we like the name).

Motivations

The motivations for this work come from observations and experiments, both recent with the team in Cape Town, and spanning over the last few years as part of the team that invented Interledger at Ripple.

1. Isolate the router.

I have always had an issue with the format of the ILPv4 packet and how it reconciles with my mental model of the protocol layering in Interledger.

The debates over this are long and well-documented in the mailing list archives and Github issues, but in summary, I don’t like the amount and expiry in the packet headers. To my mind this is data that is only relevant bilaterally and so, while it’s useful to provide standards for encoding them (uint64 for amounts and a fixed timestamp format for the expiry) I wasn’t convinced they belonged in the packet headers.

The argument in favour of the amount and expiry being in the packet headers has always been that they are part of the Interledger layer because it is crucial for the ILP module to know the incoming amount, asset, scale and expiry in order to set the correct outgoing amount, asset, scale, and expiry.

It was only when Stefan Thomas described a potential optimised design for a connector a few months ago that it dawned on me why this had a bad smell.

Stefan’s idea was to replace the existing rates backend with static rate tables, normalise all incoming packets to a single currency, and then convert them again to the outgoing currency AFTER routing. Likewise, one could apply adjustments to the expiry on incoming and outgoing packets, outside of the routing module.

As I understand it, his motivation was to remove the expensive calls to a rates backend during each routing operation so that routing could be very fast. But for me, this was a eureka moment because it helped me identify where I thought the layering had leaked.

In connector implementations today the ILP module operates on all of the packet headers, so they are all required. But by applying Stefan’s design the dependancy on this data is removed and the ILP module is back to simply looking at the address and routing the packet.

The ILP module is just a routing module.

The caveat is that this data IS needed on the other side of the router to decide how much to offer the next peer so, while this new design makes a lot of sense it also makes me realise that the most efficient way to pass the amount and expiry through the ILP module is in packet headers.

As we talked this design over in Cape Town, Matt pointed out that what we actually had in the current connector was a system combining both protocol and business rules around a router. So we set out to free the router from business rules and let the ILP module stand alone.

2. Make things modular

Our second motivation was driven by the recent challenges we’ve noted in scaling up connector operations when the volume of packets is too high for a single instance.

In theory one should be able to shard out the existing connector and spread the load, but in practice this has been challenging for a variety of reasons.

Inspired by some of the ways this problem has been approached in the Mojaloop project, where the services are designed from the start to run in cloud environments and scale dynamically, we set about seeing if we could break the current connector up into atomic units that could easily be connected together in ad-hoc configurations.

3. Support dynamic configuration

I’ve been banging this drum for a while so I’m glad to have had Matt and Don join me in the last few months to help me make it a reality. This is probably one of the biggest issues with existing connector deployments today. DJ has made some great efforts to improve things in the last few months but the plugin model and the fact that we hot-load plugin modules has made this difficult.

Needless to say, having to bounce your connector every time you want to add a peer, or change a setting is a nightmare. Not only that, but you almost certainly lose money because the balances you’re tracking for peers are reset and you lose any changes since you were last settled.

So, without completely abandoning the current configuration options we set about seeing if we could design something that supports hot reloads of configuration for new or existing peers and an architecture that makes sense in the context of cloud platforms where control and operations are well segregated.

4. A better way to peer

I’ve never been a huge fan of the plugin system but the reality is, it has worked pretty damn well until now. The real challenge is that as the network and some operators scale we have started to see a need to manage business operations like settlement (a primary function of plugins) separately from packet processing (something the the traditional payments industry consider old-hat).

Settlement is a business function that needs to sit far away from the payment processing pipeline but this doesn’t mean it can’t be fast and keep the settlement risk close to zero. The proposals and experiments I presented over the last few months looking at potential new bilateral protocols were all done to explore ways to achieve that separation.

In an ideal architecture connectors establish a very simple channel with their peers over which they exchange nothing but ILP packets. These may contain messages related to settlement but it should also be possible for settlement to be handled entirely outside the scope of the connector.

Design

So, what did we come up with? Here’s a summary:

1. A stand-alone routing module and connector

We have built an entirely stand-alone module for routing which consists of two key components, the routing table and route manager.

The routing table, is as its name suggests, is a simple and efficient routing table with an API for adding routes and an API for getting the next hop for an address.

The route manager is responsible for keeping the routing table up to date based on events it processes such as route updates from peers, new peers coming online or existing peers going offline.

The routing module is protocol agnostic. It can manage and route packets for anything that has an address in a hierarchical addressing scheme. This should make it possible to focus specifically on this component when optimising the connector, possibly even building an implementation in C that could be imported into the connector.

We plan to also use this module in our work with the Mojaloop project as a core dependancy of the Mojaloop stack enabling it to route payments across multiple networks using Moja addresses (which follow the ILP format).

The connector is a wrapper around this module, adding the necessary protocol logic to the packet processing pipelines.

It also adds a special peer to the routing table at startup, representing the connector itself (self), and then adds the necessary protocols to the outgoing pipeline for self (e.g. the echo protocol).

2. Separate business logic and protocol logic

With the connector component handling all of the protocol stuff we then needed a way to add business rules to the processing of incoming and outgoing packets.

Rafiki ships with an app implementation that does this (and more) but it’s also possible for someone to just take the connector on it’s own and use this in their own app.

ilp-connector uses middleware pipelines to impose a mixture of protocol and business rules. This makes managing the addition or removal of peers a challenge as the pipeline of middleware is established when the app starts and can’t be changed or customised for each peer.

In order to use connectors in more complex clustered configurations we wanted to allow each peer to have a different set of rules applied to their messages.

Rules

We have replaced the existing middleware with rules. Functionally a rule is very similar to middleware, it processes incoming and outgoing packets on both the request and reply leg. The major difference is that we create an instance of each rule for each peer. Also, rules are very lightweight and not bound to a pipeline, rather they are chained together when the peer is setup.

This allows peers to be setup ad-hoc with different rule pipelines so, for example, an internal peer that doesn’t settle with its peers could choose not to use balance middleware as a way to optimise its processing pipeline.

Some rules require shared state across instances so, where necessary, we set this up and pass it into the constructor of the rule from the app.

Examples of this are the token buckets and caches required by certain rules or cross-cutting services like stats.

Protocols

The logic required for the different protocols such as ILP, IL-DCP, CCP, and echo fit nicely into this model so we have also implemented these as special rules called protocols that are also added to the processing pipeline.

The difference between them is that the protocols are always applied to the end of the pipeline and are managed by the connector whereas business rules are created and managed by the app.

It’s possible to re-use the routing module and write a different connector that uses a per protocol controller pattern (like ilp-connector) but our plan is to allow specific protocols to be disabled or configured differently per peer in future.

3. Ad-hoc configuration

By default Rafiki starts with no peers configured and some sane defaults for the services. An instance of the app is started as an empty connector and then peers are configured one-by-one.

This could be via config read in by the process or via the API exposed by app.

The key here is that all components are designed to handle this ad-hoc addition and removal of peers from the ground up and each peer can have an entirely custom business and protocol pipeline.

It should be trivial to now Dockerise the basic app and manage instances that are spawned in an environment like Kubernetes from a central command process.

A nice side effect of this architecture is that there is no lifecycle management of the endpoints required within the connector. None of the routing code needs to track lifecycle (connect and disconnect) events on endpoints as they do today with plugins.

Instead we added a heartbeat service that sends a ping to peers. Failed heartbeats can be handled by a rule that notifies the route manager to stop routing down the broken route and successful heartbeats can be used to re-enable a previously disabled route.

Like ilp-connector, Rafiki also ships with a default executable script that will start an instance of the app and an instance of the Admin API (no longer bundled into the app) which exposes a REST interface that can be used to manage the app.

We’ve also got a simple, high-performance, settlement engine implementation that is started by the script, but more on that below.

4. Peering and Settlement using Endpoints and a Settlement Engine

We have done away with plugins and instead introduced the concept of endpoints which are solely responsible for serialising and deserialising ILP packets, exchanging them with peers and ensuring that requests (ILP Prepare packets) are correctly matched with responses (ILP Fulfill or ILP Reject packets).

There are not likely to be a huge variety of endpoint implementations; one for each possible transport (HTTP, gRPC, WebSockets, raw TCP). This means we can ship these with the connector and the custom logic related to settlement can stand alone.

That has huge benefits in terms of being able to add peers to a connector in an ad-hoc fashion as we no longer need to hot-load plugins from external modules.

Inside the app packets are already deserialized so there is also no performance hit on the processing pipelines. (We have some ideas on how to optimise this further).

There is also no longer a concept of “sendMoney” or rules that are specifically for that pipeline. Where peers wish to exchange settlement related messages with one another we expect them to use ILP packets in the address space peer.settle.*.

However, we are aiming to implement some of the popular settlement integrations such that they are entirely disconnected from the processing pipeline.

Settlement Models

The way we anticipate settlement happening is through different rules and engines deployed for different settlement models.

We have started with rules optimised for the ILSP use-case where high throughput is required and some flexibility is allowed when deciding if a packet should be forwarded or not based on the current balance.

We expect to ship some other rules for different strategies soon that support strict balance checking and distributed balance tracking.

Settlement Ledgers

Our vision for settlement is for the settlement engine to track balances (via connection to a rule that notifies it of packets it processes) and to trigger settlement and then notify the app as necessary depending on the settlement model.

The actual settlement is triggered by the settlement engine and could be facilitated through ledger specific plugins or even a settlement engine specific to the ledger.

Conclusion

While we still have a lot to do on this project we think this is ready for the community to start tinkering with so we’re shipping a first BETA today.

Please give us your feedback and, if you feel up for it, consider contributing new rules, endpoints or app implementations.

Thanks to Matt and Don for their awesome work making this happen and to DJ, Evan, Stefan, the teams at Strata and Kava who have all given us valuable feedback along the way.

Discussion

If you have comments or questions please post them on the forum.

Rafiki Image Credit: Poster by Felicia Ray