Machine Reasonable APIs and Regulations

Our world runs on many distributed systems. This is especially true of financial markets – an industry plagued by expensive inefficiencies. The barrier to meaningful progress is the ambiguity of rules by which financial systems are expected to operate and communicate. Recent breakthroughs in artificial intelligence empower machines to reason about infinitely many behaviours of complex algorithms automatically. These advances will fundamentally change how financial markets operate and pave the way to further innovation.

Why there’s so much noise and waste

In February, 2018 the Financial Conduct Authority (top UK financial regulator) updated its guidance on algorithmic trading in the wake of new MiFID II regulations. Here’s an excerpt:

If you’d like to trade on a public exchange (e.g. The London Stock Exchange), you need to understand their FIX specification describing what types of orders they support and their relevant business logic (FIX is one of several protocols used in trading). As an example, let’s look at the FIX ‘specs’ of the top 5 European venues (there’re hundreds of public exchanges and thousands of systems that use a similar approach):

  • London Stock Exchange – 196 pages
  • Xetra – 158 pages
  • Eurex – 103 pages
  • Euronext– 122 pages
  • CBOE Europe – 74 pages

That’s a total of 653 pages of dense highly detailed information about how these venues expect you to connect and interact with them. And that’s just the ‘meat,’ so to say; there are other specifications for market data dissemination, trade reporting, etc.

Let’s assume we’re creating a system that will trade on these five venues. How many distinct scenarios (“edge cases”) must be considered to ensure the system is compliant with the FCA guidance?

Technical note: Folks in finance refer to these connectivity APIs as ‘specifications,’ but they’re no different than APIs used in general software development, just more complex than, e.g., a typical Stripe API reference (and with much greater risks if you get them wrong). So we will use the terms ‘APIs’ and ‘specifications’ interchangeably going forward.

There’s also the term fully tested that the FCA uses: notice that neither the FCA nor any other regulator (e.g., SEC, CFTC, ESMA) has defined “fully tested,” “thoroughly tested,” etc. The way it typically works in banking is that you measure thoroughness of testing by the amount of money you spend doing it.

Why? Well, that merits another post – but for now, rest assured that regulators have not put forth any quantitative metrics to measure the quality of testing nor complexity of the systems they’re supposed to regulate.

By the way, it’s not that regulators don’t like quantitative metrics; they love them: look up Risk-Based Capital Requirements and Leverage Limits from The Dodd-Frank Act! It’s just that when it comes to balance sheets, accounting, market risk, etc., you should apply mathematics. When it comes to algorithms, you should apply handwaving. Why? Perhaps they’re unaware that there’s a dedicated field of science that can help them analyse algorithms and how they’re tested, in much the same way that accounting allows you to understand the “health” of a balance sheet…

To make matters worse, venues themselves constantly undergo upgrades and changes to their infrastructure. Consider the Irish Stock Exchange, which was recently bought out by Euronext. Before the purchase, it operated on technology developed by Xetra – the German competitor of Euronext. I highly doubt that ISE will continue to run the Xetra technology stack long after the buyout. Will they have to switch over to Euronext technology? Most likely. If so, will all of those folks that connect to ISE need to retest? Yep. And this is not just about outright exchange purchases – exchanges undergo upgrades because of new features (e.g. new order types), new technology, etc. all the time!

This issue of software systems being unable to clearly communicate with each other is actually much deeper than a connection between a firm (broker, bank, etc…) and an exchange. All of the financial markets run on a multitude of distributed systems. If you slice a typical investment bank in half, you’ll see thousands of trading systems, booking systems, databases, etc. all ‘speaking’ with each other. Why are there so many legacy systems inside big financial organisations? Because no one truly understands how they’re all supposed to work with one another, and replacement and upgrades are thus often far too risky.

Why is that? For the same reason it’s been impossible to comply with FCA’s guidance – people responsible for a particular system typically describe how it’s supposed to interact with another through word-of-mouth storytelling and outdated/imprecise documentation. Once they leave the firm (people in finance move between jobs frequently) or work on a different project, most of that knowledge is lost. Since the information was imprecise in the first place, it’s now very difficult to change the system because it’s difficult to understand what consequences it has on those that rely on it. And you never want to be the person responsible for some client not getting his fills at the end of the day…

And then there was Aesthetic Integration…

We started Aesthetic Integration because we saw a fundamental need and an incredible opportunity. As software in finance (and elsewhere, like autonomous vehicles) is becoming increasingly complex, mission-critical and regulated, old ways of hacking things together with a hope they don’t break no longer scale. Hacking must be replaced with mathematical rigour. The ability to reason about the infinitely many possible behaviours of systems we rely upon is no longer a nice-to-have. It’s a must-have. The opportunity comes from recent breakthroughs allowing us to deliver methods historically reserved to highly specialised teams at places like NASA into the hands of developers (myself included) without a PhD in mathematics.

To realise this opportunity, we’ve been busy:

  • Technology: We’ve developed a best-in-class automated reasoning engine packed with the latest breakthroughs in artificial intelligence and formal verification. It’s called Imandra and has now been successfully deployed and hardened in some of the most complex and highly regulated industrial environments in the world. We will soon open Imandra up to the public via our Reasoning as a Service (TM) APIs… Stay tuned!
  • Coverage: We’ve focused first on venues and protocols (e.g., FIX) used to connect clients to the venues. There are many types of systems in finance – take a Smart Order Router for example. We can analyse it for safety and test it rigorously, but unless we have precise knowledge about the behaviour of the many systems it connects to, our results would be limited. (You can read more about the application of Imandra to various types of financial algorithms in our recent paper published as part of CADE 26, one of the top academic conferences in computational logic and formal verification.)

Speaking of opportunities… Last May, a venture capitalist from SF wrote an interesting article in the American Banker, “Fintech’s future is in the back end”. Here’s his thesis:

“And so, we have a $241 billion opportunity that, to date, the venture community has almost completely ignored. The irony is that while the first wave of fintech focused on consumer facing products, the real innovation for consumers will come from back-end improvements.”

He gets it.

The first step to unlocking this tremendous value (the $241 billion refers to annual spend on technology and compliance within financial markets) – is to ensure we can precisely understand and control how these systems communicate with each other. To make this a reality, the information about how systems “speak” to each other must be encoded with mathematically precise meaning about all of the possible interactions – we call this being Machine Reasonable.

After nearly four years of working on the underlying reasoning engine, Imandra, and 18 months of intense work on a new language on top of Imandra tailored for encoding financial Machine Reasonable APIs, we’re happy to announce the public release of Imandra Protocol Language. You can download it from Microsoft VS Code as a plugin (at the time of writing, it already has over 3k downloads!).

With the Imandra Protocol Language (IPL), machine reasonable API specs are now a reality for financial markets.

This is a big step forward for the industry.

How does it work now?

Let’s step back a bit and see how things have worked in a pre-IPL world. We’ll use the client/venue/regulator example, but this as equally applies to connections between internal systems within a bank, or two banks connecting with each other, or a provider like Bloomberg connecting a client with a broker through its famous terminal…

When you wish to connect to a trading venue, you need to decipher their rules. The rules typically follow a common trading protocol (e.g. Financial Information eXchange, or FIX) and would be available via a PDF document that may or may not have errors:

What’s wrong with this picture?

On the Venue side:

  • The PDF is not connected to the actual production system. Hence it will be difficult to systematically spot any behavioural deviations between the actual system and the documentation being sent to clients and regulators.
  • Because the PDF is ambiguous, many clients will have questions – so the venue would have to staff up to ensure they can support the product (and ensure that ‘clarifications’ made to clients are made to all market participants).

On the Client side:

  • Since the client is connecting to many firms/venues, they have to hire staff to decipher all the PDFs they have to deal with.
  • The client has to understand how to incorporate changes within the PDF into its own systems.
  • The client will have to test the systems. Typically a separate team altogether would be involved (regulators demand clear separation of design/development/testing functions).
  • The client also has to hire compliance to demonstrate (e.g., with respect to the FCA directive at the beginning of this post) that they’re connecting to all of these venues safely. Since there’s no quantitative measure for quality of testing, compliance officers would need to have many meetings with business and technical owners of this connectivity function to feel comfortable with their ‘story’ to the regulator.

“This is why we can’t have nice things…”

There’s a lot of hype around Machine Executable specifications, regulations, etc. How would these work? What if we code up this PDF using a common language like Java, C++ or XML?

Now we can connect the specification to the Venue System – we can run it against the Venue System data and confirm that it is faithful to the actual production system. Great!

What about the other issues? Are they solved now that we have a Machine Executable specification? Not really. The staff on the Client side wouldn’t be able to easily understand changes to intricate edge cases in the behaviour of the API just by reading the code. Similarly, this would not help in any significant way with ensuring/demonstrating compliance, testing the system, etc.

So, what’s missing?

Imagine we had unlimited developer resources and that for each one of these specifications, there was a magical group of 10 million developers that could analyse this specification and:

  • Run it against the Venue system and ensure they’re consistent
  • Tell you precise behaviour that needs to be implemented within your trading system
  • Describe how you should change your test plan
  • Wait for you to test your system and then generate a quantitative compliance report you can send to a regulator

This setup would pretty much solve the issues we discussed.

While we don’t have unlimited resources, we can get a computer to perform these exact tasks. We need to have a Machine Reasonable API and use a powerful automated reasoning engine like Imandra to tell us about edge cases, what they mean for the client system, what should be tested, etc.

Let’s dig in deeper.

Syntax, Semantics and Abstraction

For us to be able to create APIs which machines may reason about, we must ensure that (1) the meaning of these API specifications is precise and unambiguous, and (2) they model the business logic at the right level of abstraction, capturing the relevant properties of the system design.

Let’s start with (1): what does it mean to ensure that these APIs are precise?

There’s quite a great effort underway among UK regulators, the industry and tech companies focused on Machine Executable reporting and regulation as it applies to the numerous responsibilities UK firms have to abide by. Similarly, the FIX trading community has been working on FIX Orchestra. And of course, the International Organization for Standardization has ISO 20022 standard in the works – its purpose is to subsume all other standards and become the single standard for financial services, encompassing everything from credit card payments to derivative securities. All of these are great efforts, but they’re still short of the Promised Land.

Readable, Executable, Reasonable – what’s the difference?

  • Machine Readable – no doubt you’ve heard this one before. People say a document is Machine Readable when a computer can parse a document. Perhaps with some minimal syntax rules. The issue? Junk in, junk out…
  • Machine Executable – this is the one with a lot of current hype. For scenarios like reporting financial statements to regulators, this would do wonders. For example, right now firms may submit their financials via PDFs. But for being able to reason about the possible behaviours and knock-on effects of a specification? We need more…
  • Machine Reasonable = Machine Executable + amenable to automated reasoning. For example, when you have transactional systems that engage in “dialogues” (with a notion of a session state, complex response scenarios, logical conditions on validity of messages with respect to the current state, etc.), you need to encode not only the format of the messages in a Machine Executable format, but also in a manner that facilitates describing symbolically all of the possible interactions this system may have with its clients. In the dark ages, exchanges have had to hire staff to explain to clients how to connect to them. With a Machine Reasonable API, you can replace many (most? all?) of those humans with a computer and get the job done so that clients don’t notice the upgrade.

For us to encode a Machine Reasonable API, we need to use a programming language, but what will it look like? There’re a few requirements we need to discuss, but first, we need to get some computer science vocabulary out of the way:

  • Syntax – the grammar of a program (how it looks)
  • Semantics – the meaning of a program (how it will behave)

(If you’d like a nice description of their differences, check out this link.)

What does it mean for a programming language to have a ‘good’ syntax, but an ‘informal’ or ‘imprecise’ semantics? You may be able to write a program that looks good to you, you can even compile and run it, but you may notice that it doesn’t always behave in a way that you can explain. There are ‘edge cases’ in the meaning of the program that do not agree with your understanding, perhaps because the semantics of the language itself was underspecified and different compilers have the freedom to attach their own interpretation to certain subtle aspects of your program. Imagine we’re writing a program that adds two integers, most of the time it returns the right number, but once a while it’ll tell you that “10 + 5” is “14”. I’m obviously oversimplifying, but this sort of thing happens all the time with informally specified programming languages. (For the more technology-savvy readers, here’s a good paper that cuts to the chase.)

So, it’s very important for us to define a language where we can ensure clear, consistent behaviours of the programs written in it – we want to give it a formal semantics so that all programs written in this language will have a mathematically-precise meaning that can be reasoned about by machine.

To phrase this differently, if I say “Let’s meet tomorrow after sunrise” versus “Let’s meet tomorrow at 10 am CET at Exit 3 of Bank Station”, you can see that the second phrase has much more meaning. The second form doesn’t require you to ask follow-up questions or request clarification. This is a key ingredient – we need to describe API information in such a way that is easily understood by a machine and doesn’t require a human to ask follow-up questions.

And now for (2): what does it mean to ensure that we use the right level of abstraction?

If I was to ask you “Would you like a Guinness?”, then you’d visualise this beautiful glass on the left and have an answer ready. Alternatively, what if I was to ask “Would you like 473.176 CCs of malted fluid coloured by polymeric structures created during the Maillard reaction?”

“Same” question, different form.

We can encode APIs in many languages (e.g. C++/Java) and still have precise meanings, but it’d be very (truly) difficult to reason about their high-level properties that encode business logic. So we must be careful to allow users to encode logic in a way that’s close to their requirements and their understanding of business processes, so when machines provide feedback, it’s meaningful.

In summary, we want to describe APIs with grammar close to the business logic, with formal semantics and at the right level of abstraction.

Interfaces with discipline

It is in our human nature to be ambiguous. We’re simply not wired for diligent step-by-step descriptions with thousands of intricate details. So how can we extract required information about how a complex trading system works from an ordinary human?

We need to have the right user interface. This is as important as designing the language requirements we discussed above.

To give you some intuition – no doubt you’ve seen something like this before:

You were trying to purchase something and left out a digit on your credit card. You get immediate feedback, highlighting in red the error. The same thing happens if you leave out your name or the security code. Also, in the text edit box where the credit card number goes, you cannot enter alphabetical characters (e.g., if you type ‘a’ on your keyboard, nothing will be reflected in your browser).

This user interface is designed to constrain you so you may provide information with the right meaning over to the service as quickly as possible.

If you think having this form online for credit cards is useful and makes your shopping faster, then you’d probably agree with us that a UI that forces a venue to encode its API information in the right way with the precise meaning would also be useful.

Can we have such an UI? I have bad news and good news…

The bad news: creating this kind of user experience for encoding trading system APIs/specs is very **** difficult.

The good news: we figured it out.

Introducing Imandra Protocol Language (IPL)

IPL is a strongly statically typed language with a powerful interface for expressing Machine Reasonable financial APIs (aka specifications) that you can use today. It’s designed to be protocol independent, but our current focus is on Financial Information eXchange (FIX), the most widely used protocol in trading.

It took us a lot of work, but we pulled it off. Combined with our Imandra Reasoning as a Service (TM) cloud infrastructure, it’s designed to green-check those question marks we went through above:

Here’re some of the features (for the complete list, see the VS Code page or the documentation):

IPL has built-in support for all recent versions of FIX data dictionaries.

Look up available fields and values with built-in editor autocompletion.

The editor will highlight any missing required fields that your model must support.

Locally declared records with linkable fields to messages and record types from FIX data dictionary.

Explicitly set Float types (Qty, Price, etc.) precision for your model.

Access documentation with a mouse-over.

IPL is a strongly typed language, making it harder for you to introduce errors.

A user cannot introduce two fields with the same name - an error will be thrown by the editor.

Extend the standard FIX enums, records and messages with custom entries.

What can you do with IPL model and Imandra?

  • Automatically generate precise documentation – elegant, error-free and consistent with the actual Venue system
  • Clearly demonstrate behavioural differences between different versions
  • Automatically generate test cases and explain what they’re actually testing
  • Create on demand test environments for your clients and then provide them with a quantitative description of the behaviours they have tested (or not)
  • Automatically generate a skeleton of the IPL model from your existing FIX sessions
  • Much, much more…

Here’s a tease of an automatically generated analysis page from the IPL model (demo site coming very soon!):

You’ll notice many sequence diagrams like this:

They’re derived by Imandra automatically from the IPL model – it understands all of the symbolic paths through the model and can explain conditions leading to each symbolic behaviour using plain English! And, from this information Imandra derives concrete test cases and can audit real-life test sessions and show you behaviours that you’ve hit (i.e., your FCA compliance report).

Here’s a quick example of auto-generated documentation:

We’re also working on technology that will eliminate many kinds of tests altogether – stay tuned!

Structure of an IPL model

Let’s pop the trunk and see what’s inside (for further details, please see the documentation page).

What does a model of Machine Reasonable API written in IPL look like? How does it allow us to encode infinitely many behaviours in a concise and precise way?

Think of modelling an API (equivalently, a specification of a trading system interface) in IPL as describing an Infinite State Machine. Note that the word “Infinite” is there because a typical API must account for infinitely many possible behaviours. For example, you might send a request for a new order and get a rejection back. Or get 1 fill, 2 fills, 3 fills, or 1000 fills. How can we elegantly encode all of these possibilities within a succinct, yet precise model?

It is true that protocols like FIX contain a finite number of states of a session (New, Cancelled, Expired, …), but if we treat that as the entire state, then we’re limited in our ability to express the actual behaviours that this system may have. So, our state must be much richer than that.

There are three main components to an IPL model:

  1. Inbound/Outbound messages
    Messages are the inputs and outputs of the model. Some inbound messages are valid and some are invalid, and must be rejected. The logic to encode validity of a message must be encoded and it may depend on the current state (see below).
  2. State
    The state is the variable containing all ‘data’ for this client session. For example, it may contain the current order status – Ack’ed, Rejected, Filled, etc… It may also contain the filled quantity of the order and the average price of the executions thus far.
  3. Actions
    These represent events internal to the system that may cause the model to change the state and potentially submit an outbound message. For example, a time change event would be an action and logic that defines how it should be handled would check whether the order should be expired. For those of you with a more technical background, this is not too far from the concept of actions in the TLA+ language.

For concrete examples, please see our IPL Tutorials and our GitHub repo.

Blockchain, blockchain, blockchain!

What post on innovation in finance can afford to leave out blockchain?

A lot of the early excitement about blockchain was that it would replace the old and obsolete financial systems. The logic was (is?) that you can simply swap out the old databases/communication processes with Ethereum (or Corda, or …) and things will just fix themselves. Well…

Here’s a nice description of the recently concluded blockchain PoC that SWIFT ran for their Nostro accounts. They figured out that it was very difficult for them to do anything meaningful without first getting a grip on their current systems and how they communicate with each other:

I see this as a major setback for banks’ adoption of Blockchain. If banks’ back office processes and technologies need to be transformed before DLT can be used, we may have to wait for a good part of another decade.

It all comes down to Machine Reasonable APIs… So now you know something that SWIFT doesn’t.

What’s next?

We’re working on the public launch of Imandra Markets – a cloud ecosystem for sharing Machine Reasonable APIs encoded in IPL with your clients and counterparts. We should have a demo site up and running within the next few weeks. Email us at contact@aestheticintegration.com if you’d like to stay in touch.

Stay tuned!