Formalising the FIX Protocol in Imandra

Financial markets run on many distributed systems – interconnected with multiple loosely-specified protocols such as FIX, SWIFT and ITCH. In this post, we describe our work on formalising the FIX protocol — converting the English-prose description of the session-level logic into a mathematically-precise executable model — and using our automated reasoning engine, Imandra, to mathematically verify the various claims made within the official specification.

Introduction

Protocols for the exchange of financial information form the foundation of modern financial markets. As “the TCP/IP of financial markets,” the FIX protocol relies on English-prose specifications even for its session-level behaviour (e.g., see this link). The only way to ensure that the rules described therein are clear, complete and logically consistent, is to create a formalised model of this protocol logic and mathematically verify its key properties.

At Aesthetic Integration we’ve created Imandra — an automated reasoning engine allowing one to formally (mathematically) verify statements about programs. At its lowest level, Imandra analyses programs written in the Imandra Modeling Language (IML), a subset of OCaml for which we’ve built a mechanised formal semantics. IML defines a functionally pure part of OCaml that can be treated as formal mathematical logic. Imandra is able to reason about code written in IML and, being valid OCaml, the same code can be compiled and executed with standard OCaml.

In this post we’ll describe how we’ve taken the English-prose specification of the FIX 4.4 protocol and created a formal model of this logic in IML and verified the key claims the FIX specification makes (with nuances, of course). What’s special is the ease with which such deep work can be done with Imandra. Indeed, without Imandra, work of this nature is typically only accessible to teams of PhDs (similar to those that work at NASA verifying safety of autopilots). Imandra breaks down the many barriers to this approach, democratising formal verification and clearing the way to further engineering rigour within financial markets.

Financial Information eXchange (FIX) Protocol

The Financial Information eXchange (FIX) protocol is the most widely used protocol for trading within financial markets. The most popular syntax for sending FIX messages is a sequence of tag=value pairs separated by an ASCII \001 character. For example, the following snippet contains a simple NewOrderSingle message (typically used to request a new order to be created by the service provider). For readability, we’ve replaced the\001 character with a vertical bar |:

8=FIX.4.4|9=110|35=D|34=3|49=BANZAI|52=20180531-11:39:25.885|56=IMANDRA|11=ID|21=3|40=1|54=1|55=INTC|60=20180531-11:39:25.885|10=054|

Here’s what it says: the 35=D field tells us that this is a NewOrderSingle message as35 is a tag for the “message type” and D is a value that encodes NewOrderSingle. Other fields of the message are decoded in a similar manner: 60=20180531–11:39:25.885 tells us the transaction time (tag 60) of the order and 54=1 refers to the side (tag 54) of our order which is “Buy” (value code 1).

The very first field of this message, 8=FIX.4.4, specifies the version of the protocol used in this transmission. Version 4.4 of the FIX protocol is perhaps the most popular version, so we’ll cover it here.

Over time, to take into account expanding electronification of financial services, the FIX community has introduced more and more message types, field tags and ways to encode field values. For example, FIX.4.4 describes 84 message types with 956 field tags. These rules are not typically followed precisely – virtually all firms have their own ‘views’ on what FIX protocol should be. This manifests itself in two ways:

  • They add custom fields and values, sometimes contradicting the original specifications;
  • Quite often, they mix message types from different protocol versions. For example, FIX 4.4 has a OrderMassCancelRequest message (4.2 does not) which is used within specs that are supposed to adhere to version 4.2.

Model architecture

All FIX messages are split into two categories: Application-level and Administrative (or session)-level.

Application-level messages

These are used to carry actual business-related information. Most commonly these are order creation and cancellation messages, various information requests and execution reports, etc. These are the ones that exhibit the most variation between systems implementing FIX.

Dealing with business-related messages requires quite a lot of bookkeeping. To make it simple for our users to encode their business rules quickly, yet precisely, we’ve created a domain specific language — the Imandra Protocol Language (IPL). IPL consolidates the description of all the business-dependent parts of the system in a compact and comprehensible way. From the IPL model, the IPL compiler automatically generates the OCaml/IML code for the parsers-encoders, type-converters and the fix-venue-model, while all the session-level logic is covered by thefix-engine model we’ll cover below.

OCaml (hence IML) is a strictly typed language and even before our work on formalisation of FIX specifications, we’ve discovered a number of inconsistencies in the FIX XML data dictionaries at the type-checking level. For example, the MiscFeeType field, introduced in FIX.4.0, is defined to have a “char” type:

But in later versions new values are added to the specification, that seem to assume that the value encoding is integer:

The IPL compiler automatically finds and corrects such typing errors in the FIX protocol data dictionaries.

You can read more about the IPL language here and here. We will now continue examining the fix-engine session-level model and the ways we can use Imandra to reason about it.

Administrative or session-level messages

These handle the low-level details of session control – ensuring that the business (application) messages arrive in the correct order and are recovered if lost.

There are Logon and Logoff messages for starting and terminating the session, Heartbeat and TestRequest for monitoring the connection, and ResentRequest and GapFill messages for retrieving missing messages if a sequence gap was detected. Representing the session-level logic formally is a complex task, which is explained in detail in the discussion that follows.

Creating a formal model of the FIX protocol

Analogously to the separation between application- and session-level logic, we’re going to create two models. The fix-engine model will be covered in this post. This model contains session-level logic. The second model,fix-venue-model, is for application logic and is generated by IPL models as describe above. The two models are loosely coupled and can be formally verified separately (or analysed modularly in tandem).

In order to be able to compile and execute the code we also need two communication layers:

  • At the front-end of the system, we have FIX parsers and encoders that read/write the raw messages on the wire and convert them to a strictly-typed IML representation. The layer is also responsible for detecting garbled/malformed messages and informing fix-engine about them.
  • Between the fix-engine and the fix-venue-model a type converter is needed to translate between datatypes used to represent message data in both models —in the fix-venue-model only business-logic relevant information is kept in message datatypes, while infix-engine we keep all fields and messages of the FIX specification.

The fix-engine model is almost completely independent of the business logic of the modelled system. It only requires administrative information in the headers of application messages and passes business payload to the fix-venue-model. On the other hand in the fix-venue-model, the parsers-encoders and type-converters are all highly dependent on the business logic.

Modeling the fix-engine

At Aesthetic Integration we’ve adopted a standard way to construct formal models of message-driven systems. At the top of the model we have a single datatype that holds all the data needed to describe the system at a given moment, including incoming and outgoing messages. We call this record type state. Together with this state type we define a one_step transition state -> state function, which performs a single logically isolated step of the simulation and returns the new state after the transition.

Following this pattern, for the fix-engine we introduce a global record type fix_engine_state which contains all the necessary information to describe the state of the FIX session. The fix_engine_state includes the fields for incoming and outgoing FIX messages and another pair of fields for the so-called “internal” messages:

Internal messages are necessary to inform the fix-engine of other non-FIX events that might occur. Most notable is the IncIntMsg_TimeChange
internal message — it is the “clock tick” event that allows the model to deal with various time delays. Communication of the application data with the business-level fix-venue-model also happens via internal messages: if a FIX application message is received normally, then its payload is wrapped into an OutIntMsg_ApplicationData internal message and placed into outgoing_int_msg. And vica-versa, the model can communicate outgoing business data, using the IncIntMsg_ApplicationData internal message.

The state record also contains a “mode” entry of fix_engine_mode type:

It describes the current “mode” of operation of the fix-engine model at a given moment. The behavior of the model can then be roughly described as a state-machine transitioning between possible modes.

The diagram above gives a bird’s-eye overview of the model transitions. Before the logon handshake and after the logoff is performed the model is in the NoActiveSession mode. Both logon and logoff can be initiated either by an incoming FIX message or by the model itself.

Most of the time the model is expected to be in the ActiveSession mode — processing incoming messages normally and passing business application data on to the fix-venue-model. Anomalies in the incoming FIX messages invoke transitions to corresponding modes, allowing the model to recover from the encountered problem and return to the ActiveSession mode:

  • If a sequence number gap in the incoming messages is detected — the ResendRequest administrative FIX message is sent and we transition to the Recovery mode. The model then collects and replays the resent FIX messages, filling the detected gap.
  • Upon receiving ResendRequest ourselves, we gather the requested messages and transition to the Retransmit mode, where we send out the collected messages.
  • When processing the TimeChange internal message, we might notice that we didn’t receive a Heartbeat for a period, exceeding the pre-agreed heartbeat interval. In that case we “ping” the other end with a TestRequest message and transition to the WaitingForHeartbeat mode. Failure to receive the requested heartbeat message in that mode results in an abnormal session termination.

Verifying and using the model

To formalise “the English-prose” specifications we break them into logically distinct Verification Goals (VGs) – functions encoding expected behaviour of the model — often covering infinitely many possible scenarios — that return a Boolean value. Under the hood, Imandra will convert these into theorems and will attempt find a concrete counterexample demonstrating why the property is False or, in most instances, return a proof that the property holds.

Consider a simple requirement: “Messages that are garbled will be ignored (sequence counter will not be incremented).”

The theorem has a typical implicational structure for a VG:

(cond1 state && cond2 state && cond3 state) ==> conclusion state

Meaning that the conditions 1, 2 and 3 imply the conclusion (for convenience, Imandra introduces the logical implication operator ==>). The conditions and conclusion are dependent on the state parameter of the theorem. In the example the theorem assumes the following conditions:

  • the state has an incoming FIX message and it is Garbled
  • the state doesn’t have any incoming internal messages (these always have priority over FIX messages)
  • the state is in an appropriate mode for receiving FIX messages

If all these conditions are met, then the theorem concludes that calling one_step on this state results in another state' that is equal to the original one except for the incoming_fix_msg field (which gets cleared). Executing the theorem code in Imandra we check that the theorem is, indeed, formally correct:

val incoming_msg_garbled : fix_engine_state -> bool = <fun>
val garbled_are_ignored : fix_engine_state -> bool = <fun>
3 base ground recursive instances
[✓] Theorem proved.

As one can see, the formal language of the VG reveals a number of implicit details in the “English-prose” formulation of the specification, making it much more precise. Furthermore, we can use Imandra to discover these implicit assumptions by first encoding a “naive” version of the theorem and having Imandra disprove it, providing us with a counterexample — a concrete state that violates the statement of the theorem.

Say, we want to formally check that the following property of the fix-engine holds: “If the incoming sequence number is greater than expected, it indicates that messages were missed”. We encode it in IML like that:

Again, we require the state to be in ActiveSession mode without any incoming internal messages, and some incoming FIX message with a sequence number exceeding the expected one. The theorem concludes that the state should transition to the GapDetected mode. Everything looks OK, but Imandra fails to prove it:

module CX : sig val state : Fix_engine_state.fix_engine_state end
Counterexample (after 3 steps, 0.124s):
[✗] Conjecture refuted.
Error: out_of_seq_leads_to_gapdetected is not a theorem.

Examining the state in the generated counterexample CX module, we discover that its incoming message is an administrative sequence reset message without the GapFill flag:

And indeed, we have missed a special case mentioned in the FIX specification: “For SeqReset-Reset ignore the incoming sequence number. The NewSeqNo field will contain the sequence number of the next message to be transmitted.

Adding this sequence reset message as an exemption in our theorem, we get another counterexample. This time, it is a business message with a Possible Duplicate flag:

The FIX specification is not very clear on what one has to do with the Possible Duplicate messages that have gaps in sequence numbers. At this point we can either treat this as a bug in our model or as something that we shall add as an explicit assumption in our VG. Choosing the second option we arrive at the final formal specification of the property “If the incoming sequence number is greater than expected, it indicates that messages were missed” with all the implicit assumptions made explicit:

The theorem gets proven successfully:

val incoming_wrong_sequence_number: fix_engine_state -> bool = <fun>
val incoming_not_a_sequence_reset : fix_engine_state -> bool = <fun>
val incoming_not_a_poss_duplicate : fix_engine_state -> bool = <fun>
val out_of_seq_leads_to_gapdetected:fix_engine_state -> bool = <fun>
3 base ground recursive instances
[✓] Theorem proved.

The procedure outlined above illustrates how one can use Imandra to iteratively create the Verification Goals and prove their correctness (or find logical inconsistencies in either the model or the specification). The resulting VGs formulate precise properties of the model, explicitly mentioning all the possible edge cases.

The fix-engine model and the collection of the formally verified VGs establish the mathematically rigorous description of the FIX protocol. In addition to that, the model is also executable — since the model is written in valid OCaml, it can be compiled and run as session control module of a FIX client. In fact, we use this FIX engine model in our work for live on-exchange trading.

As a result, we have a formally verified executable model for a session level logic of a FIX protocol, together with a collection of unambiguous, mathematically rigorous and proven for correctness statements about the properties of this model. We hope this project will be the beginning of an industry-wide effort to mathematically formalise the rules and algorithms that run the financial markets. This formalisation will significantly benefit the industry, cutting downtime and costs for connecting (and ensuring regulatory compliance of) the numerous systems relying on this protocol.

To stay in touch with our updates, please sign up on our website.