How to Comply with MiFID II Recordkeeping without Killing your Low Latency Trading Performance
The new Corvil Application Agent lets you instrument your low-latency applications with just 10 nanoseconds of overhead.
You really can’t get a lot done in 10 nanoseconds. Light travels just three meters. Today, the things most commonly measured in nanoseconds are nuclear reactions and low-latency trading.
In 2011, when Corvil introduced nanosecond resolution for our latency reporting, even microseconds seemed like overkill to most people. But today, firms are routinely tracking changes in the the fill-rates of latency-sensitive algorithms in increments of 100 nanos. In the right places, for the right trading strategies, raw speed still wins.
In these environments, MiFID II presents particular additional challenges. All trading covered under the MiFID II HFT rules will be obliged to log decisions to trade and receipt and transmission of orders with decent accuracy. RTS-25 specifies 1 microsecond resolution with deviation from UTC of no more than 100us (see our earlier posts for more discussion on MiFID II and RTS-25).
The extra headache for those covered by these rules is that this kind of logging generally means you’re going to slow down your low-latency components: feed-handling, trading decisions, and order processing. If 100 nanos is going to impact your fill-rates, that really doesn’t leave you any slack for adding in the required event logging.
Through late 2015 and into 2016, this topic came up increasingly frequently in my conversations with customers, as MiFID II moved from something distant to a visible deadline. So we started to think about the problem. Here’s what we came up with.
Key observation 1: Not all events need to be logged with software timestamps.
The MiFID II specifications are explicit about when you must log HFT events with microsecond resolution and < 100us divergence from UTC. But, reading the regulations, it seems clear that many reportable events can be measured directly from the network, with zero changes to applications. The words used are “The exact date and time of the submission of an order to the trading venue or other investment firm” and “shall be able to identify the exact point at which a timestamp is applied and demonstrate that the point within the system where the timestamp is applied remains consistent…”
Industry bodies such as STAC, as well as customers I’ve met, are proceeding under the assumption that wire timestamps are ideal for defining the time of receipt and transmission of orders, as well as gateway-gateway response time.
This lets you offload the requirement for logging from the applications themselves, and instead timestamp the network packets carrying the data directly. Using well-defined network timestamps to log the receipt and transmission of orders, or the gateway-gateway latency, should comply with the regulations.
This allows you to capture and timestamps the events you need, accurately, with zero impact on your apps.
Key observation 2: Well implemented application instrumentation can have very low overhead.
Some MiFID II reportable events, such as matching, occur in software and are most naturally timestamped in software as opposed to the network. On a case by case basis, arguments based on the system design may show that network events are guaranteed to be “close enough” to comply with the strictures of RTS-25. (e.g. FPGA “software” events publishing to the network in a fraction of a microsecond).
But that still leaves us with cases where events need to be captured from inside apps, with accurate UTC-synchronised software timestamps.
Setting aside the issue of server UTC synchronisation (we’ve written a whole eBook about clock synchronization if you’re interested) , the remaining problem is how to capture events with the lowest possible overhead.
Traditional approaches fall into a few categories, with decreasing overhead:
- Traditional logging via log-files or syslog (tens of microseconds)
- Logging offload to separate thread (microseconds)
- Hardware assisted offload, using shared memory to log to a hardware logger (hundreds of nanoseconds)
Even with the hardware-assisted approach, getting per-event latency much below 100ns has generally not been possible until now.
When our engineers focussed on this problem, the first point they jumped on is that the minimum possible overhead for logging an event is the time is takes to read the clock. So, let’s start with the fastest available clock source. It turns out that this is the hardware clock-cycle counter provided by the CPU. It doesn’t tell you the time of day, just the number of clock cycles since the machine was switched on. But you can read it in under 10 nanoseconds.
You then have two further tasks:
- Having taken the timestamp, you have to publish the timestamp and data about the event — let’s do this work after the end of a critical code section (such as, completing the matching event)
- You have to convert the hardware time to UTC — we can do this by carefully interpolating between observations of the PTP-synchronized system time and the hardware clock
This, in a nutshell, is the approach we took when building the Corvil Application Agent, which means an application developer can use our library to publish software events with less than 10 nanoseconds of overhead on the critical code sections.