Find My Fills: A Technical Deep Dive

Published in

Proof Reading

13 min readAug 22, 2019

Last week, we announced our newest tool: Find My Fills, a single-stock exploration tool that allows a user to drill down to the events at the millisecond timescale in a given stock. The tool aims to provide some insight into the market circumstances that preceded and followed the executions for a user’s order. It does this by applying a set of heuristics to match the uploaded executions to specific trades on the consolidated trade feed and calculating execution metrics such as markouts, slippage, and reversion.

In this post, we take a technical deep dive into how we built this tool and some of the things we learned along the way. We are making the full codebase for this tool public here and here under MIT license.

High-Level Goals

We wanted to build a tool that could one day become a part of the standard trading workflow for our prospective customers as well as the broader trading community. We were acutely aware that running such a data- and computation-intensive tool at scale can have pitfalls, and we didn’t want this to become a thing that we have to babysit every single day.

Below were some of the high-level goals and the corresponding decisions and trade-offs that we made.

Rich and responsive user experience

We decided to use React + Redux as we were familiar with those, and they are well-established frameworks for creating interactive web applications. React is responsible for view creation, while Redux acts as a state management layer and a pub/sub mechanism that facilitates communication between the model and the view.
For charts, we picked HighCharts for the breadth and maturity of its features, as well as the accessibility and completeness of its documentation. However, HighCharts was created a decade ago and it does not integrate cleanly with React, so an adapter layer was needed that translates React state changes into HighCharts imperatives.
The entire web app was server-rendered and converted to a static site using Next.js and served up using Gitlab Pages.

Keep things simple and standard

We wanted to minimize the number of moving parts and avoid reinventing the wheel at all costs.

Use standard tools for processing data: Python, especially Pandas, is the clear choice for data science applications these days, primary because of the maturity of the ecosystem. As mentioned in a previous post, we’re fans of kdb+/q, but that is currently cost-prohibitive for a company like us.
Avoid hosting or managing any servers: We were already familiar with Google Cloud Platform and decided to keep it super simple and go with the Google App Engine (GAE). For hosting a Python web application, GAE offers a choice of environments depending on the level of control you need on the environment. We picked the GAE “Standard” Environment with support for Python 3.7 and flask (as opposed to the GAE “Flexible” environment, which has more capabilities but also requires more upfront setup).
Avoid hosting or managing any databases: This was an easy decision as we were sure we did not want to host a tick database containing historical market data. We think that a tick database should be a commodity item by now that is essentially the same in every firm across the industry, yet there are few compelling solutions available where you don’t have to go through the hassle of hosting and back-populating a multi-terabyte time-series database. Polygon.io comes close, and we decided to continue using them. As you’ll see below, this isn’t without its trade-offs.

Avoid information security pitfalls

Most users of this tool would consider their execution information to be sensitive data. We decided right up front that we would not store this data in any layer of the system. In the immortal words of Mr. Miyagi, “Best way to avoid punch, not be there.” We even considered doing all or most of the analysis using JavaScript in the browser, but eventually decided against it for various reasons, the biggest being that JavaScript is simply terrible with numbers (all numbers are 53-bit floating-point numbers).

System Architecture

Polygon.io — Works Fairly Well

Polygon.io provides a simple, powerful, and reliable RESTful API for retrieving chunks of historical market data based on parameters such as Symbol, Type of data, and Start/End times. Note that we were working with the v2 API, which is not fully public on the website as of the time of this writing. The main features we used were:

Aggregated OHLC (Open, High, Low, Close) bars: the most granular bar duration available is 1min, and you can only query for one or more full days at a time
Full trades for a given time range: includes Venue codes, Sale Condition codes, and High-resolution timestamps (nanoseconds)
NBBO for a given time range: available as two-sided quotes with Price/Size/Venue for bid and ask, although only one Venue code is included on each side. We assume this is the venue that set that side of the NBBO.

The REST endpoints for these features include these notable parameters:

Limit: This is the maximum number of records that a query response will include. The max value of this parameter is 50,000. It is easy to see that this parameter was added as a safeguard against very large requests, but it was still inconvenient as we had to write our own pagination logic to ensure that we didn’t inadvertently retrieve partial data for a request.
Reverse: When set to True, this parameter reverses the time direction of the query scan. We combined reverse=True with limit=1 in the NBBO query to get the prevailing NBBO at a given point in time.

Comparison to a full tick database

While setting up a tick database is a hairy and expensive proposition, it does have its advantages:

Data Proximity
A typical pattern in the analysis of tick data, or any big data, is that the query engine needs to scan, filter, and aggregate vast amounts of data, only to produce a result set that is much smaller in comparison. There are two ways to achieve this result:

Transport all of the underlying data to an analytics server, where it can be filtered and aggregated to produce the results
Bring the analytics code to the data, where the analysis can be performed inline with the large scan to produce the results

There are pros and cons to both approaches, but the obvious benefit of the second approach is that you can avoid shuttling large amounts of throw-away data to the analytics server. For that to be possible, you need to have access to a database with server-side scripting capabilities (sometimes also known as Stored Procedures or User Defined Functions). When using a hosted tick data service like Polygon.io with Pandas, you are constrained to the first approach for the most part.

Richer set of operations
We expect that Polygon.io will continue to expand their API to support new operations as requested by their user base, but for now, we really missed the ability to perform as-of joins and window joins, which would have simplified the whole process significantly.

Performance
Polygon.io API calls take anywhere from tens to hundreds of milliseconds, depending on the type and size of the query. We suspect that much of this latency arises simply from having to make an HTTPS call and the serialization of a relatively large chunk of data to JSON.

This latency quickly became noticeable with our execution analytics feature, where we were analyzing tens or hundreds of executions at a time. To calculate markouts, we needed to retrieve the prevailing NBBO at one second past the time of each execution, which required us to make one REST API call per execution. With each call taking ~500ms, this was an unworkable situation. The only way we could think of mitigating this performance issue is by using 16 worker threads to make the API calls in parallel. This solution is neither ideal nor scalable, but since we don’t have access to all of the underlying NBBO data, and Polygon.io does not support server-side join operations, this is the best we can do for now.

Tape Matching — Fun And Easy

The process of finding one’s own executions/fills on the consolidated trade feed is known as Tape Matching. The key insight here is that if you want a view into the market microstructure events around your executions, you need to have precise timestamps. While the market data feeds include nanosecond timestamps, most buy-side firms, and even brokers, only receive their execution timestamps in either seconds or milliseconds.

Luckily, a relatively straightforward algorithm can be used to find most executions on the tape, making it possible to enrich the executions with precise timestamps and if necessary, information about the executing venue:

Get all trades in the symbol within +/- 1 second of the execution
Find any trades that match the size and price of the execution
If venue information is available on the execution, use that to narrow down the list of potential matches
If the list of potential matches still includes multiple trades, and if the execution timestamp includes milliseconds, use that information to pinpoint the trade
If there are still multiple potential matches, then, unfortunately, the algorithm has failed to find the execution on the tape. There are a few different ways to deal with this situation, and in our case, we make the naive assumption that the first potential match is the correct one.
If the algorithm does find a match, care must be taken to avoid matching the same trade to other executions that may have occurred around the same time.

Google App Engine — Works As Advertised

Google App Engine (GAE) ended up being the right turnkey solution for our application hosting needs. Once we set up our simple app.yaml configuration file, GAE provided a seamless deployment experience, required minimal setup, and no tedious DevOps processes or pipelines. Below are some of the features that we appreciated:

Application dependencies are automatically installed simply by creating a standard requirements.txt file for your Python project
Each deployment automatically creates a new version and traffic is gracefully migrated to the new version
Automatic scaling of instances based on number of requests, CPU load and response times
Log aggregation and Error monitoring: With a simple selection, we could get notified of any errors in the application with a full stack trace
Continuous and automated latency monitoring
We created a hosted PostgreSQL instance as a temporary market data cache, and it was trivial to set up, including multi-zone replication (high availability) and automated backups.

Some of the things we learned:

If you do not specify a min_instances: 1 element in the app.yaml file, your application may be shut down entirely when there are no requests. This can add significant latency to a future request because a whole new instance must be started before the request can be serviced.
You can configure the CPU and Memory Limit of the server that your application runs on by specifying the instance_class element. The default value of F1 only gives you a 600MHz CPU.
We played around with Bigtable as a market data cache but were underwhelmed. It may be a scalable database with high query throughput, but for our use case, it was slow and cumbersome.

Authentication — Harder Than It Looks

As a team, we had very little experience building a web application with a login screen. We didn’t want to get into the business of managing a user database with emails and passwords. We also didn’t know how we could create a functional login screen when our entire web app was prerendered into a static site. Enter Firebase Authentication, which offers a full-featured authentication system with multiple OAuth providers entirely within the browser. The integration with the server API took a bit of figuring out, and the final flow looks like this:

The user authenticates in the browser using FirebaseUI.
Once authenticated, Firebase issues a callback to our application with an ID Token (essentially a JWT). We send this token in a POST request to the server at an endpoint named /initSession.
The server independently validates this token with Firebase and requests google to create a signed version of the “user claims” associated with this token. This signed version of the claims is set as a cookie in the response.
For any further requests to the REST application, such as getting chart data or analyzing fills, the browser will automatically include this auth cookie with the request.
For each request, the server uses google’s public certificates to validate the signature on the cookie, and once validated, the parsed claims in the cookie are used without further authentication. The benefit of this approach is that we do not need to make a call into google each time we need to validate the cookie, which would, of course, be expensive.
If the user clicks the ‘Log Out’ button, the web app makes a POST request to the server at /endSession endpoint, which clears the previously set auth cookie.
If a request is received by the server without the auth cookie, it will be rejected with a 401 Unauthorized message.

Precision Navigation — A UX Challenge

A typical stock chart consists of the main chart area and a navigator mini-chart below it. The navigator includes a slider that controls the range of data that is displayed in the main chart.

In our case, since our chart can zoom in to sub-second time scales, the slider would simply not be able to provide efficient navigation or context. For one, the slider representing one second of an entire day would need to be 0.08 pixels wide on a 1920x1080 resolution screen. For another, the slider could never be used to move around with any sort of precision at that time scale.

We solved this problem by creating a UX component with multiple navigation sliders, each of which represents a different resolution of time. This custom navigation and control mechanism presents different sets of sliders to the user depending on the time scale at play. Much like the beam and rider system on a triple beam scale, multiple independently calibrated controls work together to provide arbitrary precision and control when navigating.

Web Security — Details Matter

Even though we were not storing any sensitive data, we wanted to ensure that we were following the best practices. Some of the considerations were:

HTTP Strict Transport Security (HSTS): ensures that the website is always accessed using a secure connection. This prevents inadvertent access over insecure HTTP and related man-in-the-middle attacks. To address this concern, we add the following header to each response:
Strict-Transport-Security: max-age=31536000; includeSubDomains
Set-Cookie options: When setting an authentication-related cookie, the cookie is marked as httponly and secure. Setting httponly=True helps thwart cross-site scripting attacks by ensuring that the cookie cannot be accessed through a client-side script. Setting secure=True avoids inadvertent transmission of the cookie over insecure connections.
Cross-Origin Resource Sharing (CORS): is a mechanism that ensures that only web pages running at approved domains can access certain resources. In our case, the application only allows cross-origin requests to the RESTful API from one domain: https://findmyfills.com
Cross-Site Request Forgery (CSRF): is typically used in phishing or other social engineering attacks to coerce an already authenticated user to take some undesirable action, such as deleting a record or otherwise updating some state on the server. This attack takes advantage of the fact that once logged-in, the browser will automatically send the auth cookie to the server with each request. In our case, we deemed this not to be a real concern because the server application is essentially stateless.

Python and Pandas — Mostly Good

Let’s get one thing out of the way — Python is slow. Python is popular because it is an expressive language, and the power of horizontal scaling afforded by the cloud makes it so that the slowness does not matter much.

But, it is slow — to the point that the only way to do fast computation within Python is to not use Python. Case in point — Pandas and Numpy.

Pandas is a python library for data science that is highly optimized for performance, with critical code paths written in Cython or C. Pandas also depends upon and interoperates with NumPy, which is another high-performance library written in C. NumPy provides an efficient implementation for multi-dimensional arrays and matrices, with support for fast vectorized operations.

Below code blocks demonstrate the dramatic difference in performance between vectorized operations and their non-vectorized counterparts. In this instance, the vectorized calculations are 60x faster.

Vectorized OHLC calculation

Non-vectorized OHLC calculation

Python Datetimes and Timezones — Mostly Bad

Nanoseconds

A quick word on JavaScript before we talk about Python. JavaScript Date only supports timestamps with a granularity of milliseconds. Not only that, the JavaScript Number type is not even capable of representing the number of nanoseconds elapsed since the Unix Epoch. The disappointing consequence is that we were not able to coerce JavaScript or HighCharts to plot trades or quotes at the sub-millisecond granularity.

Python datetime supports microseconds, but not nanoseconds. If you really want to use nanoseconds, you must use the NumPy datetime64 data type, although you will invariably need to also deal with pandas.Timestamp and data type strings like datetime64[ns] and <M8[ns].

Fun fact: NumPy datetime64 also supports picoseconds, femtoseconds, and attoseconds, although there are caveats regarding the range of timestamps that can be represented when using the increased granularity.

An illustration of the multiple different date-related types that are available:

And here’s a function that demonstrates how complex this can get, even just to retrieve nanoseconds since Epoch:

Time zones

If you thought date-related data types were complicated, wait till you read about time zones.

To begin with, all Python datetime objects are time zone "naive" by default. So, let's say you want a time zone "aware" object, and after some digging, you find that the datetime constructor takes a datetime.tzinfo argument. You'd think you found the solution, until you realize that Python does not ship with an implementation for known or common time zones. For that, you need to download and install a whole separate package, typically pytz. Except, that doesn't work either - using the time zone argument of the standard datetime constructor “does not work’’ with pytz for many time zones. To create a localized datetime object, you need to use the localize() method provided by thepytz.timezone class. And after all that, once you do manage to create a time zone aware datetime object, be careful about performing date arithmetic because if you happen to cross the DST boundary, you may end up with an invalid time.

We have come to the conclusion that time zones and local times should be avoided altogether when using Python.

Use of Pandas does improve the situation, as it includes a full complement of time zone functionality, through the appropriate use of pytz and dateutil packages. However, if you do end up with a DataFrame time column containing values in US Eastern time zone and you want to convert these values to UTC, this is the level of acrobatics that it takes:

Closing Thoughts

If you have questions or comments about the tool, please drop us a note at fmf@prooftrading.com.

All in all, this was a fun exploratory project that forced us to look into technical areas that we were not entirely comfortable with. We expect this experience to come in handy as we embark on the journey of building out an entire sell-side algorithmic trading system in the next few months!