Market data pricing: part 3 of many

Published in

Proof Reading

9 min readJun 26, 2019

Previously on market data pricing, we discussed the top level framework of real-time market data billing, and how it represents a high barrier to entry for new brokers and reinforces the status quo. In this episode, we will introduce a visualization tool we have built that you can use to see what the total monthly market data bill would be for various scenarios and how it breaks down.

This tool is designed to give you a view of how the underlying variables (that we defined in part 1 of this series) combine to drive the price of each individual data product, and then how the individual product prices combine to form the total market data bill. Naturally, when we say “total market data bill,” this should be imagined with an asterisk and a footnote read in the voiceover style of a side-effect list in a pharmaceutical ad: *not including physical connectivity costs, logical connectivity costs, or co-location fees, and subject to change upon immediately effective filings as prescribed as part of Dodd-Frank for some reason…

The tool currently does not include data products from IEX, NYSE National, or NYSE Chicago. For IEX and NYSE National, this is because there are currently no market data fees. For NYSE Chicago, the data product offerings are a bit in flux due to the recent acquisition. It also does not currently account for “enterprise” options, which allow the substitution of a high flat fee as a substitute for scaling user fees. These typically only make sense as the number of users reaches several thousands and beyond. We left these out for simplicity for now, since we are especially interested in elucidating the costs faced even by a small broker-dealer like Proof will be, for whom enterprise options are unlikely to be relevant.

For a small broker-dealer, we can set modest values for the variables, and see the huge variance in our potential bill based on which data products we choose. For example, if we were to consume only the UTP and CTA SIPs, and did so directly with a modest infrastructure of 2 servers, 5 user accounts, and 2 people viewing displays, using all of this for agency trading as well as a small amount of proprietary trading (for algorithm testing purposes, etc.) and not redistributing data to any external parties, our monthly bill would be a bit greater than $14,000:

If we additionally consumed depth of book feeds from just the two largest exchanges (NYSE and Nasdaq) with the same setup, our monthly bill would be above $60,000 (above $80,000 if we consume the NasdaqTotalView feed in ITCH/FPGA format). If we consumed top-of-the-line depth of book feeds from all of the exchanges, our monthly bill would jump to above $140,000 a month:

Thus, the tool gives us a quick way of determining and visualizing the differences in cost between different patterns of market data consumption. For a small broker-dealer, consuming the minimal products versus consuming the top of the line products represents a multiplicative factor of 10 difference in the monthly bill! Over the course of a year, consuming depth of book feeds across the market would cost Proof somewhere between $1 million and $2 million, and that’s before we account for connectivity and other related costs.

You can play around with the parameters and the visualization yourself to get a better feel for how the fees add up. Here are a few more examples to get you started:

What a market data bill might look like for a proprietary trading firm.
What a market data bill might look like for a full service broker-dealer.

To describe what is happening behind the scenes in the tool, we’ll dig into a few specific data products and understand how they are billed in excruciating detail. We will take one data product from each of the three major exchange families as an example and compute its price as a formula over our underlying variables. We’ll compute this from the perspective of a broker-dealer company C that is consuming the data for its own purposes and possibly redistributing it to others as well.

Sounds fun, right? Don’t worry — it weirdly is. Mostly because market data pricing minutia is full of little oddities that tickle me. Did you know the CBOE “One feed” has two versions? Come on, that’s funny!

CBOE family: BZX depth

In the CBOE family of market data products, we’ll pick BZX depth data as our illustrative example. This data feed immediately distributes information about quotes and executions on the BZX exchange, covering all quotes at all price levels. This is in contrast to “top of book” data products that only distribute information about quotes to buy at the highest current price level for buyers or quotes to sell at the current lowest price level for sellers.

The variables that drive market data bills in the CBOE-verse are:

A:= whether the data is being used by broker-dealer C to run a dark pool or other trading platform. We’ll set A = 1 if yes, and A=0 if no.

V:= the number of human employees of the broker-dealer C who will have access to the data (we’ll assume these are all classified as professional users).

E:= the number of external parties the broker-dealer C is redistributing the data to

F:= the number of external humans whose access/entitlements to the data is controlled by broker-dealer C and who are classified as professional users

G:= the number of external humans whose access/entitlements to the data is controlled by broker-dealer C and who are classified as non-professional users

As a function of these variables, we can compute the monthly amount that broker-dealer C will owe to CBOE for its consumption of BZX depth data:

The flat fee will be = $1500 if E=0 (no external distribution), and = $5000 if E>0 (some external distribution)
The non-display fee will be $2000 + $5000*A
The user fees will be $40V + $40F + $5*G

The $1500 distribution fee if there is no external distribution appears as the “internal distribution fee” in the price list, whereas the $5000 value appears as the “external distribution fee,” and it replaces the internal distribution fee when it applies. To express this relationship in concise mathematical notation, we’ll define a binary variable H that = 0 if E = 0 and =1 if E>1. We can then express the monthly bill for this particular data product as:

1500*(1 — H) + H*(5000) + 2000 + 5000A + 40V + 40F + 5G

Nasdaq family: Nasdaq TotalView

In the Nasdaq family of market data products, we’ll pick Nasdaq TotalView as our illustrative example. This is also a depth-of-book data feed that immediately disseminates information about trades and quotes at all price levels on the Nasdaq exchange. It also includes imbalance information leading up to the opening and closing auctions. As Nasdaq’s website describes it, TotalView is “the standard Nasdaq data feed for serious traders.”

The variables that drive market data bills in the Nasdaq-verse are:

R := whether data is being consumed in the native Nasdaq format, as opposed to post-normalization by some other party. We’ll set R = 1 for native Nasdaq format, and R=0 for normalized data.

Z:= the number of servers company C uses to process and compute on the data

W:=the number of internal user accounts for employees of company C to access the data

E:= the number of external parties the broker-dealer C is redistributing the data to

F:= the number of external humans whose access/entitlements to the data is controlled by broker-dealer C and who are classified as professional users

G:= the number of external humans whose access/entitlements to the data is controlled by broker-dealer C and who are classified as non-professional users

As a function of these variables, we can compute the monthly amount that broker-dealer C will owe to Nasdaq for its consumption of TotalView data:

The flat fee will be $3000*R + $1500 if E=0 (no external distribution), or + $3750 if E>0 (some external distribution)
The non-display fee will be $375*Z if Z<40, or $15000 if 40 ≤ Z < 100, or $30000 if 100 ≤ Z < 250, or $75000 if Z ≥ 250.
The user fees will be $76*(W+F) + $15*G

To express this relationship in concise mathematical notation, we’ll again define a binary variable H that = 0 if E = 0 and =1 if E>1. We’ll also define a function f(Z) such that f(Z) := 375*Z for Z < 40, f(Z) = 15000 for 40 ≤ Z < 100, etc. We can then express the monthly bill for this particular data product as:

3000R + 1500(1-H) + 3750H + f(Z) + 76(W+F) + 15*G.

It should be noted that to receive this feed as ITCH/FPGA (which is desirable for speed), the monthly internal distribution fee jumps up to $25,000 (rather than the $1500 above).

NYSE family: NYSE Integrated feed

In the NYSE family of market data products, we’ll pick NYSE Integrated feed as our illustrative example. Similar to the BZX depth and the Nasdaq TotalView products above, the NYSE Integrated feed includes full information about trades, quotes at all price levels, and auction data for the NYSE exchange.

The variables that drive market data bills in the NYSE-verse are:

A’:= the number of dark pools or other trading platforms that are being run by the broker dealer C using this data product, capped at 3. In other words, if the broker dealer C is not running any trading platforms, then A’=0. If it is running one trading platform, then A’ =1. If it is running 2 trading platforms, then A’ = 2. If it is running three trading platforms, then A’ = 3. If it is running four or more trading platforms, then A’ still = 3. (We are calling this A’ instead of A because it is related to the A above, but not quite the same.)

B:= whether the data is being used for proprietary trading or not. B = 1 if yes, B = 0 if no.

C:= whether the data is being used for agency trading or not. C = 1 if yes, C = 0 if no.

W:=the number of internal user accounts for employees of company C to access the data

E:= the number of external parties the broker-dealer C is redistributing the data to

F:= the number of external humans whose access/entitlements to the data is controlled by broker-dealer C and who are classified as professional users

G:= the number of external humans whose access/entitlements to the data is controlled by broker-dealer C and who are classified as non-professional users

As a function of these variables, we can compute the monthly amount that broker-dealer C will owe to NYSE for its consumption of the NYSE Integrated feed:

The flat fee will be $7500 + $4000 if E>0 (some external distribution)
The non-display fee will be $20000*(A’+B+C)
The user fees will be $70*(W+F) + $16*G

To express this relationship in concise mathematical notation, we’ll again define a binary variable H that = 0 if E = 0 and =1 if E>1. We can then express the monthly bill for this particular data product as:

7500 + 4000H + 20000(A’+B+C) + 70*(W+F) + 16*G.

We have computed these formulas for each data product, which we derived from the various pricing information and policies provided by the exchanges. These form the basis of our tool, and we will update them as policies evolve, or as corrections are needed. The market data pricing policies are long and convoluted, so while we have done our best and completed a thorough review of all the pricing documentation and all of our formulas, it’s always possible we’ve misinterpreted a few things or lost something in translation. If you notice any inaccuracies, please let us know and we will gratefully fix them!

Over time, we plan to supplement this tool with effective representations of other components of the costs and incentives that broker-dealers face, such as connectivity and transaction costs.

Market data pricing: part 3 of many

CBOE family: BZX depth

Nasdaq family: Nasdaq TotalView

NYSE family: NYSE Integrated feed

Written by Allison Bishop