Trading Data Should Belong to the Customer, not the Broker

Daniel Aisen
Proof Reading
Published in
8 min readJan 13, 2022

Proof Trading is enacting a new data policy that hands back control over trading data to each respective customer. The full legal agreement can be found here, and please see additional background information below.

Investment firms tend to be extremely secretive, even paranoid, when it comes to protecting their trading data, and understandably so. Some family offices and proprietary trading firms go so far as to try to hide their very existence — presumably this is only feasible when you don’t need to attract outside money, although I’m sure it poses serious challenges for recruiting talent as well. There are many reasons why an asset manager might be private, such as avoiding unwanted regulatory scrutiny or reducing the likelihood that skittish investors pull their funds during a performance dip, but the most common reasons are:

  1. Concern that others will reverse engineer / copy their strategy and capture their alpha.
  2. Concern that others will trade alongside or in front of them while they are putting on or taking off a position (i.e. information leakage). Avoiding information leakage is a near universal concern on the buyside.

Most institutional investors spread their trading across multiple execution brokers. The most secretive funds split their holdings across multiple custodians, so that no one firm is in a position to try to reverse-engineer their strategies.

Bottom line: trading data is valuable.

Who owns the data

The general attitude on Wall Street is that the middlemen — namely, exchanges and brokers — own all the trading data. Exchange and broker trading agreements have lengthy sections declaring that all data generated or aggregated from orders passing through their system belongs exclusively to them, regardless of who originated the order. For example, the agreement will say something along the lines of: “any information or data relating to, processed or created in connection with, the content or operation of [the Broker’s] services is confidential and proprietary to [the Broker].” This dynamic is not particularly questioned — it is just the way data ownership works on Wall Street.

Because of the dynamic of the broker-client relationship, brokers will usually make an effort to accommodate client requests around data, for example generating reports or packaging up raw data related to a customer’s orders, but the broker still has full control.

Exchanges are even less accommodating — they make boatloads of money off of proprietary and public data feeds, but they are much less willing to work with their customers. For example, Cboe recently enacted a monthly fee to customers just to have the ability to download basic reports on their own trading activity! Additionally, exchange legal agreements around data are absolutely ironclad. The one redeeming aspect of the exchanges is that they are required to disclose their offerings and pricing publicly, so at least the information is out there (1, 2, 3).

I believe some brokers even offer data feeds (e.g. “alpha signals”) to high tier clients based on aggregated customer holdings and/or order flow, but I have not seen any public information on the subject. If true, this seems like dubious legal territory. There is unfortunately little to no public information about what brokers do with the data they amass.

Data scandals

Speaking of dubious legal territory, institutional broker-dealers in US equity have a storied history of getting whacked by regulators for misusing client data. Here are a handful of examples that come to mind:

SEC Charges ITG With Operating Secret Trading Desk and Misusing Dark Pool Subscriber Trading Information (2015)

  • “An SEC investigation found that despite telling the public that it was an “agency-only” broker whose interests don’t conflict with its customers, ITG operated an undisclosed proprietary trading desk known as “Project Omega” for more than a year. While ITG claimed to protect the confidentiality of its dark pool subscribers’ trading information, during an eight-month period Project Omega accessed live feeds of order and execution information of its subscribers and used it to implement high-frequency algorithmic trading strategies, including one in which it traded against subscribers in ITG’s dark pool called POSIT.”

Bank of America to Pay $42 Million to Settle New York AG Probe in Electronic Trading

  • “Bank of America Merrill Lynch went to astonishing lengths to defraud its own institutional clients about who was seeing and filling their orders, who was trading in its dark pool, and the capabilities of its electronic trading services.”

Alternative Trading System [Pipeline] Agrees to Settle Charges That It Failed to Disclose Trading by an Affiliate (2011)

  • “The SEC’s order also found that Pipeline failed adequately to protect customers’ confidential trading information, allowing access to it by the research director at Pipeline’s parent company, who acted as the manager for the affiliated trading entity from 2004 to 2006.”

Barclays, Credit Suisse Charged With Dark Pool Violations (2016)

  • “Credit Suisse failed to treat subscriber order information confidentially and failed to disclose to all Crossfinder subscribers that their confidential order information was being transmitted out of the dark pool to other Credit Suisse systems.”
  • “Finally, CSSU also failed to disclose that it operated a technology called Crosslink that alerted two high frequency trading firms to the existence of orders that CSSU customers had submitted for execution.”

Citigroup unit to pay $5 million to settle U.S. SEC charges (2014)

  • “The SEC said LavaFlow failed to put adequate safeguards and procedures in place to protect its subscribers’ confidential trading information from March 2008 through March 2011. As a result, another affiliate was then able to access the data and use it to help determine where to route certain orders, the SEC said.”

Dark Pool Operator Liquidnet to Pay $2 Million to Settle SEC Charges (2014)

  • “The SEC said an investigation found that Liquidnet violated its regulatory obligations and its own promises to subscribers during a nearly three-year period when it allowed a business unit outside the dark pool to access confidential trading data, the agency said on Friday.”

SEC Charges Mizuho Securities for Failure to Safeguard Customer Information (2018)

  • “According to the SEC’s order, during a two-year period, Mizuho traders regularly disclosed material nonpublic customer buyback information to other traders and Mizuho’s hedge fund clients. That information included the identity of the party placing the order, the order size, limit price, and indications that the orders were buyback orders. Such information was routinely communicated across trading desks, notwithstanding that during the relevant period Mizuho executed over 99.8 percent of all buyback orders by using algorithms, rather than through trader-negotiated open market trades.”

Clearly, there is great value embedded in institutional investor trading data, and institutional brokers have repeatedly pushed the boundaries of the law in trying to monetize said data.

Retail trading data / payment for order flow

Although we are less well-versed in the dynamics and common practices on the retail trading side, there appear to be parallels inherent in the payment for order flow model. While there is certainly a direct value to a retail wholesaler in trading against retails orders, presumably there is also substantial value inherent in the data itself, particularly in the aggregate.

As the old adage goes, if you’re not paying for it, you are the product. But at least in the case of retail trading, the customer isn’t also paying commissions on top of their order flow/data being monetized behind the scenes.

Our new trading data policy at Proof

So what are we doing about it at Proof? We have decided to self-impose the following policies around our customers’ trading data. We have offered these commitments to our pilot customers, and we commit to offering them to future customers as well. The full legal language can be found here, but in summary:

  1. The client owns all trading data related to their orders.
  2. We cannot analyze a client’s data without their explicit permission; they can revoke their permission at any time. If they do grant permission we can include them in aggregated/anonymized studies that we publish, but we may only publish analyses where the results do not qualitatively change when any given client’s data is removed. This kind of robustness check ensures that any one client’s influence on the results is not noticeable, and is in the spirit of a guarantee like differential privacy. Additionally, anything we share privately that includes their data must also be made available to them.
  3. We cannot sell their data in raw nor in aggregated form.
  4. The client can request any or all of their trading data (parent orders, child orders, executions, cancellations, amendments, etc.) to be delivered to themselves or to a third party. We will try to accommodate requests in terms of formatting/filtering/data fields/etc.
  5. At any point, the client can request that we permanently delete all their data from our research databases. Note there is one caveat to this point: regulatory requirements. We would continue to keep all data in cold storage to fulfill our 17a-4 record retention requirements, but we would no longer be allowed to pull nor analyze their data ourselves unless explicitly in response to a regulatory/legal inquiry.

These commitments are no joke, and they do create a significant amount of additional burden and overhead internally in order to enforce them. We would not expect that many other institutional brokers could fulfill these promises even if they wanted to (which of course, they do not). But on that note, if anyone else does want to use this data agreement in part or in whole, please feel free! It is our dream that other brokers follow suit when we do the crazy things that we believe are right. Any feedback or suggestions would also be very welcome!

How we think about transparency and data privacy

Naturally our primary business goal is to make our execution algorithms perform as well as possible. We believe that our whole-hearted embrace of transparency is entirely supportive of this goal. The individual clauses of this new data policy, however, are less cut and dry, as they may limit our access to some potential research data. Nevertheless, we are doing it because we believe it is the right thing to do.

We’ll also note that the vast majority of our quantitative research is carried out on public historical market data, not actual client data. Examining our clients’ order flow can be very useful, especially for verifying that the algorithms are behaving as intended and building intuition around their real world impact. Self-generated trading data is an exceptionally clean data set, but it is inherently far smaller than the full market-wide historical record. It is our hope though that as we grow we will be able to supplement our historical quantitative research efforts with rigorous and robust analysis of client data (from clients who have opted-in to allowing us to analyze their order flow).

Below is a framework for how we perceive various commitments that we have made, and how they impact our additional business goals.

Goal: hold Proof accountable / create a positive feedback loop to improve the algos

  • Transparency into our research process and algorithm design
  • Publishing research findings and performance metrics
  • Making full data accessible to clients and their third parties upon request

Goal: protect client information

  • Committing to never sell client data in raw nor aggregated form
  • Allowing customers to opt out from their data being included in research
  • Deleting customer data upon request

Conclusion

This new trading data policy is most likely a case where we impose a significant technological and legal burden upon ourselves that will not particularly move the needle with our average client. We are not so naïve as to think that planting a flag on this hill will win us significant new business. Nevertheless, we are moving forward with this initiative for the simple reason that if Proof was a buy-side firm, this is exactly the type of data policy we would love to see from our execution partners.

--

--