Outlier Analysis

FactSet
FactSet

--

What are outliers in Transaction Cost Analysis (TCA), and how are they identified?

First, let’s define what we mean by an outlier. An outlier is a measurement that is drastically different from the group of measurements it is a part of. For example, let’s say we’re measuring the performance of child orders versus a VWAP benchmark. We would expect most of the orders to fall within a range that is close to the benchmark, but sometimes, we notice orders performing differently from most of the other orders. We call these outliers.

Defining exactly what constitutes an outlier is an objective process. One approach is to measure the mean and standard deviation of the measurements we make. For example, if we have a few hundred child orders and measure the performance of each with respect to a benchmark (e.g., VWAP), then we can identify orders some number of standard deviations away from the mean (e.g., three standard deviations from the mean).

Alternatively, we can measure the range that a certain percentage of the data falls into and define orders outside of said range as outliers. It is a bit like the Sesame Street skit where “one thing is not like the others,” but we may have many “things” that we are comparing.

There are several reasons we want to identify outliers. It may be that the reason an order shows up as an outlier is that there is a data issue. If we find this is the case, we can repair our data and re-compute the analytics to see if the order is still an outlier. Another possibility is that we have good data, but we have a problem with our calculation. Again, by identifying the outlier and looking into it, if we determine that there is a calculation error, we can fix the bug and compute again.

Things get more interesting when we have ruled out data and calculation problems and determine that the performance of the outlier is “real.” In general, when looking at aggregated TCA reports, orders that did what we expected do not concern us. Our interest is in the orders that didn’t do what we expected. Learning from these outliers helps us adjust how we engage with the market.

Outliers can have a disproportionate impact on the overall performance of a group when we aggregate performance. One outlier can skew the average of the group. We often use value weighted averages when aggregating TCA metrics, so we want to handle the outliers appropriately.

What do we do with these outliers? Our Best Execution Analytics for Smarter Trading (BEAST) service uses two basic approaches. The simplest is to remove the outliers from the analysis. This is appropriate when we suspect bad data. We may refer to these observations as “exceptions” because we remove them from the calculations. When we have “real” outliers (extreme outcomes that happened in the market and are accurately measured) we often want to use a process called winsorization.

Winsorizing the data involves replacing extreme values (i.e., outliers) with less extreme values. For example, if we see an observation that is five times the standard deviation away from the mean, we can replace that value by the value at three times the standard deviation. The outlier still contributes to the aggregate performance, but not with such an exaggerated impact.

Figure 1 shows an example

Figure 1 A plot the performance vs. the mid-point at time of first fill for a set of simulated TCA orders. There are three outliers at around 400 bps on the positive side of the distribution, as well as a handful of outliers on the negative side. The outlier at 400 bps will have a large effect on the average of the group. If we include the outlier without any special handling, we get an average performance of -0.788 bps. If we winsorize the data by replacing extreme values with less extreme values (the x-value of the yellow dashed lines), then the average performance becomes -1.10 bps. The choice of limits is arbitrary, and charting the data can help the analyst determine appropriate values.

When working with real-world data, like TCA data, we need to address real-world problems like bad input data. We do not want our aggregate analytics disproportionately impacted by a few outliers. There is no perfect solution in the messy world of large data sets so we can use techniques such as winsorization to keep the observation but reduce its influence on aggregate metrics like value weighted performance.

As we build our analytics reports and integrate them into Portware, we monitor traders’ view issues like how to handle outliers. We would love to hear from you regarding outliers and any other issues related to trading analytics. Please feel free to reach out directly, so we can include your feedback in our trading analytics service.

Authored by Chris Sparrow (Principal Product Manager)

--

--

FactSet
FactSet

FactSet delivers data, analytics, and open technology in a digital platform to help the financial community see more, think bigger, and do their best work.