Evaluating the Performance of Data-Driven Biomechanical Analysis

Patrick Duncan
6 min readApr 18, 2023

A retrospective analysis of the 2019 BSA National Yearling Sale

With the 2023 BSA National Yearling sale just around the corner, I thought it would be interesting to look back at the first sale that I analysed when launching Bio-Cal Thoroughbreds. That was the 2019 BSA National Yearling Sale, where the graduates from this sale have now had sufficient time to prove their athletic abilities at 5 years of age.

I want to emphasise that I was not involved in any of the selections or purchases at that sale. Instead, I am excited to share with you my retrospective analysis of the biomechanical evaluation results from that sale.

What is Bio-Cal Thoroughbreds?

Simply put, we utilise objective measurement data and advanced algorithms to conduct a comprehensive biomechanic analysis of the conformation of yearlings against an extensive database. We assign a prediction of their potential athletic ability based on the biomechanic profile of each lot.

The biomechanical profile of a yearling comprises a thorough evaluation of their prospective athletic aptitude, utilising objective measurements that contribute to their athletic efficiency. Essentially, it is akin to the conventional physical inspections undertaken by buyers to assess balance and correct proportions of a yearling’s conformation. However, the human brain has limitations in accurately measuring these dimensions and relies on long-term recall, particularly given that the results may not manifest until years after the initial inspection. This is compounded by the significant number of inspections carried out at sales each year.

Our goal is to leverage data science to increase the likelihood of selecting elite horses, recognising that complete accuracy is unattainable. By generating biomechanical ratings prior to the sale, we seek to enhance the odds and probabilities in our favour. It is important to note that these ratings do not incorporate physical inspections or veterinary evaluations. By integrating a trained eye, traditional horsemanship and veterinarian assessments with our predictions, we can further enhance our ability to identify elite Thoroughbreds.

2019 BSA National Yearling Sale

In order to gain a deeper understanding of the 2019 BSA National Yearling Sale, I have included statistical data for all lots from the Arion database. This provides an accurate representation of the realities of the Thoroughbred industry.

Firstly, looking at the overall strength of the sale, the average price of R320,346 proved to be moderate in comparison to previous and subsequent years, lying on the average trend line. The sale produced a 68.9% winner-to-runners ratio, with only 1% of runners scoring at the Group 1 level. Elite horses, which are classified as Group Winners and Performers, made up only 8.5% of runners.

It goes without saying that the likelihood of acquiring an elite Thoroughbred is relatively low.

Let’s take a closer look at our sample set.

Using conformation images made available by Bloodstock South Africa on their online catalogue, I was able to include lots that were not withdrawn and had a conformation image available, reducing the sample size to 394 from the original 564 cataloged.

I have arranged the categories as follows:

A+ = Group 1 Performer

A = Group Performer

B = Winner

C = Non-Winner.

Looking at the sample set of 394 lots included in the analysis, it is noteworthy to see a similar percentage breakdown of graduate success to that of the full sale results. The table reveals the success of the market, as the price of the yearling reflects the market’s belief in its potential as an elite racehorse. Despite the market’s apparent consistency, it is noteworthy that the median price for the A+ category (Group 1 Winner) is only R225,000, with only one lot surpassing this average price paid for the A+ class, indicating a certain degree of inconsistency.

The table below offers another perspective on the overall success of the market. It shows that spending upwards of R600 000 on a single lot resulted in only 25.6% of those purchases achieving elite status, while the next highest price band of R400 000 to R599 999 only yielded a 4.8% chance of purchasing an elite racehorse — the worst performing price band. In contrast, the R200 000 to R399 999 price band proved to be the most successful, with 4 Group 1 winners and 18 Group performers (including winners of Listed Races) — all under R300 000. The next two lowest price bands, including lots that were passed in, achieved 7% and 6.3%, respectively.

From a data science perspective, these results are somewhat disappointing, indicating that there is an opportunity to outperform the market whilst acknowledging that perfection is unattainable in this sport.

Evaluating the success of the biomechanical analysis algorithm.

The table below indicates that, as a whole, the algorithm produces a percentage breakdown similar to that of the graduates’ actual athletic ability. The average price for each band aligns with the market, with descending averages. However, once again note that the A+ (G1 Winner) band’s median price is R225,000.

A confusion matrix allows us to assess the algorithm’s accuracy in predicting the athletic ability of each lot:

The overall accuracy of the algorithm is promising, with a 70% accuracy rate. The confusion matrix indicates that the algorithm accurately predicted elite Thoroughbreds (A+ and A), as well as those who turned out to be winners (B) and non-winners ( C ) to a high degree. Elite runners were correctly identified and placed in the top left corner, while non-elite runners were placed in the bottom right corner.

Out of the runners who received an elite prediction of A+ or A, 70% achieved the elite status and performed according to their prediction. For runners who received a B prediction, 72% achieved a maiden or handicap win(s) but could not perform at Group level. For runners who received a C prediction, 60.4% were unable to win a race. Interestingly, of the 46 lots that did not start in a race, 87% received a B or C score — evenly split.

While a larger sample set is preferred when constructing a confusion matrix, the algorithm’s ability to push the good horses to the top and the poorer horses to the bottom is impressive. It has proven to be effective in identifying elite Thoroughbreds, and when combined with conventional methods such as physical inspections and vets checks, the chances of finding an elite racehorse are further increased.

Through retrospective analysis, it is evident that leveraging objective data has great potential to enhance the probability of purchasing an elite racehorse. The goal is to use data-inclusive approaches to substantiate all decisions at each stage of the process that contribute to overall success, not just in the selection stage. However, like any other sporting athlete, Thoroughbreds must also be biomechanically efficient to achieve success. I believe that conformation is a critical component in this regard, as it is a limiting factor to the athleticism of the equine athlete.

In our industry, focusing on probabilities rather than possibilities is key, and incorporating objective approaches alongside conventional methods can increase the likelihood of success. Those who understand and effectively utilise innovative tools have the potential to enhance their probability of selecting elite racehorses.

Where tradition and innovation meet, winning begins.

If you’re intrigued by the potential of incorporating data into your decision-making process at bloodstock sales, please feel free to contact me at info@biocalthoroughbreds.com

--

--

Patrick Duncan

Imagine the future and fill in the gaps. BSc (Hons) Racehorse Performance. Founder of Bio-Cal Thoroughbreds