How I Created An Algorithm That Could Save The UK Fashion Retail Industry £150 Million Per Year

Akhil Sonthi
Published in
4 min readFeb 24, 2021


Earlier this year, I worked with a startup in London which advises fashion retailers on how to drive profitable growth and better understand customers’ preferences.

The global retail apparel market is estimated to be worth a staggering $1.4 trillion and includes brands such as Nike, Zara, Ralph Lauren and Levi’s.

Retailers typically make initial purchases of stock between six weeks to six months before a customer is able to buy the clothes in-store or online.

One of the biggest pain points that retail buyers face is fragmented stock (spare stock for certain sizes and no stock for other sizes which customers are looking for) due to incorrect prediction of size ratios for new products which is primarily based on limited data and human decisions. This forces retailers to either discount products and lower revenue or place the product back in warehouses which increases storage costs.

My project focused on creating a machine learning algorithm that could better predict these size ratios and allow retailers to claim back this ‘lost’ revenue.

Python was identified as the best language for building the prediction model based on the following advantages:

  • Free & Open Source
  • Easy to ready (by all team members)
  • Extensive collection of ML libraries
  • Appropriate for making functional prototype
  • Powerful data manipulation

Python’s environment supports a vast array of libraries but from internet research, I identified five key libraries which were used at different stages of the prediction process (e.g. data analysis, data gathering etc.):

  • NumPy — data structures, mathematical computations
  • Pandas — data structures/analysis/manipulation
  • Seaborn — data visualisation
  • Scikit-learn — machine learning algorithms
  • Keras/TensorFlow — neural network algorithms

Check out my Medium post on my Top 10 Python Libraries for Data Science

1. Data Cleaning

Data cleaning is basically identifying and eliminating errors in the data.

The product attributes data was extracted from the database, placed into a Pandas data frame, pivoted and compressed to make the data easier to understand and use later on. Any ‘NaN’ values were converted into zeros (to avoid errors during mathematical computations) and all products with low sales were removed to minimise ‘noise’ for the prediction model.

2. Data Analysis

Data analysis is basically understanding the data to gain useful insights which will drive model development and future decisions.

I trawled through the vast amounts of data to better understand the variation of size ratios for different products over time, identify ‘significant’ features which might require more weightage in the model and any correlations between the features themselves or external variables e.g. economic indicators.

The data was then split into training and test data for the models to use.

3. Model Development & Testing

Model development & testing is basically designing, developing and testing different types of machine learning algorithms.

Prior to development, a success metric of root mean squared error was decided as it avoids taking the absolute value of the error meaning it doesn’t get overly affected by large values.

I explored regression (e.g. k-neighbours), classification (e.g. gaussian naive bayes), clustering (e.g. meanshift) and artificial neural network (e.g. k-neighbours) algorithms to identify which had the smallest buyer error and hence the most accurate predictions when backtesting.

Hyperparameter optimisation (varying a large range of parameters to optimise a model’s learning process) was then conducted to identify the local minimum i.e. the lowest possible error I could find within the time/computing power constraints.

These errors were also analysed on a size-by-size basis and with limited training data to determine the robustness of the model.

Finally, a comprehensive backtest was conducted to determine how much ‘lost’ value the optimised model would have allowed retailers to capture and the grand total came to a surprising £150 million per year for just the UK fashion retail market. Imagine the global potential if this algorithm was carefully scaled up and integrated into the retail buying process…

So what did I learn apart from some data science techniques and Python? Never underestimate the value you can provide, even with little experience.

Particularly to more ‘mature’ industries that haven’t yet realised the full potential of the data they have been chucking away into a data server somewhere in the heart of Nevada.



Akhil Sonthi
Writer for

Tech Enthusiast | Entrepreneur | Music Artist | MEng @ Cambridge