The Startup
Published in

The Startup

Building a Comprehensive Stock Valuation Ratio With Machine Learning

Like many amateur investors, I spent the early COVID period frantically shuffling my money around — trying to limit losses. Later, as the market reversed in early April, I instead sought out undervalued stocks I could use to recover quickly. While I somehow came out ahead, the madness of the period made it clear how crude my investment strategy was.

Amateur investors like myself have rather basic tools. Most of us will just look at the P/E ratio, decide whether that number “feels fair”, and go from there. Some more sophisticated investors might also look at the P/S or the EV/EBITDA. Ultimately though, regardless of preferred metric, we tend to make decisions based on a hunch and a prayer. It’s no wonder that there are professional traders who make a living off of us.

(Mis)using machine learning

My background is in software development. As part of that, I enjoy dabbling with machine learning. Machine learning is a scary sounding term but it’s actually quite simple. If you ever did linear regression in high school, that’s machine learning. Given a set of points, a machine learning algorithm can derive a “best-fit” line such that for any x, we can predict the y. A simple example might be predicting the price of a home. We know intuitively that larger homes are usually more expensive. But given a specific size, say 2000 sq ft, could we guess how much that house would cost? If we find a dataset of recent home sales, we could draw a line through the data and make a well-informed estimate.

Basic ML example from this medium article

Machine learning algorithms can have more than one input dimension. To get a better estimate of a house’s price, we might want to also look at whether it has a garage. A house with a garage will probably cost more than one without a garage. Other possible input dimensions might include the number of bathrooms or the age of the house. Generally, the more dimensions in our model, the better the price estimate.

Most algorithmic stock trading uses machine learning. That’s not what this is about though. Building an algorithmic trading bot is a naive pipe-dream of many novice ML students. While it might have been possible 20 years ago, today, there are thousands of algo-trading houses with incredibly complex models and exotic data inputs. It is simply too difficult to build an effective trading algorithm with off-the-shelf data. There is too much competition and the market is too efficient.

Instead, I decided to approach the problem from a different angle. What if, instead of predicting the future, I predicted the present?


The FFER (Fundamental Fitted Estimate Ratio) is the ratio between a stock’s actual price and its predicted price. Almost by definition, this is a useful ratio. If a stock’s actual price is 2x more than it’s predicted price, that implies a 2x overvaluation. But how do we predict the price of a stock? We can use the same method as we would use to predict the price of a house — machine learning.

To build my model, I used a dataset of the 1500 stocks in the S&P 1500. To simplify, instead of the per-share price, I used the market cap for my output dimension. For inputs, I used 16 different fundamental financial figures. These 16 inputs were (roughly in order of importance):

  • Net income
  • Total revenue
  • Total assets
  • Total liabilities
  • Operating income
  • Dividends paid
  • Cash
  • Return on assets
  • Debt ratio
  • Asset turnover ratio
  • Cash flow to debt ratio
  • Quarterly change in cash
  • Yearly change in revenue
  • Yearly change in return on assets
  • Yearly change in sales on assets
  • Yearly change in working capital

(Note: I have iterated on the model since this article was published and there are now a different number of inputs.)

Why 16? Well, I started with 60. The problem with a lot of dimensions is the tendency to “overfit.” Overfitting is what happens when your ML model fits to every nook, outlier, and weird exception. By contrast, a good ML model captures only the “essence” of the data. And with 60 inputs and only 500 stocks, the model was DEEPLY overfit. Luckily, it was possible to rank the importance of input dimensions in my model. So, to fix my problem, I repeatedly removed the least important dimension and re-trained. Each iteration generally lowered the “test error” until I found a minimum at 16 dimensions.

For the actual algorithm, I used 100 XGBoost models trained with identical data. I then estimated the model outputs to get a final price prediction. This strategy is known as an “ensemble.” In this case, using an ensemble was important to “smooth” the price prediction. In addition to XGBoost, I experimented with other algorithms like SVMs and neural networks. However, XGBoost consistently had equal or better performance relative to other algorithms. Given that XGBoost is also a relatively “lightweight” algorithm, I thought it was the appropriate choice.


Below is a selection of a few stocks and their FFERs from January 4th (the first trading day of 2021).

I also generated a much larger table, recalculated daily. In general, the numbers “feel” correct. Tesla has the highest FFER of the S&P 500 followed by trendy smaller tech companies. At the opposite end, the stocks with the lowest FFERs are generally experiencing financial difficulty.

Using the FFER for investment decisions

Ultimately, how you use the FFER is up to you. To address what might be your first question, I have tried using backdated FFERs to predict future returns. The results are the opposite of what I would have expected. While the relationship is somewhat weak, stocks with higher historical FFERs tend to generate better returns. I have several hypotheses for why this might be. Expect that in another post.

Personally, to make my future investment decisions, I plan to combine the FFER with qualitative intuition. This strategy is well illustrated with a Warren Buffet quote:

“It’s far better to buy a wonderful company at a fair price, than a fair company at a wonderful price.”

In other words, I plan to scour my FFER table looking for “wonderful” companies that, nevertheless, seem to be cheap.

Scott Rogowski co-publishes the FFER and maintains fastmap. This article was originally published at




Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +756K followers.

Recommended from Medium

Getting started with Tensors and TensorFlow

Get Started With PyTorch With These 5 Basic Functions.

What is Model-Based Reinforcement Learning?

Machine learning in the browser for the entire family

Persist & Reuse Trained Machine Learning Models using Joblib or Pickle in Python

Ship Detection using Satellite Imagery

Implicit Feedback Recommendation System (IV) — Hybrid recommendation

image recognition with model RESNET50

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Scott Rogowski

Scott Rogowski

Author of mongita & code2flow. Working on FFER & fastmap.

More from Medium

How to predict Football Games with Python (Kinda)

Top Cloud GPU Providers For Machine Learning in 2022

Stock Market Prediction Using Machine Learning (ML)

How to Detect-le-Defect on Twitch