Return Forecasting with Moments, Part 3: The importance of T scores

NTTP
6 min readDec 5, 2023

--

T? Or gamma, in a mirror, darkly? That, is the question… Photo by Gustavo Sánchez on Unsplash

Lest analysts doubt the importance of coefficient T scores when deciding which variables to keep/throw from a model such as this, we provide a counter-example here to show that these T scores are indeed important.

When you finish reading this article, check out Part 4 of this series, where we add moments from SPY returns to boost our TSLA forecast quality:

https://medium.com/@nttp/return-forecasting-via-moments-part-4-vectorization-of-input-streams-8e2004e051af

Here we take our model setup from the prior article and replace one candidate predictor (200 day mean return in this case) with a column of random numbers. First, we do this with a constant column (a random column that stays the same over all backtests). This can be achieved by replacing the formula in the mean column D cells with =RAND(). Then we copy that newly generated random column and “paste special” as “values” into that same column. This becomes a nice column of fixed random numbers.

Then we set the T cutoff to 0 at J18 (telling the backtester to not remove variables) and run a 252 day backtest. Next, we examine the output T scores from each step of the backtest for this particular variable (still called “mean” but really is our random column), which is stored way out at column AG (2nd from the end of that column sequence with the backtest T scores). The last column of that sequence AH is for the T score of the constant of the model.

We plot AG, the T scores of the random column here:

Figure 1: T scores per backtest for the fixed randomly generated X variable column

The LINEST function does its level-best to try to include this random column in the model and computes a coefficient for it at each step of the backtest. It also computes the associated standard error, and then we compute the T score as coeff/stderr in the sheet as described in our prior articles regarding his sheet.

Notice a couple of things: First, the T scores seem to stay within the range of -1.8 to 1.0. This is a larger range than we had expected when setting up this test. We thought that the T scores would be less than 1.0 most of the time. This strongly demonstrates the hazard of keeping a variable in a model that has a T score of much less than 2. Look, even a random column of data is giving abs(T) > 1 more than a comfortable amount of times during this 1 year backtest. Do we think that this random sequence has any predictive power for our output? No, of course not. It’s random, by definition, disconnected from everything in the past and the future. But also notice the wave pattern in the T score data… the wave pattern does not seem random at all. This suggests that the linear model can latch onto whatever accidental patterns there are in the random sequence (and there may be patterns, though patterns that have no predictive quality to them) and tie those accidental patterns to the patterns in the observed phenomenon, the target 5 day return that we are trying to model. If you think this is unusual for a linear model, wait until you try this with TensorFlow. TF models can be (practically) infinitely flexible to fit any random data that is thrown at it, if the analyst lets it. Hence, the need for cross validation, feature selection, and so on in ML models. The great thing about these linear models (with standard errors per term reported out) is that we can discard variables via T score methods like this without resorting to time-consuming cross validation procedures.

Now maybe we might make the case that T scores of around 2 are okay if they are consistent during the backtest, rather than wandering all over the place like in this example. That seems like it might be reasonable. Still, the situation gives us pause, right? What kind of model are we building if a random column shows up as even remotely predictive!?

Next, we allow the RAND() randomizer to change the random values every time the sheet is recalculated (which is a lot of times when we are backtesting). Just put the formula =RAND() in every cell of D (aside from the header at row 1 of course) and re-run the backtest.

Now we plot column AG again, the T scores of this continually changing random column:

Figure 2: T scores for random variable X column, RAND() regenerated often during the backtesting (at least once per every step of the backtest). Horizontal axis = backtest count. Some T scores are uncomfortably outside the +- 2 range, implying that the random column has predictive ability on the outcome during those particular backtests… uh oh…

Two things stand out: A) this output seems much more random than the prior test (as we expected it to be in the first test). And B) the range of T scores is even greater than in our first test, jumping past +-2 on more than a few occasions. Just counting by inspection, it looks like 11 times out of the 252 backtest trials, we got T scores outside the +-2 range. This is a percentage of 4.4%, slightly less than the p-value of 0.05 (5%) cutoff sometimes recommended in model building. Hmm, maybe those statisticians are on to something… From theory, we know that if we use an absolute T cutoff of 2, there is still a chance that our variable will not be significant and should have been removed from the model. We are not going to report out the particular p-value that corresponds to this T=2 because, again… it gets confusing fast. If you don’t believe me, look at this:

https://www.socscistatistics.com/pvalues/tdistribution.aspx

Do you mean to use a “one tailed” hypothesis or a “two tailed”? What are your “degrees of freedom?” Do you count the degrees of freedom before or after you eliminate variables? Oh wait no, this is the wrong math, this is the wrong type of p-value, this is for a paired t-test to compare means of distributions… or is it? Wait, I forget… is a lower case t-test the same t as an upper case T score? You need p-values for regression. And so on. All that setup in Excel, and now they want to ask us about 1 or 2 tails? It’s just too much for now, right?

But here we see visually an experimental test with data we care about that T scores of even greater than 2 can come from a completely random variable with no connection to the response we are trying to predict. Wow. The author did not expect this to be the case. Yeah, with a random variable, we figured that the T scores would rumble around close to zero, probably less than 1 in absolute value. But nope.

So we have experimental evidence in a model we are interested in, not in some unrelated abstract textbook situation, that T scores need to be studied thoroughly for this type of model. If you do choose to keep variables in a model that have T scores of 2 or so, you had better test the variable thoroughly to make sure that you are not just accidentally getting a a good result from spurious correlations, and that there is consistency over time of this T score. Because that is what is happening here, it seems: The random generator happens to occasionally make a pattern (yes, random generators can output patterns) that matches the pattern in the target Y data, and boom! Our linear regressor finds this pattern, computes a coefficient for it, and outputs a decent T score. Wow, right?

Hence:

T scores are valuable in linear modeling; proof by random variable example. Well… maybe not “proof” proof, but… evidence.

--

--