Interesting writeup, but very little discussion of actual markets.
Have you looked into overfitting? It’s also called p-hacking or garden of forking paths, etc. Search XKCD on significance and you’ll see the problem.
It’s really an issue in financial markets, and even overrules some simple regressions. Recent papers discussed it, including Bailey et al, “Financial Charlatanism…” and Harvey-Zou, “Backtesting”.
It’s also a problem in Kaggle competitions! White’s Reality Check (White 2000) and its variations (eg Model Confidence Set) are used quite a lot by algo traders and those analyzing timeseries, and Mariano-Diebold is used by econometricians, while Reusable Holdout is getting some traction in other applications.
AFAIK, ML works better when the cross-sectional dimension is large and the features have lower correlation (and signal to noise is high). If you are using market data alone, the features are highly highly correlated and signal-to-noise is small. There are relevant datasets in certain financial markets which may be pertinent here.
The question of whether the ML model continues to work OOS when it does so well IS, is known as Robust ML and is relatively new. My sense is overfitting is also a big issue in most ML problems.
A lot of people have ML tools, and a lot of people look at markets.
Now, maybe you have some rationale for using your models even if they fail the various robustness checks. Can they continue to predict well for a short period even if they are ultimately not good longer-term predictors? Are the various robustness tests appropriate? This is a matter for deep thought.