Notes on the Numerai ML Competition
Jim Fleming

Thanks for sharing this. Extremely informative. I have just re-read this article after reading it for the first time a few months back, and I have two more questions regarding the use of t-SNE, over and above the ones I asked you a few months back.

  1. When you say you ‘added’ the t-SNE features to the exiting 21 features from the Numerai data. what do you mean exactly?
    - Do you do a run of t-SNE, in say 2D, and then take the resultant x and y features, and add them as two extra columns to end up with 23 features? The original 21, plus the new x and y features from t-SNE?
    - And for the case you describe above, where you ran t-SNE five times in 2D and once in 3D, do you add all those features to end up with the following new feature set of, 21 given numerai features, + 2*5 2D t-SNE features, + 1*3 3D t-SNE features, to get 34 features in total, that are then all delivered to the ensemble?
    - feature 1
    - feature 2
    - feature 3

    - feature 21

    - tsne 2D x1
    - tsne 2D y1
    - tsne 2D x2
    - tsne 2D y2
    - tsne 2D x3
    - tsne 2D y3

    - tsne 2D x5
    - tsne 2D y5

    - tsne 3D x
    - tsne 3D y
    - tsne 3D z
  2. A few months back I asked about using a parametric implementation of t-SNE, to get a map-able embedding that can then be used on new data. However, after re-reading your article now, I get the impression that you did not use a parametric version of t-SNE for this experiment. If I’m reading correctly, it sounds like you just used the normal visualization version t-SNE, and then added the resultant features as additional features to the original 21. But now, if you do that, the trained model will expect those additional features in the live/tournament dataset. You could just run t-SNE the same way for the live data and just add the resultant features, but, since it’s not a parametric implementation of t-SNE, the new t-SNE features will be mapped differently to the training data.
    Am I reading what you did correctly, or did I miss something?
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.