I still think that there is data leakage.

2 min readApr 4, 2024

The prepare_windows function is used to create time windows for both the input data and the target labels. However, in both cases, the same DataFrame (data_x_df) is passed as the first argument, which includes the binary target variable (Close_target). This means that the input data contains the exact information the model is trying to predict, leading to a trivial learning task where the model simply learns to copy the target variable from the input to the output.

Due to this the model will achieve near-perfect accuracy and other performance metrics during training and validation. However, this performance will not generalize to real-world scenarios where the future target values are unknown.

This is how your code was writen:

train_data, train_exog_data, ytrain_data = prepare_windows(data_x_df, data_x_df[TARGET_LABEL])

test_data, test_exog_data, ytest_data = prepare_windows(data_t_df, data_t_df[TARGET_LABEL])

This is how I changed it to avoid the issues commented above:

data_x_df_no_target = data_x_df.copy().drop(columns=[TARGET_LABEL])

data_t_df_no_target = data_t_df.copy().drop(columns=[TARGET_LABEL])

train_data, train_exog_data, ytrain_data = prepare_windows(data_x_df_no_target, data_x_df[TARGET_LABEL])

test_data, test_exog_data, ytest_data = prepare_windows(data_t_df_no_target, data_t_df[TARGET_LABEL])

Close_target is inside SELECTED_FEATURES so you are appending it to the training data:

input_window = data_df[prime_ts].iloc[i : i + window_size].values

X.append(input_window)

If you fix those things (as I did) you will see that the model performs very poorly. Very far from .99 accuracy and precission. Basically useless.

I have not checked the TCN model to see if there are causal layers lacking, but I can assure 100% that there is data leakage.

The test metrics I got by just changing what I have mentioned in this comment are:

Accuracy Precision Recall F1b Score ROC AUC

0.720984 0.0 0.0 0.55054 0.504532

Written by Nayib