I still think that there is data leakage.
The prepare_windows function is used to create time windows for both the input data and the target labels. However, in both cases, the same DataFrame (data_x_df) is passed as the first argument, which includes the binary target variable (Close_target). This means that the input data contains the exact information the model is trying to predict, leading to a trivial learning task where the model simply learns to copy the target variable from the input to the output.
Due to this the model will achieve near-perfect accuracy and other performance metrics during training and validation. However, this performance will not generalize to real-world scenarios where the future target values are unknown.
This is how your code was writen:
train_data, train_exog_data, ytrain_data = prepare_windows(data_x_df, data_x_df[TARGET_LABEL])
test_data, test_exog_data, ytest_data = prepare_windows(data_t_df, data_t_df[TARGET_LABEL])
This is how I changed it to avoid the issues commented above:
data_x_df_no_target = data_x_df.copy().drop(columns=[TARGET_LABEL])
data_t_df_no_target = data_t_df.copy().drop(columns=[TARGET_LABEL])
train_data, train_exog_data, ytrain_data = prepare_windows(data_x_df_no_target, data_x_df[TARGET_LABEL])
test_data, test_exog_data, ytest_data = prepare_windows(data_t_df_no_target, data_t_df[TARGET_LABEL])
Close_target is inside SELECTED_FEATURES so you are appending it to the training data:
input_window = data_df[prime_ts].iloc[i : i + window_size].values
X.append(input_window)
If you fix those things (as I did) you will see that the model performs very poorly. Very far from .99 accuracy and precission. Basically useless.
I have not checked the TCN model to see if there are causal layers lacking, but I can assure 100% that there is data leakage.
The test metrics I got by just changing what I have mentioned in this comment are:
Accuracy Precision Recall F1b Score ROC AUC
0.720984 0.0 0.0 0.55054 0.504532