Day-Trading with AI: When to Hold, When to Fold, and When to Not Play!

Market Clustering with Transdimensional Machine Learning

Andy Carl
5 min readNov 16, 2020
Photo by clearviewstock on iStock.

TLDR:

The ability to characterize market day-trading with TML enables tailoring strategies to specific market “personalities”. These “personalities” render it seemingly impossible to trade profitably approximately 40 percent of trading days!

Background Post: The New AI Gold Rush — Transdimensional Machine Learning (Pan Provided!)

A Jupyter Notebook is provided including code in the GitHub repository HERE.

An Example:

  • For a given underlying instrument, 10 minute OHLC candles were developed from tick data covering an approximate 10 year period.
  • Backtesting was performed on the candle data using a baseline “Buy/Sell” indicator for (9) cases, for each day, of fixed stop loss/ATR based stop loss/no stop loss instances.
  • Each case was optimized for the baseline indicator hyperparameter, asymmetrically for both LONG and SHORT positions, for both BEST and WORST day P/L, recording max and min day trading P/L, number trades and associated optimal hyperparameter value.
  • For each day, the results for each (9) cases were averaged to form the “raw-data” vector for each day.
  • Preliminary data exploration was performed using TML establishing the probable number of underlying clusters to be approximately (9).
  • The “Fitness Function” was modified to help focus attention on (9) cluster solutions during the Hybrid-NEAT evolutionary process.

Resulting “Typical” Day-Trading Market “Personalities” by relative size:

“BEFORE” Applying TML (12-D to 2-D via tSNE):

time_start = time.time()
tsne = TSNE(n_components=2, verbose=1, perplexity=100, n_iter=1000)
tsne_results_orig = tsne.fit_transform( data_orig )
print('t-SNE done! Time elapsed: {} seconds'.format(time.time()-time_start))
df_subset['tsne-2d-one'] = tsne_results_orig[:,0]
df_subset['tsne-2d-two'] = tsne_results_orig[:,1]
plt.figure(figsize=(16,10))
sns.scatterplot(
x="tsne-2d-one", y="tsne-2d-two",
hue="y",
palette=['purple','red','darkcyan','brown','blue', 'dodgerblue','green','lightgreen', 'black'],
data=df_subset,
legend="full",
alpha=0.3 )

Applying TML (12-D to 1000-D via TML):

...
metric = "jaccard"
n_neighbors_max = 100
n_neighbors_min = 2
min_dist_max = 0.99
min_dist_min = 0.0
n_components_max = 1000
n_components_min = 1
min_samples_max = 1000
min_samples_min = 2
min_cluster_size_max = 2
min_cluster_size_min = 2
...
if ( num_clusters_found == 9 ):
genome.fitness = 10000.0 / abs( clustered_COMB_sum_SE + 1)
elif ( num_clusters_found == 0 ):
genome.fitness = -99999.0
else:
genome.fitness = 10000.0 / abs( clustered_COMB_sum_SE + 1) — ( abs( num_clusters_found — 9 ) * 1000.0 )
...
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
New best_fitness_so_far = -2984.7710672587614 1
New best: metric = jaccard
New best: n_neighbors = 98
New best: min_dist = 0.06658783809866256
New best: n_components = 1000
New best: min_samples = 3
New best: min_cluster_size = 2
New best: cluster_selection_epsilon = 0.6658783809866257
OUT: num_clusters_found = 12
OUT: ratio_clustered = 1.0
OUT: clusterer_probabilities_sum = 0.9558447965277097
OUT: clusterer_probabilities_sum_SE = 184.0931575208609
OUT: clusterer_outlier_scores_sum = 0.13366803680011266
OUT: clusterer_outlier_scores_sum_SE = 471.55167569985406
OUT: clustered_COMB_sum_SE = 655.644833220715
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
New best_fitness_so_far = 182.75239956493343 104
New best: metric = jaccard
New best: n_neighbors = 100
New best: min_dist = 0.9899882983797373
New best: n_components = 1000
New best: min_samples = 2
New best: min_cluster_size = 2
New best: cluster_selection_epsilon = 9.899882983797372
OUT: num_clusters_found = 9
OUT: ratio_clustered = 1.0
OUT: clusterer_probabilities_sum = 0.9978606463926271
OUT: clusterer_probabilities_sum_SE = 1.649803561162849
OUT: clusterer_outlier_scores_sum = 0.03316588079964379
OUT: clusterer_outlier_scores_sum_SE = 52.31724504377773
OUT: clustered_COMB_sum_SE = 53.96704860494058
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

“AFTER” Applying TML (1000-D to 2-D via tSNE):

time_start = time.time()
raw_data = fit_HDBSCAN._raw_data
tsne = TSNE(n_components=2, verbose=1, perplexity=100, n_iter=1000)
tsne_results_1 = tsne.fit_transform( raw_data )
print('t-SNE done! Time elapsed: {} seconds'.format(time.time()-time_start))
df_subset['tsne-2d-one'] = tsne_results_1[:,0]
df_subset['tsne-2d-two'] = tsne_results_1[:,1]
plt.figure(figsize=(16,10))
sns.scatterplot(
x="tsne-2d-one", y="tsne-2d-two",
hue="y",
palette=['purple','red','darkcyan','brown','blue', 'dodgerblue','green','lightgreen', 'black'],
data=df_subset,
legend="full",
alpha=0.3 )
...
unique_elements, counts_elements = np.unique(fit_HDBSCAN.labels_, return_counts=True)
print(“Frequency of unique values of the said array:”)
print(np.asarray((unique_elements, counts_elements)))
# Frequency of unique values of the said array:
# [[ 0 1 2 3 4 5 6 7 8]
# [638 893 269 19 41 23 486 159 225]]
threshold = pd.Series(fit_HDBSCAN.outlier_scores_).quantile(0.9)# threshold = 0.09822259079456185outliers = np.where(fit_HDBSCAN.outlier_scores_ > threshold)[0]
sns.distplot(fit_HDBSCAN.outlier_scores_[np.isfinite(fit_HDBSCAN.outlier_scores_)], rug=True)
...

LONG Trade characteristics, avoiding LONG trades 38.5 percent of trading days associated w/ Cluster ID’s 6, 7, 8 and 9.

SHORT Trade characteristics, avoiding SHORT trades 40.3 percent of trading days associated w/ Cluster ID’s 0, 2, 5 and 8.

Summary:

  • Avoid trading LONG 38 percent of trading days associated w/ Cluster ID’s 6, 7, 8 and 9.
  • Avoid trading SHORT 40 percent of trading days associated w/ Cluster ID’s 0, 2, 5 and 8.
  • The market has multiple “personalities”, rendering a single “one-size-fits-all” strategy inadequate!
  • The ability to identify individual market “personalities” enables tailoring strategies to specific “persona” in pursuit of profitability.
  • Risk management dictates knowing “How-to-play”, but more importantly, knowing “When-not-to-play”!
  • A raw data vector of 12-Dimensions was transformed into a 1000-Dimension vector to achieve cluster separation using TML, then transformed back into a 2-D vector using tSNE for visualization purposes.
  • Transdimensional Machine Learning (TML) could be defined as the holistic application perspective viewing data, metric selection/creation, manifold mapping, AI/ML/DL tool selection, and fitness function determination, driven only by the specifics of the intended use-case, and more importantly, independent of concern of the dimensionality of the underlying raw data and manifold mapping dimension.

Inspiration:

About Andrew (Andy) Carl:

The enthusiastic developer of the “GitHub AI Brain-of-Brains” and “GITHUB2VEC” NLP productivity tools. A passionate multi-discipline Aerospace Mechanical Engineer with extensive experience integrating Artificial Intelligence, Hybrid Reinforcement Machine Learning (Hybrid-NEAT), data science and multi-discipline based simulation in Hybrid Reinforcement Learning based Optimization (Hybrid-NEAT), design and analysis of complex air, space and ground-based systems and engineering tool development.

--

--

Andy Carl

A passionate multi-discipline Aerospace Mechanical Engineer with extensive experience integrating Artificial Intelligence & Hybrid Reinforcement ML.