Aesthetic Data Visualization
a selection of works (media: python + seaborn)
MNIST data (a bunch of handwritten integers 0–9)
MNIST bottlenecked through an autoencoder, the visualizations are a heatmap of pairwise distances between the data points in the encoding space. Blue is close, black is in the middle, red is far. Note the distinct blue squares along the diagonal — these are the pairwise distances between data that are the same digit. One can clearly see how many distinct classes there are (digits 0–9).
NBA Stats
The following data comes from NBA stats. The original data is 13 dimensional vectors corresponding to 13 chosen stats per player per game. These are t-SNE projections of said data, where the coloring is based on k-nearest neighbors.
This graph is generated using the average values for each player, i.e. each point is a player.
This is a combination heatmap/hierarchical clustering diagram for relative defensive strength between NBA teams for the 2019 season. The heatmap is based on the clustering distance.
This is a pair plot between the point distributions over selected players from LAC and DEN over the 2019 season. The (x,y) of each dot are the oversampled points scored by (LAC_i, DEN_j) against each other using a multi-dimensional kernel density estimator. For example, a single scatterplot could be the oversampled distribution between Leonard and Murray. The colors are clustering results — with the interpretation that each cluster represents a particular style of play with respect to the player in question. The derivation is complex but this was ultimately extremely helpful.
Market Data
The following is a graph of treasury yield curves over time (time is represented by color). This was a study for the pandemic fueled rapid market change from Jan 2020 through the end of April 2020. Yellow is April, Purple is January.
This was a study examining the discrepancy between Open-to-Open quotes vs. Open-to-Close quotes for 5min data (for SPX) over a 30 day period. This is represented by the two frequency vs. pct change distributions overlaid upon each other.
This is an example of a pairwise difference heatmap for possible buy/sell points at 1min intervals for TSLA around Nov 2021. This was a necessary input for my optimal buy/sell algorithm (finding optimal buy and sell points given a specified number of transactions), perhaps the most practically useful example of the bunch.