Last week, my childhood best friend Riley Gale passed away. Most people knew him as the lead singer of thrash metal band Power Trip.

Losing Riley has been painful. Riley was an incredible person, and I will miss him profoundly.

I would like to share some memories of Riley from when we were kids.

Riley and I grew up in Dallas, Texas and we went to school together from first grade through high school. We were closest during our elementary school years in the mid 90s.

Riley often played the role of mischievous troublemaker, while I was the nervous sidekick…

UMAP is like t-SNE, but faster and more general-purpose.

When it comes to visualizing high dimensional data, there are a number of options available. The most tried-and-true technique is PCA, which stands for Principle Component Analysis. PCA has been around for over a century. It is fast, deterministic, and linear. Being deterministic and linear means that it’s also reversible. However, this linearity puts a limit on its usefulness in complex domains like natural language or images, where non-linear structure is the norm.

A more recent technique that does capture non-linear structure is t-SNE, which stands for t-distributed Stochastic Neighbor Embedding. This technique is great at capturing the non-linear structure…

In 2009, I was in animation school. One thing that our teachers repeatedly emphasized was how fast the field of computer graphics evolves. If you want to stay employable, you have to constantly learn new tools and techniques. You have to keep up.

In 2011, I switched career paths and enrolled in a software development bootcamp, where I heard a similar story. Software changes fast. Tomorrow’s new frameworks and libraries will make today’s tools obsolete. You have to keep up.

Over the past couple years, as I’ve transitioned from software into data science, the story is the same. Things change…

When dealing with timestamp data in the context of machine learning, it’s important to encode the properties of time so that your model can utilize the information properly.

In many cases, the cyclic properties of time can be relevant to the problem you’re trying to solve. For example, if you’re building a model to predict road traffic, the time of day is an important factor.

One approach might be to encode the time of day as a number between zero and one, where midnight is zero and 11:59PM is 1. Unfortunately, that distorts the proximity of 11:59pm and midnight.

A better way is to represent time of day as a point on the unit circle, using sine and cosine.

The Python code below is an example of how to do this with datetime objects.

Algorithmic bias in machine learning systems has been a hot topic recently, but statistical bias more generally is as old as statistics itself.

In this post, I’ll cover a specific kind of selection bias called Survivorship Bias and some of its causes in the context of database systems.

What is Survivorship Bias?

Survivorship Bias happens when you have data that is the result of a hidden filtering process.

For example, let’s say we are evaluating a weight loss program, and we see that the average weight of participants is 210 pounds before the program, and 170 pounds after the program. …

Dan Allison

Software Engineer, Coffee Drinker, Sketchbook Doodler

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store