One of the best qualities of machine learning is that it applies to almost every type of work imaginable. Healthcare especially has been affected as the field has seen advancements in genomic sequencing, object recognition systems to detect cancer, and drug discovery. The healthcare industry has a plethora of data that could be used to revolutionize the way we think of medicine. I recently came across a piece of that data on Kaggle; they were hosting a competition to predict the probability of a patient dying in a hospital. …


I recently decided to participate in a Kaggle competition where the goal is to predict where or not a tweet is referencing a disaster such as a terrorist attacks, natural disasters, and acts of war. The dataset comes with a 5 columns but this article will focus on only using the text to predict the target variable.

The data comes in a CSV so we can use Pandas to load it as a DataFrame, which we will pass into TensorFlow, lastly we need TensorFlow Datasets so we can encode the text. The from_tensor_slices method allows us to convert the Pandas…


I was looking for a new project to work on and decided the best place to start would be to look on Kaggle. I checked out their competitions and saw an interesting problem analyzing twitter data to predict if a disaster was occurring. It was a very simple CSV with only 5 columns:

  1. Id — just an extra index starting at 1
  2. Keyword — A one word topic of the tweet
  3. Location — The location of the text
  4. Text — The text of the tweet
  5. Target — the y_label, 1 for a disaster and 0 otherwise

Since the data seemed…


Using Dynamic Programming to find the optimal policy in Grid World

In two previous articles, I broke down the first things most people come across when they delve into reinforcement learning: the Multi Armed Bandit Problem and Markov Decision Processes. Most games satisfy the requirements of a Markov Decision Process, and the game in particular that we will be using is called Grid World. For those who are not familiar, the environment is a two-dimensional grid with each tile/node either having a value (usually 0 but some variations have negative values), a dead space, or a reward which ends the game (called a terminal state). The agent is able to move…


In my previous article I discuss my first attempt at reinforcement learning by using it to tackle the Multi Armed Bandit problem. This problem covers the explore exploit dilemma, which is a tradeoff between exploring the environment to better understand each possible action and choosing the best action to maximize our reward and minimize our loss. I used one of the simple algorithms known as epsilon-greedy to teach an algorithm to play tic-tac-toe. The algorithm explores all possible states of tic-tac-toe and then assigns predicted rewards for each move it can make.

As I continued to learn more, I soon…


Going back to the beginning of reinforcement learning and starting with a gambling problem.

As a Data Scientist who has always been obsessed with video games and robotics, it was no surprise that reinforcement learning drew me in immediately. It seemed like such a fun thing to build that instead of being practical and starting with the basics, I tried to start with a Neural Network to play Super Mario Bros. As most of you can guess, this did not turn out well. After searching for a good introduction to reinforcement learning, I came across the Multi Armed Bandit Problem.

The multi armed bandit problem goes as follows: Suppose you are in a casino…


Using Deep Learning to bring black and white photos to the present.

Ever since I started learning about data science and machine learning, there has always been one algorithm that continually grabbed my attention: Generative Adversarial Networks (GANs). So much so that the second blog I ever wrote covered, in detail, how these models work and what they can create. When I first learned about them, most of the articles (including my own) only covered how they are able to take in a vector of random noise and produce life-like photos.

Shortly after I wrote that blog article I came across Cycle-GAN’s (Zhu et al.) which allows image to image translation. This…


Neural Networks have exploded in popularity the past couple of decades, and because of this we have adapted multiple variations to accomplish a wide range of tasks. Although most people have only recently discovered these amazing machines, they date back further than most of us. Going back we can trace the first example of a neural network back to 1958 when psychologist Frank Rosenblatt invented the Perceptron.

Source: Research Gate

The Perceptron was a simple model that was designed to model the human brain, and it was the beginning of what we know today as Neural Networks. These algorithms continued to be worked…


Automated Machine Learning(AutoML) is currently one of the explosive subfields within Data Science. It sounds great for those who are not fluent in machine learning and terrifying for current Data Scientists. The way AutoML has been portrayed in the media makes it seem capable of completely revolutionizing the way we create models by removing the need for Data Scientists. …


In statistics, when trying to compare samples, our first thought is to perform a student’s t-test. It compares the means of two samples (or a sample and population) relative to the standard error of the mean or pooled standard deviation. While the t-test is a robust and useful experiment, it limits itself to comparing only two groups at a time.

In order to compare multiple groups at once, we can look at the ANOVA, or Analysis of Variance. Unlike the t-test, it compares the variance within each sample relative to the variance between the samples. Ronald Fisher introduced the term…

Justin Tennenbaum

Data Scientist at Flatiron passionate about Math and Technology

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store