Image for post
Image for post
Photo by Paola Aguilar on Unsplash

Docker has become more and more trendy in recent years. Compared to a traditional virtual machine, it is more lightweight and portable to use. However, the learning curve can be a little steep for people who are not familiar with this type of concept. Below are twelve basic commands for you to get started.

The “docker images” command lists all the docker images.

I deleted all the docker images stored on my local machine before I ran “docker images”, so my output looks like this:


Image for post
Image for post

In this post, I am going to walk you through a simple exercise to understand two common ways of splitting the data into the training set and the test set in scikit-learn. The Jupyter Notebook is available here.

Let’s get started and create a Python list of numbers from 0 to 9 using range():

X =  list(range(10))
print (X)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Then, we create another list which contains the square values of numbers in X using list comprehension:

y = [x*x for x in X]
print (y)
[0, 1, 4, 9, 16…


Image for post
Image for post

A great part of my research is devoted to improving the accuracy of binding affinity prediction of drug candidates. You may think that the medicines we take are all made of substances with overwhelmingly complex chemical formulas. On the contrary, orally active drugs are often relatively small molecules (molecular mass <500 daltons) with rather simple structures (10 or fewer rotatable bonds), according to the famous Lipinski’s rule.

Below shows one of the most important drugs ever developed in human history: penicillin. More accurately, it is penicillin G, one of the many derivatives of penicillin (derivatives are a group of compounds…


Image for post
Image for post

There is currently a prediction competition about the survival chance of Titanic passengers going on at Kaggle. It immediately grabs my attention becuase Titanic is my favorite movie of all time (my second favorite is the original Jurassic Park).

Molly Brown: Hey, uh, who thought of the name Titanic? Was it you, Bruce? J. Bruce Ismay: Yes, actually. I want to convey sheer size, and size means stability, luxury, and above all, strength. Rose DeWitt Bukater: Do you know of Dr. Freud, Mr Ismay? His ideas about the male preoccupation with size might be of particular interest to you. J…


In this post, I am going to illustrate how to use logistic regression, combined with the “one-hot-encoding” techniques to reveal certain interesting facts from the UCI adult income dataset. This dataset can be downloaded from here, along with the data description and some basic analysis.

My Jupyter Notebook is available here, and the analysis is built on a post written by Valentin Mihov. I then implemented the one-hot-encoding technique for further data interpretation.

The first step is to load the csv data into Pandas DataFrame:

income_df = pd.read_csv("data/income_data.csv")

print list(income_df)
print income_df.shape
Output:['age', 'workclass', 'fnlwgt', 'education', 'educational-num', 'marital-status', 'occupation', 'relationship'…


Your left your data science classroom, real or virtual… with a fancy Jupyter notebook saved to your laptop. You felt like the time and the dollars you spent may be well worth it … But deep inside you were insecure: What is this analysis for? What did it really tell me?

To illustrate how to make your data analysis sexier and more meaningful, I will give an example using the famous Iris dataset, analyzed with a classic supervised machine learning algorithms: support vector machine (SVM), as well as a frequently used dimensionality-reduction technique: principle component analysis (PCA).

The Iris dataset…

Julie Yin

Computational Scientist, Ph.D.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store