Machine Learning in 30 minutes with Python and Google Colab

Yosi Kristian
5 min readApr 18, 2019

--

This article is from my sharing session in STTS on 10 April 2019.

Machine preparation for machine learning can be a nightmare. Installing dependencies, deprecated methods, space consuming, and frequent changes in libraries can be very difficult for beginners.

Google provides a very convenient platform to try machine learning: Google Colaboratory (Google Colab).

Google Colaboratory is a free Jupyter Notebook environment that does not require any settings and installation and entirely runs on Google’s cloud environment.

With Google Collaboratory we can write and execute Python code efficiently through a browser (even from a mobile browser).

We can also share code and results easily through Google Drive.

Let’s dive in, and try Google Colab to solve a simple machine learning problem. Go on and head to:

https://colab.research.google.com

Google Colab Initial Interface

Create a new notebook:

Google Colab use Jupyter Notebook:

The jupyter notebook is a part of Project Jupyter, a nonprofit to develop open-source software, standards, and services for interactive computing across dozens of programming languages beginning with Julia, Python, R, and now over 70 languages.

Jupyter notebook have general notebook functionality:

1. a word processor — handles formatted text (Using Markdown)

2. a “kernel” — execute programming language code and includes output inline.

3. a rendering engine — renders output in HTML, in addition to plain text.

Jupyter Notebook can contain multiple cells, and each cell can be a code cell or a text cell.

Codel cell can be executed individually by pressing the play button on the left of each cell.

Text Cell and Code Cell

In Jupyter Notebook, variables and function are globally defined and can be used across cells.

variable
function declaration and function call

Simple Iris Classification Problem

In this case, we will try to classify Iris Dataset into three classes (Iris Setosa, Iris Virginica, and Iris Versicolor) based on four attributes: Sepal Length, Sepal Width, Petal Length, and Petal Width.

Iris dataset description can be found in http://archive.ics.uci.edu/ml/datasets/iris

Iris data can be downloaded from http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

Data Preparation

First, we must import necessary libraries

By using pandas, we can download dataset from a given URL and convert it into a dataset.

the output of dataset.info()

Let’s analyze our dataset. Use dataset.head(n) to display top n data. Change dataset.head(n) to dataset.sample(n) to display random data.

Now that our dataset is ready we can separate input features (x) and target class (y). Input feature will be 150x4 matrix (150 data x sepal_length, sepal_width, petal_length, and petal_width) and target output 150x1 (iris_class).

separating input (x) and target (y)

In this session, we will use a Multi-Layer Perceptron (MLP) Classifier. We need to encode our target attribute for Neural Network based classifier into one hot format. We can do this by calling the Pandas method get_dummies(y). With this method, we will convert:
Iris setosa: 100
Iris versicolor: 010
Iris virginica: 001

Now that our input and target are ready, we can separate our training and testing set by using scikit learn method train_test_split().

you should get these dimensions after splitting

You can try a smaller or bigger test set by changing the test_size parameter.

The machine learning part

For our machine learning, we will use the sklearn implementation of Multi-Layer Perceptron (A neural network architecture): https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html.

Our neural-net will be 4 x 10 x 5 x 3 so hidden layers are (10 units and 5 units).
Set our max iteration to 2000 to train for 2000 epoch, and alpha to 0.01 to set our learning rate.
Set verbose to 1 to log your training process.
Random_state is used as a random seed so we can get the same output.

Training Log

After finishing the training process we can use our trained machine learning by using model.predict() method. To get our classification result we can import classification_report from sklearn.matrix and call classification_report(real_target, prediction). To show results in confusion matrix and accuracy you also need to import them from sklearn.matrix.

This is the results

Last, if we want to show our machine learning loss history we can plot loss_curve_ property of our trained model.

Loss history of our training

So there you go, you can write a simple machine learning code in just 30 minutes using Python and Google Colab.

To see my colab in bahasa Indonesia visit https://colab.research.google.com/drive/1bQUQ6KuS2E3ryUNu3A_q7iut4vb8t1-D

--

--