Exposing Machine Learning

Part I: Machine Learning is “easy”

Blair Hudson
4 min readJun 26, 2017

Welcome to Locally Optimal! 👋 It’s great to have you here. Together we are going to explore the human side of artificial intelligence, starting with the mystical machine learning. We’re going to do so with real examples.

By the end of this series, you certainly won’t be feeling like this... Fingers crossed 🤞

For anyone new to programming, we’re going to keep this simple and explain each line — so you won’t have to guess what is going on! For those who do want try this at home, you can run the examples yourself. (Instructions below!)

Ok. To start things simple, we’re going to build a machine learning in just five lines of code (and the only five lines of code we’ll look at in this post):

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
data = load_breast_cancer()model = LogisticRegression()
model.fit(X = data.data, y = data.target)

Easy, right? If you’re already well acquainted with Scikit-learn and the principles of machine learning then maybe so. Less so if you have general programming experience in Python or another language.

And if you’re part of the vast majority that has not yet had the chance to learn about any of these things, then you’re in for a treat.

In the next code block, we repeat the five lines from above with some commentary to explain a little bit about what is going on:

# Comments, like this one, are lines that begin with a '#' 
# They help us describe what is happening in the code,
# without changing the way the code executes
# On the next two lines, we import some functions from the
# popular Python machine learning package Scikit-learn

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
# Now by calling load_bread_cancer() we can load a sample
# dataset into a variable named 'data' to use later on

data = load_breast_cancer()
# Next we create a LogisticRegression model and store it in
# a variable aptly named 'model'

model = LogisticRegression()
# Finally, we call fit(), which is a function provided by
# the LogisticRegression model. This function fits our
# model using features X and labels y from our dataset

model.fit(X = data.data, y = data.target)
# ... and we're done! *emoji clap*

To summarise, in these five lines of code we do three important things:

  1. Load some useful functionality from the Scikit-learn machine learning package
  2. Import a sample dataset to build our model, which includes a number of patient observations with various data features, each labelled with breat cancer occurrence
  3. Build a model using the patient data (making lots of assumptions), which is able to predict cancer occurrence using the various data features captured in the sample dataset

If you’re thinking ‘that was too simple’, ‘how do we know if the model is any good?’ or what do you mean assumptions?’, then great!

If instead you’re thinking ‘wtf was all that about, Blair? 🤷‍♀️🤷‍♂️ — better again!

We will cover this in more detail and more in subsequent parts of this series! For further technical reading and later reference, underlined code elements link to their corresponding documentation in the brilliant Scikit-learn documentation.

That’s all for now! ✌️

Want to try this at home? 👇

To follow along with the code examples, you’ll need to install Anaconda from here (graphical installer recommended, version 4.3.21 or newer). Anaconda is a bundled distribution of many common data science tools and the Python programming language.

Everything you need to follow along this series is included, including the Jupyter Notebook environment! When you’re done installing, type jupyter notebook (with a space) in your command line, navigate to http://localhost:8888 and create a new Python notebook to get started.

--

--

Blair Hudson

Data Science explorer, DevOps warrior, loves to build cool things and write all about it