Comprehensive tutorial — deep learning to diagnose skin cancer with the accuracy of a dermatologist recently open sourced the core components of its skin cancer diagnostic software. The objective of this effort is to release a free and open source product in early May that has been validated to diagnose skin cancer with dermatologist-level accuracy or better.

To learn more about this project and why is doing this see: diagnosing an image of basal cell carcinoma

This post will:

  1. Build intuition behind the fundamentals of deep learning
  2. Take a broad look at some of the available frameworks and tools in the deep learning eco-system (TensorFlow, PyT🔥rch and Keras)
  3. Dive into clean and simple code has open-sourced that serves as a general starting point for image classification and other tasks in computer vision, and is used to train our production skin-cancer diagnosis models
  4. Get up-and-running with this code, training a deep convolutional neural network to diagnose skin-cancer with dermatologist-level accuracy!

The code we’ll be referencing lives here (contributions are very welcome):


Working with computer vision/image classification is one of the best paths towards a fundamental understanding of deep learning. It will enable you to visualize concepts effectively and build a solid understanding that transfers to other areas of deep learning/AI. In addition, there’s troves of prior work and public data sets, allowing us to get started quickly — learning by doing (how Jobs and Edison did it)!


Read the beginning (sections Intuition behind deep learning, Motivating unsupervised learning, and Unsupervised learning — a concrete example) of a medium post I made in February 2017. Reading the entire article is optional — no need to understand GANs yet:

Let’s consider a deep neural network, referred to as a model in this post, as a black box with millions of tune-able knobs.'s skin cancer diagnosis model has ~25 million of these knobs (parameters).

Our goal then is to train the model to learn the optimal position (value) of each knob (parameter) such that the model transforms a sample of data (input) into its true annotation/label (output).'s skin cancer diagnosis model takes images of dimension 299 pixels x 299 pixels x 3 channels as input (268,203 pixels) and outputs the probability that the image is benign or malignant.

You can imagine there are an infinite combination of possible inputs. So how can the model possibly learn to take pixels as input and predict benign/malignancy probability!? The key:

Deep learning builds hierarchical representations of data.

We call it deep learning because models are composed of many layers (deep). The first layer in a model takes a data sample as input and learns to transform this data into a form that is easier to solve the given task.

Early layers in a neural network learn to represent their input data as a map of simple features (i.e. edges/color gradients in a convolutional neural network trained on images)

The next layer in the model takes the previous layer’s output as its input, and learns to transform this data into a form that’s even easier to solve the given task! As the data flows through the model’s layers, it continues to be transformed in this way.

Why is this concept of building hierarchical representations so important?

The fundamental truth here is that our world is hierarchical by nature — to model it effectively, our model should also be hierarchical.

For the electrical and computer engineers, imagine building a computer using only transistors directly 🤔. Instead:

Transistors => Logic gates (i.e. NAND gates) => Combinational circuits (i.e. multiplexors, comparators, etc…) => Sequential circuits (i.e. flip-flops) => … => CPU, … The hardware industry is enabled by hierarchical art.

For the programmers, imagine building Soylent’s website 😂🤗 in machine code (not even assembly), and remember you have no operating system or anything to build on. Not even a text editor to code in. Again, 🤔. The software industry is also enabled by hierarchical art.

When compared to deep learning, many of the classical machine learning algorithms/approaches seem the equivalent of coding in assembly.

For the art majors 😉, here is a cool 😎, MRI-like, visualization of ResNet50 (the default base model uses in its skin cancer diagnosis model.

How do we train the model to learn these hierarchical representations (or in our knob analogy, the correct position of each of the many knobs responsible for transforming our input to output)? The back-bone of training any deep learning model is back-propagation and gradient descent.

This training is composed of 3 steps:

  1. Feed our input data through the model and compute its given output/label (the forward pass ▶️).
  2. Compare this computed output/label to the data’s ground-truth label and compute and error function (AKA the objective function) that tells us how far off we are 🎯.
  3. Back-propagate this error through our network, tuning each parameter (i.e. knob) in the most optimal way possible (in the opposite direction of its gradient) to minimize the error (the backward pass ⏪).

Repeat this process for all the data in our training set over and over again (AKA gradient descent) until learning plateaus (at this point the model will start to memorize the training data set instead of learning general features).

Convolutional neural networks

There are various types of neural networks that are used to accomplish specific tasks in deep learning/AI. These architectures arise because different data has different characteristics (i.e. images are organized spatially, sound is organized temporally, etc…). We can take advantage of our data’s characteristics and modify our neural network’s architecture to make it easier to train! The important thing to realize is that these architectures are just variations of the standard artificial neural network.

A convolutional neural network makes it much easier to train a model for computer vision tasks such as in this application. The underlying assumption is that features in the data set are spatially invariant (i.e. an object in the upper right hand corner of an image and the same object in the lower left hand corner is still represented by the exact same data). Allowing us to share weights and reduce the dimensionality of our data at each layer in our network. Reducing the parameters the model has to learn (knobs) by orders-of-magnitude (which means much less training data is required, etc…) with no trade-offs in practice. When it comes to computer vision, convolutional networks are the right tool for the job 🔨, and we will be using these to diagnose skin cancer!

Some details

There are other important fundamental concepts in deep learning like activations (non-linearities), error/objective functions (briefly mentioned above), optimizers, regularization, etc… and higher level concepts like transfer learning, adversarial learning, etc… The important thing is that we continue to build up our intuition behind deep learning and learn these concepts as we go.

The following materials have been helpful to me and I’m sharing these in particular because they are very concise:

Deep learning frameworks

We’ll be using Python as it’s the language of choice for deep learning. We’ll need to choose a deep learning framework to work with and I’ll review that below. Deep learning has been an extremely collaborative field with the major players publishing research and open sourcing much of their software.

The frameworks I want to mention are:

  • TensorFlow — Pros: Maintained by Google, massive eco-system, mature and extremely powerful with many features. Cons: Getting very bloated (~6,000 files and ~860,000 loc in
  • PyT🔥rch — Pros: Maintained by FaceBook, minimal and lightweight (~1,000 files and ~120,000 loc in, extremely powerful at its core. Cons: Early in development, smaller eco-system and much less features than TensorFlow.
  • Keras — Pros: A front-end/higher level API for TensorFlow and Theano, easy to use and rapidly experiment with, no loss of control, solid eco-system, documentation and examples. Cons: See 4. Shallow Wrappers, implementation is messy.

Most of's open sourced code is written on top of Keras. Keras is the framework I would recommend to anyone getting started with deep learning. TensorFlow is difficult to use. Starting with Keras will provide the Pros listed above and help you learn to use TensorFlow correctly and to leverage its features (putting you in a great position to migrate to direct usage of TensorFlow in the future if necessary — see Keras’s Cons listed above). Note: Keras is officially set to be merged into TensorFlow.

Diving in

The above repo is a simple, solid and general starting point for image classification tasks. It is built on some simple but powerful concepts:

  1. Transfer learning: As we saw above, deep learning models learn hierarchical representations of data. We saw that the lower layers in a convolutional neural network learn simple and general data representations that should be applicable to a variety of data sets. Therefore, we can re-use the lower layers of a model pre-trained on a much larger data set than ours (even if the data sets are different) as these low-level features generalize well. We do this by freezing the parameters of the pre-trained base model and adding some layers on top of it that will be trained to classify images of skin cancer on our data sets.
  2. Fine-tuning: As we train our model and learning starts to plateau, we can reduce our learning rate and start to make the top layers in our pre-trained base model trainable — fine-tuning them to learn better representations of our specific data set. Just be careful, you don’t want to un-freeze layers too early or else large back-propagated loss gradients will botch the pre-trained weights! And you also want to make sure your model’s learning rate is sufficiently low when you start to un-freeze layers for this same reason.

The code is well documented, read through it! As you come across unfamiliar concepts take some time to learn and understand them.

Getting started

It’s time to get going 🏃! The repo’s README has detailed instructions on starting training! Feel free to create issues, make pull requests and get involved with this effort!

Getting involved

The best way to get involved with this effort is to start contributing on GitHub and to join this project’s Slack channel:

Please share any interesting results/research you do on this project the best way you see fit (i.e. a GitHub contribution, a blog post, updates in Slack, etc…).

If you want to get involved in other ways (i.e. contribute data, sponsor this project, help validate our algorithms, or whatever) please get in contact with me!

About is a company whose vision is a world where medical conditions are addressed early on, in their infancy. This approach will shift the health-care industry from a constant fire-fight against symptoms to a preventative approach where root causes are addressed and fixed. Our first step to realize this vision is easy, accurate and available diagnosis. Our current focus is concussion diagnosis, recovery tracking and brain health monitoring. Please get in contact with me if this resonates with you!