Getting Started with AI: How to deploy an image classifier using HuggingFace? (Part 1)

Ada Choudhry
9 min readJul 29, 2023

--

Top-down approach to learning tricky DL concepts, to make an image classifier in just 10 minutes! This tutorial is best for those who get overwhelmed with coding neural networks.

Our dog-cat image classifier on HuggingFace. Image Source: course.fast.ai

Hi!!! Welcome to Getting Started with AI series!

Wait, when did it become a series?

Well, last time I built an RNN from scratch and wrote about my learning process, it turned out to be quite cathartic. If you’ve read my previous post, you know I have struggled with programming. So, writing all my learnings, both in ML and about myself, not only cemented my understanding of some abstract concepts but also gave me insights into how I learn and what challenges I still have to overcome.

Apart from improving my understanding, these articles also help me battle impostor syndrome. Being open about my struggles in programming has helped me find ways to overcome them by asking for help and resources. Bringing your fears to light can often help in getting rid of them. I have also gotten positive responses about my articles, which has helped me realize, that wait, I am not alone in these struggles. Feeling like I was the only one struggling with errors kept me back from pursuing machine learning, but it is quite far from the truth.

So, if you are struggling with deep learning, we can get through this together!

So, how is my love-hate relationship with programming right now?

It is definitely more on the love side as the model I built last time actually worked (though it gave me gibberish results the first 2 times I trained it) and my laptop did not display any red lines showing how I had made *grave* errors.

My first time coding on my own was through a project in Computational Neuroscience. As my topic was niche, the Python libraries were totally new and there were no reliable tutorials to help me. As I eventually left the project and the overall field later, all I really remembered about coding was frustration and feeling overwhelmed. It was like beating my head against a brick wall. Not pretty.

So I wanted to test this newfound admiration I had for AI by building more projects. I initially started replicating a transformer but its architecture kept getting more and more interesting and complex, and I found myself going down a rabbit hole.

Thankfully, Daniel Ching recommended a fast ai course , after reading my RNN article, that aims to teach deep learning in a practical manner *this is music to my ears* using the top-bottom approach to learning.

Here is my experience with top-bottom learning:

  1. There is never a moment of ‘why am I doing this?’, so it keeps you on your toes
  2. The flip side is that it can also be confusing at times because we are so used to the bottom-up approach of learning where we put together pieces at the end, but in this approach, you have a vague understanding at the start and eventually, the fog starts clearing. But it can be unsettling to not know the details of how things are working initially (it was definitely for me!).

In the fast.ai course, Jeremy Howard teaches through recorded lectures which are about an hour long. I find them helpful to give me a broad overview of what we’re going to be working with. Along with the video, there are also associated chapters of his book called Deep Learning for Coders with fastai and Pytorch which helps me get a deeper understanding of what was taught in the lectures.

The fun part is that the book is available for free as Jupyter Notebooks, which means you can run the code yourself to get the results. As my laptop doesn’t have an NVIDIA GPU, I have been using the free versions of Google Colab and Kaggle.
There is also a small quiz at the back which helps you recall the interesting and important bits from the chapter.

The last part of the homework is to experiment with the Kaggle Notebook Jeremy taught in the lecture.

This course is great for those who are looking to get a practical understanding of deep learning in a short amount of time. This course was updated in 2022 so it is quite relevant as well!

Tip: Use ChatGPT to help recall certain concepts. But try recalling it yourself first!

While writing this article when I found myself struggling to break down a line of code further, I used ChatGPT to help me understand the fundamentals better.

Let’s dive into deploying our image classifier!!

This tutorial consists of two parts:

  1. Creating our image classifier
  2. Deploying it on the web using HuggingFace (such a cute name!)

In this tutorial, we will be tackling the first part!

Creating our image classifier

This image classifier can distinguish between cats and dogs. You can duplicate the original Kaggle of the code by Jeremy Howard, to build alongside me!

This tutorial is quite short and easy to implement, so let’s get started!

We will be using the fastai library which is built on top of PyTorch and is more intuitive to work with for beginners. It aims to make deep learning more accessible and easier to use.

This command ensures that we download the latest version.

import all the classes

Next, we will download pictures of dogs and cats from a URL to decompress them.

This is our labeling function. If the image (x) is a cat, it will return the filename with the first letter as capital. This allows the model to differentiate between cats and dogs.

Now, we will be creating our DataLoaders class, which creates our dataset. DataLoaders is a thin class that just stores whatever DataLoader objects you pass to it and makes them available as train and valid. Although it’s a simple class, it’s important in fastai: it provides the data for your model and builds the dataset. It takes care of creating data loaders, which are responsible for loading batches of data efficiently during the training process.

The DataLoaders class simplifies the process of creating data loaders by providing a high-level API to handle common data-related tasks, such as data augmentation, shuffling, and batching. It is particularly useful when working with structured data, image data, text data, or any other types of data that need preprocessing before being fed into a deep learning model.

Jargon: Data Augmentation

Data augmentation refers to creating random variations of our input data, such that they appear different but do not change the meaning of the data. Examples of common data augmentation techniques for images are rotation, flipping, perspective warping, brightness changes, and contrast changes.

A DataLoaders includes validation and training DataLoaders. A DataLoader is a class that provides batches of a few items at a time to the GPU.

Here, we have used Image Data Loaders which is a class specifically for image data files.

Let’s break this function down:

  • get_image_files(path) : This function takes a path, and returns a list of all of the images in that path (recursively, by default). We had previously defined path as the variable where all the images of cats and dogs are decompressed and stored.
  • valid_pct = 0.2: From the images we have passed, 20% of them will be randomly stored in the validation dataset. (There are two types of datasets we use while building models: training data, which is used to train the model to make predictions, and validation data, which is used to measure the performance of the model against the correct labels. There is also another subset called testing data, but we can tackle this later).
  • seed = 42: This allows us to use the same validation dataset in all our iterations. This way if we tweak the model later, we will know the changes in the results were because of those changes and not due to the changes in the validation dataset. Computers don’t really know how to create random numbers at all, but simply create lists of numbers that look random; if you provide the same starting point for that list each time — called the seed — then you will get the exact same list each time.
  • label_func = is_cat: This function is a call to create labels for our dataset. In this, we have passed the is_cat function which we defined earlier.
  • item_tfms = Resize(192): Transforms each image into a size of 192 by 192 pixels. This ensures all the images are of the same size and reduces computational workload.

Now that we have our dataset, we can move on to the architecture. And this is why this tutorial is so easy to implement. Instead of building and training a model from scratch, we access pre-trained models which have been trained on huge datasets and fine-tune them according to our requirements.

Using a pre-trained model for a task different from what it was originally trained for is known as transfer learning.

There are many advantages to this approach:

  • It uses less computational power and time.
  • It is easier to implement for beginners.
The code and output of training our model in Kaggle

Let’s break down this line of code:

  1. vision_learner: This is a function from the fastai library that creates a vision learner for computer vision tasks. It takes several arguments, including dls, resnet18, and metrics.
  2. dls: This variable contains our data loaders, which are responsible for loading and preparing your data for training, validation, and testing.
  3. resnet18: This is the architecture of the model being used for transfer learning. ResNet-18 is a popular pre-trained model for computer vision tasks, consisting of 18 layers.
  4. metrics: This parameter specifies the evaluation metric(s) to be used during training and validation. In this case, the error_rate metric is used. The error rate represents the percentage of misclassifications, which is a common metric for classification tasks.

A vision learner, in the context of the fastai library, is an object that facilitates the training and evaluation of deep learning models for computer vision tasks. It is a high-level abstraction built on top of PyTorch that simplifies the process of creating, training, and fine-tuning neural networks for image classification, object detection, segmentation, and other computer vision applications. It’s a powerful tool for researchers and practitioners working on computer vision problems who want to leverage the latest deep learning techniques without getting bogged down in low-level implementation details.

Finally, we can export our deep learning model to later upload to HuggingFace.

You can download this file from the output section in Kaggle. If you’re using Colab, you can download the file from the Files section.

And that’s it!

The next part is to deploy our model on the web. But for now, celebrate the creation of your image classifier!

Skills I practiced:

  • Patience: It took time to understand some of the theory (even though it was lesser than previous tutorials) behind the code, but I told myself that there is no substitute for putting in the work to make progress.

TL,DR

  • Using a pre-trained model through transfer learning can help with learning all the major deep learning concepts without getting into the complexity of the code.

I hope this tutorial was helpful! I’ll be back soon with the second part.

But until then, keep building, keep learning!

Resources

https://www.kaggle.com/code/adachoudhry/saving-a-basic-fastai-model

Chapter 2 of the book associated with the course: https://github.com/fastai/fastbook/blob/master/02_production.ipynb

--

--