Dog Not Dog

Roshan Noronha
Algorithms For Life
10 min readSep 25, 2019

In 2017, I completed my first machine learning project — a number recognizer.

My goal after completing that project was to take what I had learnt and use it to make an app that only recognized hotdogs. Why? Check out the video below.

I started with the best of intentions but I’m ashamed to admit that I slacked off and ended up giving a TED talk.

Dunno if they chose the worst possible thumbnail on purpose

So overall, 2018 was a very busy year.

Jokes aside, another reason for using hotdogs was the fact that hotdog datasets are much smaller in size. This means that it’s a lot harder to create and train a model that can accurately classify “hotdogs” and “not hotdogs”. As such, the approach used to make this classifier can be applied to solve other problems, even when a lot of data isn’t available.

Examples on using machine learning on more niche examples such as to find novel star systems (left) or to identify tumours (right)

Before I made this hotdog recognition app, I wanted to start with an easier example. As such, I went with a classifier that could only recognize dogs, just to keep things simple. I’m not a machine learning expert so this helped to minimize errors. And believe me, even with this “simpler” approach I still ran into a number of issues.

Speaking of approach, I used a different one for this project . Previously, I had used a recurrent neural network. For this project, a convolutional neural network (CNN) was my network of choice. CNN’s are a type of neural network that are really good at recognizing images. What I find very cool about them is the fact that they are based on the mammalian visual cortex. They “see” images just like you or I do!

A recurrent neural network
A convolutional neural network

If you’d like to learn more about the differences, there are a number of great explanations available. I’ll link those below.

Before I start any project I usually break it down into a list of manageable steps. Here’s what the steps for dognotdog were.

  1. Implement a basic CNN from scratch that can classify dogs.
  2. Use transfer learning to increase the accuracy of the dognotdog CNN.
  3. Create a Shiny webapp that allows users to input a picture and uses the trained model to classify the image.

It’s important to note that I’m not a machine learning expert. I basically tried a bunch of things out and figured out what to do as I went along. So definitely take anything I mention here with a grain of salt.

Since I don’t have a super powerful computer and I’m too cheap to pay for AWS, I used Google Colab to train my models. The great thing about Colab is that I was able to train my models using a GPU instead of a CPU, for free!

While there were challenges every step of the way, I won’t be explaining things step by step. I found that even with tutorials, I still ran into issues. So I’ll be focusing more on what went wrong, why it went wrong and the thinking needed to overcome those challenges.

Of course, if you don’t like this approach then tough shit. It’s my project and I can do what I want.

Implementing a Basic CNN

The goal for this step was to train a CNN to recognize dogs. So, the dogs vs cats dataset on Kaggle came in handy.

Since there were two categories in this dataset, dog and cat, my thinking was to use binary cross-entropy during training. As you’ll see in the next section, this decision ended up being problematic.

Compared to the other parts of this project, creating this basic CNN ended up being fairly straightforward. After training it ended up being ~79% accurate. I’ve attached the Jupyter notebook I used for this part, below.

Using Transfer Learning to Optimize a CNN

As someone who appreciates not having to do unnecessary work but is definitely not lazy, the idea of transfer learning really appealed to me.

Basically, you take a model that has been trained on a different dataset and reuse it for the task that you want. Makes things a lot more efficient.

I started by modifying the code used to make the basic CNN. Keras offers a number of pretrained networks to choose from so after a bit of research I went with VGG16.

The layers of the imported VGG16 model

From there, all I needed to do was to freeze the pretrained layers before adding a dense layer with two output nodes at the end.

The final model with the dense layer as the output

This approach resulted in two errors.

Error 1: The trained model was not able to be loaded after training

Error 1

Error 2: During training, the accuracy stayed around 50%.

Error 2

Let’s focus on error 1 for a minute. When I read in the trained model I found that I could not load it. Googling around, it sounded like there was an issue either with the version of keras I was using or with the first layer of the network.

Downgrading to an earlier version of Keras didn’t work but changing the input layer did. In retrospect, it’s funny how simple the fix ended up being.

vggmodel = VGG16(include_top = False, weights = “imagenet”, input_shape= (img_width, img_height, 3))

Simply setting the include_top parameter to False fixed the issue. As I understand it, this allows you to define the input size rather than go with the default size of the pretrained network.

Error 1 fixed!

Error 2 was much more complex to figure out. As a refresher, since there were only two categories, dog or not dog, I was using binary classification. However, this resulted an accuracy of ~50%.

The great thing about having a basic foundation in machine learning is that I had a couple ideas on which parameters to tweak. Thanks Andrew Ng!

The original settings that were potentially causing issues

I played around with changing four settings.

  1. The learning rate.
  2. The batch size.
  3. The number of iterations over the entire training set.
  4. The number of batches used for training per epoch.

To ensure things didn’t get too confusing, only one parameter was changed at a time, while the other 3 remained consistent. At the end of the day however, the accuracy still hovered around 50%.

As a last resort I changed the loss function from binary to categorical cross-entropy and …..success! But why? I reached out to friends and colleagues but the explanations were varied and didn’t make a lot of sense. So at this point, I was a stumped. When this happens I usually stop coding and then do one of three things. Get a drink, go out for a run or grab some spicy dumplings.

After a plate of these you’re practically invincible

A plate of dumplings later……back to the solution.

To figure out the why, experience has taught me to take a closer look at the documentation. Going back and reading the basics can be incredibly helpful and saves time in the long run. The official keras documentation as well as the source code on Github helped to provide insight.

It turns out that when binary cross-entropy is used, the implication is that every input is associated with multiple outputs. For example, a dog could be grouped with other dogs as well as grouped with other mammals. Technically, it fits into both groups.

The assumption with categorical cross-entropy is that given multiple classes, only one is correct. For the purposes of dognotdog, categorical cross-entropy has to be used as there is only one correct answer.

Based on that, I modified my original approach.

train_gen = datagen.flow_from_directory(train_data, target_size= (img_width, img_height), batch_size= 25, shuffle = True, class_mode= “categorical”)validation_gen = datagen.flow_from_directory(test_data, target_size= (img_width, img_height), batch_size= 25, shuffle = True, class_mode= “categorical”)model.compile(loss = 'categorical_crossentropy', optimizer= 'adam', metrics = ['categorical_accuracy'])

Modifying the class_mode parameter in the training and validation generators to categorical and specifying categorical_crossentropy for the loss function fixed the issue!

Accuracy is now closer to 90%

Although the accuracy was closer to 90% and definitely an improvement I’m sure that by tweaking a number of parameters this could be improved further. If I had the time I would definitely play around with the the learning rate, batch size and a number of other parameters.

But for now I think it’s safe to say….error 2 solved!

A Shiny Webapp that Classifies an Image

Making a webapp can be a daunting task. You have to know HTML, CSS and a bunch of other frameworks. And since I’m not that kind of programmer, learning all of that isn’t the best use of my time.

Fortunately, I have some experience with this. A couple years ago, I had made a webapp using R and Shiny as part of a hackathon. And I had made a couple more since then.

Shiny is an R package that lets you develop interactive webapps. I like using it because as long as you know R, you really don’t need to learn HTML, CSS or other frameworks.

That being said, I wasn’t sure how to actually use my trained model in R. All my code so far had been written in Python and I had no idea how to convert it into something R would understand. Luckily, during the ungodly amount of time it took to complete my undergrad I learnt one, very important lesson.

Specific questions get specific answers.

  1. Can I use Keras and Tensorflow in R?
  2. How can I run Python code in R?

Here’s the result of question 1.

And the result for question 2.

…….well that’s lucky.

Before we get into how to use all this you should set up R, RStudio and Python in some kind of Linux/Unix environment. I used a virtual machine running Ubuntu 16.04.

DO NOT USE WINDOWS!

You’ll be shaking your fist at the cold, cruel world when things don’t work.

You may also run into issues where packages do not load due to missing .h files. Running the following code in the command line should help.

sudo apt-get install libxt-devel xvfb xauth xfonts-base libcairo2-dev libgtk2.0-dev

To set this up, I started with installing and importing Shiny, Keras, Tensorflow and Reticulate into R.

install.packages("shiny", “keras”, “Tensorflow”, "reticulate")library(shiny, keras, tensorflow, reticulate)py_install("Pillow")
PIL <- import("PIL")

From there I ran the same python functions that were in my Jupyter notebook. And this is where I ran into my first error message.

ImportError: Could not import PIL.Image. The use of array_to_img requires PIL.

Now ordinarily I like pickles but this one, not so much. Pillow(PIL) is a python package but it wasn’t being loaded despite reticulate being installed. My guess was that R didn’t know where to look.

Now I could have given R the file path where Python was installed. However, since this application would eventually be hosted on another server, that would have been problematic. This is where the idea of a virtual environment came in. When the app was run, reticulate would create a local environment that contained Python along with all the needed libraries.

#create virtual environment
virtualenv_install(“env”, packages = c(“Pillow”, “keras”, “tensorflow”), ignore_installed = FALSE)
#import python libraries
PIL <- import(module = “PIL”)
keras <- import(“keras”)

Essentially using virtualenv_install allows for an environment that contains the Pillow, keras and tensorflow to be created. Once that is done, import can be used to load those packages into R.

Everything should work but if not, restart RStudio. If that still doesn’t work, cry for a bit then send me a message.

The completed application was uploaded to ShinyStudio!

This output == SUCCESS!

After all that work, here’s the end result.

Some last minute notes.

  • At the time of writing this, Tensorflow 2.0 was released. So if you use my code and run into issues, that’s probably why. When installing Tensorflow, I recommend you stick to version 1.14.
  • This app is 92% accurate at distinguishing dogs from cats. If you show it another animal, results may vary. In the future, I’d use a more comprehensive dataset with more animals to get around this.

If you enjoyed this article, and want to read more, just click the link below…

… or drop by my website to check out my research or to send me a message!

If you have any comments or suggestions about this article, feel free to leave a comment!

Thanks for reading :)

--

--