Keras shoot-out, part 2: a deeper look at memory usage

Julien Simon
Sep 8, 2017 · 3 min read
Image for post
Image for post

In a previous article, I used Apache MXNet and Tensorflow as Keras backends to learn the CIFAR-10 dataset on multiple GPUs.

One of the striking differences was memory usage. Whereas MXNet allocated a conservative 670MB on each GPU, Tensorflow allocated close to 100% of available memory (a tad under 11GB).

I was a little shocked by this state of affairs (must be the old-school embedded software developer in me). The model and data set (respectively Resnet-50 and CIFAR-10) didn’t seem to require that much memory after all. Diving a little deeper, I learned that this is indeed the default behaviour in Tensorflow: use all available RAM to speed things up. Fair enough :)

Still, a fact is a fact: in this particular setup, MXNet is faster AND memory-efficient. I couldn’t help but wonder how Tensorflow would behave if I constrained its memory usage. Let’s find out, shall we?

Tensorflow settings

As a number of folks pointed out, you can easily restrict the number of GPUs that Tensorflow uses, as well as the fraction of GPU memory that it allocates (a float value between 0 and 1). Additional information is available in the Tensorflow documentation.

Just take a look at the example below.

With this in mind, let’s start restricting memory usage. I’m curious to find out how low we can actually go and if there’s any consequence on training time.

Test setup

I’ll run the same script as in the previous article (keras/examples/, with the following parameters:

  • 1 GPU on p2.8xlarge instance,

Our reference point will be MXNet: 658MB of allocated memory, 155 seconds per epoch.

Test results

After a little while, here are the results for memory usage and epoch time.

  • No restriction: 10938MB, 211 seconds.


Again, this is a single test and YMMV. Still, a few remarks.

By default, Tensorflow allocates as much memory as possible, but more memory doesn’t mean faster. So why behave like a hog in the first place?Especially since Tensorflow can actually get to a memory footprint similar to MXNet (although it’s really a trial and error process).

This behaviour still raises a lot of questions that trouble my restless mind :)

  1. What about very large models? Would they run out of memory and would I need to tweak the memory setting to make them fit?

Oh boy. More questions than when I started. Typical :) I’ll have to investigate!

All in all, I guess I’m more comfortable with a library like MXNet that allocates memory as needed and gives me a clear view on how much is left, what the impacts are when parameters are tweaked, etc.

Call it personal preference. And of course, MXNet is quite faster too.

Thanks for reading. Stay tuned for more articles!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store