Dealing with memory leak issue in Keras model training

Anuj Arora
Dec 3, 2020 · 3 min read

Recently, I was trying to train my keras (v2.4.3) model with tensorflow-gpu (v2.2.0) backend on NVIDIA’s Tesla V100-DGXS-32GB. When trained for large number of epochs, it was observed that there was memory build-up / leakage. What this meant was, as the training progressed, it was consuming more and more disk space until none was left, crashing the job or system.

One look over the internet and it was clear that, this problem has been around for sometime now. Some users, linked the issue to model.predict(), which I had included in my…