Profiling Keras Model Using TFprofile and CProfile

Shubham Agnihotri
Analytics Vidhya
Published in
3 min readNov 29, 2019

While exploring some challenges in Machine learning I stumbled upon a challenge by Stanford called the Dawn Benchmark. This Challenge was to get 94% of test accuracy in the least amount of training time and cost. There are three challenges to this problem: Getting the model right(Being Accurate), in the record time(Being Fast) and with the limited cost(Being Efficient). Here i will be showing the approach on how to be fast. To do so you first need to understand where the network is consuming time and then make necessary changes to reduce time.
Thus to timestamp the code I will be using TFProfile and CProfile.

Model Used…

The Model used for Profiling the code.

Profiling Using TFProfile

Using TF Profile to profile the model.

Tensorflow provides its own profiling module called TFProfile. This module allows to record time for each operation with ease. The visualisation can be done using tensorboard.

Understanding tf.keras.callback.TensorBoard Parameters

Here, we have used two parameters i.e.: log_dir and profile_batch. log_dir provides the location of the logging directory and profile_batch is to select which batch to profile, by default it is 2nd batch because 1st batch take more time as compared to the rest due to various initialisation.
There are various other parameters which can be added to the function call based on the requirement. For more details over the function and its parameter check out:- https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard

As Colab currently does not run tensorboard, hence to visualise it,the log file has to be zipped,downloaded and run in local machine. To zip the file, use the command :- tar -zxvf logs.tar.gz

This will create a log.tar.gz in the files section in colab. Download the file and untar in your local system. Go to the directory which contains the unzipped logs folder and run:- tensorboard --logdir=logs/ --port=6006 to visualise the profile.

Understanding the Visualisation

Time Consumed via GPU and CPU for a batch

With these visualization we can get the insight like: time to batch normalization, relu, time required to load the data, time for MAC operations, etc. Thus we can make necessary changes to the model like using prefetch to load the data, Optimal use of dropout and batch normalization, etc.

To know more about profiling in Tensorboard, checkout the link: https://www.tensorflow.org/tensorboard/r2/tensorboard_profiling_keras

Profiling Using CProfile

C Profile Code

Importing library and profiling train_model function to understand where the system is consuming time.

C Profile Output]

The output after CProfiling is ugly, and understanding it is difficult, so the best way to interpret it is via visualizing the output. There are various free and open source tools that solve the problem like snakeviz.

Snakeviz

Install Snakeviz and run the command snakeviz <filename>, it will visualize the output of CProfile.

Visualization of Cprofile output via snakewiz
Reading individual time

Once plotted, the process taking more time occupy more space on the plot. The processing running in parallel also identified from hovering over the plots. If they run in parallel, then multiple process will get highlighted. Thus one can identify places where parallel processing can take place and can reduce the time of the program.

--

--