Profiling Keras Model Using TFprofile and CProfile
While exploring some challenges in Machine learning I stumbled upon a challenge by Stanford called the Dawn Benchmark. This Challenge was to get 94% of test accuracy in the least amount of training time and cost. There are three challenges to this problem: Getting the model right(Being Accurate), in the record time(Being Fast) and with the limited cost(Being Efficient). Here i will be showing the approach on how to be fast. To do so you first need to understand where the network is consuming time and then make necessary changes to reduce time.
Thus to timestamp the code I will be using TFProfile and CProfile.
Model Used…
Profiling Using TFProfile
Tensorflow provides its own profiling module called TFProfile. This module allows to record time for each operation with ease. The visualisation can be done using tensorboard.
Understanding tf.keras.callback.TensorBoard Parameters
Here, we have used two parameters i.e.: log_dir and profile_batch. log_dir provides the location of the logging directory and profile_batch is to select which batch to profile, by default it is 2nd batch because 1st batch take more time as compared to the rest due to various initialisation.
There are various other parameters which can be added to the function call based on the requirement. For more details over the function and its parameter check out:- https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard
As Colab currently does not run tensorboard, hence to visualise it,the log file has to be zipped,downloaded and run in local machine. To zip the file, use the command :- tar -zxvf logs.tar.gz
This will create a log.tar.gz in the files section in colab. Download the file and untar in your local system. Go to the directory which contains the unzipped logs folder and run:- tensorboard --logdir=logs/ --port=6006
to visualise the profile.
Understanding the Visualisation
With these visualization we can get the insight like: time to batch normalization, relu, time required to load the data, time for MAC operations, etc. Thus we can make necessary changes to the model like using prefetch to load the data, Optimal use of dropout and batch normalization, etc.
To know more about profiling in Tensorboard, checkout the link: https://www.tensorflow.org/tensorboard/r2/tensorboard_profiling_keras
Profiling Using CProfile
Importing library and profiling train_model function to understand where the system is consuming time.
The output after CProfiling is ugly, and understanding it is difficult, so the best way to interpret it is via visualizing the output. There are various free and open source tools that solve the problem like snakeviz.
Snakeviz
Install Snakeviz and run the command snakeviz <filename>, it will visualize the output of CProfile.
Once plotted, the process taking more time occupy more space on the plot. The processing running in parallel also identified from hovering over the plots. If they run in parallel, then multiple process will get highlighted. Thus one can identify places where parallel processing can take place and can reduce the time of the program.