I’ve already published an independent review by comparing CatBoost CPU vs GPU training time with a 10M rows dataset. Read it here. When training a CatBoost model with such a large data set, computer hardware like CPU, GPU and RAM work at their peak performance.
It’s worth collecting and visualizing hardware stats while the model is running. For this, there is a Python library called lazyprofiler. You can easily install it by running the following command on the Anaconda prompt.
pip install lazyprofiler
Then, all you need to do is insert the following code block into your program.
The first two lines of the code should be included before the training of the model. The last two lines should be included after the training of the model. Then you will get hardware stats during the model training.
Let’s try it out to see what type of output we’ll get.
Collecting hardware stats: Training on CPU
By looking at the graphs, we can monitor the following things.
- CPU is working at its best during the training.
- RAM usage is 100% in the beginning, but suddenly decreases and remains constant between 75–80%.
- GPU is not used during the training.
- The temperate increases over time.
- It takes about 110 seconds to train…