On-premise (DIY) vs Cloud GPU

Earlier my first story did a very simple comparison between CPU vs GPU, for this story I will expand a bit more of comparing DIY GPU vs cloud one in term of speed and price comparison.

Currently I will cover Amazon and Alicloud, Amazon is using NVIDIA K80 with the instance type p2.xlarge while Alicloud is using NVIDIA P100 with the instance type ecs.gn5-c4g1.xlarge. Finally the on-premise DIY one is Nvidia 1070 GTX.

Basically the test is using Fast Style Transfer which take advantage of the GPU for processing. The test will use rain-princess.ckpt to process an JPEG with 4242 × 2828 resolution and the command as follow

time python evaluate.py — checkpoint ./rain-princess.ckpt — in-path lesu-0596.jpg — out-path lesu-0596-style.jpg

A quick note about Amazon is we need to apply for the limit increase, else we will not able to start the gpu instance !

Alicloud is quite a hassle to change the instance type as we need to release and re-create the instance. As for Amazon is quite straight forward without destroying the instance and hopefully Alicloud will improve on this overtime. This is quite advantage when we have scripts that will only change the instance type for deep learning and not running it 24x7 as it quite costly for GPU instance type in cloud.

nvidia-smi command for Amazon output

Alicloud as follow

The timing for each platform as follow :-

As for pricing, Amazon has the following pricing tier

Currently Alicloud doesn’t have subscription based / reserved pricing for their gpu

Rough estimate for the pricing as follow assuming we only use the instance for 12 hours a day for 30 days

As compare with DIY approach that has initial investment of USD1305.33 without electrical cost which could possible recover back in 2 months time. But then the build I am using is Core i3 vs the cloud that has better CPU.

Finally I guess the take is we always try to do the training in-premise to save cost before we use cloud option that can cost more. The other advantage of on-premise is we are not sharing resources with other people. I always believe in hybrid approach to maximise both performance and costing something like Nvidia Cloud is offering :-