Image for post
Image for post
Photo by Nana Dua on Unsplash

Azure Compute GPU vs CPU DCGAN

Allan Graves
Nov 26, 2020 · 15 min read

Azure offers many, many functional areas. One thing that is quite awesome though is the ability to spin up a machine, and shut it down.

agraves@LAPTOP-I5LSJI5R:~/gitrepos/Public Azure ML$ python 07-azure-list-vmsizes.py --gpus
Only showing GPU enabled instances
{'name': 'Standard_NC6s_v3', 'vCPUs': 6, 'gpus': 1, 'memoryGB': 112.0, 'maxResourceVolumeMB': 344064}
{'name': 'Standard_NC12s_v3', 'vCPUs': 12, 'gpus': 2, 'memoryGB': 224.0, 'maxResourceVolumeMB': 688128}
{'name': 'Standard_NC24rs_v3', 'vCPUs': 24, 'gpus': 4, 'memoryGB': 448.0, 'maxResourceVolumeMB': 1376256}
{'name': 'Standard_NC24s_v3', 'vCPUs': 24, 'gpus': 4, 'memoryGB': 448.0, 'maxResourceVolumeMB': 1376256}
{'name': 'Standard_NV6', 'vCPUs': 6, 'gpus': 1, 'memoryGB': 56.0, 'maxResourceVolumeMB': 389120}
{'name': 'Standard_NV12', 'vCPUs': 12, 'gpus': 2, 'memoryGB': 112.0, 'maxResourceVolumeMB': 696320}
{'name': 'Standard_NV24', 'vCPUs': 24, 'gpus': 4, 'memoryGB': 224.0, 'maxResourceVolumeMB': 1474560}
{'name': 'Standard_NC6', 'vCPUs': 6, 'gpus': 1, 'memoryGB': 56.0, 'maxResourceVolumeMB': 389120}
{'name': 'Standard_NC12', 'vCPUs': 12, 'gpus': 2, 'memoryGB': 112.0, 'maxResourceVolumeMB': 696320}
{'name': 'Standard_NC24', 'vCPUs': 24, 'gpus': 4, 'memoryGB': 224.0, 'maxResourceVolumeMB': 1474560}
{'name': 'Standard_NC24r', 'vCPUs': 24, 'gpus': 4, 'memoryGB': 224.0, 'maxResourceVolumeMB': 1474560}
{'name': 'Standard_NV12s_v3', 'vCPUs': 12, 'gpus': 1, 'memoryGB': 112.0, 'maxResourceVolumeMB': 344064}
{'name': 'Standard_NV24s_v3', 'vCPUs': 24, 'gpus': 2, 'memoryGB': 224.0, 'maxResourceVolumeMB': 688128}
{'name': 'Standard_NV48s_v3', 'vCPUs': 48, 'gpus': 4, 'memoryGB': 448.0, 'maxResourceVolumeMB': 1376256}
Image for post
Image for post
# Provisioning errors: [{'error': {'code': 'InvalidPropertyValue', 'message': 'The specified subscription has a total vCPU quota of 4 and cannot accomodate for at least 1 requested managed compute node which maps to 6 vCPUs', 'details': []}}]
agraves@U18.04:~/gitrepos/Public Azure ML$ python 02-create-compute-cuda.py
Creating a compute cluster
Creating
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned
Running our script gets us:agraves@LAPTOP-I5LSJI5R:~/gitrepos/Public Azure ML$ python 09-azure-list-environments.py | grep -i pytorch
Name AzureML-PyTorch-1.2-CPU
Name AzureML-PyTorch-1.1-CPU
Name AzureML-PyTorch-1.0-GPU
Name AzureML-PyTorch-1.0-CPU
Name AzureML-PyTorch-1.2-GPU
Name AzureML-PyTorch-1.1-GPU
Name AzureML-PyTorch-1.3-GPU
Name AzureML-PyTorch-1.3-CPU
Name AzureML-PyTorch-1.4-GPU
Name AzureML-PyTorch-1.4-CPU
Name AzureML-PyTorch-1.5-CPU
Name AzureML-PyTorch-1.5-GPU
Name AzureML-Designer-PyTorch
Name AzureML-Designer-PyTorch-Train
Name AzureML-PyTorch-1.6-CPU
Name AzureML-PyTorch-1.6-GPU
agraves@LAPTOP-I5LSJI5R:~/gitrepos/Public Azure ML$ python 09-azure-list-environments.py --env AzureML-PyTorch-1.6-GPU
Base Docker: mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04:20201112.v1
channels:
- conda-forge
dependencies:
- python=3.6.2
- pip:
- azureml-core==1.18.0.post1
- azureml-defaults==1.18.0
- azureml-telemetry==1.18.0
- azureml-train-restclients-hyperdrive==1.18.0
- azureml-train-core==1.18.0
- cmake==3.18.2
- torch==1.6.0
- torchvision==0.5.0
- mkl==2018.0.3
- horovod==0.20.0
- tensorboard==1.14.0
- future==0.17.1
name: azureml_9d2a515d5c77954f2d0562cc5eb8a1fc
curated_env_name = 'AzureML-PyTorch-1.6-GPU'env = Environment.get(workspace=ws, name=curated_env_name)
config = ScriptRunConfig(source_directory='./src', script='hello-cuda.py', compute_target='nc6-gpu-cluster')
import torch
import os
# Number of workers for dataloader
# set to the number of cpus we are going to have access to.
workers = len(os.sched_getaffinity(0))
# Number of GPUs available. Use 0 for CPU mode.
# Since we can run this in multiple scenarios, we want
# to set this dynamically - we might suddenly have more GPUs
# or 0 GPUs.
ngpu = torch.cuda.device_count()
print("CUDA GPU? " + str(torch.cuda.is_available()))
print ("NGPU: " + str(ngpu))
print ("Workers: " + str(workers))
After variable expansion, calling script [ hello-cuda.py ] with arguments: []CUDA GPU? True
NGPU: 1
Workers: 6
Starting the daemon thread to refresh tokens in background for process with pid = 98
curated_env_name = 'AzureML-PyTorch-1.6-GPU'
env = Environment.get(workspace=ws, name=curated_env_name)
config.run_config.environment = env
script='train-dcgan_azure-cuda.py',compute_target='nc6-gpu-cluster',
parser.add_argument('--target', action='store', default="nc6-gpu-cluster", help="the compute target to run against")
start_time = time.perf_counter()
stop_time = time.perf_counter()
print(f"Finished in {stop_time - start_time:0.6f} seconds")
parser.add_argument('--cpu', action='store_true', help="Force the training to happen on the CPU")parser.add_argument('--timing', action='store_true', help="Time the training loop, with no output or logging")
# Note - if run with the --cpu option, turn off the GPU entirely by setting it to 0.
if not args.cpu:
ngpu = torch.cuda.device_count()
else:
ngpu = 0
agraves@U18.04:~/gitrepos/Public Azure ML$ python 06-dcgan_azure-cuda.py --timing
TutorialWorkspace eastus2 TutorialResourceGroup
Calling ScriptRunConfig with Arguments:
--data_path
<azureml.data.dataset_consumption_config.DatasetConsumptionConfig object at 0x7f658bdb4f28>
--timing
https://ml.azure.com/experiments/cuda-experiment/runs/cuda-experiment_1606390894_e41c31e5?wsid=/subscriptions/c14a37bd-a658-463c-9d44-9a9326fe5fbe/resourcegroups/TutorialResourceGroup/workspaces/TutorialWorkspace
agraves@U18.04:~/gitrepos/Public Azure ML$ python 06-dcgan_azure-cuda.py --timing --cpu
TutorialWorkspace eastus2 TutorialResourceGroup
Calling ScriptRunConfig with Arguments:
--data_path
<azureml.data.dataset_consumption_config.DatasetConsumptionConfig object at 0x7f189df8c7b8>
--timing
--cpu
https://ml.azure.com/experiments/cuda-experiment/runs/cuda-experiment_1606390928_95f6ca45?wsid=/subscriptions/c14a37bd-a658-463c-9d44-9a9326fe5fbe/resourcegroups/TutorialResourceGroup/workspaces/TutorialWorkspace
agraves@U18.04:~/gitrepos/Public Azure ML$ python 06-dcgan_azure-cuda.py --target d2-cpu-cluster --timing --cpu
TutorialWorkspace eastus2 TutorialResourceGroup
Calling ScriptRunConfig with Arguments:
--data_path
<azureml.data.dataset_consumption_config.DatasetConsumptionConfig object at 0x7fab05761518>
--timing
--cpu
https://ml.azure.com/experiments/cuda-experiment/runs/cuda-experiment_1606407710_a1296b65?wsid=/subscriptions/c14a37bd-a658-463c-9d44-9a9326fe5fbe/resourcegroups/TutorialResourceGroup/workspaces/TutorialWorkspace
After variable expansion, calling script [ train-dcgan_azure-cuda.py ] with arguments: [‘ — data_path’, ‘/tmp/tmpzwgr0tsp’, ‘ — timing’]After variable expansion, calling script [ train-dcgan_azure-cuda.py ] with arguments: [‘ — data_path’, ‘/tmp/tmpk_hiwq9s’, ‘ — timing’, ‘ — cpu’]
Image for post
Image for post
Standard GPU just shows how much better it is.

The Startup

Medium's largest active publication, followed by +753K people. Follow to join our community.

Allan Graves

Written by

Years of technology experience have given me a unique perspective on many things, including parenting, climate change, etc. Or maybe I’m just opinionated.

The Startup

Medium's largest active publication, followed by +753K people. Follow to join our community.

Allan Graves

Written by

Years of technology experience have given me a unique perspective on many things, including parenting, climate change, etc. Or maybe I’m just opinionated.

The Startup

Medium's largest active publication, followed by +753K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store