Deploy a GPU Server for AI Workloads with IBM Cloud Schematics

Published in

Towards Generative AI

5 min readAug 11, 2023

In recent years, people have seen the proliferation of GPUs, which allow us to enjoy the most graphically demanding games on personal computers, and to shoot stunning photos and videos with mobile devices. With its vast parallel processing capabilities, initially used for graphics rendering, GPU has evolved to increasingly support general-purpose computing. CUDA is a parallel computing platform and programming model developed by NVIDIA that leverages the power of their GPUs for general-purpose computing. It helps developers speed up their applications by harnessing the power of GPU accelerators. The development and wide adoption of NVIDIA GPU and CUDA have played pivotal roles in the rapid advancement of several application areas, including Machine Learning (ML) and Deep Learning (DL) with the rise of Generative AI and ChatGPT.

IBM Cloud Schematics is a free service offered by IBM Cloud that provides Infrastructure as Code (IaC) capabilities to automate the provisioning and management of resources on IBM Cloud. It builds on open-source Terraform and Ansible to provide a powerful set of IaC tools as a service to program your cloud infrastructure. Schematics Workspaces delivers Terraform as a service capabilities for users to rapidly build, duplicate, and scale complex, multitiered cloud environments.

In this tutorial, we will walk through the steps to deploy a CUDA-capable GPU server on IBM Cloud using Schematics Workspaces.

Prerequisites

IBM Cloud account with appropriate permissions

Choose the type of GPU server

There are a number of options for deploying GPU servers on IBM Cloud. You could choose a virtual server for classic, a virtual server for VPC, or a bare-metal server. Each of them provides several GPU compute options. For the sake of this tutorial, we’ll deploy a virtual server instance (VSI) for VPC, which comes with per-second billing and a suspend billing feature, perfect for temporary use with no commitment.

Create a workspace

Before you can create the workspace, make sure that you have the required access to create and work with Schematics Workspaces. You would also require permissions to create and use VPC infrastructure resources.

Workspace settings define the Terraform template to be used, along with any input variables for customizing the template. A Terraform template consists of one or more Terraform configuration files that declare the desired state for your resources. A Terraform template has been created in a GitHub repository, ready to be used for this tutorial. You can find more information here on how to write your own Terraform templates for Schematics Workspaces. However, creating Terraform templates is beyond the scope of this tutorial.

Now you have everything you need to create a new workspace for deploying a GPU-enabled VSI for VPC. The simplest way to do that is to use the web UI:

a) Log in to IBM Cloud Console

b) Go to Schematics > Workspaces > Create workspace. In Specify template section, provide the URL to the Terraform template on GitHub, and select a Terraform version that works for the template. Click Next to go to the next step.

c) In the Workspace details section, enter a name for the workspace, select a Resource Group and a Location to create the workspace in. Click Next to go to the next step.

d) Review the settings, and click Create when ready.

Customize the template

The Terraform template provides a set of variables for you to customize by setting the override values. Once the workspace is created, you can find them on the workspace Settings page. If you don’t set the override value for a variable, the default value (if exists) would be used. The override value must be set for a variable without a default value.

NOTE:

At least one VPC SSH key must be available for accessing the VSI
Not all GPU profiles are available in all regions. The VPC VSI profile gx2–8x64x1v100 comes with a single NVIDIA V100 16GB GPU, which is one of the CUDA-enabled datacenter products from NVIDIA.

Generate a workspace plan

A workspace plan performs a Terraform plan to determine the IBM Cloud resources that will be created, modified or deleted when running the workspace apply operation. Click Generate plan to start a plan job against your workspace. You can use the plan summary logs to verify any resource changes before the template is applied. If the plan job runs successfully, you’ll see something like the following on the Jobs page.

Perform a workspace apply

Once the workspace plan is generated and you’ve verified the plan details, it’s time to run an apply job against your workspace. A workspace apply job provisions, modifies, or removes the IBM Cloud resources as described in the Terraform template that you’ve pointed the workspace to. Click Apply plan to start the apply job. If all goes well, it should only take a few minutes for the apply job to finish the deployment of all the resources, including a GPU-enabled VSI for VPC.

You can find the list of the deployed resources on the Resources page, with clickable links for some of them. You can click on those links to get more details about the resources.

Access the GPU server

Now the GPU server is up and running. Before you can connect to it though, you need to find its public IP address, either by clicking vsi on the Resources page, or by looking for public_ip near the bottom of the logs of the apply job. Once you have it, you should be able to SSH into the server with the following command.

ssh -i ~/.ssh/<YOUR_SSH_PRIVATE_KEY> root@<PUBLIC_IP_ADDRESS>

What’s next

In this tutorial, you’ve learned how to deploy a GPU server from a Terraform template in just a few simple steps, by leveraging IBM Cloud Schematics Workspaces’ IaC capabilities, with no Terraform coding skills required. In the next tutorial, you’ll learn how to prepare the GPU server by installing and configuring GPU drivers, libraries, tools and other software components, needed for running various AI workloads.