How to Run Your Jupyter Notebook on a GPU in the Cloud

Published in

Coiled

4 min readOct 11, 2023

You can often significantly accelerate the time it takes to train your neural network by using advanced hardware, like GPUs. In this example, we’ll go through how to train a PyTorch neural network on a GPU in the cloud using Coiled notebooks.

You can also watch this demo on YouTube to follow along.

Snippet of using `coiled notebook start` to show how to run a Jupyter notebook on a GPU.

Start your Jupyter notebook on a GPU

You can use Coiled notebooks to start a JupyterLab instance on a GPU-enabled VM in the cloud.

coiled notebook start \
    --vm-type g5.xlarge \
    --container coiled/gpu-examples:latest \
    --region us-west-2

Screencast of using `coiled notebook start` to show how to start a jupyter notebook on a GPU.

We used a few different arguments:

--vm-type g5.xlarge to request a g5.xlarge AWS EC2 instance, which has 1 GPU with 24 GiB of memory.
--container coiled/gpu-examples:latest to use this publicly available Docker image with the necessary packages installed like CUDA, PyTorch, and Optuna (see the Dockerfile for details).
--region us-west2 to start the VM in the US West (Oregon) AWS region. We find GPUs are usually easier to get there.

See our documentation for more details.

Define the PyTorch neural network

Now that we have a notebook running, we can define the model. We modified this example from the Optuna examples GitHub repo.

In this example, we optimize the validation accuracy of fashion product recognition using PyTorch and the FashionMNIST dataset. We optimize the neural network architecture as well as the optimizer configuration. For demonstration purposes, we use a subset of the FashionMNIST dataset.

import os
import optuna
from optuna.trial import TrialState
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
from torchvision import datasets, transforms

BATCHSIZE = 128
CLASSES = 10
EPOCHS = 10
N_TRAIN_EXAMPLES = BATCHSIZE * 30
N_VALID_EXAMPLES = BATCHSIZE * 10

def define_model(trial):
    # We optimize the number of layers,
    # hidden units and dropout ratio in each layer.
    n_layers = trial.suggest_int("n_layers", 1, 3)
    layers = []

    in_features = 28 * 28
    for i in range(n_layers):
        out_features = trial.suggest_int("n_units_l{}".format(i), 4, 128)
        layers.append(nn.Linear(in_features, out_features))
        layers.append(nn.ReLU())
        p = trial.suggest_float("dropout_l{}".format(i), 0.2, 0.5)
        layers.append(nn.Dropout(p))

        in_features = out_features
    layers.append(nn.Linear(in_features, CLASSES))
    layers.append(nn.LogSoftmax(dim=1))

    return nn.Sequential(*layers)

def get_mnist():
    # Load FashionMNIST dataset.
    train_loader = torch.utils.data.DataLoader(
        datasets.FashionMNIST(
            os.getcwd(), train=True, download=True,
            transform=transforms.ToTensor()),
        batch_size=BATCHSIZE,
        shuffle=True,
    )
    valid_loader = torch.utils.data.DataLoader(
        datasets.FashionMNIST(
            os.getcwd(), train=False, transform=transforms.ToTensor()),
        batch_size=BATCHSIZE,
        shuffle=True,
    )

    return train_loader, valid_loader

def objective(trial):
    # requires a GPU to run
    DEVICE = torch.device("cuda")

    # Generate the model.
    model = define_model(trial).to(DEVICE)

    # Generate the optimizers.
    optimizer_name = trial.suggest_categorical(
        "optimizer", ["Adam", "RMSprop", "SGD"])
    lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
    optimizer = getattr(optim, optimizer_name)(model.parameters(), lr=lr)

    # Get the FashionMNIST dataset.
    train_loader, valid_loader = get_mnist()

    # Training of the model.
    for epoch in range(EPOCHS):
        model.train()
        for batch_idx, (data, target) in enumerate(train_loader):
            # Limiting training data for faster epochs.
            if batch_idx * BATCHSIZE >= N_TRAIN_EXAMPLES:
                break

            data, target = data.view(data.size(0), -1).to(DEVICE), \
                           target.to(DEVICE)

            optimizer.zero_grad()
            output = model(data)
            loss = F.nll_loss(output, target)
            loss.backward()
            optimizer.step()

        # Validation of the model.
        model.eval()
        correct = 0
        with torch.no_grad():
            for batch_idx, (data, target) in enumerate(valid_loader):
                # Limiting validation data.
                if batch_idx * BATCHSIZE >= N_VALID_EXAMPLES:
                    break
                data, target = data.view(data.size(0), -1).to(DEVICE), \
                               target.to(DEVICE)
                output = model(data)
                # Get the index of the max log-probability.
                pred = output.argmax(dim=1, keepdim=True)
                correct += pred.eq(target.view_as(pred)).sum().item()

        accuracy = correct / min(len(valid_loader.dataset), N_VALID_EXAMPLES)

        trial.report(accuracy, epoch)

        # Handle pruning based on the intermediate value.
        if trial.should_prune():
            raise optuna.exceptions.TrialPruned()

    return accuracy

Optimize with Optuna

We’ll train the model and use Optuna to find the parameters that result in the best model predictions. We train the model five times with n_trials=5, using different sets of parameters.

import optuna

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=5, timeout=600, show_progress_bar=True)

This took about 25 seconds to run. We can scale this up and run 100 models, which takes 4 minutes 20 seconds.

study.optimize(objective, n_trials=100, timeout=600, show_progress_bar=True)

Now we can analyze the results to find the best set of parameters.

pruned_trials = study.get_trials(deepcopy=False, states=[TrialState.PRUNED])
complete_trials = study.get_trials(deepcopy=False, states=[TrialState.COMPLETE])

print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))

Which returns the following output:

Study statistics: 
  Number of finished trials:  100
  Number of pruned trials:  61
  Number of complete trials:  39
Best trial:
  Value:  0.84609375
  Params: 
    n_layers: 1
    n_units_l0: 109
    dropout_l0: 0.3822970315388142
    optimizer: Adam
    lr: 0.007778083042789732

Looks like the best objective value for training our model 100 times is 0.846.

Next steps

In this example, we used Coiled notebooks to run a simple PyTorch model in a Jupyter notebook on a GPU in the cloud. It cost ~$0.10 and took ~4 minutes to train the model 100 times.

If you’d like to run this example yourself, you can get started with Coiled at coiled.io/start. This notebook is available in the coiled/examples repo and runs well within the Coiled free tier (though you’ll still need to pay your cloud provider).

Originally published at https://blog.coiled.io.

How to Run Your Jupyter Notebook on a GPU in the Cloud

Start your Jupyter notebook on a GPU

Define the PyTorch neural network

Optimize with Optuna

Next steps

Written by Sarah Johnson