Introduction to Deep Learning: Pytorch
How can we bring AI to life? Creating neural networks using PyTorch!
Through time there have been a variety of deep-learning frameworks: Theano, Torch, MXNet… the list goes on, but from all of them arose the big-two giants — TensorFlow and PyTorch. TensorFlow was once the all mighty, unanimously popular framework of choice, however now the era of PyTorch begins as more and more people move towards it.
PyTorch provides an ultra-flexible way to progressively code our network.
It’s simple, it’s robust and best of all, it’s elegant!
Creating and training neural networks with PyTorch happens like the general structure of any data science project:
- Start with data — Creating dataset classes and defining transformations
- Create models — Defining the architecture (i.e. layers, loss and optimiser used) and whether to use a pretrained network
- Training — Stitching everything together and then evaluating it
We will be discussing this with a focus on images, but remember that these same techniques (often further refined) are used to process textual and video data!
The first step on any data science project is ensuring you can load in your data.
To do so, we need to describe in our code how we read the data and what transformations you’re going to use.
With PyTorch handling the flow of data is split into three distinct components:
- Data class — How to read the data
- Transformations — How to modify the data
- Data loader — Reads the data, as specified by the data and transformation classes
Note that each of these components has a distinct role. The roles are quite vague to ensure that we can read any data in the exact, same way! Hence, unlike with frameworks like Tensorflow, we don’t need specialised classes to handle different file types. Instead we just create a data class specifically for our scenario at-hand (or even multiple, using any tools/libraries we wish) and PyTorch’s built-in dataloader will do the rest of the heavy-lifting (like multiprocessing) for us.
def __init__(self, image_paths):
self.image_paths = image_pathsdef __len__(self):
return len(self.image_paths)def __getitem__(self, indices):
indices = indices.tolist()
image_path = self.image_paths[indices]
What’s perhaps even more important than how we read in our data, is how we process it before it reaches our model.
This is what we call transformations (which we discussed in-depth last-time, link here to previous article).
We can tweak these to try and improve our results (please try, it’ll give an intuitive gauge of how things come together).
data_transform = transforms.Compose([
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
When it comes to the data loaders, we can specify several arguments which dictate how much data we process at once. The important ones are the number of workers (processes) we use simultaneously, and the batch-size (number of samples processed at a time).
Practical Advice: Normally you set the number of workers to be your core count and batch size to be somewhere between 4–16 (depending on GPU VRAM and input size/resolution).
training_loader = torch.utils.data.DataLoader(training_dataset, batch_size=32, shuffle=True)
validation_loader = torch.utils.data.DataLoader(validation_dataset, batch_size=32, shuffle=False)
The process we go through to train our network:
- Predict output
- Compare prediction to answer (loss function)
- Improve (back propagate)
for _, data in enumerate(train_dataloader, 0):
inputs, labels = data# Zero the parameter gradients
optimizer.zero_grad()# Forwards, backwards and optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
There are also a few decisions to be made ahead of time. First and most-importantly, the loss-function which dictates how to compare a prediction with its real value numerically (i.e. to gauge how good/bad the model fairs). Mean Square Error is normally used for regression problems (predicting a number), whilst Cross Entropy Loss for classification (attaching labels). Next, what optimiser to use (i.e. the procedure used to improve results), normally just Adam (for its speed, solid go-to choice) or SGD (for its accuracy).
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
On-top of this we tend to log the progress/state of our network as we go (i.e. at the end of the epoch). We can log anything from loss to metrics like accuracy and F1 scores (several more-sophisticated metrics exist).
It is common to just print out the results, but tools like Weights and Biases can additionally be used to provide better visualisation of results (extremely useful for any deep-learning project).
Lastly, note that we often write a very similar loop for validation!
In our example here we’ve left out any fancy code to track/log metrics (which in reality you should do), but instead focused on the fact that in validation we don’t adjust the model in any way. For more background on validation, check out the previous article.
for _, data in enumerate(validation_dataloader, 0):
model.eval() # puts the model in validation mode
inputs, labels = data# Forwards, backwards and optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
Note there that it is absolutely normal to not understand *everything* just mentioned above. We’ve only just learnt the basics, and how everything ties together can be difficult to interpret first-time (let-alone how to push it further with more custom code). If so, remember that practice makes perfect!
Try out all the code above. Once you feel slightly familiar with it, go through it all once again, this time tweaking whatever you’re intrigued or more-likely bamboozled by. It’s a lot to take in, but using PyTorch only gets easier from here!