The Dog Pooping in My Yard Transformer Detectron PyTorch Tutorial
Part 1
The Problem
The problem is some owners do not pick up their dog poop. I believe this is not a problem isolated to where I live, in the bay area, but is ubiquitous to the world. To combat this problem, We are going to develop the Dog Pooping in my yard Transformer Detectron.
This is a 3 part tutorial wherein:
- Part 1 gets something working using transformers via the Timm library
- Part 2 unpacks what is going on under the hood with a code-first approach.
- Part 3 will be putting this into production so we can catch dogs pooping in my yard in the wild
Here is a link to Google Colab and Github.
This tutorial assumes a working knowledge of PyTorch and the tutorial series will focus on learning Transformers.
The Data
Fortunately for us, we can simply wget
and unzip
some images I scrapped. The zip has two folders, one called `poop`, one called `no-poop`. There are 19 pooping images and 17 not-pooping. We’ll save 4 for validation which leaves us 32 images for training.
!wget https://github.com/matthewchung74/blogs/raw/main/data.zip !unzip -qq -n data.zip -d data
Now you may be wondering, how can we train off 32 images? If we need more images, I’m in trouble since I’m tired of downloading images of dogs pooping. Well, lucky for us, the answer is below. Just look for pretraining
in the TIMM section.
Let’s take a look at the data. We’re going to create a dataset, split, do some transforms, and print out a batch.
skipping a bunch for brevity. refer to the Colab above for complete source
class PoopDataset(Dataset):
def __init__(self, file_list, labels, transform=None):
self.file_list = file_list
self.labels = labels
self.transform = transform
def __len__(self):
self.filelength = len(self.file_list)
return self.filelength
def __getitem__(self, idx):
img_path = self.file_list[idx]
img = Image.open(img_path)
img_transformed = self.transform(img)
label = self.labels[idx]
return img_transformed, label
fig, axes = plt.subplots(2, 2, figsize=(8, 8))
images, labels = next(iter(train_loader))
for idx, ax in enumerate(axes.ravel()):
img = transforms.ToPILImage()(images[idx])
title = "poop" if labels[idx] == 1 else "no poop"
ax.set_title(title)
ax.imshow(img)
Aren’t Transformers for NLP?
Nope. This paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale changed all that. We will get more into the details in future parts. This tutorial is all about quickly being able to detect pooping dogs which you can take and apply to your own problem, such as cats pooping in your yard!
What is TIMM
Timm is the open-source library we’re going to use to get up and running. If you have time, check out his repo. It is amazing. In a nutshell, it is a library of SOTA architectures with pre-trained weights.
- First, let’s
pip
install it. - Then let’s look at all the models, using the method
list_models,
that has pre-trained weights, and match the wildcardvit
which stands for the visual transformer. We will pick the one we're interested in,vit_base_patch16_224
- Let’s look at the name of the model for a second. patch16 means there are
16
patches. We’ll dive more into this later, but to save on resources, the algorithm divides the image into16
squares and process them separately.224
is the size of the image we’re training on. If you look at the GitHub notebook, you’ll see the images are resized to224.
- Let’s print out a summary.
!pip install timm
import timm from pprint import pprint model_names = timm.list_models('*vit*', pretrained=True)
pprint(model_names)
model = timm.create_model('vit_base_patch16_224', pretrained=True) model = model.to(device)
pprint(model)output:
(head): Linear(in_features=768, out_features=1000, bias=True)
Now this summary above is large, and we’ll cover what it all means later. What is important now is the last time (head)
. This is the last layer which classifies into 1000 different classes. We only have 2, is it a pooping dog or not. So let's change that.
Standard training stuff
Now we’re going to set up our criterion
optimizer
scheduler
like any other PyTorch project. We’re just switching the model out for another one. Then we’re going to run in a standard training loop.
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = StepLR(optimizer, step_size=1, gamma=gamma)for epoch in range(epochs):epoch_loss = 0
epoch_accuracy = 0for data, label in tqdm(train_loader):
data = data.to(device)
label = label.to(dtype=torch.long, device=device)
output = model(data)
loss = criterion(output, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
acc = (output.argmax(dim=1) == label).float().mean()
epoch_accuracy += acc / len(train_loader)
epoch_loss += loss / len(train_loader)
with torch.no_grad():
epoch_val_accuracy = 0
epoch_val_loss = 0
for data, label in valid_loader:
data = data.to(device)
label = label.to(dtype=torch.long, device=device)
val_output = model(data)
val_loss = criterion(val_output, label)
acc = (val_output.argmax(dim=1) == label).float().mean()
epoch_val_accuracy += acc / len(valid_loader)
epoch_val_loss += val_loss / len(valid_loader)
)
The results
You’ll see after 3 epochs, we’re getting pretty good accuracy.
Check out part 2 to see how this works under the hood with a code-first approach.
references: