CNN-based classification applied to concrete bridge crack images

Hajar Zoubir
6 min readApr 24, 2022

--

Photo by Amanda Forrest on Unsplash

Cracks in bridges

Cracks in reinforced concrete bridges occur due to several deterioration mechanisms related to mechanical (e.g. loading conditions), physical (e.g. drying shrinkage), chemical (e.g. alkali-aggregate reactions), and thermal (e.g. temperature gradients) factors. Their presence is hazardous to the bridge's durability and sometimes can be a sign of structural deficiency.

Identifying, localizing, and quantifying this type of damage is paramount to assess the condition of structural elements (e.g. girders and piers) and understand the possible effects of cracks on the bridge's structural reliability. To this end, bridge elements are regularly examined by trained inspectors who record the extent and severity of existing defects based on established standards and guidelines.

Bridge inspection is sometimes conducted using Unmanned Aerial Vehicles (UAVs) to access hard-to-reach areas of the bridge. However, this practice requires the automation of processing and analyzing a considerable amount of image data generated by mounted sensors (e.g. optical and thermal cameras) to efficiently detect cracks and other defects and evaluate the condition of the inspected bridge.

Crack detection automation

Several techniques have been applied to automate crack detection in concrete images ranging from traditional Image Processing Techniques (e.g. edge detectors) to Deep Learning models (e.g. Convolutional Neural Networks, a.k.a CNNs). I will write more about these techniques and compare their performance in the crack detection task in future posts.

As features (i.e. representative properties that describe an object in an image) are automatically learned from a set of training data in a CNN-based learning framework, these deep networks have become the solution of choice among researchers for crack detection in concrete images.

Generally, the architecture of a CNN consists of Convolutional layers where features are extracted from a set of labeled data (e.g. cracked and uncracked images), pooling layers that reduce the number of learnable parameters (i.e. weights), fully connected layers that map the flattened features to the Softmax layer where target class probabilities are computed. Activation functions (e.g. Rectified Linear Unit) are used to increase the non-linearity of the models. The learning scheme of these networks is based on optimizing, through backpropagation, a loss function (e.g. Binary Cross-Entropy loss) that measures the error between the predicted outputs and ground truth targets.

Example of the VGG16 model architecture

Crack image classification using a CNN

The rest of this post focuses on the application of a CNN for crack image classification.

So let’s dive into some code!

First of all, the images I am using are extracted from this dataset where we shared more than 6900 cracked and uncracked images of concrete bridges following our participation in the 6th International Conference of Engineering Against Failure.

I am building a dataset of 1304 and 1806 cracked and uncracked images respectively, and it is randomly split into training (70%), validation (10%), and testing (20%) sets.

For model implementation, I am using Google Colaboratory with the 12GB NVIDIA Tesla K80 GPU provided by the platform, Pytorch, and the necessary libraries for data preprocessing, model training, and result visualization.

from torchvision import datasets, models, transformsimport torchimport torch.nn as nnimport torch.optim as optimimport numpy as npimport matplotlib.pyplot as pltimport timeimport osdevice = 'cuda' if torch.cuda.is_available() else 'cpu'

Since my dataset is small, I am using a pre-trained VGG16 model with imageNet weights and I am adjusting the number of classes of the last layer to two (since we have two target classes). This is one of the most common Transfer Learning techniques applied to transfer the knowledge of a source domain (large datasets like ImageNet) to a specific domain (low-scale datasets like our concrete crack dataset).

model = models.vgg16(pretrained=True)num_ftrs = model.classifier[6].in_featuresmodel.classifier[6] = nn.Linear(num_ftrs,2)input_size = 224

I am also applying some data augmentation techniques to avoid overfitting.

data_transforms = {'train': transforms.Compose([transforms.RandomVerticalFlip(),transforms.RandomHorizontalFlip(),transforms.RandomRotation(45),transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),'val': transforms.Compose([transforms.Resize(input_size),transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),}image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")data_transforms_test = {'test': transforms.Compose([transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}dataset_test = {'test' : datasets.ImageFolder(os.path.join(data_dir, 'test'), data_transforms_test['test'])}dataloader_test = {'test': torch.utils.data.DataLoader(dataset_test['test'], batch_size = 1, shuffle=False, num_workers=4)}

I am keeping all the learning parameters frozen and only retraining the last convolutional and fully connected layers that are more sensitive to the target dataset.

for param in model.parameters():param.requires_grad = Falsefor param in model.classifier[3].parameters():param.requires_grad = Truefor param in model.classifier[0].parameters():param.requires_grad = Truefor param in model.features[28].parameters():param.requires_grad = True

Defining the loss function and optimizer (I am using the Stochastic Gradient Descent optimizer)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(params_to_update, lr=0.001, momentum=0.9)

Defining training and testing functions

#Training function : def train_model(model, dataloaders, criterion, optimizer, num_epochs):since = time.time()val_acc_history = []train_acc_history = []val_loss_history = []train_loss_history = []best_model_wts = copy.deepcopy(model.state_dict())best_acc = 0.0for epoch in range(num_epochs):print('Epoch {}/{}'.format(epoch, num_epochs - 1))print('-' * 10)for phase in ['train', 'val']:if phase == 'train':model.train()  else:model.eval()   running_loss = 0.0running_corrects = 0for inputs, labels in dataloaders[phase]:inputs = inputs.to(device)labels = labels.to(device)optimizer.zero_grad()with torch.set_grad_enabled(phase == 'train'):outputs = model(inputs)loss = criterion(outputs, labels)_, preds = torch.max(outputs, 1)if phase == 'train':loss.backward()optimizer.step()running_loss += loss.item() * inputs.size(0)running_corrects += torch.sum(preds == labels.data)epoch_loss = running_loss / len(dataloaders[phase].dataset)epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))if phase == 'val' and epoch_acc > best_acc:best_acc = epoch_accbest_model_wts = copy.deepcopy(model.state_dict())if phase == 'val':val_acc_history.append(epoch_acc)val_loss_history.append(epoch_loss)if phase == 'train':train_acc_history.append(epoch_acc)train_loss_history.append(epoch_loss)print()time_elapsed = time.time() - sinceprint('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))print('Best val Acc: {:4f}'.format(best_acc))model.load_state_dict(best_model_wts)return model,(train_acc_history,val_acc_history,train_loss_history,val_loss_history)#Testing function def check_accuracy(loader, model):num_correct = 0num_samples = 0model.eval()with torch.no_grad():for x, y in loader:x = x.to(device=device)y = y.to(device=device)scores = model(x)_, predictions = scores.max(1)num_correct += (predictions == y).sum()num_samples += predictions.size(0)print(f'Got {num_correct} / {num_samples} with accuracy {float(num_correct)/float(num_samples)*100:.2f}')model.train()

Training the model for 20 epochs

model, hist = train_model(model, dataloaders_dict, criterion, optimizer, num_epochs=20)

And testing the model


check_accuracy(dataloader_test['test'], model)

After testing the trained model on the test set, I got a 96.15% classification accuracy.

I then visualized the prediction results of sample crack images :

Finally, to check if the model has learned the right features for classification, I used the GradCAM interpretation technique to visualize the regions of images that contributed to the classification result :

Cheers :)

--

--

Hajar Zoubir

Senior Civil Engineer and Bridge Inspector with a PhD. Blending technical expertise with a passion for cutting-edge technology, particularly in Machine Learning