Build Exceptional Deep Learning Models Faster by Using Detailed Visuals of Neural Networks

Case studies on new visualization software to identify problems sooner and reduce guesswork when auditing datasets and the architecture of complex models.

Published in

Zetane

12 min readJan 18, 2021

Display of feature maps and a tensor with its associated histogram for a convolutional layer of the U-Net image segmentation model, shown in the Zetane Engine.

You train your artificial neural network for a few hours, watching as tens of epochs pass and the loss decreases beautifully. But when it comes time to test your model, it fails! How unexpected. You note that the error — whether it be a detection, classification, or segmentation error — is high and the precision and accuracy are low. You wonder what the problem is and how to solve it. If this situation sounds familiar, you are not alone. Here we present a collection of case studies on this topic, focusing on how we overcame lackluster results by using new tools to inspect the internal metrics and architecture of neural networks and related datasets.

Our discussion here will focus on segmentation tasks for biomedical images, where we employ a suite of software tools in the Zetane Engine to inspect the internal components of complex neural networks. For the cases involving the U-Net model and CT scans of lungs, you can conduct the same visual audits shown here using the free Zetane Viewer for visualizing machine learning models and the trained U-Net model, both are available at our Gallery of models. We at Zetane Systems designed the Engine to help us gain a better understanding of proverbial black-box algorithms when building AI solutions for industrial processes. One aim of these projects was to assess how we can streamline common tasks in machine learning by analysing visuals of items such as feature maps and tensor histograms. Prior to this project, we were unsure of the benefits these visual techniques could provide towards optimizing deep learning models. We were pleasantly surprised by the utility and insights gained from this novel approach, so we aim to raise greater awareness among machine learning professionals about our observations. These case studies also aim to ignite a hearty discussion on how experts in the data science and machine learning communities can further strengthen the techniques outlined here and how you employ them best in your daily workflows.

Before diving into our exposé, we note that anyone can conduct the detailed visual assessments shown here by employing a diversity of software tools and skillful Python coding. We chose to use the Zetane Engine for practical reasons since this software contains standard features to assess the training process and test-time performance, as well as the design, implementation, and elements to debug in model architecture — all in one digital workspace. To familiarize yourself with the workspace and its display of neural networks, consider viewing this short tutorial video.

Tutorial video about inspecting tensors and the internal metrics of neural networks using the Zetane Engine.

Part 1) Searching for weaknesses in a dataset

We all know that verifying the quality of our image dataset is one of the most important steps when developing a deep learning model. It is crucial that we feed the correct data to the network. Basic data-related questions to ask yourself when developing a new model include:

Are the shape of the input data and related tensors set correctly?
Are the height and width of images adjusted right?
Did we compromise the data when resizing an image with common image-processing libraries?
Did we normalize the data correctly?
Should the intensity values of the input image be in the range [0, 1], [-1, 1] or [0, 255]?
Did the pre-processing step work as it should?
Should we expect a specific data distribution for the inputs?
Does the data-generator work correctly? Does the implemented data augmentation work fine?

Figure 1) An example of a visualization of a 4D tensor in the Zetane Engine with additional statistical measures from a previous project unrelated to this case study.

We made the above data-related assessments using the Zetane Engine. This included inspecting the data distribution and statistical measures (such as min., max., mean and standard deviation) with a few clicks of a mouse (see figure one for an example of the displayed statistics from an unrelated project). Our focus then turned to visuals of 3D, 4D and 5D data to look for any potential abnormalities.

To set the foundation for the main project involving segmenting CT scans of lung tissue to diagnose COVID-19 morbidities, we began by testing our capacities by using a simpler dataset comprised of x-ray images of lungs; we aimed to train a U-Net model [1] to segment typical lung tissue from pulmonary pathologies. This segmentation task proved to be quite easy, where we obtained a Dice score of 0.97, which was the best result for the analysed dataset. If unfamiliar to our readers, a Dice score is a common metric to evaluate segmentation tasks, defined as 2(A∪B)/A∩B, where A and B are the ground truth and predicted masks, respectively. A Dice score of 1 represents the complete alignment of the masks, and 0 signifies no overlap between the masks. Having obtained state-of-the-art results with this dataset, we assumed the performance of our model to be top-notch. Further scrutiny showed that our assumptions were wrong.

A quick way to identify bias in our dataset

To conduct routine assessments of our project, we scrutinized our model in the Zetane Engine. From the first convolution layer to the last, we saw that feature maps contained an unusual and strong activation at the top-left corner of an exemplary image in the dataset. We realized it superimposed on a white tag added to the x-rays as a means to hide confidential patient information (Fig. 2). Probing further we identified additional images that contained similar tags; these too had the same strong activation in our model. Since tags should be irrelevant towards segmenting lung images based on pathologies, our observations suggested that the initial model contained a notable flaw.

Figure 2) Left: an exemplary image in the dataset with a prominent tag in the top-left corner. Right: Feature maps show an unusual and strong activation in the top-left corner, being the location of a tag used to anonymise the images of our dataset.

Armed with this insight, we cleaned the dataset by removing the tags and retrained the model with these improved images (inpainting techniques removed the tags as expected). We hypothesized that removing the tags would help the model to refocus on the relevant textures and patterns within the x-rays. On average, this optimization task did indeed improve the Dice score by 1%. For a handful of images, the number of objects identified was more than two using the old model. With our new and improved model, we had better results in terms of the number of detected objects in the dataset. A typical result appears in figure three. Note the richer, less skewed and broader histogram for a given feature map in the new model.

Figure 3) The predicted mask. Left: using the unprocessed image dataset; the strong activation from the white tag in the top-left corner is evident in the feature map. Right: using the processed dataset without the labels, feature maps focus attention towards objects in the lungs and produce a less skewed and broader histogram.

Part 2) Why does my data look weird?

Moving on to the second part of our project, our efforts now aimed to develop a model to automate the identification of CT scans of lungs that contain lesions due to COVID-19. The dataset consisted of 3D CT images in the Nifti format (this file format is typical for medical images). Using libraries designed for the processing and visualizing of medical images, we reviewed the data, did typical pre-processing tasks (cropping and normalization), and implemented the EfficientNet [2] model to conduct the segmentation. To our dismay, the resultant model performed poorly. What went wrong?

Further inspection of the model in the Engine indicated once again that the problem was due to our input data; it looked very strange, to say the least (Fig. 4). This was particularly surprising since displays of the data in a Python format and visualized using simple plotting techniques showed nothing unusual. That’s quite mysterious!

Further investigations uncovered that we overlooked the header of the Nifti images that contain important information about the size of voxels, the transformation, coordinate system, and other factors that were compromising the pre-processing procedure. We could see the correct images in the Python format because the header information was taken into account when plotting image data. Converting the images into arrays was another story: that important header information was lost in the process, causing an incorrect display of the images. Converting the data to standard NumPy arrays first and repeating the pre-processing step solved the problem — and saved us a lot of time. Eureka!

Figure 4) Left: Display of the initial input data, which shows nonsense results. Right: The input data after correcting the faulty pre-processing task; here the data appears as expected.

Part 3) A better method to assess the architecture of neural networks

When designing a new model it is crucial to verify that your conceptual planning of a model’s architecture is equivalent to that of the actual model you train; this is particularly important when we are implementing a deep neural network with many skip connections and residual blocks. During a given design process, copying and pasting a previous layer or block of layers is a great way to save time; however, forgetting to alter and fine-tune all the necessary parameters for the copied layers — especially the input — happens all the time. Employing deep learning frameworks is one way to confirm that a model’s architecture is correct. To expand, in Keras we can use the model.summary() method to print the structure of the model. Regardless, such prints remain challenging to comprehend and make assessments about the structure of a neural network from text-based depictions in the printout. Another common approach to validate neural network architecture is to manually draw the network architecture using simple graphic design tools or even a pen and paper. Both the drawings and printouts are too prone to error.

We wanted to move away from these error-prone methods and decided to try the Zetane Engine. The ability to display complex neural networks with detailed metrics using that software should provide novel means to visually examine a model and inspect its architecture and layer connections. Indeed, this was the case. Detailed visuals make it simpler to answer questions like:

Are all layers connected correctly?
Have we used the correct activation functions?
Have we forgotten to connect a specific layer to the rest of the neural network?

The following visuals are exemplary. Figure five displays the architecture of a U-Net neural network in the Zetane Engine; note the apparent U-shaped architecture. Upon zooming in for closer inspection, we are able to access the connections of each layer of the network. For instance, figure six shows a section of a ResNet model[3] in greater detail. Here we can easily inspect the connections between different layers, the activation functions, the batch normalization, the concatenation and more.

Figure 5) The original U-Net network presented in the Zetane Engine. The full length of the neural network extends beyond the field of view.

Figure 6) A section of a ResNet model appearing in the Zetane Engine.

Part 4) Dead neurons are easy to spot

An additional benefit of rich visuals is the convenience by which we can see all the kernels and feature maps for an entire network. For each individual layer, we can assess detailed metrics for its input, bias, kernels (filters) and outputs using the Engine. These features provide opportunities to investigate the model in-depth and identify factors that inhibit a well-performing model. Such assessments include the ability to review the shape and histogram of every tensor in the model. This provides new ways to investigate visually the features and gain insight about the low and high-level features the network is learning. Moreover, we have the ability to analyse the histogram, range and distribution for all feature maps. Figure seven shows an example of a convolution layer with all 3x3 kernels, feature maps, and a collection of other statistical measurements of interest.

Figure 7) A convolutional layer with associated feature maps and filters, displayed in the Zetane Engine.

When our chosen neural network for the previous COVID-19 segmentation task did not perform as well as we hoped for, we decided to investigate our model further. Our model consisted of a feature-pyramid structure as the backbone with a few bi-directional feature layers as the decoder section. After auditing inputs and output for internal convolutional layers, we realized that the data was not passing through some branches of the network. Artificial neurons appeared “dead” in those branches (Fig. 8). Adding batch-normalization to all layers to ensure that the input for all layers is normalized and not skewed offered a solution. Another option to resolve this issue is to use other kinds of activation functions other than ReLu, such as ELU or leaky Relu.

After adding batch-normalization, the dead neurons were active and working again. This quick and simple trick resulted in a considerable improvement in model performance. Identifying this problem of dead neurons would have otherwise been a difficult and a drawn-out process without these visuals.

Figure 8) Left: no signal is passing through a branch of the network such that the neurons appear ‘dead’; right: adding batch-normalization is effective in making the data pass ReLu activation.

The ability to investigate tensors and their associated histograms can thus facilitate error analysis in many ways. For one, you can inspect with ease different tensors when feeding the network with erroneous cases to recognize the possible causes of the error. With that known, it becomes easier to plan strategies to remedy the problem.

Part 5) A closer look at tensor metrics

Now consider the following project concerning the segmentation of ultrasound images of breast tumors as benign or malignant. We were able to segment the tumor classes with high accuracy for a subset of images. This was not always the case, where a significant set of ultrasound images proved too difficult to assign a correct segmentation. To expand our analysis of the problem, we once again reviewed the images from both tumor types in greater detail using the Engine.

In images with a good segmentation, the histogram of the tensor before the last layer (the sigmoid layer) had a different pattern: Images with a good segmentation result had a sharp drop in its histogram, while images with a weak segmentation had a histogram plot that was more spread out (Fig. 9). These observations guided us to adapt the thresholding based on the histogram results of the output; for each image in the test set, we found the most similar histograms in the training set and calculated the optimum threshold. With the new thresholding approach, the average Dice score in images with weak segmentation (Dice score < 0.80) improved by 2.1%. Figure ten represents example images with good and weak segmentation.

Figure 9) Histograms of the feature maps generated in the last layer of the neural network. Top) typical histogram plots for images with good segmentation results. Bottom) exemplary histogram plots of images with poor segmentation results.

Figure 10) Exemplary images with poor segmentation results. From left to right: the ground truth; the output with fixed thresholding (Dice = 0.74); the output with the adaptive thresholding (Dice = 0.82).

Analysis of the histogram provided helpful insights towards identifying images with poor segmentation, which is important on its own because we can then identify images known to have bad results. If we are able to determine the types of images incompatible with our model, it will be simple to separate them and apply another more-performant model, or we can employ manual segmentation techniques by visual inspection. Overall, these tasks will reduce the amount of data that needs further investigation.

Visuals provide diverse insights and guidance to streamline tasks in deep learning

This series of short case studies demonstrate how visual inspections of data and models can provide a wealth of insights that we can leverage to improve a model’s performance, to prune the model, or to get guidance for additional fine-tuning. In addition to the examples described here, there is a growing list of ways that rich visuals can help deep learning researchers and developers improve their models. Here is non-exhaustive list of tasks where better visuals of datasets and neural networks can streamline tasks in practice:

Data preparation
Pre-processing optimization
Model structure optimization (e.g., adding attention gates, skip connections, etc.)
Layer activation analysis (e.g., identifying skewness)
Hyperparameter optimization
Error analysis
Feature visualization and attribution using explainability methods
Training-procedure inspection
Model pruning
Pre-trained model analysis

We will discuss further topics from the above list in subsequent articles from members of Zetane, so keep your eye out for future articles. In the meantime, we encourage you to try these optimization techniques with your own projects; you can start a free trial of the Zetane Engine today to access detailed visuals of your data and models. We would love to see your results and encourage you to send them to info@zetane.com or post summaries of your work in the Comments section below.

Works cited

[1] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation.” International Conference on Medical image computing and computer-assisted intervention. Springer, Cham. (2015).

[2] Tan, Mingxing, and Quoc V. Le. “Efficientnet: Rethinking model scaling for convolutional neural networks.” arXiv preprint arXiv:1905.11946 (2019).

[3] He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. (2016).