Drawing Deep Neural Network

Visualizing Deep Neural Network Using Matplotlib

Oliver Lövström

Published in

Internet of Technology

5 min readMar 1, 2024

This is the first part of the Data Science Portfolio that I’m doing for March.

This image was created with the assistance of DALL·E

In the first part, I’ll be showcasing a way to draw deep neural networks using Matplotlib.

From the YOLOv8-n model, we parse the following architecture:

[{'type': 'Conv', 'conv': {'in_channels': 3, 'out_channels': 16, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 16, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'Conv', 'conv': {'in_channels': 16, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 48, 'out_channels': 32, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 16, 'out_channels': 16, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 16, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 16, 'out_channels': 16, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 16, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'Conv', 'conv': {'in_channels': 32, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 128, 'out_channels': 64, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}, {'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'Conv', 'conv': {'in_channels': 64, 'out_channels': 128, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 128, 'out_channels': 128, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 256, 'out_channels': 128, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}, {'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'Conv', 'conv': {'in_channels': 128, 'out_channels': 256, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 256, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 256, 'out_channels': 256, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 256, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 384, 'out_channels': 256, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 256, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 128, 'out_channels': 128, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 128, 'out_channels': 128, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'SPPF'}]

Note: We will draw just the backbone of YOLOv8

We draw each outer layer as a separate block:

Let us add color:

This doesn’t look super interesting; we will draw the shapes in a 3D space:

Now it looks better. Let’s change the dimensions of the blocks to match the output dimension and channels in each layer.

And that’s it! It looks very nice, and the diagram is ready to be annotated with dimensions and layer descriptions.

Currently, the code is not very general, but it should be able to draw the backbone of any YOLOv8 model. If there’s interest, I‘ll consider updating and publishing the code. Just reach out to me, and I’ll send the GitHub link.

Drawing Deep Neural Network

Visualizing Deep Neural Network Using Matplotlib

Further Reading

Machine Learning

Offered by Stanford University and DeepLearning.AI. #BreakIntoAI with Machine Learning Specialization. Master…

Written by Oliver Lövström