Drawing Deep Neural Network

Visualizing Deep Neural Network Using Matplotlib

Oliver Lövström
Internet of Technology
5 min readMar 1, 2024

--

This is the first part of the Data Science Portfolio that I’m doing for March.

This image was created with the assistance of DALL·E

In the first part, I’ll be showcasing a way to draw deep neural networks using Matplotlib.

From the YOLOv8-n model, we parse the following architecture:

[{'type': 'Conv', 'conv': {'in_channels': 3, 'out_channels': 16, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 16, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'Conv', 'conv': {'in_channels': 16, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 48, 'out_channels': 32, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 16, 'out_channels': 16, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 16, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 16, 'out_channels': 16, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 16, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'Conv', 'conv': {'in_channels': 32, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 128, 'out_channels': 64, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}, {'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'Conv', 'conv': {'in_channels': 64, 'out_channels': 128, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 128, 'out_channels': 128, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 256, 'out_channels': 128, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}, {'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'Conv', 'conv': {'in_channels': 128, 'out_channels': 256, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 256, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 256, 'out_channels': 256, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 256, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 384, 'out_channels': 256, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 256, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 128, 'out_channels': 128, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 128, 'out_channels': 128, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'SPPF'}]

Note: We will draw just the backbone of YOLOv8

We draw each outer layer as a separate block:

Image by Author

Let us add color:

Image by Author

This doesn’t look super interesting; we will draw the shapes in a 3D space:

Image by Author

Now it looks better. Let’s change the dimensions of the blocks to match the output dimension and channels in each layer.

Image by Author

And that’s it! It looks very nice, and the diagram is ready to be annotated with dimensions and layer descriptions.

Currently, the code is not very general, but it should be able to draw the backbone of any YOLOv8 model. If there’s interest, I‘ll consider updating and publishing the code. Just reach out to me, and I’ll send the GitHub link.

Further Reading

If you want to learn more about programming and, specifically, machine learning, see the following course:

Note: If you use my links to order, I’ll get a small kickback.

--

--