Drawing Deep Neural Network
Visualizing Deep Neural Network Using Matplotlib
This is the first part of the Data Science Portfolio that I’m doing for March.
In the first part, I’ll be showcasing a way to draw deep neural networks using Matplotlib.
From the YOLOv8-n model, we parse the following architecture:
[{'type': 'Conv', 'conv': {'in_channels': 3, 'out_channels': 16, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 16, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'Conv', 'conv': {'in_channels': 16, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 48, 'out_channels': 32, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 16, 'out_channels': 16, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 16, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 16, 'out_channels': 16, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 16, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'Conv', 'conv': {'in_channels': 32, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 128, 'out_channels': 64, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}, {'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 32, 'out_channels': 32, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 32, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'Conv', 'conv': {'in_channels': 64, 'out_channels': 128, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 128, 'out_channels': 128, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 256, 'out_channels': 128, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}, {'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 64, 'out_channels': 64, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 64, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'Conv', 'conv': {'in_channels': 128, 'out_channels': 256, 'kernel_size': (3, 3), 'stride': (2, 2), 'padding': (1, 1), 'bias': None}, 'bn': {'num_features': 256, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, {'type': 'C2f', 'cv1': {'conv': {'in_channels': 256, 'out_channels': 256, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 256, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 384, 'out_channels': 256, 'kernel_size': (1, 1), 'stride': (1, 1), 'bias': False}, 'bn': {'num_features': 256, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'm': [{'type': 'Bottleneck', 'cv1': {'conv': {'in_channels': 128, 'out_channels': 128, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}, 'cv2': {'conv': {'in_channels': 128, 'out_channels': 128, 'kernel_size': (3, 3), 'stride': (1, 1), 'padding': (1, 1), 'bias': False}, 'bn': {'num_features': 128, 'eps': 0.001, 'momentum': 0.03, 'affine': True, 'track_running_stats': True}, 'act': 'SiLU'}}]}, {'type': 'SPPF'}]
Note: We will draw just the backbone of YOLOv8
We draw each outer layer as a separate block:
Let us add color:
This doesn’t look super interesting; we will draw the shapes in a 3D space:
Now it looks better. Let’s change the dimensions of the blocks to match the output dimension and channels in each layer.
And that’s it! It looks very nice, and the diagram is ready to be annotated with dimensions and layer descriptions.
Currently, the code is not very general, but it should be able to draw the backbone of any YOLOv8 model. If there’s interest, I‘ll consider updating and publishing the code. Just reach out to me, and I’ll send the GitHub link.
Further Reading
If you want to learn more about programming and, specifically, machine learning, see the following course:
Note: If you use my links to order, I’ll get a small kickback.