A Guide to Hand-Calculating FLOPs and MACs
Why is Understanding MACs and FLOPs in Neural Networks Important?
In this session, we are going to delve deep into the concepts of MACs (Multiply-Accumulate Operations) and FLOPs (Floating Point Operations) within the context of neural networks. By learning how to calculate these manually using pen and paper, you’ll acquire a foundational understanding of the computational complexity and efficiency of various network structures.
Understanding MACs and FLOPs is not just an academic exercise; it is a critical component in optimizing neural networks for performance and efficiency. It helps in designing models that are both computationally efficient and effective, ultimately saving time and resources during the training and inference phases.
here is a fully functioning example in a colab notebook
Resource Efficiency:
Understanding FLOPs helps in estimating the computational cost of a neural network. By optimizing the number of FLOPs, one can potentially reduce the time taken to train or run a neural network.
Memory Efficiency:
MAC operations often dictate the memory usage of the network since they are directly related to the number of parameters and activations in the network. Reducing MACs can help in making the network memory efficient.
Energy Consumption:
Power Efficiency:
Both FLOPs and MAC operations contribute to the power consumption of the hardware on which the neural network is running. By optimizing these metrics, one can potentially reduce the energy requirements of running the network, which is particularly important in mobile and embedded devices.
Model Optimization:
Pruning and Quantization:
Understanding FLOPs and MACs can assist in optimizing a neural network through techniques like pruning (removing unnecessary connections) and quantization (reducing the precision of weights and activations), which aim to reduce computational and memory costs.
Performance Benchmarking:
Comparison between Models:
FLOPs and MACs provide a means to compare different models in terms of their computational complexity, which can be a criterion for selecting models for specific applications.
Hardware Benchmarking:
These metrics can also be used to benchmark the performance of different hardware platforms in running neural networks.
Deployment on Edge Devices:
Real-time Applications:
For real-time applications, especially on edge devices with limited computational resources, understanding and optimizing these metrics is critical in ensuring that the network can run within the time constraints of the application.
Battery Life:
In battery-powered devices, reducing the computational cost (and hence energy consumption) of neural networks can help in extending the battery life.
Research and Development:
Designing New Algorithm:
Researchers can use these metrics as guidelines when developing new algorithms or neural network architectures, aiming to improve computational efficiency without sacrificing accuracy.
Step 1: Understand the Definitions
FLOP
A FLOP (Floating Point OPeration) is considered to be either an addition, subtraction, multiplication, or division operation.
MAC
A MAC (Multiply-ACCumulate) operation is essentially a multiplication followed by an addition, i.e., MAC = a * b + c. It counts as two FLOPs (one for multiplication and one for addition).
Step 2: Analyze Each Layer
calculating the number of floating-point operations or multiply-accumulate operations to understand the computational complexity of each layer.
1. Fully Connected Layer (Dense Layer)
Now, we will create a simple neural network with 3 layers and begin counting the operations involved. Here is the formula for calculating the operations in the first linear layer, which is a fully connected (or dense) layer:
For a fully connected layer with I
inputs and O
outputs, the number of operations are as follows:
- MACs:
I × O
- FLOPs:
2 × (I × O)
(since each MAC counts as two FLOPs)
class SimpleLinearModel(nn.Module):
def __init__(self):
super(SimpleLinearModel,self).__init__()
self.fc1 = nn.Linear(in_features=10, out_features=20, bias=False)
self.fc2 = nn.Linear(in_features=20, out_features=15, bias=False)
self.fc3 = nn.Linear(in_features=15, out_features=1, bias=False)
def forward(self, x):
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
F.relu(x)
x = self.fc3(x)
return x
linear_model = SimpleLinearModel().cuda()
sample_data = torch.randn(1, 10).cuda()
step 1: Identifying Layer Parameters For the given model, we have three linear layers defined as:
fc1
: 10 input features, 20 output featuresfc2
: 20 input features, 15 output featuresfc3
: 15 input features, 1 output feature
Step 2: Calculating FLOPs and MACs Now, calculate MACs and FLOPs for each layer:
Layer
fc1
:MACs = 10 × 20 = 200
FLOPs = 2 × MACs = 2 × 200 = 400
Layer
fc2
:MACs = 20 × 15 = 300
FLOPs = 2 × MACs = 2 × 300 = 600
Layer
fc3
:MACs = 15 × 1 = 15
FLOPs = 2 × MACs = 2 × 15 = 30
Step 3: Summing Up the Results Finally, to find the total number of MACs and FLOPs for a single input passing through the entire network, we sum the results from all layers:
- Total MACs = MACs(
fc1
) + MACs(fc2
) + MACs(fc3
) = 200 + 300 + 15 = 515 - Total FLOPs = FLOPs(
fc1
) + FLOPs(fc2
) + FLOPs(fc3
) = 400 + 600 + 30 = 1030
Verifying FLOPs and MACs with torchprofile
Library
You can use the torchprofile
library to verify the FLOPs and MACs calculations for the given neural network model. Here's how to do it:
macs = profile_macs(linear_model, sample_data)
print(macs)
#515
2. Convolutional Neural Networks(CNNs)
Now, let’s determine the MACs (Multiply-Accumulates) and FLOPs (Floating-Point Operations) for a straightforward convolutional model. This calculation can be a bit more involved than our previous example with dense layers, mainly due to factors like stride, padding, and kernel size. However, I’ll break it down to make it easier for our learning purpose.
class SimpleConv(nn.Module):
def __init__(self):
super(SimpleConv, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1)
self.fc = nn.Linear(in_features=32*28*28, out_features=10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = x.view(x.shape[0], -1)
x = self.fc(x)
return x
x = torch.rand(1, 1, 28, 28).cuda()
conv_model = SimpleConv().cuda()
Important Consideration for Calculating Convolutional Operations
- When calculating operations for convolutional kernels, it’s crucial to remember that the number of channels in the kernel should match the number of channels in the input. For instance, if our input is an RGB image with three color channels, the kernel’s dimensions will be
3x3x3
to account for the input's three channels. - For the purpose of our demonstration, we’ll maintain a consistent image size throughout the convolutional layers. To achieve this, we’ll set both the padding and stride values to 1.
step 1: Identifying Layer Parameters
For the given model, we have two conv layers and one linear layer defined as:
conv1
: 1 input channels, 16 output channels, kernel size is 3conv2
: 16 input channels, 32 output channelsfc
:32*28*28
input features, 1 output feature. because our image is not changed in the convolutional layers
Step 2: Calculating FLOPs and MACs Now, calculate MACs and FLOPs for each layer:
formula is output_image_size * kernel shape * output_channels
Layer
conv1
:MACs = 28 * 28 * 3 * 3 * 1 * 16 = 1,12,896
FLOPs = 2 × MACs = 2 × 200 = 2,25,792
Layer
conv2
:MACs = 28 × 28 * 3 * 3 * 16 * 32 = 3,612,672
FLOPs = 2 × MACs = 2 × 300 = 600 = 7,225,344
Layer
fc
:MACs = 32 * 28 * 28 * 10 = 250,880
FLOPs = 2 × MACs = 2 × 15 = 501,760
Step 3: Summing Up the Results Finally, to find the total number of MACs and FLOPs for a single input passing through the entire network, we sum the results from all layers:
- Total MACs = MACs(
conv1
) + MACs(conv2
) + MACs(fc
) = 1,12,896 + 3,612,672 + 250,880 = 39,76,448 - Total FLOPs = FLOPs(
fc1
) + FLOPs(fc2
) + FLOPs(fc3
) = 2,25,792 + 7,225,344 + 501,760 = 7,952,896
Verifying Operations with torchprofile
Library
macs = profile_macs(conv_model,(x,))
print(macs)
#3976448
3. Self-Attention Block
Having covered MACs for linear and convolutional layers, our next step is to determine the FLOPs (Floating-Point Operations) for a Self-Attention block, a crucial component in large language models. This calculation is essential for understanding the computational complexity of such models. Let’s delve into it.
class SimpleAttentionBlock(nn.Module):
def __init__(self, embed_size, heads):
super(SimpleAttentionBlock, self).__init__()
self.embed_size = embed_size
self.heads = heads
self.head_dim = embed_size // heads
assert (
self.head_dim * heads == embed_size
), "Embedding size needs to be divisible by heads"
self.values = nn.Linear(self.embed_size, self.embed_size, bias=False)
self.keys = nn.Linear(self.embed_size, self.embed_size, bias=False)
self.queries = nn.Linear(self.embed_size, self.embed_size, bias=False)
self.fc_out = nn.Linear(heads * self.head_dim, embed_size)
def forward(self, values, keys, queries, mask):
N = queries.shape[0]
value_len, key_len, query_len = values.shape[1], keys.shape[1], queries.shape[1]
print(values.shape)
values = self.values(values).reshape(N, self.heads, value_len, self.head_dim)
keys = self.keys(keys).reshape(N, self.heads, key_len, self.head_dim)
queries = self.queries(queries).reshape(N, self.heads, query_len, self.head_dim)
energy = torch.matmul(queries, keys.transpose(-2, -1))
if mask is not None:
energy = energy.masked_fill(mask == 0, float("-1e20"))
attention = torch.nn.functional.softmax(energy, dim=3)
out = torch.matmul(attention, values).reshape(
N, query_len, self.heads * self.head_dim
)
return self.fc_out(out)
step 1: Identifying Layer Parameters
Linear Transformations
let’s define hyper_params
- batch_size = 1
- seq_len = 10
- embed_size = 256
In the attention block, we have three linear transformations (for queries, keys, and values), and one at the end (fc_out).
- Input Size: [batch_size, seq_len, embed_size]
- Linear transformation matrix: [embed_size, embed_size]
- MACs: batch_size×seq_len×embed_size×embed_size
Query, Key, Value linear transformation
:
MACs for Query Transformation = 1 * 10 * 256 * 256 = 6,55,360
MACs for Key Transformation = 1 * 10 * 256 * 256 = 6,55,360
MACS for Value Transformation = 1 * 10 * 256 * 256 = 6,55,360
Energy Calculation Calculation: queries (reshaped) dot keys (reshaped) — a dot product operation.
Macs: batch_size×seq_len×seq_len×heads×head_dim
query and key dot product
MACS = 1 * 10 * 10 * 32 [32 because 256/8 divide by heads] = 25,600
Output from Attention Weights and Values Calculation: attention weights dot values (reshaped) — another dot product operation.
Macs : batch_size×seq_len×seq_len×heads×head_dim
attention and value dot product
Macs = 1 * 10 * 10 * 32 = 25,600
Fully Connected Output (fc_out)
Macs: batch_size×seq_len×heads×head_dim×embed_size
Macs = 1 * 10 * 8 * 32 * 256 = 6,55,360
Step 2: Summing Up the Results
- Total MACs = MACs(
conv1
) + MACs(conv2
) + MACs(fc
) = 6,55,360 + 6,55,360 + 6,55,360 + 25,600 + 25,600 + 6,55,360 = 26,72,640 - Total FLOPs = 2 * Total MACs = 53,45,280
Verifying Operations with torchprofile
Library
# Create an instance of the model
model = SimpleAttentionBlock(embed_size=256, heads=8).cuda()
# Generate some sample data (batch of 5 sequences, each of length 10, embedding size 256)
values = torch.randn(1, 10, 256).cuda()
keys = torch.randn(1, 10, 256).cuda()
queries = torch.randn(1, 10, 256).cuda()
# No mask for simplicity
mask = None
# Forward pass with the sample data
macs = profile_macs(model, (values, keys, queries, mask))
print(macs)
#2672640
Summary: Scaling MACs and FLOPs for Different Batch Sizes
Throughout our calculations, we’ve primarily considered a batch size of 1. However, it’s important to note that scaling MACs and FLOPs for larger batch sizes is straightforward.
To compute MACs or FLOPs for a batch size greater than one, you can simply multiply the total MACs or FLOPs obtained for batch size 1 by the desired batch size value. This scaling allows you to estimate computational requirements for various batch sizes in your neural network models.
Keep in mind that the results will directly scale linearly with the batch size. For instance, if you have a batch size of 32, you can obtain the MACs or FLOPs by multiplying the values for batch size 1 by 32.
Conclusion
I hope you found this article insightful and valuable. If you enjoyed the content and wish to stay updated with future posts, please consider following me on LinkedIn. Your support motivates me to continue sharing knowledge. Thank you for reading!