Sitemap

A Guide to Hand-Calculating FLOPs and MACs

8 min readSep 20, 2023

Why is Understanding MACs and FLOPs in Neural Networks Important?

In this session, we are going to delve deep into the concepts of MACs (Multiply-Accumulate Operations) and FLOPs (Floating Point Operations) within the context of neural networks. By learning how to calculate these manually using pen and paper, you’ll acquire a foundational understanding of the computational complexity and efficiency of various network structures.

Understanding MACs and FLOPs is not just an academic exercise; it is a critical component in optimizing neural networks for performance and efficiency. It helps in designing models that are both computationally efficient and effective, ultimately saving time and resources during the training and inference phases.

here is a fully functioning example in a colab notebook

Resource Efficiency:

Understanding FLOPs helps in estimating the computational cost of a neural network. By optimizing the number of FLOPs, one can potentially reduce the time taken to train or run a neural network.

Memory Efficiency:

MAC operations often dictate the memory usage of the network since they are directly related to the number of parameters and activations in the network. Reducing MACs can help in making the network memory efficient.

Energy Consumption:

Power Efficiency:

Both FLOPs and MAC operations contribute to the power consumption of the hardware on which the neural network is running. By optimizing these metrics, one can potentially reduce the energy requirements of running the network, which is particularly important in mobile and embedded devices.

Model Optimization:

Pruning and Quantization:

Understanding FLOPs and MACs can assist in optimizing a neural network through techniques like pruning (removing unnecessary connections) and quantization (reducing the precision of weights and activations), which aim to reduce computational and memory costs.

Performance Benchmarking:

Comparison between Models:

FLOPs and MACs provide a means to compare different models in terms of their computational complexity, which can be a criterion for selecting models for specific applications.

Hardware Benchmarking:

These metrics can also be used to benchmark the performance of different hardware platforms in running neural networks.

Deployment on Edge Devices:

Real-time Applications:

For real-time applications, especially on edge devices with limited computational resources, understanding and optimizing these metrics is critical in ensuring that the network can run within the time constraints of the application.

Battery Life:

In battery-powered devices, reducing the computational cost (and hence energy consumption) of neural networks can help in extending the battery life.

Research and Development:

Designing New Algorithm:

Researchers can use these metrics as guidelines when developing new algorithms or neural network architectures, aiming to improve computational efficiency without sacrificing accuracy.

Step 1: Understand the Definitions

FLOP

A FLOP (Floating Point OPeration) is considered to be either an addition, subtraction, multiplication, or division operation.

MAC

A MAC (Multiply-ACCumulate) operation is essentially a multiplication followed by an addition, i.e., MAC = a * b + c. It counts as two FLOPs (one for multiplication and one for addition).

Step 2: Analyze Each Layer

calculating the number of floating-point operations or multiply-accumulate operations to understand the computational complexity of each layer.

1. Fully Connected Layer (Dense Layer)

Now, we will create a simple neural network with 3 layers and begin counting the operations involved. Here is the formula for calculating the operations in the first linear layer, which is a fully connected (or dense) layer:

For a fully connected layer with I inputs and O outputs, the number of operations are as follows:

  • MACs: I × O
  • FLOPs: 2 × (I × O) (since each MAC counts as two FLOPs)
class SimpleLinearModel(nn.Module):
def __init__(self):
super(SimpleLinearModel,self).__init__()
self.fc1 = nn.Linear(in_features=10, out_features=20, bias=False)
self.fc2 = nn.Linear(in_features=20, out_features=15, bias=False)
self.fc3 = nn.Linear(in_features=15, out_features=1, bias=False)
def forward(self, x):
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
F.relu(x)
x = self.fc3(x)
return x

linear_model = SimpleLinearModel().cuda()
sample_data = torch.randn(1, 10).cuda()

step 1: Identifying Layer Parameters For the given model, we have three linear layers defined as:

  • fc1: 10 input features, 20 output features
  • fc2: 20 input features, 15 output features
  • fc3: 15 input features, 1 output feature

Step 2: Calculating FLOPs and MACs Now, calculate MACs and FLOPs for each layer:

Layer fc1:

MACs = 10 × 20 = 200

FLOPs = 2 × MACs = 2 × 200 = 400

Layer fc2:

MACs = 20 × 15 = 300

FLOPs = 2 × MACs = 2 × 300 = 600

Layer fc3:

MACs = 15 × 1 = 15

FLOPs = 2 × MACs = 2 × 15 = 30

Step 3: Summing Up the Results Finally, to find the total number of MACs and FLOPs for a single input passing through the entire network, we sum the results from all layers:

  • Total MACs = MACs(fc1) + MACs(fc2) + MACs(fc3) = 200 + 300 + 15 = 515
  • Total FLOPs = FLOPs(fc1) + FLOPs(fc2) + FLOPs(fc3) = 400 + 600 + 30 = 1030

Verifying FLOPs and MACs with torchprofile Library

You can use the torchprofile library to verify the FLOPs and MACs calculations for the given neural network model. Here's how to do it:

macs = profile_macs(linear_model, sample_data)
print(macs)

#515

2. Convolutional Neural Networks(CNNs)

Now, let’s determine the MACs (Multiply-Accumulates) and FLOPs (Floating-Point Operations) for a straightforward convolutional model. This calculation can be a bit more involved than our previous example with dense layers, mainly due to factors like stride, padding, and kernel size. However, I’ll break it down to make it easier for our learning purpose.

class SimpleConv(nn.Module):
def __init__(self):
super(SimpleConv, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1)
self.fc = nn.Linear(in_features=32*28*28, out_features=10)

def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = x.view(x.shape[0], -1)
x = self.fc(x)
return x

x = torch.rand(1, 1, 28, 28).cuda()
conv_model = SimpleConv().cuda()

Important Consideration for Calculating Convolutional Operations

  • When calculating operations for convolutional kernels, it’s crucial to remember that the number of channels in the kernel should match the number of channels in the input. For instance, if our input is an RGB image with three color channels, the kernel’s dimensions will be 3x3x3 to account for the input's three channels.
  • For the purpose of our demonstration, we’ll maintain a consistent image size throughout the convolutional layers. To achieve this, we’ll set both the padding and stride values to 1.

step 1: Identifying Layer Parameters

For the given model, we have two conv layers and one linear layer defined as:

  • conv1: 1 input channels, 16 output channels, kernel size is 3
  • conv2: 16 input channels, 32 output channels
  • fc: 32*28*28 input features, 1 output feature. because our image is not changed in the convolutional layers

Step 2: Calculating FLOPs and MACs Now, calculate MACs and FLOPs for each layer:

formula is output_image_size * kernel shape * output_channels

Layer conv1:

MACs = 28 * 28 * 3 * 3 * 1 * 16 = 1,12,896

FLOPs = 2 × MACs = 2 × 200 = 2,25,792

Layer conv2:

MACs = 28 × 28 * 3 * 3 * 16 * 32 = 3,612,672

FLOPs = 2 × MACs = 2 × 300 = 600 = 7,225,344

Layer fc:

MACs = 32 * 28 * 28 * 10 = 250,880

FLOPs = 2 × MACs = 2 × 15 = 501,760

Step 3: Summing Up the Results Finally, to find the total number of MACs and FLOPs for a single input passing through the entire network, we sum the results from all layers:

  • Total MACs = MACs(conv1) + MACs(conv2) + MACs(fc) = 1,12,896 + 3,612,672 + 250,880 = 39,76,448
  • Total FLOPs = FLOPs(fc1) + FLOPs(fc2) + FLOPs(fc3) = 2,25,792 + 7,225,344 + 501,760 = 7,952,896

Verifying Operations with torchprofile Library

macs = profile_macs(conv_model,(x,))
print(macs)

#3976448

3. Self-Attention Block

Having covered MACs for linear and convolutional layers, our next step is to determine the FLOPs (Floating-Point Operations) for a Self-Attention block, a crucial component in large language models. This calculation is essential for understanding the computational complexity of such models. Let’s delve into it.

class SimpleAttentionBlock(nn.Module):
def __init__(self, embed_size, heads):
super(SimpleAttentionBlock, self).__init__()
self.embed_size = embed_size
self.heads = heads
self.head_dim = embed_size // heads

assert (
self.head_dim * heads == embed_size
), "Embedding size needs to be divisible by heads"

self.values = nn.Linear(self.embed_size, self.embed_size, bias=False)
self.keys = nn.Linear(self.embed_size, self.embed_size, bias=False)
self.queries = nn.Linear(self.embed_size, self.embed_size, bias=False)
self.fc_out = nn.Linear(heads * self.head_dim, embed_size)

def forward(self, values, keys, queries, mask):
N = queries.shape[0]
value_len, key_len, query_len = values.shape[1], keys.shape[1], queries.shape[1]
print(values.shape)
values = self.values(values).reshape(N, self.heads, value_len, self.head_dim)
keys = self.keys(keys).reshape(N, self.heads, key_len, self.head_dim)
queries = self.queries(queries).reshape(N, self.heads, query_len, self.head_dim)


energy = torch.matmul(queries, keys.transpose(-2, -1))

if mask is not None:
energy = energy.masked_fill(mask == 0, float("-1e20"))

attention = torch.nn.functional.softmax(energy, dim=3)
out = torch.matmul(attention, values).reshape(
N, query_len, self.heads * self.head_dim
)

return self.fc_out(out)

step 1: Identifying Layer Parameters

Linear Transformations

let’s define hyper_params

  • batch_size = 1
  • seq_len = 10
  • embed_size = 256

In the attention block, we have three linear transformations (for queries, keys, and values), and one at the end (fc_out).

  • Input Size: [batch_size, seq_len, embed_size]
  • Linear transformation matrix: [embed_size, embed_size]
  • MACs: batch_size×seq_len×embed_size×embed_size

Query, Key, Value linear transformation:

MACs for Query Transformation = 1 * 10 * 256 * 256 = 6,55,360

MACs for Key Transformation = 1 * 10 * 256 * 256 = 6,55,360

MACS for Value Transformation = 1 * 10 * 256 * 256 = 6,55,360

Energy Calculation Calculation: queries (reshaped) dot keys (reshaped) — a dot product operation.

Macs: batch_size×seq_len×seq_len×heads×head_dim

query and key dot product

MACS = 1 * 10 * 10 * 32 [32 because 256/8 divide by heads] = 25,600

Output from Attention Weights and Values Calculation: attention weights dot values (reshaped) — another dot product operation.

Macs : batch_size×seq_len×seq_len×heads×head_dim

attention and value dot product

Macs = 1 * 10 * 10 * 32 = 25,600

Fully Connected Output (fc_out)

Macs: batch_size×seq_len×heads×head_dim×embed_size

Macs = 1 * 10 * 8 * 32 * 256 = 6,55,360

Step 2: Summing Up the Results

  • Total MACs = MACs(conv1) + MACs(conv2) + MACs(fc) = 6,55,360 + 6,55,360 + 6,55,360 + 25,600 + 25,600 + 6,55,360 = 26,72,640
  • Total FLOPs = 2 * Total MACs = 53,45,280

Verifying Operations with torchprofile Library

# Create an instance of the model
model = SimpleAttentionBlock(embed_size=256, heads=8).cuda()

# Generate some sample data (batch of 5 sequences, each of length 10, embedding size 256)
values = torch.randn(1, 10, 256).cuda()
keys = torch.randn(1, 10, 256).cuda()
queries = torch.randn(1, 10, 256).cuda()

# No mask for simplicity
mask = None
# Forward pass with the sample data
macs = profile_macs(model, (values, keys, queries, mask))
print(macs)

#2672640

Summary: Scaling MACs and FLOPs for Different Batch Sizes

Throughout our calculations, we’ve primarily considered a batch size of 1. However, it’s important to note that scaling MACs and FLOPs for larger batch sizes is straightforward.

To compute MACs or FLOPs for a batch size greater than one, you can simply multiply the total MACs or FLOPs obtained for batch size 1 by the desired batch size value. This scaling allows you to estimate computational requirements for various batch sizes in your neural network models.

Keep in mind that the results will directly scale linearly with the batch size. For instance, if you have a batch size of 32, you can obtain the MACs or FLOPs by multiplying the values for batch size 1 by 32.

Conclusion

I hope you found this article insightful and valuable. If you enjoyed the content and wish to stay updated with future posts, please consider following me on LinkedIn. Your support motivates me to continue sharing knowledge. Thank you for reading!

--

--

Pasha Shaik
Pasha Shaik

Written by Pasha Shaik

Artificial Intelligence | Deep Learning | NLP | Computer Vision | Generative AI | LinkedIn https://www.linkedin.com/in/pasha-shaik

Responses (5)