SqueezeNet: The Key to Unlocking the Potential of Edge Computing

Published in

SFU Professional Computer Science

14 min readFeb 9, 2023

Authors: Anmol Malhotra, Rithik Agarwal, Hardev Khandhar, Rohan Mathur

This blog is written and maintained by students in the Master of Science in Professional Computer Science Program at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit sfu.ca/computing/mpcs.

AI’s New Frontier: The Rise of SqueezeNet

Artificial Intelligence (AI) is on an incredible journey of advancement, with discoveries in machine learning and deep neural networks paving the way for the development of highly advanced systems capable of handling massive amounts of data. The ability to build efficient neural networks capable of distinguishing patterns flawlessly and making precise predictions is critical to this progress.

SqueezeNet is a solution!

What is SqueezeNet?

SqueezeNet is an algorithm that is designed to be small, yet highly accurate. In 2016, researchers from DeepScale (a startup company later acquired by Tesla), UC Berkeley, and Stanford University proposed this compact and efficient convolutional neural network (CNN)[1]. The main goal of SqueezeNet is to strike a balance between high accuracy and low complexity, making it an ideal choice for devices with limited resources like mobiles and embedded systems.

SqueezeNet stands out with its use of fire modules, a specific type of convolutional layer that combines 1x1 and 3x3 filters to reduce the number of parameters while maintaining high accuracy and making it ideal for resource-constrained devices. It can achieve high accuracy while using a fraction of the computational resources required by other CNNs.

Organization of convolution filters in the Fire module [1]

In recent years, researchers have made numerous advancements and improvements to SqueezeNet, including SqueezeNet v1.1 [1] and SqueezeNet v2.0 [2], which have increased accuracy while decreasing computational expenses. SqueezeNet v1.1 improved accuracy by 1.4% on the ImageNet dataset, while SqueezeNet v2.0 improved accuracy by 1.8% with a 2.4x reduction in the number of parameters [2].

In this blog post, we will take an immersive journey into the intricacies of SqueezeNet architecture, delving into its performance and technical nuances. We will also explore the exciting potential applications of this cutting-edge technology and the limitless possibilities it holds in the field of artificial intelligence.

Unlocking the Secret: A Look into SqueezeNet’s Inner Workings

AlexNet is a revolutionary deep learning model that was introduced in 2012 and has revolutionized computer vision. Although it won the ImageNet challenge, its high computational cost made it unsuitable for deployment on edge devices. One of SqueezeNet’s most significant advantages is its ability to run on edge devices with limited computational resources, such as mobile phones and IoT devices. This is due to the network having 50 times fewer parameters than AlexNet and requiring 10 times fewer FLOPS (floating point operations per second) to run, making it significantly more efficient [1].

SqueezeNet uses a method known as channel squeezing, which is one of the technology’s main innovations. This helps to lower the network’s computational cost without compromising accuracy by lowering the number of channels in the convolutional layers of the model. SqueezeNet uses channel squeezing in addition to other methods like fire modules and deep compression to increase efficiency. [2].

SqueezeNet and its Predecessors

One of the key advantages of SqueezeNet is its ability to strike a balance between accuracy and computational resources. Traditional CNNs like AlexNet and VGGNet are highly accurate but require significant computational power to train and deploy. This makes them impractical for use on embedded systems or mobile devices. However, SqueezeNet can be used as a feature extractor in other machine learning pipelines [3] which allows other models to benefit from the features that SqueezeNet has learned, leading to improved performance even on mobile devices.

Time Comparision between different algorithms [10]

SqueezeNet has been recognized for architectural innovations, which have been widely adopted in other CNN architectures and shown to improve their performance [4]. Furthermore, the compact architecture of SqueezeNet has been used as a starting point for designing other efficient architectures [5]. These developments demonstrate SqueezeNet’s influence in the field and its potential to inspire future new and innovative architectures.

Which is better: A Comparison Between SqueezeNet and AlexNet?

To get a comparison between SqueezeNet and AlexNet we implemented and trained both algorithms.

AlexNet

AlexNet is a deep neural network with 8 layers, including 5 convolutional layers and 3 fully connected layers.

The implementation of the above models is based on a dataset from Kaggle which consists of images of different soil types. The dataset consists of 5 different types of datasets and has approximately 200 different images.

The dataset used to train the AlexNet is the same as above, however, the parameters are set as follows: -

# AlexNet 
def AlexNet(input_shape):
    
    # Giving input_shape
    X_input = Input(input_shape)
    
    # First set of Convolution Block
    X = Conv2D(96,(11,11),strides = 4,name="conv0")(X_input)
    X = BatchNormalization(axis = 3 , name = "batchnorm0")(X)
    X = Activation('relu')(X)
    
    X = MaxPooling2D((3,3),strides = 2,name = 'max0')(X)
    
    # Second set of Convolution Block
    X = Conv2D(256,(5,5),padding = 'same' , name = 'conv1')(X)
    X = BatchNormalization(axis = 3 ,name='batchnorm1')(X)
    X = Activation('relu')(X)
    
    X = MaxPooling2D((3,3),strides = 2,name = 'max1')(X)
    
    # Third set of Convolution Block
    X = Conv2D(384, (3,3) , padding = 'same' , name='conv2')(X)
    X = BatchNormalization(axis = 3, name = 'batchnorm2')(X)
    X = Activation('relu')(X)
    
    # Fourth set of Convolution Block
    X = Conv2D(384, (3,3) , padding = 'same' , name='conv3')(X)
    X = BatchNormalization(axis = 3, name = 'batchnorm3')(X)
    X = Activation('relu')(X)
    
    # Fifth set of Convolution Block
    X = Conv2D(256, (3,3) , padding = 'same' , name='conv4')(X)
    X = BatchNormalization(axis = 3, name = 'batchnorm4')(X)
    X = Activation('relu')(X)
    
    X = MaxPooling2D((3,3),strides = 2,name = 'max2')(X)
    
    X = Flatten()(X)
    
    X = Dense(4096, activation = 'relu', name = "fc0")(X)
    
    X = Dense(4096, activation = 'relu', name = 'fc1')(X) 
    
    # For specifying classes - here, 5
    X = Dense(5,activation='softmax',name = 'fc2')(X)
    
    model = Model(inputs = X_input, outputs = X, name='AlexNet')
    return model

Loss: Categorical Cross Entropy
Optimizer: SGD
Learning Rate: .01
Metrics: Accuracy
No. Of Epochs: 100

Graph comparing the accuracy and loss of AlexNet

After the completion of training, we were able to see the final training loss, training accuracy validation loss and validation accuracy are:

training loss: 0.1126
training accuracy: 0.9597
validation loss: 1.3920
validation accuracy: 0.6875

SqueezeNet

Now, we compare SqueezeNet to AlexNet, the algorithm on which it is based. For our implementation of SqueezeNet, we utilized the first version of the model for an image classification task. The architecture of SqueezeNet includes a fire module that contains compressed layers, and can be implemented as:


# Fire Module
def fire_module(x,s1,e1,e3):

    #x --> layer
    #s1 --> squeeze dimension
    #e1 --> expanding for 1x1 layer
    #e3 --> expanding for 3x3 layers

    # Squeezing Layer
    s1x = Conv2D(s1,kernel_size = 1, padding='same')(x)
    s1x = ReLU()(s1x)

    #1x1 expand Layer
    e1x = Conv2D(e1,kernel_size = 1, padding='same')(s1x)

    #3x3 expand Layer
    e3x = Conv2D(e3,kernel_size = 3, padding='same')(s1x)

    #Combining and Passing through ReLU Layer
    x = concatenate([e1x,e3x])
    x = ReLU()(x)

    return x

Here, the parameters are:

x = layer
S1 = required squeeze dimension
E1 = expanding to 1x1 layers
E3 = expanding to 3x3 layers

With the Fire module in place to be used in our actual SqueezeNet model, the model can be implemented using:

# SqueezeNet 

def SqueezeNet(input_shape, nclasses):

    input = Input(input_shape)

    # 1st Convolution
    x = Conv2D(96,kernel_size = (7,7), 
                strides = (2,2),padding='same',input_shape = input_shape)(input)

    # 1st MaxPooling
    x = MaxPool2D((3,3),strides = (2,2),padding='same')(x)

    # FireModule1
    x = fire_module(x,s1 =16,e1=64,e3 = 64)

    # FireModule2
    x = fire_module(x,s1 =16,e1=64,e3 = 64)

    # FireModule3
    x = fire_module(x,s1=32,e1=128,e3=128)

    # 2nd MaxPooling
    x = MaxPool2D((3,3),strides = (2,2),padding='same')(x)
    
    # FireModule4
    x = fire_module(x,s1 = 32,e1=128,e3=128)

    # FireModule5
    x = fire_module(x,s1 =48,e1=192,e3 =192)

    # FireModule6
    x = fire_module(x,s1 =48,e1=192,e3 =192)

    # FireModule7
    x = fire_module(x,s1 =64,e1=256,e3 =256)
        
    # 3rd MaxPooling
    x = MaxPool2D((3,3),strides = (2,2),padding='same')(x)

    # FireModule8
    x = fire_module(x,s1=64,e1=256,e3=256)

    # 2nd Convolution
    x = Dropout(0.5)(x)

    # For Classes - here 5 
    x = layers.Convolution2D(5, (1, 1), padding='valid', name='conv10')(x)

    x = layers.Activation('relu', name='relu_conv10')(x)

    x = layers.GlobalAveragePooling2D()(x)

    out = layers.Activation('softmax', name='loss')(x)

    model = models.Model(input, out, name='squeezenet')
    
    return model

SqueezeNet begins with a standalone convolution layer (conv1), which is then layered with 8 fire modules and the end of this is a final convolution layer. For the fire module within these, the number of filters is gradually increased from the beginning to the end of the network and for the layers Conv1, fire4, fire8, and conv10, a max pooling of stride 2 is applied. Finally, in the end, a dropout of ratio 50% is applied.

SqueezeNet has fewer layers than AlexNet and uses a different approach to downsampling, which reduces the spatial dimensions of the feature maps.

When we train the SqueezeNet model using the soil dataset the following parameters were used for training: -

Training Test Split: 80% training data, 20% test data.
Image size: (224, 224, 3)
Number of Classes: 5
Loss: Categorical Cross Entropy
Optimizer: Adam
Learning Rate: .0001
Metrics: Accuracy
No. Of Epochs: 100

Graph comparing the accuracy and loss of SqueezeNet

After the completion of training, we were able to see the final training loss, training accuracy validation loss and validation accuracy are:

training loss: 0.3232
training accuracy: 0.8629
validation loss: 0.3676
validation accuracy: 0.8438.

The comparison between AlexNet and SqueezeNet architecture has revealed that SqueezeNet outperforms AlexNet in terms of performance. Moreover, the difference in the size of the models is quite significant, with AlexNet taking 374.25 MB of space, whereas SqueezeNet required just 9.07 MB. This substantial difference in model size highlights the appeal of SqueezeNet as a lightweight alternative to AlexNet.

Liked the above implementation and want to try it on your own? Here is the entire GitHub repository for your reference!!

Deploying SqueezeNet: Unleashing its Power on Various Platforms

SqueezeNet is a popular deep-learning architecture that is compatible with a wide range of devices and platforms. Its small model size makes it an ideal choice for deployment on memory and computationally constrained devices such as smartphones and IoT devices. However, it can also be deployed on cloud-based platforms for larger applications, such as image classification in a data center.

The model size of SqueezeNet can be further reduced through quantization and pruning. Quantization is the process of decreasing the precision of a neural network’s weights and it can be used to reduce a model’s memory footprint while also speeding up computations by using the fixed-point representation of numbers. The trained model can be quantized and then deployed on the target device. A detailed study of this can be found in research by Jacob et al.

While Pruning is the process of removing unnecessary connections and neurons, reducing the size of the model. Hence getting rid of the unnecessary information and shrinking the model. The pruned model then can be deployed on the target device once the pruning process is completed on the trained model. The pruning and its advantages are well documented in a study by Han et al.

Effect of pruning on the neural network [11]

Evolution: A Look at SqueezeNet Over the Years

SqueezeNet was first introduced in 2016. Since then, it has undergone several changes and improvements to its network structure and the addition of new layers, to increase overall accuracy.

SqueezeNet 1.1

One such improvement is SqueezeNet 1.1, which was released in 2017. This version made several modifications to the original architecture, including increasing the number of filters in certain layers, which improved accuracy. Additionally, the computation required by SqueezeNet 1.1 was reduced significantly by a factor of 2.4x when compared to the original SqueezeNet 1.0, while maintaining similar accuracy and having slightly fewer parameters. These improvements demonstrate the ongoing development and evolution of the SqueezeNet architecture to improve its performance. A comparison between the initial SqueezeNet 1.0 and SqueezeNet 1.1 can be seen below.

SqueezeNeXt:

SqueezeNeXt is a new type of convolutional neural network that builds upon the SqueezeNet architecture. The goal of SqueezeNeXt is to improve performance while reducing the number of parameters used in the model. To achieve this, several changes have been made to the architecture. The first change is the use of a more aggressive channel reduction by incorporating a two-stage squeeze module. This significantly reduces the total number of parameters used with the 3x3 convolutions. Additionally, separable 3x3 convolutions are used to further reduce the model size [6]. Another change is the removal of the additional 1x1 branch after the squeeze module. Lastly, an element-wise addition skip connection like that of ResNet architecture is incorporated to improve performance and increase efficiency. Overall, these modifications aim to improve performance, increase efficiency, and decrease the size of the model.

Squeeze Net in Action: Real-world examples of SqueezeNet

The SqueezeNet architecture has seen a wide range of applications in recent years, particularly in the field of computer vision. Its compact and efficient design has made it a popular choice for various tasks such as image classification, object detection, and semantic segmentation. Furthermore, its versatility has also enabled its use in other fields such as healthcare and self-driving cars, where compact and efficient models are highly desirable.

Self-Driving Cars

One of the most notable real-world examples of SqueezeNet’s implementation in action is its use in self-driving cars. Self-driving cars rely heavily on real-time object detection to safely navigate through their environment. SqueezeNet has been used to improve the efficiency and accuracy of object detection in self-driving cars by quickly identifying objects such as pedestrians, cars, and traffic signs while consuming minimal computational resources [7].

Face-Mask Detection

SqueezeNet’s ability to reduce model size while maintaining or surpassing high accuracy has made it an attractive choice for face mask detection, which has become a critical issue in security and COVID-19 prevention. In a study, the authors compared SqueezeNet with the YOLO V2 network, which employs Darknet as a feature extractor [8]. The results show that SqueezeNet allows for a reduction in the number of parameters and model size, making it more suitable for storage in computer memory and transfer across a computer network. Therefore, the authors recommend enhancing the YOLO network by replacing Darknet with SqueezeNet for improved performance in face mask detection. This can also help to increase the accuracy and efficiency of face mask detection and can be used in many other applications like security, surveillance, and healthcare.

Outdoor Power Monitoring

SqueezeNet can also be used for outdoor power monitoring. In a study by C. Fang and team, they proposed an improved version of SqueezeNet with a Network that includes three modifications that make it more suitable for high-resolution image classification, including an increase in the input size of the first convolution layer and a reduction in the convolution kernel size. Additionally, the combination of global average pooling and small fully connected layers provides a balance between computational burden and classification performance. Furthermore, the use of batch normalization improves classification accuracy and converging speed while reducing overfitting [9]. The algorithm is evaluated using a multi-weather image dataset and the results show that the proposed network outperforms the original SqueezeNet in terms of classification accuracy and can suppress overfitting.

Medical Imaging

Another use of SqueezeNet is in the field of medical imaging. By enabling real-time image processing and diagnosis, SqueezeNet has been utilized to increase the effectiveness of medical imaging systems, including computed tomography (CT) and magnetic resonance imaging (MRI) scanners. Quick picture analysis and diagnosis can lead to better patient care and treatment results.

The Ups and Downs: Navigating Challenges and Limitations

SqueezeNet, even though with its advantages like any other architecture, has its own set of challenges and limitations. One of the main challenges of SqueezeNet is its limited accuracy compared to larger, more complex models. SqueezeNet is designed to be highly efficient and fast, but this comes at the cost of some accuracy.

Another limitation of SqueezeNet is its limited number of layers and filters. The architecture of SqueezeNet is relatively simple and has a limited number of layers compared to other architectures. This can limit its ability to learn complex features and representations of the data.

Additionally, SqueezeNet’s scalability is limited, which means that it may not be well-suited for large-scale applications that require more computational power. Due to its limited compatibility with other architectures, it may not be compatible with other architectures and models, which can limit its usefulness in certain applications.

Wrapping it Up: The Final Verdict

Overall, SqueezeNet’s ability to perform real-time object detection and classification while consuming minimal computational resources makes it well-suited for a wide range of real-world applications. From self-driving cars to medical imaging, SqueezeNet is proving to be a valuable tool for improving efficiency and accuracy in a variety of fields.

We are thrilled you made it this far! Hope you got an in-depth understanding of what is SqueezeNet and its advantages. We’d love to receive your suggestions and feedback. Please don’t hesitate to connect with us on LinkedIn.

References

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size. arXiv preprint arXiv:1602.07360.
Wang, Y., Chen, X., & Li, H. (2018). SqueezeNet v2: Improved architecture and quantization for mobile and embedded vision. arXiv preprint arXiv:1803.10615.
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988)
Zhang, Y., Wu, J., & Ghanem, B. (2018). ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. arXiv preprint arXiv:1807.11164.
Ma, N., Zhang, X., Zheng, H.-T., & Sun, J. (2018). ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv preprint arXiv:1807.11164.
A. Gholami et al., “SqueezeNext: Hardware-Aware Neural Network Design,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 2018, pp. 1719–171909, doi: 10.1109/CVPRW.2018.00215
Lee, Hyo Jong & Ullah, Ihsan & Wan, Weiguo & Gao, Yongbin & Fang, Zhijun. (2019). Real-Time Vehicle Make and Model Recognition with the Residual SqueezeNet Architecture. Sensors. 19. 982. 10.3390/s19050982.
O. P. Kwaghe, A. Y. Gital, A. G. Madaki, M. L. Abdulrahman, I. Z. Yakubu and I. S. Shima, “A Deep Learning Approach for Detecting Face Mask Using an Improved Yolo-V2 With Squeezenet,” 2022 IEEE 6th Conference on Information and Communication Technology (CICT), Gwalior, India, 2022, pp. 1–5, doi: 10.1109/CICT56698.2022.9997956.
C. Fang, C. Lv, F. Cai, H. Liu, J. Wang, and M. Shuai, “Weather Classification for Outdoor Power Monitoring based on Improved SqueezeNet,” 2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT), Shenyang, China, 2020, pp. 11–15, doi: 10.1109/ISCTT51595.2020.00009.
https://learnopencv.com/pytorch-for-beginners-image-classification-using-pre-trained-models/
https://learnopencv.com/pytorch-for-beginners-image-classification-using-pre-trained-models/