A Reduced Order Approach for the problem of Object Recognition

Published in

SISSA mathLab

6 min readFeb 5, 2023

In the last decades, Artificial Intelligence (AI) has attracted growing interest due to its capability of letting machines reproduce the abilities of the human mind. Thanks to this powerful ability it has found application in many contexts, such as healthcare, robotics, autonomous vehicles, facial recognition, voice assistants, and gaming.

A challenging and fascinating problem is that of vision, or rather the insertion of vision systems in commonly used objects, from cars to smartphones and professional machines. The key ingredient to implement this feature is represented by Artificial Neural Networks (ANNs), powerful models able to solve difficult tasks such as the aforementioned ones. The complexity of these problems leads to deep ANNs, characterised by a huge number of parameters, that require long training times, huge computational power and memory storage. A practical application in embedded vision devices can be thus not feasible, making it necessary to create a well-performing light-weight version of the original model.

Source: https://www.viatech.com/en/2018/05/history-of-artificial-intelligence/

Object Recognition: the model

From a more technical point of view, the aforementioned topics are part of what is called Object Recognition: trying to solve the problem of identifying, classifying and locating what is present in an image or a video. As already introduced, a solution for this kind of problem is represented by ANNs, and in particular by Convolutional Neural Networks (CNNs), a special type of ANNs designed for this aim.

A schematic representation of a Convolutional Neural Network (CNN).

Its structure can be mainly subdivided into two parts:

a feature learning part, responsible to detect several features of objects, such as shape, color and vertical lines, thanks to the presence of filters.
a classification part, which uses the features extracted from the previous block to differentiate between items and obtain the final classification.

Practical application in embedded systems

In order to place a CNN inside an embedded device, we need to let the model learn the task to solve to gain a well-performing network. The fundamental steps needed to achieve this are summarized in the Figure below for sake of clarity.

Key steps for the development of an AI model.

The first key ingredient is represented by the dataset, that represents the data from which the model learns. There can be two different choices, based also on the goal of the project: use a benchmark or a custom dataset. The latter is the common case in the industrial field, where there is the need to have data that are not general, but specific to the problem under consideration. In the research framework, instead, the interest is slightly different, more connected to having a comparison of the developed network with the state of the art, done using benchmark datasets.

The second fundamental choice is represented by the model. In the beginning, it is typical to use an already implemented CNN model, created to solve a particular task, but this can represent also the starting point for developing a new architecture. The accurate selection of the model and the preparation of the data, to have the correct structure to use, are thus the core of the second step.

The third phase is a crucial point in the development process because the accuracy of the final model depends on it. In fact, here is where learning takes place through the training of the CNN. We provide some input-output pairs and we want that the model learns the existing association rule between them. Once a trained model has been created, we should use the acquired knowledge to understand how accurate our CNN is in making predictions on new samples. We are thus testing the generalization capability of our algorithm, namely if it can solve the same problem having different inputs, never seen before. At the end of this phase, we should have a well-performing CNN for the required task, otherwise an additional fine-tuning of the CNN parameters is needed.

The last step coincides with the practical application of our algorithm to solve a real-world problem in the industrial field. In particular, in that context the network should be placed inside small devices or embedded systems, that do not have the same computational power as the server in which we have trained our model.

A reduced model for embedded systems

As already introduced, the final application in an embedded system can be not so feasible, due to constraints characterizing the device itself. For example, there can be strict memory constraints connected with restricted resources and limited hardware. This has led researchers to develop techniques to overcome that difficulty creating light-weight models with fewer parameters, but with similar performances as the original network.

Schematic representation of the reduction method proposed in [3, 4].

A possible solution can be represented by the development of a reduction technique based on methods commonly employed in the context of Model Order Reduction, such as Proper Orthogonal decomposition [1] and Active Subspaces [2]. Starting from an original neural network we can create a compressed version of it, by analysing with the aforementioned methods the important information that is stored in its structure [3, 4]. In this way, only some layers of the full model are kept, while the remaining ones are discarded and replaced by a reduction layer, implementing the parameter compression, and an input-output layer to obtain the final expected output. The resulting ANN is characterised by fewer layers and thus parameters, providing a reduction in the memory space required and in the training time but with a comparable level of accuracy. Thanks to these improvements, the final application of this kind of models in an embedded system can be more feasible, leading also to faster (almost real-time) reliable predictions.

Test cases: benchmark datasets for the problem of Object Recognition

We have created a reduced version of CNNs for both problems of image recognition and object detection, using as models respectively VGG-16 and SSD300. We have thus tested our methodology against some datasets commonly employed in these fields, such as CIFAR-10 and ImageNet.

In both cases the compressed neural networks are characterised by a reduction of 90% of the space required for their storage [3, 4, 5], solving the memory constraints often characterising the embedded devices. On the other hand, the accuracy of the compressed models is comparable to or slightly decreased than that of the original ones, requiring also less training and inference time to gain performing networks. The pictures below present an example of a comparison between the output images obtained with the full network and our compressed version of it. It can be observed that the reduced model is able to correctly classify and detect the position of cats and dogs, even if the boxes surrounding the objects are not as perfect as in the full case.

References:

[1] J. S. Hesthaven, G. Rozza, and B. Stamm. Certified Reduced Basis Methods for Parametrised Partial Differential Equations (2015), SpringerBriefs in Mathematics. Springer International Publishing.

[2] Paul G. Constantine. Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies (2015), Society for Industrial and Applied Mathematics.

[3] L. Meneghetti, N. Demo, G. Rozza, A Dimensionality Reduction Approach for Convolutional Neural Networks (2021), arXiv:2110.09163.

[4] L. Meneghetti, N. Demo and G. Rozza, A Proper Orthogonal Decomposition Approach for Parameters Reduction of Single Shot Detector Networks (2022), IEEE International Conference on Image Processing (ICIP), pp. 2206–2210, doi: 10.1109/ICIP46.

[5] S. Zanin, L. Meneghetti, N. Demo and G. Rozza, Deep neural network compression via tensor decomposition, in preparation, 2023.