Deep Learning for Human Activity Recognition

Khushil Modi
8 min readApr 4, 2022

--

This is a huge research-based field where you can recognize the human activity with the help of concepts, methods and functionalities of Deep Learning. In this article we will talking one of the research done for Human Activity Recognition using Wearable Sensors and reviews, challenges, evaluation benchmark faced for the similar.

Abstract: Recognizing human activity is important for human-interaction applications in healthcare, personal fitness, and smart gadgets to improve.
Many papers discussed various techniques for representing human activity that resulted in discernible progress. This article, provides a comprehensive assessment of contemporary, high-performing approaches for recognizing human movement using wearable sensors. Applied a standardized evaluation benchmark on the state-of-the-art techniques using six publicly available data-sets: MHealth, USCHAD, UTD-MHAD, WISDM, WHARF, and OPPORTUNITY, due to a lack of standardized evaluation and to assess and ensure a fair comparison between the state-of-the-art techniques.
Furthermore, proposed an improved experimental strategy that is a mix of better handmade features and a neural network architecture that outperformed top performing strategies with the same number of features benchmark applied concerning MHealth, USCHAD, UTD-MHAD
datasets.

Introduction: Human Activity Recognition (HAR) is a difficult topic aimed at predicting human motions via computer interaction. It improves people’s lives through a variety of uses. Human activity detection may be divided into two categories: video image-based recognition and wearable sensor-based recognition. The camera is used to recognize human activities in video systems. This technique not only necessitates costly camera infrastructure deployments, but it also faces several issues in terms of backdrop, illumination, and scale, all of which would make motion detection problematic. Human activity detection based on wearable sensors such as barometers, accelerometers, gyro-meters, and other sensors translates motion into identifiable signals in the second technique. It provides an alternate method of acquiring motion without the same environmental restrictions as in the video-based method, as well as providing consumers with privacy. However, activity recognition based on this approach has several limitations in terms of gathering enough information on all posture movements in the human body, which might have a detrimental impact on performance. For more precise capture of human gestures and improved performance in industrial applications, it is recommended to employ several input sensors. The focus of this research is on wearable sensor-based human activity tracking approaches. Despite significant advancements in this approach, the absence of systematic assessment makes it difficult to judge the quality of work in this subject. The following contributions are based on the article:

  1. Extensive literature search for contemporary, high-performing human activity approaches based on sensor data.
  2. It’s difficult to make a fair comparison across modern strategies because of differing assessment procedures. As a result, we used six publicly accessible datasets with three alternative temporal windows techniques: Full-Non-Overlapping, Semi-Non-Overlapping, and Leave-One-Trial-Out to provide an uniform evaluation baseline for current efforts.
  3. Implementation, training, and re-assessment of recent literature work using the suggested standardized evaluation criteria to ensure a fair comparison of all methodologies.
  4. Proposal of an experimental hybrid strategy that combines increased feature extraction with neural networks, as well as assessment using the given evaluation benchmark criteria, resulting in competitive accuracy.

Datasets and it’s Preparation: There are two sorts of datasets in HAR: vision-based datasets and sensor-based datasets. KTH and Wieszmann are two examples of vision-based datasets. Object sensors, Body-Worn sensors, Hybrid sensors, and Ambient sensors are the four categories of sensor-based datasets. Object sensor datasets include Vankastern Benchmark and Ambient Kitchen, whereas body-worn sensor datasets include UCIHAR and WISDM, hybrid sensor dataset Opportunity, and ambient sensor dataset AAL. For the article, experiments are conducted on these datasets: MHealth, USC-HAD, UTD-MHAD, WISDM, WHARF and OPPORTUNITY.

Raw data must be processed into identified samples before being fed to the model. Data is separated into equal-sized tiny windows also known as temporal windows throughout this creation procedure. The data sets for train and test are then separated from the temporal frames. Temporal windows may experience overlapping in parts of the window, resulting in a non-biased judgement. For a fair experimental evaluation, there are three ways for generating temporal windows:

  1. Full-Non-Overlapping Window is a generating approach that ensures that no temporal windows overlap.
  2. With a 50% overlap between each temporal frame, Semi-Non-Overlapping Window is an alternative to Full Non-Overlapping Window for sample production. Unlike the Full-Non-Overlapping technique, this method generates a large number of samples. However, because of the overlapping content in training and testing, the results will be skewed.
  3. Leave-One-Trial-Out is a revolutionary sample generation strategy. The trial reflects a single subject’s raw activity signal. It ensures unbiased evaluation and proper sample size creation. Trials with the same raw signals are not replicated in training and testing data sets in this generation approach.

Literature Review:

i. Hand-Crafted Method: Instead of employing deep learning, a traditional machine learning approach is used. The WISDM dataset was used by Kwapisz, who retrieved characteristics per sensor reading. To find the best classifier, the scientists looked at three options: Multi-Layer Perception, J48, and logistic regression. MLP outperformed the other classifiers, with a score of 91.7 percent.

ii. CNN Based Methods: Ha and Choi introduced 2 CNN models: CNN-pf and CNN-pff. The mean accuracy for CNN-pf was 91.33%, and CNN-pff was 91.94%. Burns and Whyne developed two alternative models: the FCN (Fully Convolutional Network) and the PTN (Partially Convolutional Network) (Personalized Feature Classifier). PTN had the greatest outcomes on MHealth 99.9% ± 0.003, WISDM 91.3% ± 0.053, and SPAR 99.0% ± 0.017.

iii. LSTM-CNN Method: The model achieved 95.80% on UCI-HAR, 95.75% on WISDM, and 92.63% on OPPORTUNITY, Gesture recognition.

iv. CNN-LSTM Method: The CNN-LSTM-ELM model achieved 91.8% accuracy for gesture recognition, while CNN-LSTM-Fully connected model achieved 89.7% accuracy for gesture recognition.

Proposed Hybrid Approach: The design proposed consists of three thick layers, followed by a softmax of the number of activity categories in the dataset. Our design is made up of a completely linked layer with a size of 128, another fully connected layer with a size of 64, and 32 totally connected layers. As an optimizer, we utilized Adam with a batch size of 16 and Leaky-Relu as an activation. Each sample window in the dataset is used to compute the 12 Features. Then, using the collected features as input, our lightweight neural network trains. The neural network learns the hidden characteristics and improves its weights to get a greater recognition accuracy than other traditional machine learning algorithms. In comparison to existing strategies, our suggested hybrid strategy is stable and lightweight.

Results:

Lyu had the highest accuracy compared to our model, with an 11 percent difference in accuracy. On the Opportunity dataset, Table 6 shows mean accuracy findings utilizing the Hold-Out Validation approach (Semi-Non-Overlapping). The impact of hyper-tuning settings is investigated. Proposed Approach V1, which is trained using 250 epochs, came in second with 86.24 percent accuracy when compared to other techniques. The effect of training our model with different epochs: Proposed Approach V1 and V2 via Hold-out validation is investigated in Tables 7, 8, 9. For the three-generation approaches, the findings are provided using the MHealth, USCHAD, UTD-1, UTD-2, WHARF, and WISDM datasets, respectively. In comparison to the K-Folds validation results shown in Tables 2, 3, and 4, it can be stated that the model’s overall performance has improved. We believe this is due to the model’s ability to optimize its parameters throughout the full training dataset when utilizing the hold-out validation approach. In our tests, we looked at how well our suggested method performed not just in terms of recognition accuracy, but also in terms of time complexity. We looked at how long it took our suggested strategy to extract the handmade characteristics mentioned in Sect. 4.1 per sample window in comparison to the standardized benchmark above. It was discovered that the time it takes to extract the features per sample varies between 0.008(s) and 0.03(s) (s). Because of the MHealth’s large temporal frame, we believe it takes longer to extract features.

Conclusion: An detailed literature overview of contemporary, high performing techniques in human activity identification using wearable sensors is covered in this paper. Recent approaches are applied and reevaluated utilizing our standardized benchmark using three data sample generation strategies to follow the same experimental setting for a fair assessment due to the lack of non-standardized evaluation. Six open-source datasets were used in our investigations. For human activity recognition, a mixed experimental technique is presented. Our feature engineering is used to extract features, which is then followed by a 3-layered neural network design. For the MHealth, USCHAD, UTD-1, and UTD-2 datasets, our experimental findings show that our suggested hybrid method has a significant generalization ability and high recognition accuracy, outperforming all state-of-the-art techniques. Future research should look at the influence of datasets with low sample rates and large activity numbers, such as WHARF and WISDM.
More features should be added to our feature extraction technique, and our neural network approach should be hyper-tuned for better identification of human behavior.

References: https://link-springer-com.libaccess.sjlibrary.org/content/pdf/10.1007%2F978-981-16-0575-8.pdf

--

--