Human Activity Recognition using CNN & LSTM

Chamani Shiranthika
Feb 14 · 10 min read

Human Activity Recognition using CNN & LSTM

If you are a working mother or father, you may be aware of what your small kid will be doing at home or at day care centre!. If you are an adult, you may be thinking what your old mother or father at home will be doing or is he or she safe!!!

Here comes Human Activity Recognition..!!

The recent advancements of Artificial Intelligence (AI) have make the human being more inclined towards novel research aims in recognizing the objects, learning the environment, time series analysis and predicting the forthcoming sequences. Nowadays there is a growing interest of AI researchers towards Recurrent Neural Networks (RNN) which compromises massive applications in the fields of speech recognition, language modeling, video processing and also time series analysis. Human Activity Recognition (HAR) is one of the challenging problems which seeks answers in this wonderful AI field. It can be mainly used for eldercare and childcare as an assistive technology combined with technologies like Internet of Things (IoT). This article presents an approach to predict human activities developed using CNN and Long Short-Term Memory (LSTM) on the basis of the UCI HAR dataset.

Human Activity Recognition is the process of identifying, analyzing and interpreting what kind of actions and goals one or more agents or persons will be performing. The decisions will be taken based on their previous actions performed with their behavior.

One, Two, Three, Action !!!!

Dataset

The dataset used in the system is the standard Human Activity Recognition (HAR) dataset or the ‘Activity Recognition using smart phones dataset’ which was made available in 2012. This is available and can be downloaded from the UCI machine learning repository. Size of the dataset is 10,299. The data has been collected using 30 persons aged between 19 and 48 performing six standard activities namely walking, walking upstairs, walking downstairs, sitting, standing and laying. Each person has performed this sequence of activities twice once with the device on their left-hand side and once with the device on their right-hand side. Sensors used were count-1 sensor and the position waist sensor. Data has been collected with Samsung Galaxy S II mobile phone. Here they have captured accelerometer and gyroscope 3-axial raw signals with tAcc-XYZ and tGyro-XYZ with 50Hz frequency. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.


CNN LSTM Architecture

The CNN LSTM architecture involves using Convolutional Neural Network (CNN) layers for feature extraction on input data combined with LSTMs to support sequence prediction. This model is also referred to as Long-term Recurrent Convolutional Network or LRCN model.

Methodology

In this approach I present a CNN and LSTM hybrid approach as presented in the following figure.

Fig. 1. System Design

Here a CNN LSTM architecture was used which the CNN layers are used for feature extraction on input data and LSTM to support sequence prediction. The basic steps of constructing the CNN LSTM neural network is as follows.

1. Load Data

2. Fit and Evaluate Model

1. Load Data

First step is the loading the raw dataset into memory. There are three main signals in the raw data as, total acceleration, body acceleration, and body gyroscope and each has 3 axes of data as x, y, z. Thus, there are a total of nine variables for each time step. Further each series of data has been partitioned into overlapping windows of 2.56 seconds of data or 128-time steps. Therefore, each row of data has 128*9 or 1152 elements. Following is the simple data collected from the HAR dataset for the walking activity.

Fig. 2. Variations of walking activity

The output data is defined as an integer for the class number. First these output labels were one hot encoded so that the data will be suitable for fitting a neural network multi class classification model.

2. Fit and Evaluate Model

Here the train and test datasets will be loaded, fits the model on the training dataset, evaluates it on the test dataset and returns an estimate of the model’s performance. The model is defined as a Keras Sequential model. First the entire CNN model is wrapped in a ‘TimeDistributed layer’. The extracted features are then flattened and provided to the LSTM model. Thus, a single LSTM hidden layer is added next. This is followed by a dropout layer intended to reduce overfitting of the model to the training data. Finally, a dense fully connected layer is used to interpret the features extracted by the LSTM hidden layer, before a final output layer is used to make predictions. In the final layer Softmax activation function is used since we need 6 outcomes as the result.

The efficient Adam version of stochastic gradient descent is used to optimize the network, and the categorical cross entropy loss function is used to calculate the loss in the training process.

The model is fit for 100 epochs with a batch size of 64 samples. Once the model is fit, it is evaluated on the test dataset and the accuracy of the fit model on the test dataset is returned.

Fig. 3. Organization of the layers in the network

Implementation

The input data is in CSV format where columns are separated by whitespace. Each of these files can be loaded as a NumPy array. The load_file() function below loads a dataset given the fill path to the file and returns the loaded data as a NumPy array.

We can then load all data for a given group (train or test) into a single three-dimensional NumPy array.

The load_group() function below implements this behavior. The dstack() NumPy function allows us to stack each of the loaded 3D arrays into a single 3D array.

The load_dataset_group() function below loads all input signal data and the output data for a single group using the consistent naming conventions between the directories.

The output data is defined as an integer for the class number. We must one hot encode these class integers so that the data is suitable for fitting a neural network multi-class classification model. We can do this by calling the to_categorical() Keras function.

The load_dataset() function below implements this behavior and returns the train and test X and y elements ready for fitting and evaluating the defined models.

Fit and evaluate model

Now that we have the data loaded into memory ready for modeling, we can define, fit, and evaluate an LSTM model.

We can define a function named evaluate_model() that takes the train and test dataset, fits a model on the training dataset, evaluates it on the test dataset, and returns an estimate of the model’s performance.

The model is defined as a Sequential Keras model, for simplicity.

It is common to use two consecutive CNN layers followed by dropout and a max pooling layer, and that is the simple structure used in the CNN LSTM model here.

We will define the model as having a single LSTM hidden layer. This is followed by a dropout layer intended to reduce overfitting of the model to the training data. Finally, a dense fully connected layer is used to interpret the features extracted by the LSTM hidden layer, before a final output layer is used to make predictions.

The efficient Adam version of stochastic gradient descent will be used to optimize the network, and the categorical cross entropy loss function will be used given that we are learning a multi-class classification problem.

The model is fit for a fixed number of epochs, in this case 15, and a batch size of 64 samples will be used, where 64 windows of data will be exposed to the model before the weights of the model are updated.

Fig. 4. Model Summary

Once the model is fit, it is evaluated on the test dataset and the accuracy of the fit model on the test dataset is returned.


Summarize Results

We cannot judge the skill of the model from a single evaluation.

The reason for this is that neural networks are stochastic, meaning that a different specific model will result when training the same model configuration on the same data.

Therefore We will repeat the evaluation of the model multiple times, then summarize the performance of the model across each of those runs. For example, we can call evaluate_model() a total of 10 times. This will result in a population of model evaluation scores that must be summarized.


Experimental results

The constructed model was run for 5 repeated times and check the accuracy in each of the repeating cases. The reason for using 5 repeated model training is we cannot judge the skill of a model from a single evaluation. Neural networks are stochastic, that means a different specific model will result when training the same model configuration on the same data. After each of the model running, the performance of the model across each of the run was evaluated. Below are the plotted prediction data with corresponding accuracy measures after each of the model execution. The blue line gives the actual data and the red line denotes the predicted data.

Fig. 5. Actual data(b) and predicted value(r) after 1st repetition

Accuracy: 90.15948422124194

Fig. 6. Actual data(b) and predicted value(r) after 2nd repetition

Accuracy: 93.17950458092976

Fig. 7. Actual data(b) and predicted value(r) after 3rd repetition

Accuracy: 93.34916864608076

Fig. 8. Actual data(b) and predicted value(r) after 4th repetition

Accuracy: 90.80420766881574

Fig. 9. Actual data(b) and predicted value(r) after 5th repetition

Accuracy: 91.95792331184255


After overall accuracy was calculated by averaging the above 5 accuracy scores, and got a value of 91.89% as the accuracy. Finally, I plotted the training accuracy with the validation accuracy in each of the 5 repetitions and evaluated how the two graphs changing with the number of epochs. The below graph shows the training and validation accuracy behavior in the 5th iteration. It is noticeable that both the accuracy graphs are rising over time while the training accuracy gradually rising and the validation accuracy fluctuates with the time when rising.

Fig. 10. Training and validation accuracy

Similarly, training loss and the validation loss was also plotted in a graph with the number of epochs. It is seen that the training loss gradually decreasing with the epochs while the validation loss is going up and down during the process.

Fig. 11. Training and validation loss

Finally, Accuracy was compared with using different activation functions and optimizers. Model was running again and again and overall accuracy was computed for all cases.

Fig. 12. Comparison with different activation functions and optimizers

Here using the Relu Activation function with Rmsprop Optimizer gives the highest accuracy of 92.6546 %.

Also, I drew a confusion matrix which helps to graphically view the true label and the predicted label more comparatively. As in the figure it is noticeable that the diagonal regions have got the bright colors with in the 13–16 color ranges in the color bar. Thus, predicted label has been equalized to the true label in most of the experimental cases.

Fig. 12. Confusion matrix of results [1]

Source for confusion matrix (Fig. 12).: @misc{chevalier2016lstms,
title={LSTMs for human activity recognition},
author={Chevalier, Guillaume},
year={2016}
}

Last but not least, I suggest that the use of HAR technology integrated with IoT, in order to generate smart solutions for the problems in the childcare industry. This would be a future target that I will also focus on. Let’s dig more!!!

Hey..!! What are you doing……!!

Please go through my Github repository at https://github.com/ChamaniS/ANN-exercises/blob/master/HAR_using_CNN__LSTM_Final_Model.ipynb , to view the full code of the project.

References:

[1] Guillaume Chevalier, LSTMs for Human Activity Recognition, 2016, https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition

Data Driven Investor

from confusion to clarity not insanity

Chamani Shiranthika

Written by

Masters Student @ NTPU Taiwan, IT Graduate @ University of moratuwa, Former Software Engineer @WSO2.Telco

Data Driven Investor

from confusion to clarity not insanity

More From Medium

More from Data Driven Investor

More from Data Driven Investor

More from Data Driven Investor

More from Data Driven Investor

PowerPoint: The Most Underrated Diagram Tool

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade