Fall Detection with PyTorch

Published in

Diving in Deep

6 min readApr 30, 2020

Introduction

Today, fall detection is a major concern among healthcare sector since it leads to both physical and metal damage to people, especially the elderly. As our society is turning into aging society, it is becoming even more concerned. Today, to prevent it, there are many technologies being developed (you can find more details in this survey).

What we are going to do here is just one alternative which we can categorize as a vision-based fall detection system. The objective is to build model which is able to detect falls from just a simple plain video. The most of this project is based on and credited to this paper, Real-time Vision-based Fall Detection with Motion History Images and Convolutional Neural Networks, 2018, T. HARALDSSON. [1]

All codes here.

Motion Tracking

To capture the movement which is considered to be temporal, one technique in computer vision we can use is called Motion History Image (MHI) . The concept is to incorporate the previous movement info into the current image by decaying the previous position of the person along a specific period or a number of frames.

Motion History Image (MHI)

The algorithm of MHI which we’re going to use for pre-processing is shown below. For more intuition, more explanation in this vdo. I hope it would be clearer when you look further into the script when we implement it.

Real-time Vision-based Fall Detection with Motion History Images and Convolutional Neural Networks

Implementation

Data Source

Fall Detection Dataset (FDD) [3] consists of videos from a single camera in separate locations-Home, Coffee room, Office, and Lecture room. The frame rate is 25 frames/s and the resolution is 320x240 pixels.

For this experiment, Home (Home_01.zip and Home_02.zip) and Coffee room (Coffee_room_01.zip and Coffee_room_02.zip) were selected due to their completion with annotations.

data
├── Coffee_room
│   ├── Annotation_files
│   │   ├── video (1).txt
│   │   ├── ...
│   └── Videos
│       ├── video (1).avi
│       ├── ...
└── Home
    ├── Annotation_files
    │   ├── video (1).txt
    │   ├── ...
    └── Videos
        ├── video (1).avi
        ├── ...

For each video, the annotation is given video (i).txt in Annotation_files. Each annotation file contains each file contains:

the frame number of the beginning of the fall
the frame number of the end of the fall
the height, the width and the coordinates of the center of the bounding box for each frame

48
80
1,1,0,0,0,0
2,1,0,0,0,0
3,1,0,0,0,0
4,1,0,0,0,0
5,1,292,152,311,240
6,1,292,152,311,240
...

Pre-processing

During exploring data, I found missing start-stop fall frame numbers in couple annotation text files. So before you proceed, please correct it by adding start-stop fall frame numbers at the first two line of these files as following:

Coffee_room/Annotation_files/
'video (26).txt'
197
227
'video (50).txt'
1816
1852
'video (52).txt'
87
113

Prepare Train, Validation, and Test Datasets

Datasets for each location were allocated to train, validation, and test datasets with the ratio of 0.7, 0.2, and 0.1 respectively. Each clip is allocated to either datasets only.

MHI Processor

The MHI algorithm is processed by MHIProcessor Class to generate MHIs for training. The parameter dim=128 is the input size for the MobileNetV2 model. Each MHI record frame every two frame, interval=2 , for 40 frames, duration=40 with the decay rate 1/duration .

class MHIProcessor:def __init__(self, dim=128, threshold=0.1, interval=2, duration=40):
# initialize MHI params
self.index = 0
self.dim = dim
self.threshold = threshold
self.interval = interval
self.duration = duration
self.decay = 1 / self.durationdef process(self, frame, save_batch=True):
    ...
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    ...
        frame = cv2.resize(frame,(self.dim, self.dim),
                           interpolation=cv2.INTER_AREA)
        diff = cv2.absdiff(self.prev_frame, frame)
        binary = (diff >= (self.threshold * 255)).astype(np.uint8)
        mhi = binary + (binary == 0) * np.maximum(self.mhi_zeros,
                                       (self.prev_mhi - self.decay))
    ...

The sample of MHI genereated by MHIProcessor

All pre-processing can be done by running preprocess.py .

python preprocess.py --source data --dest dataset

The datasets for train, validation, and test will be created

dataset
├── train
│   ├── fall
│   │   ├── Coffee_room_1_80.png
│   │   ├── ...
│   └── not_fall
│       ├── Coffee_room_1_82.png
│       ├── ...
├── val
│   ├── fall
│   │   ├── Coffee_room_2_192.png
│   │   ├── ...
│   └── not_fall
│       ├── Coffee_room_2_80.png
│       ├── ...
└── test
    ├── fall
    │   ├── Coffee_room_11_376.png
    │   ├── ...
    └── not_fall
        ├── Coffee_room_11_80.png
        ├── ...

Model

MobileNetV2 was selected as the pre-trained model. The model is fast for inference with acceptable performance.

class FDNet(nn.Module):
    def __init__(self, out_features=2):
        super(FDNet, self).__init__()
        mnet = models.mobilenet_v2(pretrained=True)
        for name, param in mnet.named_parameters():
            if("bn" not in name):
                param.requires_grad_(False)
            
        in_features = mnet.classifier[1].in_features
        mnet.classifier = nn.Sequential(
                                nn.Dropout(p=0.2, inplace=False),
                                nn.Linear(in_features,500),
                                nn.ReLU(),
                                nn.Dropout(),
                                nn.Linear(500, out_features))
        self.mnet = mnet
        
    def forward(self, images):
        features = self.mnet(images)
        
        return features

Training

Before running training.ipynb on Google CoLab, do not forget to upload dataset to the same location.

Evaluation
The evaluation on test dataset got accuracy 95% with confusion matrix below:

from sklearn.metrics import confusion_matrix
cf = confusion_matrix(targets_np, outputs_np)array([[  61,   59],        
       [  21, 1317]])tn, fp, fn, tp = (61, 59, 21, 1317)

Visualize the Result

Let’s see the result on a sample testset.

Conclusion

We just finished building a very preliminary model to detect fall using a plain video recording as an input. The result looks promising but still need more evaluation. There are plenty rooms for improvement. You may try using different inputs, different models, tuning hyper-parameters and much much more. It depends on your resources and applications. Hope this article will help you get start your own.

In next article, I will show you how to deploy our PyTorch model in Amazon Sagemaker. Hope it be useful as well. For now, if you like this article or find it helpful, please give it a clap. Any comments or queries are very welcomed.

PS. Unfortunately, I found that the link of Fall Detection Dataset seems broken. I’m not sure when it would be available. However, with some adaptation, you might consider using another dataset such as UR Fall Detection Dataset for training , using created MHI datasets for training, or using the already trained PyTorch model for inference.

References: