Mobile Crane Usage Monitoring with CV

5 min readJan 23, 2023

Mobile crane is an important equipment used in construction. Compliance with proper operating procedures is important to prevent accident. However, there is an element of human error. Here I will give an example of how computer vision (CV) can help solve this issue.

For simplicity, I will break the mobile crane into 3 parts: the body, the outrigger, and the boom.

3 main part of a mobile crane: the boom , the body (main vehicle), the outrigger (the extended 4 legs that provide crane stability)

The crane can be in 3 possible states: mobile, extended outrigger, and open boom.

Left (mobile state), middle (extended outrigger), right (open boom)

When a crane is in a mobile state, its boom must be closed, and the outrigger must be retracted. Before the crane can lift its boom and start carrying heavy objects, it must extend its outrigger to increase the crane stability. Lastly, the open boom state is when the boom is lifted up. The correct sequence is to park the crane, extend the outrigger evenly, before lifting the boom. The same order follows in reverse before the crane can move.

Correct sequence: park, extend outrigger, then open boom

The problem is when crane operators does not follow the proper sequence. For example, they do not use the outrigger before lifting the boom, they would drive the crane while the boom is still opened, or the outrigger still extended.

Driving the crane around while it’s outriggers are extended. BAD!

Driving the crane while the boom is still extended. BAD!

This can be very dangerous! One consequence could be collision but a more dangerous scenario is if the crane tip over.

Crane tipped over because it didn’t extend all its outrigger.

This problem can be solved with the use of CCTV camera to monitor crane usage. With advances in Deep Learning I had developed an AI pipeline that 1. Detect the location of the crane within the frame, 2. track the crane’s movement, then 3. Determine the state of the crane.

Model Pipeline

Below is a diagram of the model pipeline.

The pipeline started with YOLOv5 an object detection model, which detected 2 classes: person and crane (person detection was used for another task). Since, I want to track the movement of the crane, I applied DeepSORT. Next, the image of the crane is cropped out and used as input in an image classifier. The classifier outputs 3 possible states: mobile, extended outrigger, open boom.

Data Collection

Video footages of the working site were collected for a period of 4 months, after which were processed to find frames with crane. A total of 34 video clip were extracted out, then individual frames were sampled out. A total of 2523 frames we labeled for training dataset and 2071 frames were used for validation.

Object Detection & Tracking Results

I used the pre-trained weights and froze the model’s backbone to finetune the model. I experimented with all model size and found that YOLOv5m yielded the best result.

The mAP@0.5 for crane was 0.953. The crane recall was 0.95 at 0.5 confidence, the crane precision was 0.96 at 0.5 confidence, and F1 score at 0.95.

Samples of prediction frames from test/val set.

For the tracking part, the original DeepSORT was used with minor modifications. The Kalman filter’s state (velocity x,y, r, and h) of the crane was monitored for movement. The accuracy of crane tracking was 100% where it can correct track and ID the crane in all frames of the test set. This is because there is often only 1 crane within the scene and it moves pretty slowly relative to the frame rate.

Crane State Classifier

Images of the crane were sourced and grouped into 3 classes {mobile, extended-outrigger, and open-boom}. The total number of datasets is summarized below.

Distribution of classes to train classifier. Little in number and unbalanced.

There was a problem with class imbalance, where the majority of the dataset was made up of the open-boom class. This was because the crane would enter the scene if it was being used, where it would be in the open-boom state most of the time. Examples of mobile and extended-outrigger state were available only during the short period of time where the crane entered the frame to go to the work spot. Class imbalance would have a negative effect on supervised learning. In addition, the number of training dataset is quite small and is at risk of overfitting.

So I went with contrastive learning for the crane-state classifier. Please checkout my other post about using triplet-loss to train an encoder to convert image into a vector representation then using SVM to classify the state. This method yielded a model that is more accurate than classic multi-class image classifier model.

Crane Usage Monitoring

To monitor the crane usage, two things were observed: crane state sequencing and crane state movement. The correct crane sequence is summarized below.

For the sequence check, the crane state should always be mobile to extend outrigger to open boom. The prediction from the classifier was stored in a fixed size FIFO buffer. The stored states within the buffer were used collectively to determine the actual state of the crane. This makes the overall prediction more robust to noises or mis-prediction of individual frame.

FIFO buffer to store predicted crane state for each frame.

Once the crane state was known, I checked if the crane was stationary by inspecting the Kalman velocity matrix of the x, y, width, and height of bounding box in the tracked crane. If the crane was in either extended outrigger or open boom, then the crane should be stationary.

Conclusion

The result of this system showed that it can monitor crane activities 24 hours without the need of human intervention. This will save both money and time by preventing accidents that may cause damage to equipment and loss in production time. Here I have presented an example of how CV can assist human in performing repetitive tasks such as monitoring CCTV feeds.

Mobile Crane Usage Monitoring with CV

Written by Natthasit Wongsirikul