Using MXNet to Detect Fire from Live Video

Published in

Apache MXNet

5 min readJan 14, 2020

Since 2018, Alchera Inc. has been working on the AIIR™ fire detection system for KEPCO (Korea Power Corporation) that automatically detects indoor and outdoor fires through standard CCTV cameras. Our solution is being expanded to California, where wildfires are a major cause of concern for citizens and businesses alike. Our visual anomaly detection product, called AIIR, is a cloud solution utilizing AWS, and as such, we needed to optimize it for speed and size to ensure high performance and reduce costs.

Why are we using MXNet?

While an embedded solution is possible, the inputs from the camera are processed by a computer with a GPU to show the results, rather than directly on the camera. We use one-stage detector models which can be found at: https://github.com/DeepFocuser/Mxnet-Detector. We train our model using the Python bindings of MXNet. When inference speed is important, we export our model using ONNX and use TensorRT (C++) for inference. Otherwise, we export the model to the standard symbol (*.json) and parameters (*.params) formats and load them back into MXNet using the C++ bindings.

With regards to the learning speed and performance of MXNet compared to Pytorch, our engineering team found that MXNet was typically faster.

With regards to memory efficiency, in our experience, PyTorch consumes more memory than Gluon (Gluon HybridBlock or Symbol), though it may be difficult to compare them in a similar environment. Let’s take Yolov3 as an example. The consumption of MXNet versus PyTorch memory is as follows when implementing Darknet53.

(MXNet YoloV3, custom implementation)

(PyTorch — https://github.com/eriklindernoren/PyTorch-YOLOv3, Memory Usage = < 11GB environment of GPU memory)

Since MXNet was the faster and lighter framework, we decided to develop our fire and smoke detection model using MXNet as detailed below.

Alchera’s Fire and Smoke Detection Algorithm

Our Data Set

We didn’t use any public data sets for this project. All data was collected by our in-house data team. All images were labeled after shooting these scenarios directly or by virtually composing images. You can see two different situations below.

Directly Captures (on left) and Virtual Composition (on right)

Model

Our development environment was as follows:

Ubuntu 16.04 / CUDA10.1 / RTX 2080TI
MXNet 1.6.0b20191122 of Python 3.7.0

We implemented a ResNet-based CenterNet and this was implemented by referring to the CenterNet code published in the GluonCV Model Zoo. In this experiment, we used ResNet18-CenterNet and the implemented code is publicly available this CenterNet Github repository. The fire and smoke detector code is a modified version of the CenterNet `core/utils/dataprocessing/dataset.py` file to match the company’s data format, and we modified the hyperparameters in `config/detector.yaml` before we started training.

Why use CenterNet?

CenterNet outputs three results: a heatmap, box center offsets, and box sizes. By combining these results we can create bounding boxes. We became interested in this methodology and thought there was a lot of room for expansion.

In our own performance comparisons, we had similar performance to networks like SSD, YOLO, and Retina, but faster. As the CenterNet paper points out:

First, our CenterNet assigns the “anchor” based solely on location, not box overlap [18]. We have no manual thresholds [18] for foreground and background classification. Second, we only have one positive “anchor” per object, and hence do not need Non-Maximum Suppression (NMS) [2]. We simply extract local peaks in the keypoint heatmap [4, 39]

As such, since CenterNet assigns only one anchor to one object (one location), it is not necessary to perform the NMS used in box overlap methods such as SSD, YOLO, and Retina. This speeds up the inference.

Results & Evaluation

The results of evaluating the network according to the VOC2007 mAP (mean average precision) measurement method are shown below. This score was evaluated in our test data set.

VOC2007 mAP measurement results evaluated by our test data set

Inference using our test data set. Left : box detection. Right : heatmap prediction of center points

Fire detection algorithm running on test footage taken from https://www.youtube.com/watch?v=OmycQsToUew&t=373s

Inference time in MXNet C++ (We ran these tests because we are building a C++ inference server to provide to our customers). OS: Ubuntu 16.04 / CPU: i7–8700 / GPU: RTX 2080TI

Problems and Future Directions

There is still a lot of work to be done in order to improve the fire and smoke detector within real environments. There are two common problems. First, fire and smoke have no specific form problem. Secondly, there is always the issue of false positives.

The simplest solution to the former issue is to add various types of data, so for now we are simply continuing to add new data and will continue to pursue adding virtual or synthesized data. And the second issue will be addressed with the following experiment. We could add a background image data set to the trained dataset, and design a loss that matches it. Then, we could use rule-based post-processing on the detector output. This should help reduce false positives to ensure a useful system.

We are excited to continue developing and expanding this solution for detecting fire with any simple camera, and MXNet has allowed us to speed up the process, which when dealing with fire, is usually the primary factor. Please let us know your thoughts on the implementation and any issues you can find with the code shared above.

We look forward to updating on our progress in another post in the near future.