Enhancing Security Measures through Smoke & Fire Detection

In retrospect, why go back in time to try to stop a fire when you could have just detected it?

Published in

Zaka

11 min readMar 17, 2021

What **could** be…but does not **have** to be. | QQ

You are sitting down in your office late at night, tired, wanting to leave work. Your unsuspecting colleague decides to punch out before you do, tempting to you to follow suit, except you decide to brave it out and finish the remaining work for the night.

Everything is dandy until you notice something odd a couple of hours later into the night. You look through your office walls, made of glass, only to see a plume of smoke emanating from within a closed office from what could have started as a harmless electrical malfunction. Your mind immediately gravitates toward the idea that the office from which the smoke was emanating from was that of your colleague who had just punched out a couple of hours prior. There is not much you can do now that the office is engulfed by flames, but to get yourself out of there and call the fire brigade as fast possible to halt the spreading of the fire. Proprietary assets, information (and sometimes even lives!) were lost.

We, at Zaka, have taken our stance against the starting of such events (i.e. fires) without informing the right people at the right time which consequently would result in a reduction of lost property and even lives. Having said that, we abolished this doubt by creating a smoke & fire detector!

Outline:

Where do we start?
Defining the problem
Gathering the data
Building the model
Results (i.e. Inferencing)
Future Improvements

1. Where do we start?

So how do we begin approaching this problem?

Well, we can always begin by trying to strengthen our understanding of the problem.

1.1 Clearly defining the problem

We can do this by asking ourselves a set of questions such as:

What are the cases of break-out fires we want to catch?
What features of those cases are we interested in?
How do we catch those cases and features?
What are we expecting out of this model?

This will reveal a lot of the context hidden inside the topic.

1.2 Gathering data

Our next step, after having a good understanding of the problem and the outcome, is to gather as much of the right data as possible. As we all know, if we feed the wrong type information to the model, the model will output the wrong results. In laymen terms, this refers to a commonly said phrase in the data-driven community as “garbage in, garbage out”.

1.3 Building the model

After finally formulating an approach on the problem and gathering the right data, we opted to use an AI-based model to help detect those nasty fires in a matter of milliseconds!

1.4 Performing Inference

Last but not least, we ensure that the model has been trained adequately by performing some tests, otherwise known as inferencing, on it using some videos.

2. Defining the problem

We touched a bit on how we must expand our understanding of the problem, but we never really stated how? It is not as trivial as one might suggest and for that reason, let’s take a look at the matter a bit further.

In trying to catch fire along with the smoke, we must first ask ourselves:

In what scenario are we trying to catch it (fire) in?

Is it a wildfire? An indoor fire? An outdoor city fire? “Why not just cover them all”, you say? We could do so, but that would present multiple complications one of which being the under-representation of certain scenarios due to restricted access or lack of footage similar to the scenarios in which the end model would be deployed in.

Okay, say we decided to catch indoor fires, well the next question to immediately invade my mind would be:

Are we focusing on large scale fires or small scale fires?

A question that depends on the context of the problem, and what you want out of the solution, that we will have to answer before moving on to the next question:

What about catching the thick plume of smoke directly emanating from the fire versus catching the translucent sheet of smoke that directly follows it?

They are both technically “smoke”, but have enough distinguishing characteristics to force you to think about whether you want to treat them as separate cases thus risking confusing the model, (unless trained on a plethora of examples) or combining the two together thus eliminating the nuance between them. This is an important question to answer since it might directly impact the overall behavior of your model. (i.e. What your model would be looking for)

One last question (among many more), before it starts to really hurt the head, could be:

What are we expecting out of this model?

Answering this question will not only inform us more on what we should get as data but also help focus on what is important rather than getting sidetracked on issues that could, and probably would, arise have we not decided to set a clear vision.

Just to make sure that we’re all on the same page in the following sections, the answers to the question posed above are as follows:

1- In what scenario are we trying to catch it (fire) in?

Answer: We will be trying to capture both indoor & outdoor in-city fires with a focus on indoor fires.

2- Are we focusing on large scale fires or small scale fires?

Answer: We will be attempting to capture the details of both large & small-scale fires.

3- What about catching the thick plume of smoke directly emanating from the fire versus catching the translucent sheet of smoke that directly follows it?

Answer: Last but not least, we will be try and differentiate between the different types of smoke namely the thick plume and the translucent cloud-like smoke.

4- What are we expecting out of this model?

Answer: What we would like out of this model is to detect some of the smallest and largest fires (with a focus on indoor fires) alongside the nasty plumes of smoke that come along with it. We also hope to distinguish between the two types of smoke emanating from the fires as discussed above.

3. Gathering the data

Seeing that we’ve asked the right questions for our cause, to which hopefully we’ve gotten the right answers to, we can begin gathering the appropriate data. Note that since most projects at this level are usually tailored to a specific use case, hence the particular questions, we gravitated towards creating our own custom dataset to fit our needs.

One way in which we did this was to gather as many images (albeit random) of fires and smoke, mainly located in the suburbs and cities with fires located in forests being kept at a minimum.

Note: There is a saying that more data is usually always better in the case of deep learning. The issue with under-representing (and therefore under-performing) here only arises when you deploy a model in an environment you were consciously aware didn’t extensively train on. Therefore, even though we are collecting some image data on forest fires, we will not be deploying our model within that scenario.

We also resorted to gathering videos from YouTube that helped accommodate our case and below is one such video:

What we did was we took these YouTube videos and parsed a specific time-frame that was of interest, although one could take the entire video for substantially more data, and extracted 1 frame for every 5 frames that passed by. So in actuality, what happened was we took a frame for every 1/6th of a second that passed by in video and assumed that there was enough change in that 1/6th of a second. The reason for this was, since the video was originally 30 frames per second, it would have been an overkill to take each frame (resulting in the time between each consecutive frame being 1/30 of a second @ 30FPS) as though it were different from the previous frame.

Here are some of the outcomes of having scraped some images (~ 1500 images) from the internet:

Some of the difficulties that were faced while labelling for such a task was to understand the delimitation of both fires and smokes. The problem was, where does one fire begin and the other end? The problem is exacerbated by the idea of delimitation with regards to translucent smoke too!

Initially, as planned, we were to have three classes:

fire: A tag representing a clear portion of fire
smoke: A tag representing a clear visible body of smoke
smoke_2: A tag representing a not so clear tail of smoke that could be easily confused as anything but smoke. (i.e. Water droplets hanging in the air, clouds et cetera.)

Seeing that figuring out the boundaries of fires and thick smoke was hard enough, in addition to lacking in examples exhibiting the third (smoke_2) class, we decided to merge the third label with the second (smoke) label and treat them as one during the training phase.

4. Building the model

Let’s get down to business! We now need to build a model that will help us churn the annotated data that we’ve prepared for it!

We could use some available architecture and train from scratch but that would take way too long and require exponentially more data ranging in thousands if not tens of thousands. So what can we use that would help accommodate this shortage of data and speed up our training times?

Knock, Knock… NVIDIAs Transfer Learning Toolkit (TLT)

NVIDIAs Transfer Learning Toolkit offers a platform that contains generic pre-trained, retrainable and deployment ready models! Not only that, but they also offer models that are trained for specific use cases such as their PeopleNet that are also retrainable and deployment ready.

To start with, we opted to use the Faster-RCNN (F-RCNN) architecture which seemed to be the most accurate architecture out of the bunch. This crown, unfortunately, came with its own set of drawbacks. To start with, F-RCNN has a harder time integrating into NVIDIAs DeepStream SDK due to the additional plugins/add-ons needed to facilitate a smoother integration. Due to this, the additional influx of plugins and add-ons might add to the overall weight of the pipeline which also might heavily impact the pipelines inferencing performance. Moreover (peering into the future a bit), a pipeline containing F-RCNN adds an amount of weight so much so that it processes a one minute video about 8.5x slower than if we were to use DetectNet_V2.

Having said this, it is of utmost importance for us to take scalability (onto hundreds of cameras) into account and so instead, we decided to move to NVIDIAs pretrained DetectNet_V2 architecture which offers a much simpler integration with NVIDIAs DeepStream SDK and turns out to be lighter and faster than Faster-RCNN.

Moving on, we used a ResNet18 backbone as a feature extractor coupled with their DetectNet_V2 architecture and after tuning a specific subset of hyperparameters set by NVIDIA along with a subtle bunch of online augmentation* parameters, we were ready to train!

*online augmentation: Augmentation on the go (i.e. while training)

Training our DetectNet_V2 architecture with around 1500 images, we landed at about 76%–77% mAP (mean Average Precision) taking into account both the smoke and fire class. In the world of object detection, I would consider this to be good but only actual inferencing could either debunk or solidify our gut feeling.

5. Results (i.e. Inferencing)

It is only now that we get to view our hard earned inference results!

A portion of the following two videos were taken from YouTube and applied on our pipeline, containing the smoke/fire detector, which yielded the below results (which were converted into GIFs).

With only 1500 images under our belt, we can see very promising results based on initial impressions. In addition to that, given that we were using NVIDIAs TLT platform for pre-trained architectures, this helped us get the most out of our current limitations/dataset.

We can see that in the very first gif, outlining the indoor fire, the fire/smoke detector seems to be doing perfectly with regards to detecting the fire as it grows, along with the smoke that intensifies over time. This is what we expected seeing that we consciously had a focus towards indoor fire/smoke detection.

The outdoor detections however, outlined in the second GIF, seem to be a bit more unstable, but overall good, and that’s also to be expected. Note that in the outdoors smoke/fire detection, you can make out that the model is having a hard time distinguishing between the dark plume of smoke from the background, which seems to be inherently dark too either due to poor video quality, compression, the time of day etc...

All that aside, our gut feeling concerning the legitimacy of the mean A.P seems to be heading in the right direction and could lead us to even better results with more and more data!

6. Future Improvements

That’s great and all but what are we going to do about the current limitations we are having to face because of limited data?

The first clear and obvious path to take would be to:

gather more and more data.

The answer in the face of adversity, especially when the adversity is caused by small datasets, is to always gather more data. Another improvement we could act upon would be to:

2. distinguish between the different types of smokes.

This is subject to more or less having an equal distribution of data among the classes you want the model to explore.

Last but not least, we could also:

3. explore the plethora of architectures provided to us by NVIDIA.

There is no debate surrounding the fact that there are architectures out there that are even lighter than DetectNet_V2. The question, however, of whether they can live up to the same standard of accuracy still remains.

Here’s to pushing the boundaries. 🍻

Don’t forget to support with a clap!

Do you have a cool project that you need to implement? Reach out and let us know.

To discover Zaka, visit www.zaka.ai

Subscribe to our newsletter and follow us on our social media accounts to stay up to date with our news and activities:

LinkedIn — Instagram — Facebook — Twitter — Medium