Balefire: Navigating the Challenges of Edge AI in Smoking Detection
In the realm of AI, the focus has been fixated on Large Language Models (LLMs) and leveraging powerful APIs like OpenAI’s. However, my recently completed project was in an entirely different domain — computer vision at the edge.
Exploring Balefire
Recently, we’ve wrapped up the third iteration of Balefire, an edge AI solution aimed at identifying smoking activities in public spaces. This project is an exploration with the National Environment Agency (NEA), a collaborative partnership that has been flourishing since Balefire’s inception in 2019. The principal aim of Balefire in this third iteration is to assist NEA in detecting smokers in smoking-prohibited places. The system provides vital information, such as the frequency and patterns of smoking activities, as well as overall foot traffic data. These insights facilitate NEA in optimising the allocation of enforcement officers to these identified hotspots.
While the first two iterations served primarily as proof-of-concept demonstrations, the latest iteration of Balefire represents our largest scale pilot to date, with 20 installations around Singapore. This expanded pilot is a strategic move towards applying the valuable insights gained from these field deployments towards full operationalization in the future.
The Intricacies of Smoking Detection
At first glance, detecting smoking activities might seem straightforward. Many AI professionals might initially view Balefire as a mere cigarette detection tool and wonder about its complexity. I shared this sentiment until I delved into the project, where I quickly realized the substantial challenges involved. Cigarettes are notably difficult to detect — they’re small, often partly occluded. Moreover, numerous everyday objects resemble them, especially when viewed through cameras in an outdoors setting. Common misidentifications include straws, shiny phone edges, fingers positioned in certain ways, and even certain types of food, all of which can be easily mistaken for a cigarette. Relying on smoke detection or the cigarette’s glowing tip as detection cues also proved error prone. Going beyond the cigarette and looking at the entire person, such as through pose estimation, also resulted in an unacceptable level of false positives. All these insights, gained from lots of experimentation, led to the understanding that an end-to-end detection model isn’t feasible, particularly in an edge AI context with its inherent compute limitations and relatively small model sizes, coupled with the need for near-instantaneous detection.
Furthermore, the challenge extends beyond mere detection. It’s crucial to avoid duplicate detections for the same smoker as that will skew trends and prevent meaningful analysis. This required a reidentification process within the edge pipeline, adding another layer of complexity to the task.
Pre-Existing Solutions
When confronted with the task of smoking activity detection, it seemed logical to assume that existing solutions would be readily available. This was my initial belief as well. However, upon exploring the market, it became evident that while there are service providers offering smoking detection capabilities, their offerings are often constrained by specific limitations. For instance, many of these systems require the smoker to be fully visible within the camera’s field of view, leading to a high likelihood of false negatives. Others demand that the subjects be positioned quite close to the camera, a requirement stemming from their need for a minimum size of detectable objects.
NEA’s requirements, however, called for a more versatile solution. They needed a system capable of identifying as many smokers as possible across the entire span of a camera’s field of view and doing so almost instantaneously. This set of needs was not met by the existing solutions and thus necessitated the development of a custom solution tailored specifically to these unique operational demands.
Balefire Pipeline
Determining the most effective approach for smoking activity detection required extensive experimentation and iterative improvements. Eventually, we established a robust pipeline specifically tailored for this task:
- Head Detection and Processing: The pipeline initiates with the camera frames being fed into a head detector, which identifies the coordinates of all heads within the frame.
- Heuristic-Based Filtering: Post-detection, these heads undergo a series of heuristic filters designed to eliminate potential erroneous heads. These filters are the product of accumulated learnings and detailed analysis of deployment data.
- Head Tracking: An object tracker then follows the detected heads across successive frames, linking them with previously detected heads wherever possible. This ensures that, for identified smokers, repeated alerts are not triggered each time they are recognized in a new frame.
- Smoke/No-Smoke Classification: Heads not previously classified as belonging to smokers are then processed through a binary head classifier. This classifier determines whether the individual is smoking or not.
- Reidentification Module: If the classifier indicates smoking activity, a reidentification module attempts to match the detected smoker against a watchlist of recent smokers. If there is no reidentification, an alert is triggered. The watchlist is updated with the latest appearance of the smoker and other relevant information.
Typically, reidentification is integrated with object tracking in computer vision applications. However, to maintain real-time analysis capabilities in a compute-limited edge environment, we chose to apply reidentification solely for detected smokers, rather than for all tracked heads. The addition of reidentification after classification helped to further reduce double/multiple counting of the same person.
In Version 1, we used a face detector instead of a head detector. However, the inconsistency in facial visibility, such as when individuals turn away from the camera, led to tracking issues and downstream double/multiple counting where multiple alerts are sent out for the same smoker on different frames.
In Version 2, an attempt was made to combine head and face detectors for enhanced filtering, but with finetuning of the head classifier and the addition of the reidentification module, this became redundant.
While some of these design choices might seem unconventional, they are justified within the context of edge AI deployment. Empirical data from internal benchmarks and on-the-ground operational improvements support these decisions.
From the first to the third version of Balefire, these strategic decisions, coupled with continuous model fine tuning using data collected over the course of the project, have led to a reduction of false positives and double counts by approximately 90%, while maintaining a high true positive rate and minimizing false negatives.
For data collection and annotation for training of the constituent models, we used actual footage from the current and past iterations of Balefire and used a semi-supervised approach to annotate them. Simply put, we used our existing models to annotate the new data for us and corrected any errors from that process. We iteratively added in specific profiles of images that the existing models were error prone in, such as persons wearing helmets, or persons who are eating or drinking. This helped to improve the performance of the models significantly over the course of the project.
Operationalization Considerations
Operational considerations played a critical role in the development of Balefire. Our focus extended beyond merely proving the viability of smoking activity detection, a milestone achieved by the previous iteration in 2020. The challenge was to create a system that not only enhanced NEA’s enforcement operations but also incorporated a set of quantitative performance metrics addressing key issues like false positives and double counting.
In the realm of performance evaluation, accuracy is a fundamental and very commonly used metric. However, for Balefire’s specific operational context, precision and recall are more relevant. A high precision rate is crucial to ensure that NEA is not overwhelmed with false positive alerts, while a high recall rate is essential to identify as many smokers as possible within the operational scope.
However, neither precision nor recall sufficiently address the issue of double counting. To tackle this, we introduced a novel metric: strict precision. This metric is an adaptation of the standard precision formula, integrating double counts into the calculation. By doing so, strict precision evaluates the detections not only for their accuracy but also for their necessity and relevance, ensuring that each alert sent to NEA is both valid and essential.
To rigorously assess Balefire’s performance, we established two benchmarks using video footage from all deployment locations. These benchmarks utilized three main performance metrics — accuracy, precision, and our newly defined strict precision — to evaluate and compare the efficacy of Balefire across its three iterations and various experimental modifications. This comprehensive approach has been instrumental in refining Balefire’s capabilities and ensuring its operational effectiveness for NEA.
Lessons Learned
Some valuable non-technical lessons that I have learnt from working on Balefire include:
- Revisiting Old Solutions: Previously discarded ideas should be reassessed from time to time, as they might offer unexpected benefits in a different context. For instance, the reidentification model was initially set aside after not working well together with the object tracker for real-time tracking. In subsequent iterations, I tried reintegrating this model from another perspective, and this resulted in a significant improvement in Balefire’s performance.
- The Swiss Cheese Approach: For Balefire, we embraced the Swiss Cheese Model of Risk, a concept traditionally used in risk management and safety engineering. This model visualizes a system’s defenses as layers of Swiss cheese, where each layer represents a safeguard, and the holes symbolize potential failures or weaknesses. The key is that the weaknesses in one layer are covered by another, reducing errors overall.
- Reality Check on Academic Benchmarks: New model releases, while impressive on research benchmarks, often fall short in real-world applications as these benchmarks often do not adequately represent the challenges and nuances that can arise in practical settings. There needs to be real-world, context-specific internal benchmarks developed to properly evaluate experiments.
- Resourcefulness and Pragmatism: Innovation doesn’t always mean inventing something entirely new. Often, the most efficient solutions come from creatively using existing resources and technologies. It’s crucial, however, to be mindful of legal aspects like software licenses when utilizing available tools.
These insights are often reiterated by many AI practitioners but only become profoundly meaningful when experienced firsthand.
Conclusion
Deploying complex AI systems at the edge, as we do at DSAID, presents a unique set of challenges and opportunities distinct from traditional cloud-based solutions. Edge computing, characterized by its limited computational capacity compared to cloud environments, necessitates a fundamentally different approach to AI algorithm optimization. The evolution of the Balefire project underlines the importance of a tailored approach to edge AI deployment. This project has shown how technical expertise must be integrated with a profound understanding of the specific operational needs. In the case of NEA’s requirements, it was evident that off-the-shelf models were insufficient, leading us to develop bespoke solutions that precisely addressed the unique challenges at hand.
Looking ahead, the future of edge AI remains vibrant. With recent developments like the impressive benchmark results achieved by vision models on the Raspberry Pi 5, and the substantially enhanced capabilities of the Jetson Orin platform, the field of edge AI is at the forefront of technological innovation. These advancements open a plethora of exciting possibilities for us at DSAID. From enhancing public service delivery to exploring new realms of AI applications, the future of edge AI is replete with opportunities for groundbreaking tech for public good.