Revolutionising Air Quality Monitoring: A New Approach Using Images and AI

Om Kathalkar
13 min readAug 29, 2023

Air pollution has been a persistent issue for quite some time now, and it’s becoming more severe due to increased industrial activities and the higher number of vehicles on the roads. This emphasizes the critical need to take proactive steps immediately to mitigate its harmful effects. Substantial research consistently underscores that this problem will continue to impact our environment and health without early intervention.

An visible effect of Air Pollution in Indian City of New Delhi.
A Visible Layer of Smog Effect Due to Air Pollution in Indian City of New Delhi.

The rise of cutting-edge technologies, encompassing low-cost sensors, Internet-connected smart devices, and human-like learning computer programs processing big data, has propelled our comprehension of air pollution’s dangerous nature. These breakthroughs have revealed the direct correlation between air pollution and various diseases, underscoring an urgent imperative for immediate action.

Air pollution results primarily from an intricate blend of pollutants discharged from diverse origins. These pollutants encompass a range of substances such as particulate matter (PM), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO), and volatile organic compounds (VOCs), among others. These substances can severely impact human health, increasing the prevalence of respiratory diseases like asthma and chronic obstructive pulmonary disease (COPD), cardiovascular diseases, and even lung cancer. Delicate particulate matter (PM2.5) and other pollutants are linked to around 7 million premature deaths globally every year, as reported by the World Health Organization (WHO). Recognizing the intricate connection between pollutants and health issues is imperative, driving the urgency for comprehensive efforts to mitigate air pollution’s health implications.

What is the Air Quality Index and How is it calculated?

The Air Quality Index (AQI) is a standardized numerical scale used to communicate the level of air pollution and its potential health effects to the public. It provides an easy-to-understand way to gauge air quality in a specific area. The AQI is calculated based on the concentrations of key air pollutants, including particulate matter (PM2.5 and PM10), ground-level ozone (O3), nitrogen dioxide (NO2), sulfur dioxide (SO2), and carbon monoxide (CO). Each pollutant’s concentration is converted into a sub-index, and the highest sub-index among these pollutants becomes the AQI value. The AQI is then categorized into ranges, from “Good” to “Severe,” indicating the health concern associated with the air quality. This information is crucial for individuals and authorities to make informed decisions to protect public health. A detailed tutorial on Kaggle with relevant code to calculate the Air Quality Index (AQI) by Vopani is available.

Category-wise Air Quality Index and Concentration range of different pollutants.

What are the traditional Air Quality Monitoring Stations?

Traditional air quality assessment heavily relies on sophisticated tools such as Beta Attenuation Monitors (BAM) and Tapered Element Oscillating Microbalances (TEOM), which are widely utilized by regulatory bodies like the Central Pollution Control Board (CPCB) and governmental entities in India.

Numerous countries worldwide have established various air quality monitoring stations across diverse cities. These stations utilize precision detection instruments based on the principles above to acquire pollutant concentrations. They subsequently calculate the air quality index (AQI) and disseminate this information to the public. The strategic placement of these monitoring stations considers factors such as population distribution, built-up area size, representativeness, continuity, safety, and operational feasibility. Consequently, a monitoring station’s coverage is approximately 1–3 km. However, air quality within a city exhibits considerable regional variations and nonlinear changes, rendering it impractical to adequately capture this complexity with a limited number of monitoring stations. Another drawback of this approach is the finite lifespan of sensors, necessitating regular maintenance. Achieving dense deployment to cover every corner of a city is prohibitively costly. Consequently, individuals residing or working far from monitoring stations cannot access precise and real-time air quality information.

Why are Image-based Methods for Air Quality Index (AQI) estimation needed?

Consequently, the coverage of an air quality monitoring station typically extends to a radius of about 1–3 km. However, urban air quality exhibits substantial regional disparities and nonlinear variations, rendering it impractical to capture this complexity with limited monitoring stations comprehensively. An additional drawback of the conventional sensor-based monitoring approach is the finite lifespan of sensors, necessitating frequent maintenance. The cost of achieving widespread deployment to encompass every city corner proves exorbitant. This results in individuals residing or working far from monitoring stations lacking access to precise, real-time air quality information.

This underscores the need for an image-based approach to Air Quality Index (AQI) estimation, driven by the rapid advancements in smartphone technology, video surveillance systems, and the ubiquitous use of artificial intelligence (AI). The improvements in image quality and ease of image capture using mobile devices enable individuals to document their surroundings effortlessly. When combined with AI techniques like image processing and machine learning, this progress allows for air quality detection based on images. This empowers the public to analyze air quality by applying established recognition models, facilitating informed responses to air pollution.

Furthermore, the emergence of deep learning methodologies has magnified the relevance of image-based air quality assessment. This novel technique not only diminishes dependence on specialized hardware but also augments the granularity of air quality monitoring. The intricate link between image attributes and air quality indices is systematically explored using image features and advanced deep-learning models. As a result, this approach addresses the limitations of conventional methods, providing a comprehensive and accessible means to gauge air quality.

IoT-based AQI Estimation using Image Processing and Learning Methods

Recent research introduced an innovative IoT-based approach to estimate real-time Air Quality Index (AQI) levels, categorized into five groups. This method harnesses traffic images and weather parameters for estimation. Notably, this work represents a pioneering effort, being the first to achieve such results on Indian roads. This ground-breaking study was presented as a conference paper at the 2022 IEEE 8th World Forum on the Internet of Things (WF-IoT) in Yokohama, Japan. The authors of this work are Nitin Nilesh, Ishan Patwardhan, Jayati Narang, and Sachin Chaudhari.

To support this methodology, a new traffic dataset is gathered from Indian roads, encompassing 5048 images, pertinent weather data, and co-located ground truth PM values. This dataset spans across different seasons in the Indian city of Hyderabad.

The proposed method demonstrates a remarkable 82% overall accuracy in accounting for PM variation due to different seasons. Moreover, a substantial enhancement in AQI estimation accuracy through image utilization is showcased compared to existing methodologies.

Proposed Methodology

The methodology comprises a structured series of steps, encompassing pivotal phases such as the Data Collection Campaign, establishing a precise Hardware Setup, creating a specialized Vehicle Detection model trained for the unique complexities of the Indian traffic scenario, and various other integral components.

Air pollution has been a persistent issue for quite some time now, and it’s becoming more severe due to increased industrial activities and the higher number of vehicles on the roads. This emphasizes the critical need to take proactive steps immediately to mitigate its harmful effects. Substantial research consistently underscores that this problem will continue to impact our environment and health without early intervention.

The rise of cutting-edge technologies, encompassing low-cost sensors, Internet-connected smart devices, and human-like learning computer programs processing big data, has propelled our comprehension of air pollution’s dangerous nature. These breakthroughs have revealed the direct correlation between air pollution and various diseases, underscoring an urgent imperative for immediate action.

Air pollution results primarily from an intricate blend of pollutants discharged from diverse origins. These pollutants encompass a range of substances such as particulate matter (PM), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO), and volatile organic compounds (VOCs), among others. These substances can severely impact human health, increasing the prevalence of respiratory diseases like asthma and chronic obstructive pulmonary disease (COPD), cardiovascular diseases, and even lung cancer. Delicate particulate matter (PM2.5) and other pollutants are linked to around 7 million premature deaths globally every year, as reported by the World Health Organization (WHO). Recognizing the intricate connection between pollutants and health issues is imperative, driving the urgency for comprehensive efforts to mitigate air pollution’s health implications.

What is the Air Quality Index and How is it calculated?

The Air Quality Index (AQI) is a standardized numerical scale used to communicate the level of air pollution and its potential health effects to the public. It provides an easy-to-understand way to gauge air quality in a specific area. The AQI is calculated based on the concentrations of key air pollutants, including particulate matter (PM2.5 and PM10), ground-level ozone (O3), nitrogen dioxide (NO2), sulfur dioxide (SO2), and carbon monoxide (CO). Each pollutant’s concentration is converted into a sub-index, and the highest sub-index among these pollutants becomes the AQI value. The AQI is then categorized into ranges, from “Good” to “Severe,” indicating the health concern associated with the air quality. This information is crucial for individuals and authorities to make informed decisions to protect public health. A detailed tutorial on Kaggle with relevant code to calculate the Air Quality Index (AQI) by Vopani is available.

What are the traditional Air Quality Monitoring Stations?

Traditional air quality assessment heavily relies on sophisticated tools such as Beta Attenuation Monitors (BAM) and Tapered Element Oscillating Microbalances (TEOM), which are widely utilized by regulatory bodies like the Central Pollution Control Board (CPCB) and governmental entities in India.

Numerous countries worldwide have established various air quality monitoring stations across diverse cities. These stations utilize precision detection instruments based on the principles above to acquire pollutant concentrations. They subsequently calculate the air quality index (AQI) and disseminate this information to the public. The strategic placement of these monitoring stations considers factors such as population distribution, built-up area size, representativeness, continuity, safety, and operational feasibility. Consequently, a monitoring station’s coverage is approximately 1–3 km.

Hardware Setup of the Solution

Hardware Setup and Block Diagram

The apparatus devised for the “IoT-based AQI Estimation using Image Processing and Learning Methods” project consists of a Raspberry Pi 3B+ microcontroller and a Raspberry Pi Camera, which facilitates the capture of traffic images. The integrated sensors encompass the BME280 for Temperature and Humidity measurements, the Neo 8M GPS module for Latitude and Longitude tracking, and the SDS011 Nova Sensor to extract PM2.5 and PM10 values at specific locations. Each new image is captured at a 5-second interval. The data obtained from the SDS011 sensor is employed to compute the AQI, simultaneously serving as the ground truth for the Machine Learning algorithm developed for this experimental initiative. The AQI is calculated in accordance with Indian standards. Notably, the hardware configuration enables the transmission of processed data to a remote server, rendering it well-suited for edge computing applications.

Data Collection Campaign

The data collection campaign is carried out in Hyderabad, India, across different phases and seasons to curate an extensive dataset of co-located Air Quality Index (AQI) images. The dataset encompasses 5048 images, each correlated with AQI data. The first phase involves data collection during the months of September, October, November, and December 2021. The second phase occurs during the months of October and November in 2022, as well as January and February in 2023. This dataset captures a variety of seasons and is predominantly collected during daytime hours. The collection process spans over 1,000 kilometres across the city, utilizing multiple cameras.

Data Collection Device which was mounted on the top of the Bodhyan Car.

To facilitate the data collection endeavour, the Bodhyan car from IIIT Hyderabad’s iHub is generously provided, serving as the platform for mounting our hardware setup. The setup is positioned on the car’s roof, featuring the Aeroqual reference sensor with PM extension. An additional Aeroqual reference sensor equipped with CO extension is also mounted to gather more comprehensive data. For seamless data transmission and real-time monitoring, a JioFi device is employed, enabling internet connectivity for the data collection node.

Sensor Data Calibration and Preprocessing

During the data collection, we assembled a comprehensive dataset incorporating essential variables such as PM2.5, PM10, Relative Humidity (Rh), and Temperature. This data was gathered through our air quality node and a reference sensor known as Aeroqual. Notably, our node operated at a frequency of 15 seconds, while the reference device operated at a 1-minute interval.

Upon dataset collection, a crucial preprocessing step was initiated. The raw data underwent refinement by applying the IQR outlier removal technique, concurrently excluding PM values surpassing 999 and Rh values exceeding 80. Subsequently, a strategic decision was made to average our node data to a 1-minute interval, aligning it with the reference data. The rationale behind this step was to facilitate the execution of a regression model to establish correlations between our data and the reference dataset.

The process of establishing correlations encompassed employing both Pearson and Spearman correlation models. Upon detecting a temporal lag in our data, we undertook corrective action by shifting our data to synchronize it. Regression models were tested to calibrate our data with the reference device readings. Ultimately, a simple linear regression model emerged as the optimal choice, demonstrated by the lowest Root Mean Square Error (RMSE) and the highest correlation, substantiated by a commendable r2 score.

After successfully executing the linear regression, we derived the slope and intercept values, pivotal in calibrating our device values using the equation y = mx + c. Subsequently, an additional dataset was formulated, housing the calibrated values obtained through this equation. This calibrated dataset was then harnessed to calculate the Air Quality Index (AQI). This was achieved by computing sub-indices for PM2.5 and PM10 values, followed by AQI computation per the Indian AQI formula.

Feature Extraction and Feature Engineering

The central premise of this study revolves around the estimation of air pollution, specifically the Air Quality Index (AQI), through the utilization of traffic images. This entails extracting pertinent features from the acquired images, which are then employed as inputs for calculating AQI. In the process, vehicles emitting pollutants in the images are identified, and their count is designated as a significant image feature. Identifying these vehicles within the images was accomplished by implementing the state-of-the-art object detection algorithm, You Only Look Once version 5 (YOLOv5).

To quantify the number of vehicles depicted in the images, the YOLOv5 algorithm was used. YOLOv5 is an advanced algorithm for object detection and localization, characterized by utilising a Convolutional Neural Network (CNN) as a feature extractor. This underlying architecture enables the algorithm to detect and localize multiple objects within a single image adeptly. The ultimate output of YOLOv5 for each image encompasses two key components: the identification of detected objects (classification) and the delineation of their respective bounding boxes (regression). To ensure superior performance, YOLOv5 was trained using the customized Indian Driving Dataset (IDD) introduced earlier in this study.

Final Dataset Preparation

In finalizing the dataset, the initial phase involves inputting images into the pre-trained YOLOv5 model. This model effectively identifies and quantifies the count of pollution-emitting vehicles within each image, encompassing various vehicle types such as buses, cars, trucks, motorbikes, and autorickshaws (totalling five categories). This count serves as a fundamental component of the image’s feature vector. Additionally, the visibility score of each image is computed through the application of the BRISQUE algorithm, yielding a singular value that further augments the feature vector.

Building upon this, the feature vector is expanded by incorporating corresponding sensor data, precisely temperature and humidity readings. This fusion culminates in a comprehensive feature vector, structured at 8 × 1 for every sample image. For clarity, a representative example of this feature vector is provided.

Regarding establishing labels for the samples, the PM2.5 and PM10 concentration values obtained from the reference PM sensor are utilized. Through the computation of AQI values based on these sensor measurements, the dataset’s samples are categorized into predefined AQI levels, each corresponding to a specific level of air quality. This categorized AQI level serves as the label for each sample.

Upon executing this process across the entire dataset, a comprehensive data matrix M of dimensions m × 8 is generated, with m signifying the total number of samples in the dataset. Additionally, a corresponding label vector y of size m × 1 is created to encapsulate the AQI category for each sample, completing the preparation of the final dataset.

Experiments and Results

This study focused on predicting air quality, and a diverse range of machine learning (ML) models took centre stage. The spotlight was on models like Random Forest, Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP), each playing a distinct role. These models were meticulously trained and put to the test using the Scikit-learn library, known for its powerful ML capabilities.

The study’s centrepiece was the YOLOv5 model, which wasn’t just trained but refined over 25 epochs. This model was then transformed into a TensorFlow Lite variant, optimized to run seamlessly on the Raspberry Pi Zero (Rpi 3B+) platform.

The success of this endeavour was made evident through a comprehensive evaluation of the models’ performances. This evaluation encompassed key metrics like precision, recall, F1-score, and accuracy, shedding light on the models’ adeptness in predicting air quality levels.

Stay tuned for deeper insights into this fascinating study, where these models go head-to-head to predict air quality with precision and finesse. From refining YOLOv5 to TensorFlow Lite to harnessing the prowess of SVM, RF, and MLP, this study takes us on a captivating journey through the world of machine learning for air quality estimation.

Conclusion

In this groundbreaking article, we introduced a user-friendly technique for predicting the Air Quality Index (AQI) using images on IoT devices. The approach, powered by a blend of Machine Learning (ML) and Deep Learning (DL), achieved remarkable accuracy rates of up to 90% in AQI classification.

But we’re not stopping at the findings. We’re excited to collaborate and share our progress with the broader community. If you’re interested in exploring the code and data behind this innovative project, you can check out our GitHub repository: IoT-based-AQI-Estimation-using-Image-Processing-and-Learning-Methods. This repository includes the code, datasets, and resources that fueled our exploration into the world of machine learning for air quality estimation.

As we move forward, our focus is expanding. We plan to extend this innovative method to nighttime predictions, gather summer-specific data, and even predict AQI for different cities. Join us on this captivating journey through machine learning and air quality estimation.

--

--