Building Arya’s Health Vital Monitoring AI model with DeepPhys Inspiration!

Kushagra Bhatnagar
Arya AI Tech Blog
Published in
4 min readFeb 13, 2024

In recent years, the advancement of deep learning techniques has revolutionized various fields, including healthcare and biometrics. One such innovation in AI healthcare is the modeling of Photoplethysmography (PPG) signals from video data.

Arya.ai has embarked on a groundbreaking journey, fusing artificial intelligence with biometric data to forge the future of health monitoring. Imagine this- you glance into your phone’s camera for 30 seconds, and your key health vitals, including heart rate, blood pressure, and respiration rate, appear on the screen. With Arya’s pioneering Health Vitals Monitoring Model this is the new reality!

In this tech blog, we delve into the official documentation of our Health Vitals Monitoring Model, drawing inspiration from the DeepPhys model.

What is Photoplethysmography (PPG)?

PPG is a simple optical technique used to detect volumetric changes in blood in peripheral circulation. It is a non-invasive method that makes measurements at the surface of the skin. The basic idea behind camera-based photoplethysmography(PPG) is the same as that of a Pulse Oximeter. By harnessing subtle changes in blood flow, our model unlocks a treasure trove of insights, for real-time tracking.

Similarities and Differences in interaction of light between contact and non-contact method

Delving into the DeepPhys Model-

DeepPhys represents a paradigm shift in biometric analysis by harnessing the power of deep learning, particularly convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to extract temporal and spatial information from facial video data and accurately predict underlying PPG signals.

DeepPhys Model Architecture

Custom Modifications for Enhanced Performance-

Building upon the DeepPhys framework, we have meticulously tailored our model with several key modifications to elevate its capabilities and adaptability across diverse scenarios:

  1. Multi-Modal Fusion
  • Integration of Additional Sensors: Beyond facial video data, our model incorporates insights from complementary sensors like accelerometers or ambient light sensors to enrich the input modality and enhance signal accuracy.
  • Fusion Mechanisms: Employing sophisticated attention mechanisms or fusion networks to seamlessly combine information from multiple modalities, thereby bolstering the robustness and accuracy of PPG signal prediction.

2. Transfer Learning

  • Leveraging Pretrained Feature Extractors: We capitalize on the wealth of knowledge encoded in pre-trained CNNs trained on extensive image datasets like ImageNet to initialize feature extraction layers, expediting the learning process.
  • Fine-Tuning Strategies: Through fine-tuning on domain-specific facial video data, we tailor the pre-trained CNNs to the task of PPG signal modeling, thereby refining feature representations and improving overall performance.

3. Self-Supervised Learning

  • Temporal Context Prediction: Introducing self-supervised learning objectives such as predicting future video frames or temporal context reconstruction to encourage the model to capture long-term temporal dependencies effectively.
  • Auxiliary Tasks: Incorporating auxiliary tasks like heart rate estimation or facial expression recognition as self-supervised learning objectives to facilitate robust feature learning and generalization.

4. Adversarial Training

  • Harnessing Generative Adversarial Networks (GANs): Integrating GANs into the training process enables the generation of synthetic video samples mirroring real-world variations in facial appearance, pose, and illumination conditions.
  • Adversarial Regularization: By incorporating adversarial loss terms, we fortify the training process, enhancing the model’s resilience to domain shifts and adversarial attacks.

Implementation of Multithreaded Approach for Processing Video Frames-

To further amplify the efficiency and performance of our model, we have implemented a multi-threaded approach for processing video frames. By leveraging the power of Multi-Core CPUs and exploiting parallelism effectively, this approach accelerates frame-level computations, thereby maximizing computational resources and expediting the overall processing pipeline.

Our customized DeepPhys model represents a significant advancement in video-based PPG signal modeling, offering improved performance and adaptability to diverse scenarios. By incorporating multi-modal fusion, transfer learning, self-supervised learning, and adversarial training techniques, the model can effectively leverage additional information sources, learn robust feature representations, and mitigate common challenges in real-world applications.

To amplify efficiency and scalability, Arya has implemented a multi-threaded approach for processing video frames, maximizing computational resources and expediting the overall processing pipeline.

Here’s the result, Arya’s health vitals monitoring AI model-

Explore the API now- https://api.arya.ai/health-vitals-monitor

References-

  1. Chen et al., (2018).DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks (DeepPhys).
  2. Chen, W., Zhang, Y., Liu, Y., & Zhang, Q. (2019). Photoplethysmograph signal processing and its applications in healthcare monitoring. Artificial Intelligence in Medicine, 94, 101–114.
  3. Elgendi, M. (2012). On the analysis of fingertip photoplethysmogram signals. Current Cardiology Reviews, 8(1), 14–25.
  4. Wang, Wenjin, et al. “Algorithmic principles of remote PPG.” IEEE Transactions on Biomedical Engineering 64.7 (2016): 1479–1491.

--

--