Challenges of Face Tracking and Background Segmentation in Unity in Real Time

4 min readMar 15, 2023

Unity has become a viral game engine and development platform due to its ease of use, versatility, and flexibility. It is a powerful tool that enables developers to create games, AR/VR applications, simulations, and other interactive experiences for various platforms, including desktop, mobile, console, and web. One of the key factors driving the popularity of Unity is its accessibility, as it offers a low barrier to entry for aspiring developers. However, some challenges arise when it comes to real-time background segmentation. Let’s explore!

Background Subtraction in Unity: Case Studies

Before we dive deeper into the topic, let’s get acquainted with some renowned brands and companies that leveraged background subtraction in Unity:

Banuba

Banuba is an SDK that provides real-time background subtraction and face-tracking capabilities for Unity. Their SDK is used by various mobile apps such as FaceApp, Snapchat, and TikTok.

Volkswagen

Volkswagen used Unity to create a virtual reality showroom allowing customers to explore their cars digitally. The showroom uses real-time background subtraction and object tracking to place the vehicles in a virtual environment.

Magic Leap

Magic Leap developed a mixed-reality headset that uses real-time background subtraction and objects recognition to place virtual objects in the real world. The headset uses Unity as its development platform.

Overall, background subtraction in Unity is used in various industries and projects, including mobile apps, robotics, automotive, medical, and mixed reality. It has enabled developers to create immersive experiences seamlessly blending virtual and real-world elements.

Challenges of Real-time Background Segmentation in Unity

Even though Unity is one of the most popular game engines, developers still encounter typical problems during background or face segmentation. Let’s consider the most common ones.

Lighting Сonditions

One of the biggest challenges in face tracking is the effect of lighting on the face. Changes in lighting conditions can cause the face to appear different and can make it difficult for the system to track the face accurately.

Occlusions

Another challenge in face tracking is occlusions, where objects or other parts of the body block the view of the face. This can cause the system to lose track of the face, leading to errors in tracking.

Different Faces

Face tracking systems need to be able to accurately track different faces, regardless of gender, age, ethnicity, and facial hair. This requires a lot of training data and sophisticated algorithms.

Background Noise

Background segmentation in real-time can be challenging due to noise in the background, such as moving objects or changes in lighting. This can lead to errors in segmenting the foreground and background.

Real-time Processing

Real-time processing adds an extra layer of complexity to face tracking and background segmentation. It requires the system to process the video stream in real time while maintaining accuracy and avoiding lag.

System Requirements

Face tracking and background segmentation require significant computational power, which can be a challenge for lower-end devices. This can limit the effectiveness of the system, particularly in real-time applications.

Dynamic Environments

Finally, face tracking and background segmentation in Unity must handle dynamic environments where the camera or the objects in the scene move. This can be particularly challenging for real-time applications, where the system needs to keep up with the changes in the scene.

Face tracking and background segmentation in Unity in real-time are complex and challenging tasks that require sophisticated algorithms, significant computational power, and careful tuning to achieve accurate results. Can Unity remove background with a flick of a finger? Definitely, no. However, there are solutions to overcome the pitfalls and simplify the process.

Solutions to Overcome the Challenges

We have reviewed the common challenges, so let’s skip to the solutions provided.

Some face-tracking SDKs and libraries, such as Banuba, use advanced computer vision techniques to handle changes in lighting conditions. For example, they may use adaptive thresholding or color-based segmentation to separate the face from the background, even under changing lighting conditions.
One way to overcome occlusions is to use multiple cameras or depth sensors to capture a complete scene view. Another approach is to use machine learning algorithms that can predict the location of the face even when parts of it are occluded.
Face-tracking SDKs and libraries often use deep learning models trained on large datasets of faces from different populations. This helps ensure the system can accurately track different types of faces. Developers can also train their own models using custom datasets if needed.
To deal with background noise, some background segmentation techniques use temporal filtering or morphological operations to smooth the foreground/background segmentation over time. This can help to reduce the impact of noise in the scene.
Optimizing the algorithms and using hardware acceleration where possible is important to ensure real-time processing. For example, Banuba’s face-tracking SDK uses GPU acceleration to achieve real-time performance on mobile devices.
To overcome limitations in computational power, developers can optimize the algorithms and use lower-resolution inputs if possible. They can also use cloud-based processing to offload some of the computational load to remote servers.
To handle dynamic environments, some SDKs and libraries use object-tracking algorithms that can track the camera’s movement or other objects in the scene. They may also use multiple cameras or depth sensors to capture a more complete view of the scene.

Overall, developers can overcome the challenges of face tracking and background segmentation in Unity by using advanced computer vision techniques, deep learning models, and hardware acceleration. They can also optimize the algorithms, use cloud-based processing, and leverage multiple cameras or sensors to handle dynamic environments.