How I took third place in a Telegram contest and won $2,000

A brief introduction.

Published in

iOS Dev — Mobile Development

5 min readMar 23, 2022

Telegram announced a contest for developers to create a library that could be used to transform faces in real time.

Task: Create a cross-platform C++ module for correcting appearance. Create an app for iOS or Android that demonstrates how this module works by converting video from the front camera in real time.
Similar functionality is implemented in applications such as Instagram and Snapchat, and may involve smoothing facial skin, hiding skin blemishes, if necessary, enlarging eyes, reducing the nose, chin or ears, i.e. bringing facial proportions to the universal standards of attractiveness. The module should work correctly regardless of gender, age and skin color. The result of the transformation should be perceived by the viewer as natural, not conspicuous compared to the original. The goal is to create a barely noticeable filter, which, would allow users to present a “better version of themselves” to the interlocutor.

Reflections, was it worth participating?

Threshold of entry (judging by my experience), pretty high, but the road less traveled, right? Let me tell you right away that before this contest I had little experience in this field. Distantly heard about ARKit and Metal. And previously participated in a contest, in which I used CoreML to find and cut out the area on static pictures.

Where to start?

For this task I’ve studied several articles about ARKit and also examples on GitHub with more or less workable things. But at most I managed in that time to create either a surface mask to work in real time or use pre-generated models to overlay eyes, or a nose, for example. For that, I studied the articles here:

Developer site Apple

Face recognition with front-facing camera, virtual content overlay, and real-time facial expression animation.

Sample by Ray Wenderlich

A clear example of the approach I had originally planned to use, but there are some limitations here as well.

Github

With this repository you can roughly visualize how the sectors are divided up, for which further manipulations can be applied.

An example with a list of available “sectors”

The difficulties with this approach were clear in advance: the number is strictly limited now, 1220, but can be changed later. So even a temporary solution may not work in future versions of iOS + it is hard to call it cross-platform, as required by the problem condition.
ARFaceGeometry has 1220 vertices, and index 9 is on the nose, for example.

The solution approach chosen within the context of the contest.

So I decided to enter the competition using only native techniques out of the box, in particular to adapt the module for use with a bilateral filter, which is the difference between the different ways of smoothing. This procedure takes a weighted sum around, relying on both the distance to the selected pixel and the proximity of the pixels in color. You can read a bit about it here (article in Russian).

Bilateral filter

By the terms, or the way I understood them, I can’t share the source code. But I’m happy to share the resources that got me thinking.

Apple example

This example shows how to apply a filter with a pink lens color using Core Image and Metal. It also shows how to render depth and a smooth depth effect over the capture stream using a grayscale filter.

An example of how the algorithm works on the Apple website

Just a pink filter would clearly be insufficient for the contest, so I continued my research on possible options.
Metal renders advanced 3D graphics and performs parallel computing using graphics processing units (GPUs).
High-performance image processing can be done in iOS/macOS thanks to Metal. It is also known that images are represented as a matrix. Each element of the matrix represents the color of each pixel.
Similar to a Gaussian filter, a two-way filter is also defined as a weighted average of neighboring pixels. The difference is that this filter accounts for luminance variation, as shown below.

Here:

The weight value of W is constant in the Gaussian filter, but depends on the position of the target pixel, gid in the metallic one, in the bilateral one.
The value of the delta parameter means that the color or brightness of the neighboring pixel is not the same, very different from the color or brightness of the central pixel. If this parameter is large enough, it will return almost 0, i.e. the output value will be almost the same.
The Gaussian filter does not take luminance information into account, so edges can get lost.
In contrast, the bilateral filter uses information about neighboring pixels with brightness close to the parameter of neighboring pixels, so edges will be preserved.
To control the parameters, I created a class where parameters for radius and sigma values are passed in, which you can check just in real time. You can read more about this method here.

I fully agree with the opinion of the jury that the algorithm needs to be improved, as well as the application, but at least I hope that I managed to share at least the direction of approaches to implement this problem.

A version of this post is also available in Russian. If you found the information interesting you can support me by subscribing to my Telegram channel: https://t.me/iOS_Career. Thanks for reading and have a nice day!