iOS Image Face Centering Using Apple’s Vision Framework

In this tutorial, you’ll learn how to utilize Apple’s Vision Framework in your iOS app to achieve a better user experience by detecting and centering faces in your images.

Kirill Kudaev
Ancestry Product & Technology
6 min readOct 6, 2020

--

One of the latest Ancestry app features — Top Picks section.

Problem

During the development of multiple Ancestry app features, our team ran into a similar issue. We often had to display images of different aspect ratios within an ImageView for which the aspect ratio was fixed.

The image below illustrates the problem this creates. On the left, you can see the initial uncropped image we get from our server. In the middle, you can see what happens if we try to display that image within an ImageView that has a fixed aspect ratio. The boundaries of the ImageView are marked in red.

Initial image (left). Trying to set that image to a horizontal ImageView (middle). Desired result (right).

This current ImageView has an aspectFill ratio so it crops the top and the bottom parts of our vertical image. The user only sees the content inside of the red border. As a result, our users don’t see the faces of people we’re trying to show them. This significantly hurts the experience of our product. Not good.

So how we can fix this? Wouldn’t it be nice to somehow identify the face position on our image and then crop the image accordingly? The desired result is illustrated on the right in green.

Okay, great… But how can we achieve this desired behavior? The good news is that with iOS 11 Apple released a new Vision Framework that has face detection capabilities built-in!

Solution

Let’s utilize Apple’s Vision Framework to detect and center faces in our images. First of all, we should understand what the Vision API has to offer in our case:

VNImageRequestHandler allows you to process image analysis requests by calling itsperform(_:) function and passing an array of requests to conform to VNRequest protocol.

VNDetectFaceRectanglesRequest is one of those requests. This specific request allows us to find faces and their coordinates within an image. Exactly what we need in our case.

VNFaceObservation is a type of observation that results from a VNDetectFaceRectanglesRequest. This object contains the necessary facial-feature information.

Step One. Detecting the facial-feature information.

Let’s start with getting a VNFaceObservation for each of the faces on our image and not worry about the image cropping logic yet. We are going to add the following extension to CGImage:

Note: Vision API is only available since iOS 11. As you can see in the code snippet, we have to mark our functions with @available(iOS 11.0, *)

Here’s what’s going on in the above code:

  1. First of all, we create a VNDetectFaceRectanglesRequest.
  2. If for some reason our request returns an error, let’s return .failure.
  3. If no faces were found, return .notFound result.
  4. Now we can finally iterate through all of the results and add them to our faces array. We’ll need this array in Step Two.
  5. For now, let’s just print the total number of faces we found and return the original uncropped image in our .success.
  6. Finally, we call our VNDetectFaceRectanglesRequest using perform(_:) function of VNImageRequestHandler.

Step Two. Cropping the initial image.

Now let’s add some more logic to crop the initial image based on the facial-feature information we got from Step One:

Let’s walk through the above code:

  1. The responsibility of this function is to return a CGRect that tells us how to crop our initial image.
  2. Let’s look into how this function works. First of all, we initialize our total... variables. These are important for cases when more than one face was detected in our image.
  3. minX and minY variables are used to keep track of the farthest bottom-left face in our image. We’ll need those coordinates later.
  4. Let’s now iterate through all the faces we were able to identify. P
  5. Why do we multiply face.boundingBox.width by CGFloat(width)? face.boundingBox.width returns a number between 0 and 1 that represents the face width in proportion to the full width of the image. We multiply that number by CGFloat(width)to get the absolute width of the face. Same issue with face.boundingBox.height and face.boundingBox.origin.x.
  6. This coordinate space transformation might also seem a bit confusing. First, we do 1 — face.boundingBox.origin.y to get the relative y position from the top instead of the bottom of our image. We then multiply that number by CGFloat(height) and subtract h to correctly represent the absolute y coordinate in the flipped-coordinate space for our CGRect initialization later on. In the flipped-coordinate space, the origin is in the upper-left corner and the rectangle extends towards the lower-right corner.
  7. Calculate average width, height, x, and y coordinates of faces by dividing our total... variables by the number of faces in the image.
  8. This line might seem a bit tricky. Here we calculate how much extra space we want to add around the face(s) that were found. For that, we calculate the distance between our avgX (which represents the average x coordinate of all faces in the photo) and minX (left-most face coordinate). We then add a custom margin parameter that we will talk about later in this article.
  9. We can now use our offset and averages to create a CGRect that represents how we need to crop our initial image.
  10. Now let’s use that CGRect to crop the initial image.
  11. Return .notFound if cropping(to rect: CGRect) returned nil
  12. Return .success with our new face-centered image!

Let’s talk more about the mysterious margin parameter in our faceCrop(margin: CGFloat = 200, ...) method. This param is necessary in order to specify the amount of extra space on each side of the face before we return the final cropped image. As you can see in the table below, this extra margin prevents us from returning an image just containing the face of the person:

Importance of margin param for images with one / multiple faces.

Implementation

Now we can go to our ViewController and take advantage of this new extension:

There are a few things to note here:

  1. First of all, we get the CGImage from our UIImage. If that doesn't work, we just display our original uncropped image.
  2. We then switch to a global queue with the highest .userInteractive priority to call ourfaceCrop() method.
  3. If the face cropping worked successfully, we then switch back to the main queue and display the new cropped image.
  4. If our faceCrop() method didn’t find any faces or crop the image for some other reason, we just display the original uncropped image.

That’s it. We have successfully detected and centered faces in our image!

Conclusion

Our team uses face centering on both Ancestry and AncestryDNA iOS apps. This approach has helped us significantly improve the quality of the images we display to our users. The best part is, we were able to achieve that in less than a 100 lines of code!

We hope this article will inspire you to utilize this approach and improve the user experience of your app as well. Feel free to try this code in your project and please let us know what you think!

Where to Go From Here?

Consider visiting our FaceCrop repo on GitHub.

You can also check out this WWDC 2019 video for a deeper dive into VMImageRequestHandler and VNFaceObservation functionality.

Finally, you can always take advantage of Apple’s documentation to learn more about other Vision API features.

Big thanks to Anastasios Grigoriou for his contribution to this project.

If you’re interested in joining Ancestry, we’re hiring! Feel free to check out our careers page for more info.

--

--