iOS — Camera Frames Extraction

Boris Ohayon

Published in

iOS App Development

14 min readJan 3, 2017

Written by Boris Ohayon | January 3, 2017

Today, we learn how to access each frame of the camera feed by mastering some parts of the AVFoundation Framework 💪

Want to do some image processing? Computer vision? Then accessing every frame is the first step! What we are going to build together here is a simple class that notifies the caller every time a frame is available. Then, free to the caller to do whatever he wishes with the given images, in real time 💋

Without further ado, let’s jump right in!

TL;DR? This is what we create in this tutorial!

Step 1 — 👋 Create a new iOS project

Create a new project, Single View Application, nothing new here. Give it the name of your choosing, FrameExtraction for example. Select Swift as the language and you’re done.

Step 2 — 🎥 Our Foundation: AVFoundation

For this demo project, we’ll stick with the Storyboard and the already created ViewController.swift for the UI. The heart of our program will be in a new Swift file that we will name FrameExtractor.swift.

Create this file

The first and only line created by Apple is import Foundation. The framework we want to use to be able to use the camera and extract the frames is AVFoundation.

Replace the present line with the following

Create the hero of the day - FrameExtractor

Let’s recap what our algorithm is supposed to do.

It needs to access the camera
It should be customizable (front/back camera, orientation, quality…)
It should return every frame captured

Step 3 — Present 🤗 AVCaptureSession

Inside the AVFoundation framework, our best friend is the AVCaptureSession. The session coordinates the flow of data from the input to the output.

Create a strong reference to the capture session as an attribute of the class

Some of the things we are going to do with the session must take place asynchronously. Because we don’t want to block the main thread, we need to create a serial queue that will handle the work related to the session. To create a serial queue, let’s use DispatchQueue initializer and name it session queue. We will add a reference to this queue as an attribute at the beginning of the FrameExtractor class, so that we can access it later and suspend or resume it when need be.

Note that the name we give to that queue is just a way to track it later. If you happen to create two queues with the same label, they would still remain two different queues.

Step 4 — 👮 Permission to film?

In order to access the camera, the app is required to ask permission from the user. Inside AVFoundation, we can find a class named AVCaptureDevice that holds the properties pertaining to the underlying hardware. This class also remembers if the user previously authorized the use of the capture device through the authorizationStatus() function. This function returns a constant of an enum named AVAuthorizationStatus which can hold several values.

The cases that interest us are the .authorized and the .notDetermined.

If the user has already granted permission for the media capture, then we don’t need to do anything
If the permission is restricted or denied, the user previously refused media capture, we decide not to query him again
If the user has not yet granted or denied permission, we’ll prompt him

After picking a choice, the user can go whenever he wants in the phone’s settings and change the app’s permission.

Declare a class variable to track if the permission is granted

Before checking the authorizationStatus, we need to put our code somewhere! We want to check this as soon as the FrameExtractor object is created.

Create an init for the class, with no argument for now

Fill in the previous block with the cases that interest us

AVCaptureDevice also has a method named requestAccess that prompts the user for permission and takes as an argument a completion handler that is called once the user chose to grant or deny permission.

Complete the permission request block

Here, we need to watch out for retain cycles as we are referring to self in the completion handler: declare self as unowned inside the block. In this code particularly, there’s no actual need to add unowned self because even though the closure retains self, self doesn’t retain the closure anywhere and once we get out of the closure, it would release it’s retain on self. But because we might in the future add things and retain the closure, it can be a good thing not to forget.

We might also wonder why unowned and not weak? It is always recommended to use weak when it is not implicit that self outlives the closure. Here, it seems like we can be pretty sure that the closure will not outlive self, so we don’t need to declare it weak (and drag along a self that would now be optional). Feel free to read some articles about weak and unowned on the web, there are some quality explanations out there.

Declare self to be unowned

If ever we end up in the .notDetermined case, because the call to requestAccess is asynchronous (on an arbitrary dispatch queue), we need to suspend the session queue and resume it once we get a result from the user.

Suspend and resume the queue

Once we get out of the permission asking mechanism, we have to let the capture session previously created know that it can continue the configuration process.

When going into the .notDetermined case, the session queue is suspended and it is only resumed once we get an answer from the user. Hence, when the execution flow gets out of the switch, the session queue might still be suspended. To continue configuring the session only when we have a valid decision from the user, we can put the work inside the session queue in an async way. Once the session queue resumes, it will do the work inside this block. Again, as we are using self in the completion handler, we must not forget to use unowned.

Add the async configuration on the session queue

Clean the initializer and create specific methods.

One last thing to note with permission asking. Starting in iOS 10, your app will crash if you don’t specify in the plist a string describing the usage of the permission you are asking. The error will look like The app’s Info.plist must contain an NSCameraUsageDescription key.

Head over to the Info.plist file and add a NSCameraUsageDescription key, with value Used to capture frames, or anything that you wish that fits the purpose of your app.

Step 5 — 💁 Customize me!

We are now allowed to record. Let’s choose which camera we want to record from, the desired image quality etc…

For now, let’s say we’re using the front camera of the device and an image quality set to medium. The chosen image quality has to be a compromise between the beauty of the image and the computation power that will be needed after the image extraction, once we want to analyze it. Extracting an image is not an end in itself, if you want to do image analysis on each frame for example, choosing a high image quality is not advised.

Inside the AVFoundation framework, we can find the attributes we are looking for to customize the capture.

Add references to those attributes

Let’s put all the session configuration in a method named configureSession(). After checking if the user granted permission, we’ll setup the session.

Create the configurationSession method

Step 6 — 📽 Input

To choose the recording device we want to use, we need to setup an AVCaptureDevice. To get all the capture devices available, we can do

Using the filter function, we can try and select only the targeted capture device. For every device in the array of devices, check:

If it’s a video recording device
If it is the front camera

The first device in the resulting list, if it exists and is available, is the device we are looking for.

Continue the configureSession() function with a helper method

Now that we have a valid capture device, we can try to create an AVCaptureDeviceInput. This is a class that manipulates in a concrete way the data captured by the camera. The thing to watch out for is that creating an AVCaptureDeviceInput with an AVCaptureDevice can fail if the device can’t be opened: it might no longer be available, or it might already be in use for example. Because it can fail, we wrap it in a guard and a try?. Feel free to handle the errors as you wish.

Create an AVCaptureDeviceInput

Check if the capture device input can be added to the session, and add it 🤓

Step 7 — 🔢 Output

We now have to intercept each frame. AVCaptureVideoDataOutput is the class we’re going to use: it processes uncompressed frames from the video being captured. Just after adding the capture device input to the session

Create an instance of AVCaptureVideoDataOutput

The way AVCaptureVideoDataOutput works is by having a delegate object it can send each frame to. Our FrameExtractor class can perfectly be this delegate and receive those frames.

Modify the declaration of the class and conform to the protocol

Notice here that an error appears:

Type “FrameExtractor” does not conform to protocol “NSObjectProtocol”

If we go up the inheritance chain, we can see that the protocol requires methods that FrameExtractor doesn’t have. To fix this, we can give him those methods by making FrameExtractor a NSObject.

Make FrameExtractor inherit from NSObject

Because NSObject already has an init(), the init we built needs to override the NSObject’s one. Don’t forget to call the super’s implementation.

Override init

Now that FrameExtractor conforms to the protocol, we just need to specify that the delegate of the video output is FrameExtractor itself.

Set FrameExtractor as the delegate

This protocol has two optional methods, one being called every time a frame is available, the other one being called every time a frame is discarded. When setting the delegate, we need to specify a serial queue that will handle the capture of the frames. The two previous methods are called on this serial queue, and every frame processing must be done on this queue.

Sometimes, frame processing can require a lot of computing power and the next frame can be captured while the current frame has not been completely processed yet. If this happens, the next captured frame has to be dropped!

If we were to send every frame available to another queue and process them all, we could end up in a situation where frames pile on and the pile always increases. The frames come faster than we can treat them and we would have to handle ourselves the memory management that this would trigger!

The method that interests us is the one being called every time a new frame is available:

For now, let’s just add a simple print statement every time we received a frame.

Add the delegate method with a simple print statement

Let’s complete the configureSession() method by adding our video output to the session:

Step 8 — ✋ Pause and test

Let’s take a break and test our code!

Two last things before running the project, first we need to start the capture session and don’t forget that the capture session must be started on the dedicated serial queue we created before, as starting the session is a blocking call and we don’t want to block the UI.

Remember that we have two queues in play here, one is the session queue, the other one is the queue each frame is sent to. They are different.

Complete the session queue block to start the session

Lastly, inside the initial ViewController.swift that Xcode created, let’s just create a reference to a frame extractor to strongly keep it.

Create a reference to frame extractor

Here, we have to note that the frame extractor is declared outside of the viewDidLoad call, otherwise, the object created wouldn’t be retained and nothing would happen. Worse, a swift_abortRetainUnowned error would be thrown when the closures of FrameExtractor would be called, where we declared [unowned self], making the app crash.

👌 You can now run the project. If everything went well, Got a frame! should print again and again in the console. Yay!

Step 9 — 🤔 O Image, where art thou?

Now that we can capture every frame, let’s try to convert them to actual images and display them in an image view.

Let’s go back to the captureOutput(_:didOutputSampleBuffer:from) method. Remember that this method captures every frame. The captured buffer containing the frame information is given as an argument of the function, called sampleBuffer.

The first algorithm we could implement to transform a sample buffer into an actual UIImage could be

Transform the sample buffer into a CVImageBuffer
Transform the CVImageBuffer to a CIImage
Finally transform the CIImage to an UIImage

The problem with this method is that for certain iOS version, the UIImage initializer from a CIImage gives a memory leak:

The memory doesn’t seem to get freed, so it piles up which ends up in the OS killing our application eventually.

The second algorithm we can do starts like the previous one

Transform the sample buffer into a CVImageBuffer
Transform the CVImageBuffer to a CIImage
Transform the CIImage to a CGImage
Transform the CGImage to a UIImage

1.

Below the selectCaptureDevice() method, create a function that takes as an argument a sample buffer and returns, if all goes well, a UIImage.

Add the following method skeleton

Transform the sample buffer to a CVImageBuffer

Because the function can fail we wrap it in a guard let.

2.

Then, because we are using CIImage, don’t forget to import UIKit.

Create a CIImage from the image buffer

3.

This step is crucial and this is what differs with the first proposed implementation.

Create a CIContext and create a CGImage from this context

4.

We can finally create and return the underlying UIImage from the CGImage.

Create and return the UIImage

😴 I know this is long, but bear with me, we’re almost done!

Step 10 — 🙌 Here is the image!

We now have the actual UIImage! This is something we know how to work with! The final step is to let the caller know that an image is available. For this, we can create a protocol with only one function, captured(image) that is called whenever a UIImage is available. Then, free to the user to do whatever he wants with it. The only thing the calling class has to do is to be declared as the delegate of the protocol to which FrameExtractor will send each UIImage.

At the top of FrameExtractor.swift, create the protocol

Inside the FrameExtractor class, hold to the delegate with a weak attribute

Don’t forget that to avoid a retain cycle, the delegate must be declared weak. Because it is declared weak, we add the class keyword to the protocol’s declaration.

Modify the protocol’s declaration

Going back to the captureOutput(_:didOutputSampleBuffer:from) method, we can now send to the delegate the available UIImage. Remember that this method is called on a serial queue. We don’t want the caller the hassle of dealing with the serial queue, so we’ll send him the image on the main thread and the caller will be able to update the UI right away.

Dispatch to the main queue the UIImage created

Step 11 — 👮 Memory is leaking!

If we look closely at the previous code, we can see that we didn’t respect what we said before! During step 7, we said that every image processing must be done on the serial queue from which captureOutput is called. Here, we did the mistake of converting the buffer to the image inside the main thread!

This is what happens with the memory:

Move the image processing outside of the main thread

Note that because we don’t need the self anymore before the function call, we removed it to respect the best Swift coding guidelines.

This is now the state of the memory:

This is more pleasant, it seems steady over time, but we can see some ups and downs that we can tackle too. They happen because when converting the buffer to an image, we create a context every time. What we want is only create a context once.

Move the context as a class variable

This is now the state of the memory:

It appears to have fixed the memory leak! 🤓

Step 12 — 👓 Show me what you got!

Let’s quickly setup the UI.

Go Main.storyboard and add an ImageView that takes all the screen.
Set the content mode to aspect fill
Control-drag a reference to the ImageView inside ViewController.swift

Head now to ViewController.swift.

Conform ViewController to FrameExtractorDelegate

Set ViewController to be the delegate

Finally, add the captured method of the FrameExtractorDelegate protocol that we created and update the image view every time we receive an image. We can update the UI right away because captured is called on the main thread.

Add the captured protocol method

You can now press run!

Step 13 — 🙃 Oops, we’re upside down

Exactly, the default image sensor is oriented that way. This is why we need to correctly set the orientation the video output!

For this, we must get a connection from the video output and check if we can set the orientation the video. Also, if we’re using the front camera, the feed has to be mirrored.

Complete the configureSession() method

Also, go into your General project settings, and inside Deployment Info, uncheck Landscape Left and Landscape Right to force the app to remain in portrait.

Run the app again, and voilà!

🤘 That’s it !

You are now free to use the camera in the way that pleases you and unleash your creativity in the computer vision world!

Feel free to elaborate on this! For example, give the possibility to start and stop the frame extraction, because in the actual state, if we push a view controller on top, the frame extraction still occurs! In the current state of the app, the frame extraction is suspended when an interruption occurs (for example a phone call) and is resume afterwards.

A lot of stuff can be built on this, for example, make the user able to change the settings of the captured frames, the camera used, the quality etc… A lot of great tutorial are available online, but don’t forget that when changing the camera settings, this must be done while locking it. A whole new subject!

In order to do image processing on each UIImage captured, you might want to use OpenCV! Some neat functions, for example UIImageToMat gives the underlying pixel matrix behind an image. Once you get the matrix, the real show begins. Take a look at my other tutorials explaining how to use OpenCV with Swift!

Let me know the projects you built on this, I would love to take a look!

Like what you read? Let’s hit the ❤️ button so that everybody can read it too!

Want to support those tutorials? A bitcoin donation is always fun! 19MZUSLWszAdkaKVJNwrcVwgfvHxvqNMVU