Go to Code & Quips
Code & Quips
Letter sent on Apr 3, 2016

A Poorly Drawn Bowlex

Stitching Videos and Photos with AVFoundation

We were recently tasked with coming up with a solution that allows the user to inject photos into a previously recorded video. The end result would be the original video, with the user’s photos fading in and out at arbitrary times. AVMutableComposition immediately comes to mind as a good candidate to solve this problem. With AVMutableComposition, we can interleave various tracks (AVMutableCompositionTrack) into a video. AVFoundation gives us control over the timing of when each track fades and in out of the final video, and we can even animate the tracks with CGAffineTransform.

Get the Recorded Video’s Track

The AVURLAssetPreferPreciseDurationAndTimingKey is pretty important here. If you don’t specify this key, it will default to NO. This will result in potentially incorrect values being returned for the asset’s duration. Since we need fine control over exactly when to show and hide tracks, we really need our duration values to be correct. Set this to YES.

Create an AVMutableVideoCompositionLayerInstruction

Create an AVMutableVideoCompositionLayerInstruction with the track, and add the instructions to an array. “Instructions” are how we control when and how tracks appear in the finished video.

Adding User Photos

In our case we opted to make a model for handling user photos. We’ll call it ADLVideoImage. Here’s what the interface looks like:

Shout out to Mantle ❤️

For the sake of brevity we’re going to assume there’s just two ADLVideoImages. ADLVideoImage has a startTime and an endTime. We actually calculate these values based on a the contentOffset of a UIScrollView. Users are able to drag images into a presentation which is nested in a UIScrollView.

Below you’ll see the logic we use for handling the fading in and out of images.

If there’s an image following the current image, we grab it from the array. If the startTime of the second image overlaps that of the image that precedes it, we set the startTime of the second image equal to the endTime of the image that precedes it.

Next if the startTime of the image is less than or equal to one, this image is at the beginning of the video, so we set the fadeInTime to zero and the images’ starting opacity to one. We also check to make sure the images’ endTime does not exceed the length of the of the recorded video.

Turning Photos Into Videos

Images don’t have an AVAssetTrack. You’re shocked, I know. So, we turn the user’s photos into videos. Below we use the fadeIn and fadeOut logic we’ve already calculated.

We generate a clipAsset with the ADLVideoImage’s file path. Then we add an empty track to the mixComposition and insert a time range of the track. There’s also a bit of logic for handling the images’ opacity. As we mentioned earlier, we are using AVMutableVideoCompositionLayerInstruction.

AVMutableVideoCompositionLayerInstruction is a mutable subclass of AVVideoCompositionLayerInstruction that is used to modify the transform, cropping, and opacity ramps to apply to a given track in a composition.

The ALVideoEditor has a size. We have users that prefer to make videos that are 16:9 and others that don’t. Depending on the size of the video you can apply a transform to the AVMutableVideoCompositionLayerInstruction.

Exporting The Video

Finally, we’ll use AVAssetExportSession to export the video with all our images spliced in. We set the layerInstructions on the mainTrack’s instructions. We give our assetExport an export path, file type, and the final AVMutableVideoComposition.

This process can take a bit of time depending on the type of device a user is running. Older devices can take a few minutes. It’s definitely important to implement some form of progress indicator

AVFoundation is not for the faint of heart, but it is quite powerful.