Building native video Pins

Bin Liu | Pinterest engineer, Product

Billions of videos are viewed across the internet every day, but video on Pinterest is unique. On Pinterest, you’ve always been able to save videos from around the web, and in 2013, we made it possible to play them right on the Pin when you tap. Videos help people visualize how to bring ideas to life, which make them the perfect medium for Pinterest. In fact, we found that people who watch videos on their smartphones are more than twice as likely to take action when they come across a video on Pinterest than on another app.

Today we’re making watching videos on Pinterest even more seamless with the launch of our first native video player. Starting today, people can watch native videos from experts and tastemakers on Pinterest in the newly launched Explore, as well as in their home feed as those they follow save videos.

In this post, we’ll dive into how we built video into our iOS app including best practices we learned while working with AVFoundation and building a native video experience from scratch.

AVFoundation 101

Here we’ll introduce a few key concepts in AVFoundation as context for the rest of the post. For those of you who have been working with AVFoundation or AVKit, feel free to skip ahead.

We built our video experience using AVFoundation, the standard framework provided by Apple that sits on top of Core Media, Core Audio and Core Animation. It provides an interface to a powerful framework which allows developers to create time-based audiovisual media.

Terms

  • AVPlayer: The engine of video playing on iOS that provides control interfaces for video playback.
  • /* Examples of Provided APIs */ - (AVPlayerStatus)status; - play(); - pause(); - (void)seekToTime:(CMTime)time; - addPeriodicTimeObserverForInterval; - addBoundaryTimeObserverForTimes;
  • AVPlayerLayer: A subclass of CALayer to which an AVPlayer directs its visual output. Think of it as a monitor: it renders content for the connected AVPlayer. This is how you create one:
  • + (AVPlayerLayer *)playerLayerWithPlayer:(AVPlayer *)player;
  • One of our favorite features built into AVFoundation enables developers to create multiple AVPlayerLayers against a single AVPlayer so all AVPlayerLayers play the output of that AVPlayer in sync. This made modularizing player container views and transitions between ViewControllers easy to implement.
  • AVPlayerItem: An AVPlayerItem is initialized with an AVAsset and represents a playable resource.

Goals

We had three main goals while architecting the video framework:

  1. Meet current and potential long-term product needs. Since we’re building a client-side infrastructure to accommodate future product needs, we made the video components modular, portable and robust.
  2. Be resource savvy since videos can be extremely resource intensive and can cause OOM crashes or out-of-disk-space shutdowns. Additionally, if we weren’t careful, we could drain a user’s data or their battery quickly.
  3. Prioritize performance and minimize frame drops in scrolling and in transitions. Playing videos is inherently main thread intensive, so we had to balance the video playback experience and its potential UI performance impacts.

A User Scenario

Before we dive into the architecture, let’s consider the following user scenario:

  1. User finds a video Pin autoplay-ing in the home feed.
  2. User taps on it and the video zooms up into Pin Closeup and continue to play.
  3. User taps on the full-screen button and the video zooms into full-screen view.

In this scenario, there are three ViewControllers involved, Feed, Pin close up and Full Screen. Each is complex and modular, but they all play the same video continuously and are connected by two different interactive transitions so the video plays continuously and performantly during transitions. In addition, each has different playback control interfaces.

Overview

From the user scenario described above, there are a few baseline requirements we wanted to build:

  1. Ensure all ViewControllers that play the same video share one AVPlayer-AVPlayerItem pair to avoid each loading separate buffers and maintaining separate playback states.
  2. Because of #1, the ViewControllers shouldn’t own the AVPlayer-AVPlayerItem pair.
  3. Each ViewController should manage their own view hierarchy. As ViewControllers are heavily used throughout the app, managing hand-off could become challenging if they share the same view.

We built our first iteration of the architecture with the following key components:

  • PlayerController (a data structure) holds the AVPlayer-AVPlayerItem pair and provides APIs to control playback and buffering.
  • PlayerControllerManager (a singleton manager) holds strong references to PlayerControllers through a dictionary keyed by video identifiers.
  • PlayerLayerContainer (a container view class) is used in all ViewControllers and contains an AVPlayerLayer created off the AVPlayer from the PlayerController.

All this terminology can be confusing, so the below diagram of this base architecture should help.

At the top of the diagram is the singleton manager that holds strong references to the PlayerControllers (only weak references are allowed anywhere else in the app). To access the PlayerController, you just need the video ID. The central manager design enables us to prefetch or clear when needed.

The gray boxes in the ViewControllers are the PlayerLayerContainers. Each creates an AVPlayerLayer through the shared AVPlayer and puts the AVPlayerLayer in its view hierarchy. This helps with making these ViewControllers modular, however there’s a caveat we’ll get to later.

Controls

The control interfaces can be generalized into two categories: pushers and pullers. The pushers are ones Pinners can take action on, such as play/pause, mute/unmute, seek, etc. The pullers control elements that listen for playback state changes and update themselves accordingly (e.g. progress indicator, buffering indicator, timestamps, etc.).

The pusher controls are easy, but the pullers are slightly trickier. In order to reflect the accurate playback state, you have to listen to many different broadcasted notifications as well as setting up Key-Value-Observations on various property values. To highlight a couple:

/* Notifications names */
AVPlayerItemDidPlayToEndTimeNotification
AVPlayerItemPlaybackStalledNotification
AVPlayerItemNewAccessLogEntryNotification


/* Properties to KVO */
BOOL playbackLikelyToKeepUp;
NSArray<NSValue *> *loadedTimeRanges;

One thing that can often trip us iOS developers is setting up and removing observers. What can be particularly troublesome is seeing a bunch of observer-related code sprinkled everywhere. We decided to wrap all of this logic in an EventObserver object which handles removing listeners and observers based on its own object lifecycle. It only listens for specific events based on its delegate flags, because handling event callbacks can be pretty main thread intensive.

Caveats

We don’t live in a perfect world and there are always hidden caveats in the SDK that can be tricky.

  • Problem: Creating an AVPlayerLayer against an AVPlayer while a video is playing caused a noticeable skip in playback. Because of this, we couldn’t allow each VC to have its own AVPlayerLayer. Instead they’d have to share the same player layer to avoid the skip in playback.
  • Solution: We extended the PlayerController to own the shared instance of AVPlayerLayer together with the AVPlayer-AVPlayerItem pair while at the same time refactoring the way PlayerLayerContainer worked. We came up with a solution inspired by the First Responder analogy from UIKit, where the PlayerLayerContainer would claim itself as the current presenter of the PlayerController whenever it became visible and then return the AVPlayerLayer so the container could add it to its view hierarchy.
  • Problem: AVPlayerLayer holds strong references to its AVPlayer, so it can’t clear out the buffer when needed.
  • Solution: The refactor in problem #1 helped resolve this issue. When PlayerLayerContainer becomes invisible, it resigns current presenter and removes its reference to the AVPlayerLayer, allowing the PlayerController to remove all references to the AVPlayer.
  • Problem: Error code: “Cannot decode” due to the limitation of up to four “render pipelines” allowed in an app (reference Stackoverflow). We noticed an app can create more than four AVPlayer instances, but can’t connect more than four AVPlayers with AVPlayerItems.
  • Solution: We do not set an AVPlayer’s currentItem to the AVPlayerItem until we need to start playback. Then, we remove the currentItem as soon as the playback is no longer needed.

Other tips

  1. The AVPlayerItemNewAccessLogEntryNotification carries the latest entry of AVPlayerItemAccessLogEvent which contains a lot of useful information and has proved to be quite helpful for getting insights on video performance.
  2. The built-in buffering strategy for video playback favors avoiding stalls over time to begin playback. On poor network connections, such as a 3G network, the player would sometimes buffer more than 10 seconds of data before beginning to play. Apple introduced a new API in iOS10, preferredForwardBufferDuration, that allows developers to adjust this built-in behavior. Unfortunately, there’s nothing we can do for iOS 9 and below.
  3. Audio is important for a great video experience, yet it can also be unwanted at times, especially for mobile users. iOS wrapped complex sound behaviors into a few standard categories, and we currently use a mix of the two most-used categories:
  • AVAudioSessionCategoryPlayback: Two important traits of this audio category:
  • Your app cuts off any music/video apps playing in the background.
  • Audio streams of your app will play sound regardless of your iPhone’s hardware ringer mute control.
  • AVAudioSessionCategoryAmbient: When your app’s audio session is set to this category, audio streams will respect the hardware ringer mute switch, meaning you won’t be able to play any sound when the user’s hardware mute is ON.
     Note that your app doesn’t need to stick to one of these provided categories. We use different categories for different user scenarios.
  1. The playbackLikelyToKeepUp property on AVPlayerItem is your best friend if you need to display loading spinners when the videos are buffering.
  2. If you’re using HLS protocol for video streams, make good use of preferredPeakBitRate property. We set it to be a lower value when the user is in cellular so the first few segments loads faster (resulting in shorter wait times) and so we don’t drain a user’s cellular data by loading an unnecessarily high resolution video stream.

Forward thinking product development

When we build new product features into the codebase, we have to think ahead in terms of product progression so we make the right architectural decisions. For example, we wanted to build a way for native videos to play upon close ups of Pins, and we anticipated next building an immersive full-screen view that would play the same video continuously. After that, we could consider adding a custom transition in between, and the video could be play continuously during the transition. The list goes on.

Thinking ahead benefited us in so many ways. In particular, because we had most of the components built, we were able to build an immersive iOS video experience for Explore in just a few weeks.

Try it out the new framework for yourself by visiting the new Pinterest Explore today.

Acknowledgements: Huge shoutouts to Ricky Cancro who worked closely and tireless with me on designing and implementing all of the video components from ground up. Thanks to Steven Ramkumar, Scott Goodson and Max Gu for providing invaluable feedback and suggestions along the way!