Replicating Instagram’s Shared Transition on iOS (UIKit) — Part I.

Kolos Foltányi
Supercharge's Digital Product Guide
16 min readNov 10, 2023

Welcome to Part I. of our two-part series on replicating Instagram’s shared frame transition with UIKit. This segment focuses on the transition’s fundamental implementation. Part II. will expand on this by implementing the gesture-driven interactive portion of the transition.

You can find the finished implementation here.

Introduction

Ever since its introduction in an update, I’ve always admired Instagram’s image detail screen transition. Interactive, gesture-driven shared-element transitions have become a key part of iOS, and you’ll encounter them in many popular apps, including native ones. Due to their widespread usage, these animations have become familiar to iOS users, particularly when used with image components.

To start off, I have created a profile and detail screen similar to Instagram’s design. The profile screen view controller has a standard collection view populated with square-shaped image cells, each leading to the detail screen. The detail screen view controller shows the image in full size accompanied by some additional information.

Here is what the default push and pop animation looks like:

The Default Push and Pop Transition

In this tutorial, you will learn how to go from the above to the following shared element transition that Instagram uses:

The Final Transition

Before we dive in, let’s dissect this animation in detail and lay out our requirements. The push and pop animation displayed above:

  • Is a shared element transition: Shared element transitions can be used when the source and destination screens share a common component — in our case, the same image. These transitions create the illusion that the source component is transforming into its final destination as the transition progresses. In reality, of course, these are two separate components that we must carefully synchronize during the animation to produce this effect.
  • Is gesture-driven and interactive: The pop animation responds to a drag gesture and can be reversed by counteracting the initiating gesture. Reversing the gesture cancels the pop transition.
  • Handles different source and destination aspect ratios: This detail may initially seem unrelated but plays a critical role in our implementation. Notice how the source image view is squared, whereas the destination image view uses an aspect-fit content mode.
  • The transition uses a mask to hide and show the rest of the detail screen. As you tap an image, it transitions to its target position, simultaneously revealing the detail screen above and below the image.
  • The source view gets a dark overlay during the transitions.

To better understand the above, here is a slowed version of the animation we are aiming for:

The Final Transition (Slowed)

With these requirements in mind, let’s start implementing the transition step by step.

Custom View Controller Transitions

Let’s begin by exploring the view controller transitioning APIs and creating a very simple custom zoom transition.

Custom transitions on iOS can be created by providing an implementation of the UIViewControllerAnimatedTransitioning protocol. We are going to refer to these objects as animation controllers. During transitions, UIKit will call our animation controller with the necessary context, allowing us to execute our custom animations in place of the default ones.

To customise view controller presentation and dismissal, you need to provide the animation controller through the UIViewControllerTransitioningDelegate.

However, in our example, we aim to customise push and pop transitions which are handled by the navigation controller. In this case, you should supply your custom animator via the UINavigationControllerDelegate.

Let’s start by implementing our custom animator. The UIViewControllerAnimatedTransitioning protocol has two requirements we need to implement:

  • transitionDuration(using:) must return the duration of the transition
  • animateTransition(using:) will be used to execute our custom animations when needed

Both of these methods have a transitionContext parameter of type UIViewControllerContextTransitioning that plays a critical role in our implementations. The system (the navigation controller) will provide all necessary components in this parameter we need for our custom animation.

Let’s create our first custom transition by conforming to this protocol:

class SharedTransitionAnimator: NSObject {
enum Transition {
case push
case pop
}
// 1
var transition: Transition = .push
}

extension SharedTransitionAnimator: UIViewControllerAnimatedTransitioning {
func transitionDuration(
using transitionContext: UIViewControllerContextTransitioning?
) -> TimeInterval { 3 }

func animateTransition(
using transitionContext: UIViewControllerContextTransitioning
) {
// 2
guard let toView = transitionContext.view(forKey: .to),
let fromView = transitionContext.view(forKey: .from) else {
transitionContext.completeTransition(false)
return
}
if transition == .push {
// 3
transitionContext.containerView.addSubview(toView)
toView.transform = .init(scaleX: 0.001, y: 0.001)
} else {
// 4
transitionContext.containerView.insertSubview(toView, belowSubview: fromView)
}
UIView.animate(
withDuration: 2,
animations: {
if self.transition == .push {
// 5
toView.transform = .identity
} else {
// 6
fromView.transform = .init(scaleX: 0.001, y: 0.001)
}
}, completion: { _ in
// 7
transitionContext.completeTransition(true)
})
}
}

Here is what the implementation does:

  1. We categorize the transition type we’re animating with an enum.
  2. We retrieve the origin and destination views from the transition context. These represent the primary views of the source and destination view controllers.
  3. The provided transition context offers a containerView attribute, acting as the superview for all transition participants. UIKit automatically makes this containerView visible and, by default, contains the source view. For a push transition, we append the destination view to this container and apply an initial scale transformation.
  4. For a pop transition, we position the destination view beneath the source view.
  5. Our push animation involves scaling the destination view back to its original size.
  6. The pop animation scales down the source view, unmasking the destination view below (given its prior placement during setup).
  7. Finally, we need to tell UIKit that we are done with our animations allowing the transition to finalize.

Once the animation controller is set up, we must instruct the navigation controller to use it for animating transitions instead of the default one.

First we set our view controller as the navigation controller’s delegate:

class ProfileScreen: UIViewController {
private let transitionAnimator = SharedTransitionAnimator()

override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
navigationController?.delegate = self
}
}

Then provide our custom animator using a delegate method called navigationController(animationControllerFor:form:to:)

extension ProfileScreen: UINavigationControllerDelegate {
func navigationController(
_ navigationController: UINavigationController,
animationControllerFor operation: UINavigationController.Operation,
from fromVC: UIViewController,
to toVC: UIViewController) -> UIViewControllerAnimatedTransitioning? {
if fromVC is Self, toVC is DetailScreen {
transitionAnimator.transition = .push
return transitionAnimator
}
if toVC is Self, fromVC is DetailScreen {
transitionAnimator.transition = .pop
return transitionAnimator
}
return nil
}
}

Before handing our animator over to the navigation controller, we ensure that the current transition type is set within our animator object.

With that, we have implemented our first custom push/pop transition which looks like this:

A Custom Zoom Transition

Not bad given the amount of code we needed to achieve this. Armed with the fundamentals of custom transitions let’s proceed to construct the next crucial component: the shared frame animation itself.

Math

This section delves deep into frame calculations. If that’s not your cup of tea right now (which I would totally get), feel free to skip ahead.

We’ll be implementing a set of helper functions that will help us calculate the necessary frames and transformations needed for our shared element transition.

I recommend experimenting with the utilities, using the test view controller bundled in the project. Simply set the test property of the app delegate to true, and you’ll be able to interactively explore the described examples within the TestViewController.

For clarity, let’s momentarily forget about view controllers and simplify how we think about the animation with the following rects:

Simplified Example Layout of the Participants in the Transition

In this streamlined representation:

  • Rectangle A (blue) represents the squared image cell from the source screen.
  • Rectangle B (red) signifies the destination screen.
  • Rectangle C (yellow) corresponds to the large image view on the destination screen.

Throughout the transition, all these elements (A, B, C) will be present in the view hierarchy (the intricacies of which we’ll delve into later). In both “push” and “pop” transitions, we use the same transformation on rectangle B to make A and C overlap. For the “push” transition, the transformation is removed from the detail screen (the destination). In contrast, for the “pop” transition, we apply this transformation to the detail screen (which is now the source in a pop scenario).

Let’s write a helper function that calculates our transform based on the three participating rects:

extension CGAffineTransform {
// 1.
static func transform(parent: CGRect,
soChild child: CGRect,
matches rect: CGRect) -> Self {
// 2.
let scaleX = rect.width / child.width
let scaleY = rect.height / child.height

// 3.
let offsetX = rect.midX - parent.midX
let offsetY = rect.midY - parent.midY
let centerOffsetX = (parent.midX - child.midX) * scaleX
let centerOffsetY = (parent.midY - child.midY) * scaleY

let translateX = offsetX + centerOffsetX
let translateY = offsetY + centerOffsetY

// 4.
let scale = CGAffineTransform(scaleX: scaleX, y: scaleY)
let translate = CGAffineTransform(translationX: translateX, y: translateY)

return scale.concatenating(translate)
}
}
  1. It takes three input rectangles in the same coordinate system:
    - parent: The rectangle to which you’ll apply the transform (B).
    - child: A rectangle within the parent which you aim to overlap with another rectangle post-transformation (C).
    - rect: The target rectangle whose geometry you want the child rectangle to match (A).
  2. We calculate the scaling factor by comparing the child rectangle’s dimensions with the target rectangle.
  3. Determine the necessary translation to align the centers of the rectangles. The translation is made up of two components in both dimensions:
    - first we match the center of the parent rectangle to the target
    - we adjust the offset so that the the child rectangle’s center is aligned with the target (this adjustment is calculated as the center difference of the parent and child rect in their scaled form)
  4. The final step is to combine the transformations into one CGAffineTransform, ready to be applied to our views.

Now let’s apply the above calculation to our example:

UIView.animate(withDuration: 2) {
rectBView.transform = .transform(
parent: rectB,
soChild: rectC,
matches: rectA
)
}

Here is what it looks like:

Applying the Calculated Transform to the Example Layout

We’ve managed to align the yellow frame with the blue one, but there’s a hitch in the implementation: the aspect ratio of the yellow rectangle alters as it conforms to the blue one.

This kind of image resizing, known as ‘stretch,’ can lead to problems when we replace the yellow rectangle with an image view. The image would be distorted because it is reshaped from a rectangular to a square form.

Let’s pause for a moment to revisit the methods by which an image can be adapted to a new size. Understanding these will be crucial as we delve deeper into this section:

Image Placement Content Modes

To address the stretching issue described above, we’ll refine our utility to employ the “aspect fill” approach rather than a simple stretch when fitting the yellow rectangle into the blue one:

extension CGTransform {
static func transform(parent: CGRect,
soChild child: CGRect,
aspectFills rect: CGRect) -> Self {
// 1.
let childRatio = child.width / child.height
let rectRatio = rect.width / rect.height

let scaleX = rect.width / child.width
let scaleY = rect.height / child.height

// 2.
let scaleFactor = rectRatio < childRatio ? scaleY : scaleX

let offsetX = rect.midX - parent.midX
let offsetY = rect.midY - parent.midY
let centerOffsetX = (parent.midX - child.midX) * scaleFactor
let centerOffsetY = (parent.midY - child.midY) * scaleFactor

let translateX = offsetX + centerOffsetX
let translateY = offsetY + centerOffsetY

let scaleTransform = CGAffineTransform(scaleX: scaleFactor, y: scaleFactor)
let translateTransform = CGAffineTransform(translationX: translateX, y: translateY)

return scaleTransform.concatenating(translateTransform)
}
}

Here is how we do it:

  1. Calculate the aspect ratio (width divided by height) for both the child and the target rectangles.
  2. Determine the scaling dimension based on a comparison of the two ratios, ensuring we maintain the original aspect while fitting the rectangle.

With this enhanced solution, the yellow rectangle retains its aspect ratio while being transformed to fit perfectly within the target rectangle using the “aspect fill” approach:

Using Aspect Fill Instead of Matching the Frames

Using an aspect fill transformation, we’ve indeed solved the issue of image distortion, but we introduced another one: parts of the yellow rectangle now extend beyond the boundaries of the blue rectangle. This is the perfect time to implement the second challenging part of the transition, the mask animation.

If you carefully examine the original transition, you’ll notice that during the animation, parts of the destination screen gradually become visible both above and below the enlarged target image. This gradual reveal can be achieved by adjusting the boundaries of a mask applied to the destination view (rectangle B).

If we apply this new mask to the destination screen correctly, we can elegantly handle the overhang from the yellow rectangle, ensuring that any overlapping sections remain invisible.

“Correctly” however is easier said than done in this context. When calculating the frame of the mask we need to keep in mind that we are already applying an affine transformation that will inevitably alter the mask’s dimensions too.

The mask can be calculated by aspect fitting the blue source rect into the yellow destination rect:

Calculating the Frame of the Mask (green)

This way, the green rectangle (therefore the visible parts) will completely match the blue one after the transformation is applied.

Let’s implement a utility that calculates just that:

extension CGRect {
// 1.
func aspectFit(to frame: CGRect) -> CGRect {
let ratio = width / height
let frameRatio = frame.width / frame.height
if frameRatio < ratio {
return aspectFitWidth(to: frame)
} else {
return aspectFitHeight(to: frame)
}
}

// 2.
func aspectFitWidth(to frame: CGRect) -> CGRect {
let ratio = width / height
let height = frame.width * ratio
let offsetY = (frame.height - height) / 2
let origin = CGPoint(x: frame.origin.x, y: frame.origin.y + offsetY)
let size = CGSize(width: frame.width, height: height)
return CGRect(origin: origin, size: size)
}

// 3.
func aspectFitHeight(to frame: CGRect) -> CGRect {
let ratio = height / width
let width = frame.height * ratio
let offsetX = (frame.width - width) / 2
let origin = CGPoint(x: frame.origin.x + offsetX, y: frame.origin.y)
let size = CGSize(width: width, height: frame.height)
return CGRect(origin: origin, size: size)
}
}
  1. We start by comparing the aspect ratios, which helps us decide the dimension to use for aspect fitting.
  2. We implement aspect fitting based on width. The outcome is a rectangle that retains the original’s aspect ratio, but its width is adjusted to match that of the given frame.
  3. The same implementation but based on height.

Let’s introduce a new green layer to our example animation (serving as a simulated mask) and define its frame using our latest utility. The result looks like the following:

Adding a Mask Layer to the Animation

As you can see, the green rectangle is transformed to align perfectly with the blue source rectangle.

To illustrate the final form of the transition, here is what happens when we initiate the mask to match the red rectangle and animate it to its final form, in tandem with the scale transformation:

Animating the Mask Layer

With this final utility established, we are ready to implement the transition and finally forget about boring rectangles. You can take a deep breath as we are done with the hardest part of the implementation.

Shared Frame Transition

In this section we are going to put our learnings from the previous two sections together and create the final animation controller, responsible for executing the shared frame transition.

Things will get complicated quickly so let’s split our animateTransition function of SharedTransitionAnimator into two distinct methods for the push and pop animations:

func animateTransition(using transitionContext: UIViewControllerContextTransitioning) {
switch transition {
case .push:
pushAnimation(context: transitionContext)
case .pop:
popAnimation(context: transitionContext)
}
}

private func pushAnimation(context: UIViewControllerContextTransitioning) {}
private func popAnimation(context: UIViewControllerContextTransitioning) {}

To calculate and apply the necessary affine transformations, we must gather the following information:

  • fromView: The root view of the source view controller.
  • fromRect: The rect that corresponds to the source image view within fromView.
  • toView: The root view of the destination view controller.
  • toRect: The rectangle that corresponds to the destination image view within toView.

As discussed earlier, we can fetch fromView and toView directly from the transitionContext. However, that’s not the case for fromRect and toRect.

Instead, we do have access to the source and destination view controllers. To retrieve the essential rectangles inside the animator, we’ll define a new protocol named SharedTransitioning, to which both our view controllers will conform. This will enable us to retrieve those rects directly from within the animator.

Before we query the frames, there is one important detail we need to address. Our animator will do all calculations in a common global coordinate system, specifically the window’s. To account for this let’s add an extension to map frames to the global coordinate system:

extension UIView {
var frameInWindow: CGRect? {
superview?.convert(frame, to: nil)
}
}

Now let’s add the SharedTransitioning protocol as well as the two conformances:

protocol SharedTransitioning {
var sharedFrame: CGRect { get }
}

extension ProfileScreen: SharedTransitioning {
var sharedFrame: CGRect {
guard let selectedIndexPath,
let cell = collectionView.cellForItem(at: selectedIndexPath),
let frame = cell.frameInWindow else { return .zero }
return frame
}
}

extension DetailScreen: SharedTransitioning {
var sharedFrame: CGRect {
imageView.frameInWindow ?? .zero
}
}

To simplify the process of retrieving the relevant frames from the transitionContext, we’ll craft an extension for UIViewControllerContextTransitioning as follows:

extension UIViewControllerContextTransitioning {
func sharedFrame(forKey key: UITransitionContextViewControllerKey) -> CGRect? {
let viewController = viewController(forKey: key)
viewController?.view.layoutIfNeeded()
return (viewController as? SharedTransitioning)?.sharedFrame
}
}

Having established these utilities, we’re now equipped to gather all essential components for our animation.

Next, we’ll create a setup function that will return all components as a tuple and also arrange them appropriately within the transitionContext’s container view, ensuring they’re positioned correctly for our animation.

private func setup(
with context: UIViewControllerContextTransitioning
) -> (UIView, CGRect, UIView, CGRect)? {
// 1
guard let toView = context.view(forKey: .to),
let fromView = context.view(forKey: .from) else {
return nil
}
// 2
if transition == .push {
context.containerView.addSubview(toView)
} else {
context.containerView.insertSubview(toView, belowSubview: fromView)
}
// 3
guard let toFrame = context.sharedFrame(forKey: .to),
let fromFrame = context.sharedFrame(forKey: .from) else {
return nil
}
// 4
return (fromView, fromFrame, toView, toFrame)
}

Here is what goes on here:

  1. We retrieve the toView and fromView from the context. (This step should be familiar from our part one.)
  2. The toView is added to the containerView, ensuring the correct z positioning. (Same as in part two.)
  3. With our new extension in play, we fetch the shared frames directly from the context.
  4. Lastly, we group all components into a tuple and return it.

With all prerequisites in place we can now finally implement the long-awaited shared frame animation inside the pushAnimation function of our animator:

private func pushAnimation(context: UIViewControllerContextTransitioning) {
// 1
guard let (fromView, fromFrame, toView, toFrame) = setup(with: context) else {
context.completeTransition(false)
return
}

// 2
let transform: CGAffineTransform = .transform(
parent: toView.frame,
soChild: toFrame,
aspectFills: fromFrame
)
toView.transform = transform

// 3
let maskFrame = fromFrame.aspectFit(to: toFrame)
let mask = UIView(frame: maskFrame).then {
$0.layer.cornerCurve = .continuous
$0.backgroundColor = .black
}
toView.mask = mask

// 4
let placeholder = UIView().then {
$0.backgroundColor = .white
$0.frame = fromFrame
}
fromView.addSubview(placeholder)

// 5
let overlay = UIView().then {
$0.backgroundColor = .black
$0.layer.opacity = 0
$0.frame = fromView.frame
}
fromView.addSubview(overlay)

UIView.animate(withDuration: 0.25) {
// 6
toView.transform = .identity
mask.frame = toView.frame
mask.layer.cornerRadius = 39
overlay.layer.opacity = 0.5
} completion: { _ in
// 7
toView.mask = nil
overlay.removeFromSuperview()
placeholder.removeFromSuperview()
context.completeTransition(true)
}
}

We utilize a syntactic sugar library called ‘Then’ to simplify UIView configurations post-initialization. If you haven’t encountered it before, I recommend exploring its repository. Here’s a breakdown of each step:

  1. We retrieve the necessary components using our previously established setup function. This will also add the destination view to the container view.
  2. We use our utility function from the prior section to calculate the transform for the destination view. It’s worth revisiting the A, B, C rectangle examples from earlier to fully grasp this step.
  3. We compute and set the mask for the destination view, which will expand throughout the animation.
  4. We add a white placeholder view that will cover up the fromView (the original image cell). This is crucial to create the illusion of the image cell expanding and departing from its original position.
  5. We add a dark overlay to the source screen with 0 opacity. We will gradually increase the opacity to apply this overlay during the transition.
  6. Within the animation block:
    - The destination view is reverted to its original dimensions and position.
    - The mask’s frame is adjusted to unveil the entire destination screen. Additionally, a corner radius is applied to the mask, mimicking the device’s shape.
    - The opacity of the background overlay is gradually increased.
  7. Post-animation cleanup:
    - We dispose of the mask and the overlay.
    - Lastly, we signal UIKit that the animation has concluded by invoking completeTransition on the transitionContext.

Now, let’s implement the pop animation. While it mirrors the push animation, we will basically execute all steps in the opposite order:

private func popAnimation(context: UIViewControllerContextTransitioning) {
// 1
guard let (fromView, fromFrame, toView, toFrame) = setup(with: context) else {
context.completeTransition(false)
return
}

// 2
let transform: CGAffineTransform = .transform(
parent: fromView.frame,
soChild: fromFrame,
aspectFills: toFrame
)
// 3
let mask = UIView(frame: fromView.frame).then {
$0.layer.cornerCurve = .continuous
$0.backgroundColor = .black
$0.layer.cornerRadius = 39
}
fromView.mask = mask

// 4
let placeholder = UIView().then {
$0.backgroundColor = .white
$0.frame = toFrame
}
toView.addSubview(placeholder)

// 5
let overlay = UIView().then {
$0.backgroundColor = .black
$0.layer.opacity = 0.5
$0.frame = toView.frame
}
toView.addSubview(overlay)

// 6
let maskFrame = toFrame.aspectFit(to: fromFrame)
UIView.animate(withDuration: 0.25) {
// 7
fromView.transform = transform
mask.frame = maskFrame
mask.layer.cornerRadius = 0
overlay.layer.opacity = 0
} completion: { _ in
// 8
overlay.removeFromSuperview()
placeholder.removeFromSuperview()
let isCancelled = context.transitionWasCancelled
context.completeTransition(!isCancelled)
}
}
  1. Similar to the push animation, we begin by retrieving the necessary components and setting up the container view. In this case, the destination view is the profile screen, and we position it beneath the source detail screen.
  2. We compute the same transform as in the push animation, but the roles are reversed due to the pop scenario. The parameters shift accordingly.
  3. The mask is initially set with a frame that aligns with the detail screen. This doesn’t alter the visible region but sets the stage for the animation, where we’ll shrink the mask’s size.
  4. We add the placeholder to the image cell as before.
  5. An overlay is added to the profile screen; this will be progressively removed during the transition to reveal the profile screen.
  6. The ending frame for the mask applied to the detail screen is determined. This will guide the mask reduction during the animation.
  7. In the animation block:
    - We apply the transform to the detail screen.
    - The mask on the detail screen is resized according to the predetermined end frame and its corner radius is removed to adapt to the destination image cell.
    - The overlay on the profile screen is gradually removed to unveil the underlying content.
  8. Post animation cleanup:
    - The overlay and placeholder views are removed from the view hierarchy.
    - We check if the transition was canceled, and subsequently, the completion function on the transition context is invoked to signal the conclusion of the animation.

With that, we’ve successfully implemented the shared frame transition, which appears as follows:

Using Our New Animation Controller for the Transition

In the final repository, you’ll find additional code for handling minor edge cases, like scrolling the collection view to ensure the selected cell is fully visible before initiating the pop animation. I’ve omitted these details here to keep the article concise, but you can check those out in the repo if you are interested.

Part II. addresses the last piece of the puzzle by making our pop transition interactive and gesture-driven.

--

--