Augmented Reality Video Conference

Published in

Agora.io

5 min readMar 28, 2018

Last year at WWDC 2017, Apple launched ARKit. Using this technology, developers can create mixed reality applications on the iOS platform quickly and use their device’s cameras to help augmented reality come to life.

In this article, we will integrate ARKit in a video conference scenario. This article describes the implementation of two scenarios in the video:

Integrate ARKit with live video streaming
Render the live video stream to the AR plane using Agora’s Video SDK

We will be using ARKit to detect a plane in the room and then use the Custom Video Source and Renderer function, included in Agora.io Video SDK v2.1.1, to render the live video stream onto the plane. This will end up giving a holographic feel to the video call, just like you see in Star Wars! The source code for this demo is included at the end of the article. Just add your Agora.io App ID to the ViewController.swift file and run the app on your device!

Basic AR Preparation

First, we will use ARKit to create a simple plane-aware application as the basis for development. Create a new project in Xcode using the Augmented Reality App template and select SceneKit as the Content Technology.

Start plane detection:

Set ARConfiguration to plane detection in ViewController.

Display the identified plane:

To add a red background to the identified plane, implement the ARSCNViewDelegate callback method, renderer:didAddNode:forAnchor:

You have now completed a very simple AR application. When a plane in the environment is identified, a red rectangle is added to it and fades out.

Once a plane is identified, a red rectangle appears.

Interactive Broadcasting Preparation

Now, we will use the Agora SDK to add live video calling capabilities to the app. Download the latest SDK package on the official website and add it to the Xcode project. Next, create an instance of AgoraRtcEngineKit in the View Controller and add the following live video related settings.

Finally, in the viewDidLoad function, set the delegate for agoraKit to the view controller (self) and join an Agora channel.

At this point, all the preparations have been completed. We have an AR application that can recognize planes and can also make audio and video calls. The next step is to combine these two functions.

Broadcast the ARKit screen

Since ARKit already uses the device camera, we cannot start AVCaptureSession for the video capture. Fortunately, the capturedImage interface in ARFrame provides the image captured by the camera for us to use.

Add custom video source:

In order to transmit video data, we need to create a class (ARVideoSource) and implement the AgoraVideoSourceProtocol, in which bufferType should return AgoraVideoBufferType.

Add a method to transmit the video frames to the ARVideoSource class:

Next, instantiate an ARVideoSource in the View Controller and pass the instance variable to the Agora SDK via the setVideoSource interface in viewDidLoad().

This allows us to pass video frames to the Agora SDK as long as we call videoSource’s sendBuffer:timestamp: method.

Send Camera Data:

We can get each ARFrame through the ARSession callback, read the camera data from it, and use the videoSource to send out.

In the viewDidLoad method, set the ARSession delegate to the View Controller and add the callback function.

Send ARSCNView data:

ARFrame’s capturedImage method collects the raw data from the camera. If we want to send a picture with a virtual object already added, we must obtain the ARSCNView data. Here’s a simple idea: set a timer, switch SCNView to UIImage, convert it to CVPixelBuffer, and provide it to videoSource. The sample logic code is provided below:

Rendering the live streaming video to the AR scene

Add virtual display:

First we need to create a virtual display for rendering remote video and add it to the AR scene with the user’s click.

Add a UITapGestureRecognizer to ARSCNView in the Storyboard. When the user clicks on the screen, get the position of the plane through ARSCNView’s hitTest method and put a virtual display on the clicked position.

Users may add multiple display screens by clicking on the screen and they will be left in the unusedScreenNodes array until they are used and video is rendered to them.

Add custom video renderer:

In order to obtain remote video data from the Agora SDK, we need to construct an object ARVideoRenderer, which implements the AgoraVideoSinkProtocol.

The remoteRenderData:size:rotation: method can get the remote video data, and then use the Metal framework to render to SCNNode. The full Metal rendering code can be found in the final version of the demo.

Set custom renderer to Agora SDK:

By implementing the rtcEngine:didJoinedOfUid:elapsed: callback of the AgoraRtcEngineDelegate protocol, you can identify when/where the streamer joins the channel. Create an instance of ARVideoRenderer in the callback, set the virtual screen node (created by the previous user when clicking on the screen) to ARVideoRenderer, and set the custom renderer to the Agora SDK via the setRemoteVideoRenderer:forUserId: interface.

This way when the other user joins the channel, the other user’s video will be displayed on the AR plane and get the effect of a virtual conference room.

Using the Agora SDK’s custom video source and custom video renderer features, it’s easy to combine AR and live video scenarios. This demo runs on the Agora SDK using the Agora software defined realtime network and can support 17 simultaneous video streams. It is quite clear that AR technology will bring a whole new experience to real-time video streaming.

Where to take this from here:

Challenge a friend in Pokemon Go
Bring your friends/families/colleges closer to you in a video call
Create a mixed reality fitness app to connect trainers to their clients

For the full source code, check out the Github repo here.

Please feel free to reach out on our Developer Slack Channel if you have any questions! If you’d like to be a part of our Slack Community, please fill out this form and we’ll send the invite out!