In this post, I am going to talk at a high level about how the ARKit 2.0 features are supported in UE4 4.20. In subsequent posts, I intend to give more detail about each feature and provide ideas on using them in your apps. Until then, this will give you enough information to experiment with the features.
The major features that were added in this release of ARKit are:
- World AR Map Persistence
- Environment Texturing
- Object Detection
Minor features in this release are:
- Image Tracking
- Eye Gaze Target, Individual Eye Transforms, and Tongue detection for FaceAR
Since UE4 hadn’t fully integrated the ARKit 1.5 changes, this means we had to add support for:
- Vertical Plane Anchors
- Image Detection
The following sections talk about the features/changes at a high level.
ARKit 2.0 gave us some new configuration options and UE4 exposed an option that was being configured for you under the hood. The new options are:
- World Alignment — which is used to specify what kind of transform tracking you want from ARKit (previously hidden)
- There are new options in the Session Type (Image, Object Scanning)
- Vertical Plane Detection
- Enable Auto Focus — this works well for most apps, so is on by default. Some apps that require macro camera focus and small objects may see them swim as the focus changes. I think those are less frequent and can disable the auto focus feature as needed.
- Max Num Simultaneous Images Tracked — tracking an image in a scene can be expensive. This limits how many images to track at once.
- Environment Capture Probe Type — this is the new Environment Texturing feature which uses machine learning to build cubemaps from the camera data collected as the camera scans the scene. These cubemaps can be used for image based lighting and/or making more realistic reflective surfaces.
- World Map Data — this is the binary data that was saved in a previous AR session (persistence). When provided, the AR session will use that data as the basis for the AR world, rather than building a new AR world from scratch.
- Candidate Objects — just like image detection/tracking needs to know what to look for in a scene, its size, etc. candidate objects need a similar descriptor before the AR session can search for them.
In my previous post, I walked you through the process for setting up image detection within UE4 4.20. Since ARKit 2.0 introduces Image Tracking support and I was visiting Apple in order to provide complete 2.0 feature support in the engine, I had to first add the ARKit 1.5 feature for image detection. You can read the previous post for how to set that up. What’s the difference between detection and tracking? Detection only notifies you of an object anchor once, whereas tracking provides updates to that anchor over time. This feature now will update your ARTrackedImage objects as images within the scene move. (Scan your friends faces and track them for fun.)
ARKit 1.5 Image Detection in UE4 4.20
ARKit 1.5 added support for detecting images within the AR scene, which can be used to align your content to a specific…
There are 2 new configuration options for image detection and tracking. The first is a new AR session mode that only detects and tracks images in a scene. It skips all of the other work that is done when building an AR world, so is great for performance if you are only interested in detecting and tracking images.
The second option tells ARKit how many instances of images you want to track in a scene. Note: this can be the same candidate image seen multiple times or different images. It’s there to control how much performance you want to give to tracking, since tracking unlimited number of instances could be too much for your framerate targets.
One improvement that has the potential to really make AR objects within a scene seem more realistic is lighting that matches the environment as well as objects reflecting the scene. The environment texturing feature generates a cubemap of the environment based upon the camera data it accumulates as you move around with in your AR space.
There are two ways to generate this data. The first is to set the Environment Capture Probe Type to Manual. When you do this, you must tell ARKit where you want an environment probe to be located and the bounding box that you want it to include. After that, an AREnvironmentCaptureProbe is created for you, which will hold the cubemap that is generated. As the camera data is applied to a subsequent updates, the capture probe will update its cubemap data. The second way to is to Environment Capture Probe Type to Automatic. In this mode, ARKit will automatically create probes within the scene and update their textures. In my testing, I found that it creates many probes (and overlapping) which has two problems: The process of updating those cubemaps is pretty expensive and can generate hitches; and, secondly, you have to manually decide how to render objects with multiple probes around it. Because of those two problems, I recommend that you stick to a Manual probe that covers the potential play area for your game.
UE4 already supports image based lighting and reflections via the SkyLight actor you place in a level. If you set that SkyLight actor to be image based, you can then use the cubemap from the AREnvironmentCaptureProbe to be the image it lights and reflects (see below).
SkyLights can also blend between two different cubemaps, which some developers use as the basis of a time of day system. You can use that feature to blend between two capture probes, so that moving objects don’t pop when moving between those probe areas. One drawback to using a SkyLight for image based lighting is that it doesn’t expect the cubemap to be changing underneath, so it doesn’t automatically know to regenerate its internal data. I am hoping to add a ARSkyLight that knows when the capture probe was updated and triggers a regeneration of the underlying SkyLight data. Longer term, I want to double buffer the cubemap textures, so that updates can be blended between rather than the pop that happens when the texture is wholesale replaced.
World Map Persistence
Of all the features in this release, I believe this one to be the most important. The ability to share the same AR world opens up the possibility for true multiplayer AR without resorting to various hacks to achieve something akin to relocalization via alignment images or educated guessing based off of parallel inputs. The way this works is that one person scans the participation area until the session indicates that mapping has completed sufficiently for later relocalization. At this point, you can serialize the AR world area into a blob, which can be saved to disk, cloud, or streamed to another device in your networking session. Once the other person has received/loaded the data, they start a new session with the blob stored in the ARSessionConfig’s World Map Data. That data is then deserialized into the object representation that ARKit wants. After that session is started, ARKit begins its relocalization process, giving you status updates during the process. Once the status is that the world is mapped, you are fully relocalized and can have accurate networked or asynchronous game/ar/world interactions. I plan to build some gameplay code on top of the low level layers to make the networking of this data easy to use.
Object Scanning and Detection
This feature is the trickiest to use because it requires a series of steps and must be done using an app built on ARKit running on device. The first step is to set the ARSessionConfig Session Type to Object Scanning. As with world persistence, you first need to get your session to a mapped state. At this point, you can then use some custom UI to allow the user to select which parts of the world point cloud are to be included in the candidate object. With those points selected, you can tell the underlying code to grab those points and convert them to a candidate object. That candidate object is just like a candidate image that you use for image detection and tracking. To detect objects in the scene, you need to build a ARCandidateObject with the blob of data contained within it. That candidate object is added to your session config’s CandidateObjects array. Finally, you start a new AR session, where the session will add a ARTrackedObject when it detects the candidate object in the scene. The blob of data for the candidate object can also be delivered via the cloud or a networked session. Of course that means you can save it to disk for use later.
Face AR Additions
In terms of new features, there is the addition of tongue tracking, so your Animojis and Face AR apps can display all important raspberries to the screen. More interesting is ability to know where a person is focusing. It’s probably the iPhoneX’s display, but possibly somewhere outside of the frame. Either way, you’ll get a location in space that your eye gaze converges on. The other interesting bits are that you get a full transform for each eye, which is great to make your on screen characters eyes track with the user’s movements. It is a full transform, so if one eye is slightly forward or back from the other, you’ll have that information. Additionally, I am told that tracking quality has improved with this release. I haven’t had a chance to compare side by side with ARKit 1.5, so I don’t have any insight to how this manifests itself.
The 4.20 release has a large amount of new ARKit features in it. We’ve added features from both the 1.5 and 2.0 releases. This is just the first pass of support of these features and will generate new features for us to build as we get more experience using them.