Using Async Collision Traces in Unreal Engine 4

10 min readFeb 20, 2018

Hello all. I’m an engineer at Disruptive Games and I’d like to share some knowledge about Async Collision Traces and how they can be a performance benefit with a little bit of binding help. In this article we’ll take a look specifically at game thread / physics thread stuff. We won’t look at all at the Render or Draw thread.

Disclaimer: I’ve not done a complete deep dive on all of these systems with respect to threading, so I’m trying to be accurate but it is possible I’m missing some details also. And as always, profile, hypothesize, improve, repeat. Optimization without measurement is just folly.

I’ll start by saying our current title, Megalith, is a PSVR title, and as such we’ve got pretty hard performance requirements to maintain 60fps as both a technical requirement and a player experience requirement. Ideally we hit 90fps for the best experience. That’s effectively 16ms and 11ms (ELEVEN!) per frame, respectively. Of course this means optimizing. When you’re targeting PS4 hardware, you have to always be watching performance in VR. Likewise, the execution profile between PC and PSVR are vastly different due to threading setup of the engine and the pure single core speed you’re generally seeing. Because of this, things that won’t even be a blip on PC can be easily be a constant performance cost to you that adds up.

I don’t want to get too far into the weeds here, just sort of set the stage for the need to really try to be efficient as possible wherever we can be and it doesn’t impact design / flexibility too greatly.

Low hanging fruit in this case is converting synchronous traces into the physics scene into asynchronous traces. If you would like some background on collision traces, feel free to visit Epic’s documentation here. Note: This is not the same as tracing into the Async scene, which we’ll discuss briefly in the article.

We make use of a good number of traces in a standard frame of gameplay in Megalith, so taking this out of the performance picture isn’t a huge gain but it does help remove offenders from the game thread. Something you must watch out for is that it isn’t free. Those cycles go somewhere else, which appears to be kicked off to TaskGraph threads. TaskGraph threads are UE4’s generalized task threads that it can kick work to. So you’re still potentially in contention with there being enough computation power to complete that work under a specific time threshold. Like anything, your mileage may vary. For us, it should give back a couple milliseconds on bad frame and help stabilize framerate.

The basic premise here is you’re taking an operation that will give you some result instantly about the state of the physical nature of your game and you’re asking to get the result at a later time. In this case, that later time is always at the very beginning of the next frame (before any Tick groups have occurred).

So lets just look at a comparison right now between the familiar World trace functions, many of which are exposed to blueprints for ease of access, and and the async trace functions. Async trace functions are still part of the UWorld object, so they’re right there for the picking, you just need to create some glue to generally make them easier to use.

Trace functions:

bool UWorld::LineTraceSingleByChannel(
   struct FHitResult& OutHit,
   const FVector& Start,const FVector& End,
   ECollisionChannel TraceChannel,
   const FCollisionQueryParams& Params, 
   const FCollisionResponseParams& ResponseParam) const
FTraceHandle UWorld::AsyncLineTraceByChannel(
   EAsyncTraceType  InTraceType, 
   const FVector& Start, const FVector& End, 
   ECollisionChannel TraceChannel, 
   const FCollisionQueryParams& Params, 
   const FCollisionResponseParams& ResponseParam, 
   FTraceDelegate * InDelegate, uint32 UserData)

The top function, LineTraceSingleByChannel, is the synchronous version which immediately does a RaycastSingle call into PhysX. The bottom function, AsyncLineTraceByChannel, is the asynchronous version which returns an FTraceHandle which will be used later to get the results.

There is also:

AsyncLineTraceByObjectType
AsyncSweepByChannel
AsyncSweepByObjectType
AsyncOverlapByChannel
AsyncOverlapByObjectType

We’re only going to look at the AsyncLineTraceByChannel here for brevity, but they all function very much the same way. They take a set of arguments similar to the synchronous version and then return FTraceHandles.

Notice above I’ve bolded two things:

FTraceHandle
FTraceDelegate

FTraceHandle

This is effectively your “ticket” to the work. A simple analogy is a dry-cleaning ticket. You drop off your clothing, you get a ticket and are told to come back at a later timer. When you return, you must present your ticket to get your items returned. Without a ticket, your clothing will just be burned at the end of the working day! This just uniquely tracks (for a given frame of requests) your individual async work request.

FTraceDelegate

If the FTraceHandle is a ticket then the FTraceDelegate is a delivery service. You drop off your clothing, and at a later date, a delivery service returns your cleaned clothes to your doorstep. It is simply a non-dynamic delegate you can bind to your trace request to be called when your work is done. This delegate takes two arguments and no return values. DECLARE_DELEGATE_TwoParams( FTraceDelegate, const FTraceHandle&, FTraceDatum &);

FTraceDatum

This is where the results of your work is stored on the World AsyncTraceState object and how it is returned to you through either a provided FTraceDelegate firing OR by querying values with your FTraceHandle . It is important to note that there are a few structures here. FTraceDatum derives FBaseTraceDatum , which simply holds commonalities such as the World, CollisionParams, and UserData. FTraceDatum has:

Start / End line positions in world space of request
TArray<FHitResult> OutHits array containing the results
EAsyncTraceType to determine single, multi or test for the request

FOverlapDatum

This structure is how asynchronous overlap work results is returned to you either by querying or delegate firing. I’m mentioning it briefly so that readers are aware there are a few slightly different paths depending on what you’re asking of the physics scene. It has:

Position / Rotation of the request.
TArray<FOverlapResult> Results array of results for tasks.
Shape information for overlaps is in the base class FBaseTraceDatum

Some interesting things to note:

Delegates will always fire before any tick functions on the next frame. That is because in UWorld::Tick calls UWorld::ResetAsyncTrace very early in the frame and right before the start of the TG_PrePhysics tick.
On the other end, UWorld::FinishAsyncTrace is called after the last tick group, TG_LastDemotable , is completed. At this point in the frame, all tick groups have executed to completion.
These async tasks can overlap frame boundaries. The time between the FinishAsyncTrace (which ensures all work is in flight) and ResetAsyncTrace (which waits on all work to be finished) is the time effectively between the end of the game threads tick and the start of the next tick. There are some built-in things / couple tasks that occur in that time that buys you time for the work to be fully completed by the next call to ResetAsyncTrace.
Due to frame boundaries being crossed, this could potentially introduce stability issues with certain objects (I’m looking at you destructibles) or anything you maybe play a little too fast and loose with. YMMV!

Because of the above information, you need to determine if you want to do your traces with delegates or checking your FTraceHandle . Either works well, delegates have slight overhead and are executed outside of the objects Tick group. This could have an effect on maintaining work loads and preventing stalls due to arbitrary workloads being requests for tasks that should be done (or not done) during specific tick groups. The one pain point many will find is that the registered delegate is not dynamic, meaning it cannot be blueprinted without a little extra work. More on that later…

Other Useful Bits

Go over the WorldCollisionAsync.cpp file. It isn’t terribly long and it should be easy to follow through how work is handed off for execution and how data is marshalled back to us.

bool UWorld::QueryTraceData(const FTraceHandle&, FTraceDatum&) returns true if data is ready for the given trace handle and puts the results into the second argument.
bool UWorld::QueryOverlapData(const FTraceHandle&, FOverlapDatum&) returns true if data for a given overlap is available and returns the results in the second argument if it is.
bool UWorld::IsTraceHandleValid(const FTraceHandle& , bool) tells you if you have a valid handle. The second argument is set to true if its for an overlap task, not a trace task.

So now that we’ve kind of seen the whole picture from the game side of things. Let’s see an example usage inside an actor. Note: I’d try to create something more reusable for actors in production environments, however, if you’ve got an actor doing per-frame traces that you’d like to wrangle in, there isn’t anything wrong with this approach either.

Start by making a new Actor derived Actor. I’m calling my AsyncTraceActor. Let Unreal do its compile then pop over into your IDE of choice.
You’ll need several things in the header.

So lets dive into the implementation details. We’ll start by looking at BeginPlay. I could write a whole post about BeginPlay and the dangers of using it for important initialization in a networked gameplay setting, but that is another post altogether. For now, lets assume this is part of JoeBlow’s epic single player RPG title.

BeginPlay

void AAsyncTraceActor::BeginPlay()
{
 Super::BeginPlay();   TraceDelegate.BindUObject(this,&AAsyncTraceActor::OnTraceCompleted);
}

So this binds our declared FTraceDelegate TraceDelegate to an actual instance function so we can hand it off to AsyncLineTraceByChannel. Note that it is binding a UObject. We could potentially bind it in any method of a standard Unreal delegate.

The Delegate Handler Method

void AAsyncTraceActor::OnTraceCompleted(const FTraceHandle& Handle, FTraceDatum& Data)
{
    ensure(Handle == LastTraceHandle);
    DoWorkWithTraceResults(Data);
    LastTraceHandle._Data.FrameNumber = 0; // reset it
}

This is pretty self explanatory. We’re being told by the World AsyncState system that our trace results are ready to use. We ensure that the Handle coming in is actually the LastTraceHandle (it could not be if we’re reusing this delegate in some manner). Then we hand off data to the workhorse function DoWorkWithTraceResultsand reset the FrameNumber to invalidate the TraceHandle.

Requesting Async Task to be started

FTraceHandle AAsyncTraceActor::RequestTrace()
{
    UWorld* World = GetWorld();
    if (World == nullptr)
        return FTraceHandle();

    auto Channel = UEngineTypes::ConvertToCollisionChannel(MyTraceType);
    FCollisionObjectQueryParams ObjectQueryParams(Channel);

    bool bTraceComplex = false;
    bool bIgnoreSelf = true;
    TArray<AActor*> ActorsToIgnore;
    
    auto Params = UKismetSystemLibrary::ConfigureCollisionParams(NAME_AsyncRequestTrace, bTraceComplex, ActorsToIgnore, bIgnoreSelf, this);    auto Start = FVector::ZeroVector;
    auto End = FVector(1000.f);    return World->AsyncLineTraceByChannel(EAsyncTraceType::Single, 
        Start, End, 
        Channel, 
        Params,
        FCollisionResponseParams::DefaultResponseParam, 
        &TraceDelegate);
}

So this function really is just doing all the standard setup you need to do for a trace. The main boilerplate code is to convert our MyTraceType variable into a CollisionChannel via UEngineTypes::ConvertToCollisionChannel, and create our CollisionParams structure. Note that UKismetSystemLibrary:: ConfigureCollisionParams is actually a modification we made to the KismetSystemLibrary to expose the existing inline function to be usable outside of the file. We just stuffed the global namespace function into a static blueprint library call and away we go!

We then in the example setup some trashy Vector values just to prove it works (you’d obviously feed in the values of interest to you) and we start our trace! This hands back an FTraceHandle just as we’d discussed before. We’ll use this to query validity and see if our results are in.

Starting the process & Ticking metholodogy

void AAsyncTraceActor::SetWantsTrace()
{
    // don't allow overlapping traces here.
    if (!GetWorld()->IsTraceHandleValid(LastTraceHandle,false))
    {
        bWantsTrace = true;
    }
}

Recall that SetWantsTrace is BlueprintCallable, so it can be triggered from blueprint land. It sets in motion the whole set of events. You could simply call RequestTrace() instead of using bWantsTrace, this is just how I’ve structured this to push more details into the Tick itself.

void AAsyncTraceActor::Tick(float DeltaTime)
{
    Super::Tick(DeltaTime);
    if (LastTraceHandle._Data.FrameNumber != 0)
    {
        FTraceDatum OutData;
        if (GetWorld()->QueryTraceData(LastTraceHandle, OutData))
        {
            // Clear out handle so next tick we don't enter
            LastTraceHandle._Data.FrameNumber = 0;
            // trace is finished, do stuff with results
            DoWorkWithTraceResults(OutData);
        }
    }

    if (bWantsTrace)
    {
        LastTraceHandle = RequestTrace();
        bWantsTrace = false;
    }
}

The first if statement is doing a fast validity check on the FrameNumber. We clear out the _Data.FrameNumber value when we’ve handled trace results. For this reason, using the Tick() logic to query trace results with not work with the delegate approach in this example.

If the Handle is valid, see if the QueryTraceData returns a good result. This should be basically 100% of the time it is hit because of how the Tick() is structured. We ensure that in the previous frame we RequestTrace at the end of tick. This means that the next time we enter this function, we should have QueryTraceData returning valid results to us. In the case that it does, we clear the handle and we DoWorkWithTraceResults .

Handling the results and giving them to blueprints

void AAsyncTraceActor::DoWorkWithTraceResults(const FTraceDatum& TraceData)
{
    // do things here
    ReceiveOnTraceCompleted(TraceData.OutHits);
}

So we don’t do anything fun here, but we could! The best part is that the RecieveOnTraceCompleted gives the hit results (it should probably give more information about the task) to Blueprint land as a BlueprintImplementableEvent.

Conclusion

We’ve briefly talked about Async traces (overlaps / tests) that the UWorld object has built-in support for that helps alleviate game thread pressure. What we’ve come out with is a pretty idealized usage inside a C++ actor. Two methodologies were presented 1) We can use non-dynamic c++ delegates to do event driven results or 2) We can check in our own tick as to whether or not our FTraceHandle is valid or not. It works, but it certainly leaves us wanting more. How can we make use of these things inside of blueprints easily? How to get more mileage out of this?

Whats Next?

The next part of this I will discuss how to convert this from a very heavy C++ domain and into blueprint land. We’ll setup a simple ActorComponent based system that allows us to bind dynamic delegates into listening for trace results. We want to empower designers but also make sure we can get them to do things performantly without pulling too heavily on the reigns. Hopefully I’ll have that out in the next week or two. Ideally in the near future there are some performance metrics we show. Not sure how much profiling we can show from a PS4 kit but I’ll investigate that possibility to see what kind of gains we’re talkin’ about.

An update! The newer versions of Unreal Engine 4 have removed the concept of the Async Scene in prepration for the Chaos physics engine. We’ll have some more content coming soon, otherwise the rest of this content should still be up to date.