Hand Tracking in The Curious Tale of the Stolen Pets

Kristoffer Benjaminsson
10 min readMay 23, 2020

--

When The Curious Tale of the Stolen Pets was released in November 2019 our goal was to release a game that was the perfect introduction to VR. To release a game that was accessible to everyone; hard core gamers and first time players alike by removing much of the complexity found in most VR games. Simplifying the interactions and removing the need for virtual locomotion were some of the things we did to create a game that everyone can enjoy without having to think about what buttons to press or what comfort options to enable.

As the CTO of Fast Travel Games one of the things on my table is the long term technical vision and strategy for the studio. Player presence and immersion is the key to most of the things I spend my time on in various projects. Player interactions being one of the core one. Those who know me also know I have a passion for mobile standalone devices, so when we learned that hand tracking for the Oculus Quest was about to be released, it was a no-brainer to try and improve our already super accessible game to become even more accessible!

But before diving into the discussion around the solution we ended up with for Curious Tale, let’s take a step back and look at hand tracking on a higher level.

Hand Tracking

What is hand tracking and what does it do? The short description is sensors capturing your hands position together with each hand’s individual finger’s joint position and rotation. This information can be mirrored on to virtual representations of hands in VR giving users the ability to fully control a virtual hand. Compare this to traditional controllers where fingers are animated when pressing various buttons and limited to show poses defined by the developer.

Adopting hand tracking is more than just adding support for hands animating based on a tracked skeleton. It fundamentally changes the way an interaction model works. When are you attempting to grip something? When are you just flexing your fingers or relaxing your hand? How much flex should be considered an attempt to grip something?

Buttons make these decisions really easy as the user physically presses a button with a clear intent of something to happen, whereas with hand tracking your fingers placement and your intent might not be aligned. On the other hand (pun intended) hand tracking opens/enables a much richer immersion as it allows the user to interact with the environment in a much more natural way. There is no need to teach the user how to use their hands!

There are other fundamental differences between using controllers and hand tracking for input such as the lack of haptic feedback together with other UX aspects, but the rest of this post will be about interactions in general and the model we ended up with for Curious Tale in particular. Including some of the solutions we came up with to handle the case of losing the tracking of the users hands..

Early prototypes

Before we added hand tracking to Curious Tale we did some research and experiments in preparations for a master thesis project we have running at the studio. The intent was to give users a natural interaction model where s/he could pick up items with a natural grip, as in an object is picked up as soon as the thumb and at least one more finger grabs the item. Grabbing in this context meant when one or more parts of the relevant fingers were touching the object. Such as the tip of a finger.

The picture adobe shows early experiments with a completely procedural grip model.

In these experiments we used life sized objects that more or less required the user to clearly approach an object with an open hand to pick it up. Despite the lack of haptics informing the user when fingers touched an object this worked really well as we locked the finger joints upon contact leading the resulting grip to follow the shape of the object as it was picked up.

We didn’t go for fully physical hands as previous tests have taught us that while physics is awesome when it works it can also make the user clumsy and push things away rather than picking them up. Especially as hand tracking currently has a tendency to have a fair bit of latency between the user’s physical hand location and the tracked location.

Poke and Pinch

While the early experiments were successful, the type of interactions used in The Curious Tale of the Stolen Pets are quite different. Since the entire game is centered around miniature worlds with small items that you interact with by either poking at them or lifting them up it made sense to pick an interaction model tailored for this. We also wanted to create something that could work with the existing data set and not require a total make over of all items the player can pick up as time was of an essence. We also wanted a model that would produce (mostly) good looking grips.

Enter the point and pinch model! Most interactions in the game are poke interactions; poke at a bush to make it wiggle, poke at a hidden coin to collect it, poke Spot the dog to make him swing, etc, but some require the player to pick up an item and either place it somewhere or tilt it such as the Kettle in the first world. For poke interactions the natural response is using your index finger; you do it without thinking. The same for smaller items; extending your thumb and index finger and perhaps your middle finger is a natural interaction. For larger items though, such as the Sugar Dispenser in the Winter Vacation world, a more natural model would be gripping with the entire hand.

If playing the game with hand tracking becomes popular, adding support for grabs is a top candidate in a future patch!

Technical Details

Implementing a pinch grip seemed easy at first. The Oculus SDK exposes “is pinching” APIs that detect when the thumb and another finger such as the index finger forms a pinch grip. We use that when determining when the player wants to grab something. At first we used the same method to determine if the player was letting go of an item, and while this worked well when the player’s hand is fully tracked, we soon realised that hand tracking is very dependent on a number of factors to give stable results. Lighting conditions, surroundings, where the hand is placed relative to the headset, the hand orientation, are some factors to name a few that have implications on the tracking quality. And you can’t really ask the player to understand all limitations or ask them to only play while in perfect conditions! Instead we chose to implement a number of mechanics to better decide when to let go of an item.

Statistical Analysis of Tracking Confidence (SATC)

A not so widespread term (as I just made it up)! But it has a nice ring to it and makes it sound more advanced and professional than it really is.

As mentioned above, tracking quality can change based on many factors. What happens when tracking is bad is that we might wrongly believe that the player is opening their hands to let go of an item as that is what the tracking data is telling us. Quite often tracking is only lost for a frame or two but a single frame is enough to trigger a state change.

The Oculus SDK provides values of how confident the tracker is in the tracking quality. It can either be high or low. Values are provided for the hand as such but also for each individual finger. When observing the values we saw that confidence values typically ping ponged over a number of frames and looked different depending on which value was observed. Below is an image over such a period. The interesting thing to note is that the hand appeared to be tracked most of the time, but the finger confidence was very jittery. The result to the player is that the fingers would move even though the player held a fixed grip resulting in items being dropped.

To improve the situation we started to use the confidence values in our code. Besides changing the hands’ transparency based on tracking quality to indicate to the player that tracking is non-optimal (solid hands equals good tracking) we also use the values in the interaction code to decide whether we can trust the pinch values. If confidence was low, we would ignore any hand input and keep gripping the current item. Even so we still felt we dropped items when we shouldn’t. The reason for this was that fingers would drift before the API reported low confidence leading us to trust the input values and interpret them as a “let go” situation.

Enter SATC! What we do is to create a histogram of the confidence values where different parts have different weight. The overall tracking confidence weighs higher than the thumb confidence for instance. Based on an averaged normalized confidence score over X amount of frames we get a confidence value that is more fine-grained than the binary low/high but it is also less prone to give false information. To get a low confidence score using this approach, the player needs to have low confidence over a longer period of time and likewise regain tracking confidence over a longer period of time before we consider confidence to be high.

The picture above shows what happens to the confidence level with SATC applied using only 4 frames of history. In the actual game we use a much larger frame history value but you can clearly see how the curve gets more smooth even with just a few samples.

Total Rotation Per Finger (TRPF)

On top of SATC (yes I love making up acronyms), because we still had the occasional false positives, we switched from using the provided pinch API to determine when to let go and started using our own measurement dubbed Total Rotation Per Finger which is just that; the sum of all rotations (for a single axis) in a finger compared to the relaxed joint rotations of said finger. What this does is give us a simple way of deciding whether the player is flexing the fingers we are interested in or not. We don’t really care which joint in the finger that is flexed, giving the player more options in how they “pinch” grip and let go of items.

The pictures above show different amounts of flex on various joints of the index finger starting with a relaxed finger and ending with a heavy flexed finger.

Good Looking Grips

When we grip we lock the rotations of the index finger and thumb joints to preserve the grip the player had when initiating the grip. The reason for this, besides looking nice, is that due to the fact that tracking quality can change instantly, combined with the fact that the player is not likely to hold their fingers still, items would move and rotate while being held. Since the rest of the fingers are free to move, the player seldom notices this as the impression is that the fingers still are free to move. When we started locking the finger pose we achieved both good looking grips but also a feeling of stability and robustness!

Learnings and Takeaways

Adding hand tracking to The Curious Tale of the Stolen Pets was a fun adventure! So many technical aspects were uncovered and handled and with the knowledge we have now, we have some good ideas of what and how to improve the experience in the future.

The most difficult interaction to port was the world rotation. With controllers it’s very easy to press a button to “grab” the world to turn it. With hand tracking it’s easy to make a pinch-like grip by mistake with the hand “resting” and effectively grab the world and prevent it from rotating when the other hand tries to grab.

So one of the biggest takeaways is, as often is the case, you get much better results when designing for hand tracking to begin with. Even if we can improve the existing world rotation, the best result will come when we design a rotation mechanic that isn’t vulnerable to unintentional hand poses. The existing solution is after all just a port of the controller version.

Another big takeaway is that hand tracking can be truly magical. When done right in combination with well designed interactions that fit your game, immersion goes up as you interact with the virtual world in a totally natural way.

Until next time!

Kristoffer Benjaminsson

--

--

Kristoffer Benjaminsson

CTO and Co-founder of Fast Travel Games. Creative Technologist with a passion for solving challenges with technology. Passionate about mobile and VR.