A Naruto fighting game using real-time gesture recognition

Khai Nguyen
khaiquangnguyen
Published in
12 min readMar 18, 2019

Background

Fighting game

A fighting game is a game where the player controls a character and engages in close-quarter combat with another player or computer-controlled character. The characters often have a set of moves, and the game is often played in two or more rounds on a stage. Examples of iconic fighting games include Mortal Kombat (Figure 1-a), Stress Fighter (Figure 1-b), Super Smash Bros (Figure 1-c), Soul Calibur (Figure 1-d).

Figure 1 — Images of iconic fighting games. (a) Top-left is Mortal Kombat, (b) top-right is Street Fighter, (c) bottom-left is Super Smash Bros, (d) bottom-right is Soul Calibur.

Naruto

Naruto is a Japanese manga series that tells the stories of a young ninja named Naruto. The manga features a fictional world where ninja engages in combat using jutsu — special techniques. The primary method to perform a jutsu is through the use of hand seals (Figure 2).

Figure 2 — A hand seal performed in the Naruto franchise

Naruto-inspired Fighting Games

There have been many fighting games on a variety of systems inspired by the manga Naruto, the most prominent of which is the series Naruto Ultimate Ninja Heroes (Figure 3).

While these games feature a variety set of characters, move sets, and battlegrounds, they all employ traditional control schemes such as keyboards or controllers.

Figure 3 — Footage from Naruto Ultimate Ninja Heroes franchise

Concept

Keyboards and controllers are very familiar control scheme to gamers and have been the standard for many years. However, they fail to provide an immersive experience for fighting games. The most notable reason is that while there is a one-to-one mapping between the controls and the characters actions, they do not map naturally to how the characters act.

For this document, I propose a new set of control schemes for a Naruto fighting game. The control scheme will be based primarily on:

  1. Gesture recognition. All of the jutsus will be invoked using real hand gestures as input through a system of gesture recognition with Microsoft Kinect and Leap Motion.
  2. Body inputs. The movements will be controlled using leg movements through the usage of a Microsoft Adaptive Controller.

These two set of gestures map directly from user movements to character movements. Players use their own legs to move the characters and use their own hands to form hand seals and perform jutsus. This direct mapping creates a much more immersive experience than traditional keyboards and controllers.

Theme

Naruto Ultimate Ninja Heroes has been one of the most successful 2D Naruto fighting games, praised for its great graphics, combat system, move sets, and characters. Since the game is already greatly designed, and the main innovation proposed in this document is the control scheme, which is independent of the game itself,

I propose to use Naruto Ultimate Ninja Heroes as the base game, with gesture recognition built on top of the game as a separate input scheme.

This document will primarily refer to the game Naruto Ultimate Ninja Heroes as the main gameplay.

Interactions and Control Schemes

There are two main types of interaction with the game: out-of-fight interactions, and in-fight interactions. Among in-fight interactions, we can separate into three main types: basic movements, basic attacks, and jutsus invoke.

1. Out-of-fight interactions

Description

There are many types of out-of-fight interactions, most of which are interactions with the UI, such as navigation or character selection.

Control Scheme

All forms of common interactions will the game, such as navigating through the menus or select characters, will use traditional inputs from a Microsoft Adaptive Controller (Figure 4).

Microsoft Adaptive Controller is essentially a special type of controller. Therefore, it has all the basic inputs of a regular controller, and thus it is trivial to describe how to use it the normal way to navigate through the menu.

2. Basic movements

Description

There are two forms of basic movements in a fighting game: moving and jumping.

  1. Moving. Moving in a 2D game only involves 2d movements, which means there are only four possible movements: move left, move right, run left, and run right. Traditionally, directional movements are invoked using arrow keys or an analog stick.
  2. Jumping. Jumping involves only one movement: jump vertically up. Normally, there is a special button for jump movement.

In a fighting game, a jump and a movement are often combined: players can move while jumping and vice versa.

Control Scheme

Basic movements of characters will use players’ legs as input. Players will provide inputs by pressing the buttons on a Microsoft Adaptive Controller (Figure 4) with their legs.

The proposed input scheme will be as follow:

  1. Move left — Hold button A
  2. Move right — Hold button B
  3. Run left — Press — release — hold (double presses with the second press as a hold ) button A
  4. Run right — Press — release -hold (double presses with the second press as a hold ) button B
  5. Jump — Single press on both button A and B simultaneously
Figure 4 — the controls of a Microsoft Adapter Controller

3. Basic attacks

Description

There is only one form of basic attack in the game, normally invoked using only a single button. Other forms of basic attacks (often are basic attacks chained together) are invoked by pressing the button repetitively.

Proposed Control Scheme

The basic attacks will be invoked through the gesture recognition system. The system will attempt to recognize a specific gesture as the attack gesture.

Basic attacks will use a single, static punch gesture as an input (Figure 5-a).

To perform chain basic attacks, users will have to perform alternate punch movements between left hands and right hands. (Figure 5-b),

Figure 5 — (a) Left is a single attack, (b) Right is multiple attacks performed in sequence.

4. Jutsus

Description

Jutsus in Naruto Ultimate Ninja Heroes depend on the characters. Each character has a set of his or her own set of jutsus, invoked through a combination of buttons. A standard character often has three to five jutsus, may be slightly more or less for some special characters. These jutsus often reflect the jutsus in Naruto manga, which are invoked by performing a series of hand seals.

While there are many variations of hand seals and special hand seals, there are only twelve basic hand seals (Figure 6). For the simplicity of this project, we will assume that all moves in the game will only be formed from a combination of these twelve hand seals. There are twelve basic hand seals in total, so there should be at least 12 * 11 * 10 = 1320 possible combinations, excluding duplicate hand seals. Therefore, there should be more than enough potential combinations so that each jutsu will have its own hand seal sequence.

Figure 6 — The twelve basic hand seals to perform jutsus

Proposed Control Scheme

All jutsus will be invoked through the gesture recognition system. The system will attempt to recognize a series of hand seals, identify the jutus, and perform it.

Each jutsu will be assigned a specific sequence of hand seals to perform. To simplify the game, each jutsu should only use three to five hand seals, depending on the power of the jutsu. For example, for a Fireball Jutsu, the sequence of hand seals will be Boar — Dog — Tiger.

The sequence of hand seals of a jutsu is unique, meaning no jutsu should have a sequence containing the sequence of other jutsus. The uniqueness is to prevent accidental activation of unintended jutsus. For example, if a jutsu requires Horse — Snake — Tiger hand seals, then no jutsu has Horse — Snake — Tiger in its sequence.

Furthermore, to prevent false activation, I propose a thirteen basic hand seal called Zero Seal (Figure 7) to indicate the initiation of a jutsu. It means that all jutsus will use the Zero Seal as the first hand seal so that the system is able to recognize when a hand seal is invoked. This is to both prevent false activation and false inactivation. Ideally, the Zero Seal should be the easiest hand gesture to recognize. As such, the hand gesture of the Zero Seal is relatively different from the twelve basic seals and the attack gesture so that the system can easily recognize this seal. However, further testing should be performed to confirm this assumption.

Figure 7 — Zero Hand Seal

With the number of jutsu and characters, it is impossible to list all the potential jutsus. For this document, I will use one character, Sasuke, and three jutsus as an example of how the system will recognize hand gestures and invoke the jutsus.

Sasuke has the follow three jutsus: Great Fireball Jutsu, Fire Dragon Flame Jutsu, and Phoenix Flame Jutsu.

The three jutsus will have the following sequence of activations:

  1. Great Fireball Jutsu: Zero Seal ⇒ Ox Seal ⇒ Ram Seal ⇒ Tiger Seal (Figure 8).
Figure 8 — The hand seals to perform the Great Fireball Jutsu. (a) Top-left is the Zero Seal. (b) Top-right is the Ox Seal. (c) Bottom-left is the Ram Seal. (d) Bottom-right is the Tiger Seal.

2. Fire Dragon Flame Jutsu: Zero Seal ⇒ Snake Seal ⇒ Ox Seal ⇒ Tiger Seal

Figure 9— The hand seals to perform the Fire Dragon Flame Jutsu. (a) Top-left is the Zero Seal. (b) Top-right is the Snake Seal. (c) Bottom-left is the Ox Seal. (d) Bottom-right is the Tiger Seal.

3. Phoenix Flame Jutsu: Zero Seal ⇒ Snake Seal ⇒ Horse Seal ⇒ Tiger Seal

Figure 10— The hand seals to perform the Phoenix Flame Jutsu. (a) Top-left is the Zero Seal. (b) Top-right is the Snake Seal. (c) Bottom-left is the Horse Seal. (d) Bottom-right is the Tiger Seal.

How do users know and remember which types of hand seals compose a jutsu? Fighting games often have a record of movesets inside the game, and a training area for players to practice their characters, moves, and combos, so players can easily know which types of hand seals compose the jutsus of a character. However, knowing the hand seals is one thing, but remembering them all is a totally different thing. Since there are hundreds of jutsu, users will have to remember all of them. This may seem really overwhelming from a UX point of view. However, for gamers, remembering the skills are what make the games worth playing. The skills of a player are not just about mechanical skills, but also about whether they knows the characters inside out. The challenge of knowing and remembering all the moves of the characters make the game worth mastering, and it also becomes one of the elements that separate hardcore players from casual players.

System Implementation

The system will use two devices/system for inputs: Microsoft Adaptive Controller for simple button press inputs and a Microsoft Kinect+Leap Motion combination for gesture recognition.

1. Microsoft Adaptive Controller

The Microsoft Adaptive Controller (Figure 4) features two big buttons, denoted as A and B in Figure 4. They are large buttons and are designed to take input with great force. They are also very big, and thus players can easily use their legs and a stepping action as an input.

Furthermore, the Adaptive Controller works in the exact same way as a normal controller, so it should be trivial to integrate this controller to the game.

2. Microsoft Kinect+Leap Motion for gesture recognition

The heart of the system is a combination of Microsoft Kinect and Leap Motion for reliable and responsive gesture recognition. The entire recognition system will be the same as the one implemented in the paper “Hand gesture recognition with leap motion and Kinect devices.” by Marin, Giulio, Fabio Dominio, and Pietro Zanuttigh.

System Setup

Figure 11 — Setting up the Microsoft Kinect and Leap Motion.

Gesture recognition architecture

Figure 12 — System Architecture

The architecture is as implemented in the paper “Hand gesture recognition with leap motion and Kinect devices.” by Marin, Giulio, Fabio Dominio, and Pietro Zanuttigh. Figure 12 is an image of the system architecture as described in the paper.

We obtain three features from Leap Motion: the position of fingertips, the location of the palm center, and the orientation of the hand, and two features from Kinect: the maximum correlation between the collected data and reference data, and the hand contour. These five features are then fed into a classification algorithm called a multi-class Support Vector Machine. What this algorithm does is basically decide which class the input most likely belongs to. The classes are supposedly the predetermined gestures that the system should recognize. The system proposed in the paper is trained on roughly 1400 data samples, which is not much. The output of the entire system will be a gesture (a class) that the system thinks the input belongs to.

Training Data

The system is designed to recognize a specific set of hand gestures effectively. These gestures need to be pretrained. For our game, we only need the system to recognize fifteen basic hand gestures:

  1. Left-hand punch
  2. Right-hand punch
  3. Zero Hand Seal
  4. Twelve basic hand seals

There are fifteen basic hand gestures in total. The system proposed in the paper is trained on roughly 1400 data points and achieves reasonable accuracy. However, we can do much better than that. Our main method to collect training data will be crowdsourcing. The manga and anime community is big, and many are fans of the Naruto franchise. For example, the r/Naruto Reddit has 220,000 subscribers. A single Naruto’s Fan Page has 44000 likes. The overall consensus here is that there are many fans of the franchise, and if we can ask even a small fraction of the community to contribute to the training set, we will have a reasonable set of training data to work with. Furthermore, the hand gestures are not difficult to perform. The twelve hand seals are standard hand seals of the series and should be familiar with most Naruto fans. The three other hand gestures are simple enough, and their data can be easily collected together with the twelve hand seals. Each person only needs to spend about 5 minutes to perform and record 15 hand seals. A person can also record a set multiple times. To provide further incentive, we can give participants a chance to win gift cards. We can also use Amazon Mechanical Turk to collect further training data.

Overall, given the size of the Naruto community, there should be no problem for us to collect enough training data to work with.

System architecture

The overall system will have to take inputs from two sources simultaneously: Microsoft Adaptive Controller and Kinect/Leap Motion system. Furthermore, since there is no signal to control when Kinect/Leap Motion should begin to recognize a hand gesture, the system should perform recognition polling — to continuously perform recognition on the input every few times per second. Since gesture recognition is a costly operation, together with the delays from the input, the system should only attempt to recognize hand gesture at about 10 times per second, since it is unlikely that a human can perform more than 10 hand gestures per second. At the same time, the system should be fast enough to be responsive, since responsiveness is very important to a fighting game.

Figure 13 — State diagram of the system

Reference

“The Next Generation 1996 Lexicon A to Z: Fighting Game”. Next Generation. №15. Imagine Media. March 1996. p. 33.

Marin, Giulio, Fabio Dominio, and Pietro Zanuttigh. “Hand gesture recognition with leap motion and Kinect devices.” Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014.

--

--