Playing GTA V using Hand Localization

Using Python and OpenCV for playing Games by Hand Gestures.

Harsh Malra
6 min readJul 31, 2020

After completing three Udacity Nanodegrees and many books on machine learning, I thought now enough of studying and to getting started with implementation.

Recently I shared a post on playing GTA V, just by using Hand Gestures and it got quite viral. So now I’m posting this tutorial on Navigation by Hand Localization using CSRT with Single Tracker.

This project is great if you are new to OpenCV as it is not very difficult to implement and very cool project to show off.

TL;DR — Code is here.

Concept -

This project uses the concept of Object tracking / localization. The main idea is very simple -

  1. Take the base image.
  2. Draw a bounding box (bbox) across the object which you need to track.
  3. And done, the tracker will track the movement of the object over successive frames.

The best thing about this is that it is much faster than object detection, as object localization takes into account the position of object in previous frames and just tracks it new position rather than trying to detect/find it in whole image. Read more about it here.

Main Idea →

  1. We will use two functions ‘steer’ and ‘accelerate’ for navigation purpose. These functions will simulate the keypress.
  2. Be ready in pose. One image will be captured, you just have to make a rectangle across the hand, which will be used as steering.
  3. Now just you have to move your hand, and accordingly the navigation will happen depending on the deviation from the center position.

Getting Started →

Full Code is here.

Importing Libraries ->

  1. FileVideoStream — is just a more efficient way for loading videos.
  2. directkeys.py — contains the methods to simulate keys (like press key and release key).
import numpy as npimport cv2
import time
from imutils.video import FileVideoStream, VideoStream
from directkeys import W, A, S, D, Left, Right, Up, Down
from directkeys import PressKey, ReleaseKey

This will get the centroid of the bbox and to draw it on frame —

def get_centroid(bbox): #center point of rectangle
x, y, w, h = bbox
centx = int(x + w // 2)
centy = int(y + h // 2)
return (centx, centy)
def drawbox(ret, bbox, frame): #draws rectangle from bbox
global q
if ret:
#Tracking Success
p1 = (int(bbox[0]), int(bbox[1]))
p2 = (int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3]))
cv2.rectangle(frame, p1, p2, (255,0,0), 2, 1)
if p1[0] < 10: #quits program if taken hand to left top corner
q = True # quit

Defining the navigation functions ->

The steer and accelerate function works as follows —

  1. The get_centroid will give the center points (x, y) of the bounding box. Now diff is the deviation of your hand from the center.
x, _ = get_centroid(bbox)
diff = x - cent[0]

2. Then we will not perform any action if the diff is less than a threshold. THRESH_ACTION is the area of no action. Only when center of your hands are outside this (the black circle), the action will be performed.

if abs(diff) < THRESH_ACTION:
return

3. Now this code ensures that action only will be recorded if your hand is more extreme (left or right) than the last position. So that it don’t keep on turning. The more you take it towards right, only till then it will turn. And stop when you stop your hands. This is just to handle sensitivity issue while steering. (Note: You are free to SKIP this.)

if abs(diff) < last:
if abs(diff) < last-10:
last = 0
return
last = abs(diff)

4. As x increases towards right direction in x-axis, thus if diff is +ve, then we will turn right else left. And diff is -ve for up else down.

#cent[0] is x coordinate of center
diff = x - cent[0]
diff = - (y - cent[1]) # '-' as y decr upward

Note: In images, origin is at top left corner, so y increases in downward direction. As show by image below, so if your hand goes up, then (y - center) will be negative). So i have negated(-ve) the diff (see acceleration function).

Note: y increases in downward direction.

5. At last pressed_steer key is passed to PressKey function.

Now the action function —

  1. As initially no keys are pressed, so it will set both keys to False.
  2. Then it will pass the bbox to accelerate and steer, so they can act accordingly.
  3. Line 10 - 16 are to release the when they are no longer pressed.

Setup →

So, now we can start it. First run the cell and get ready with your hands on position.

Hand in Pose

The below code starts your webcam. You have to keep your hand in pose while the timer waits for you to get ready. Then it takes a image of you. After it you just have make a rectangle (bbox) across your hand and press ‘Enter’.

The cv2.selectROI() function allows you to select the region of interest to track (i.e. your hands).

This you have to do only once, for setup purpose. Then you just have to put your hands within the bbox and let it start.

Using the Tracker ->

Its great that OpenCV already has the implementation of CSRT Tracker in it. So you just need to call a function to create and initialize it.

# Creating the CSRT Tracker
tracker = cv2.TrackerCSRT_create() # tracks the bbox
# Initialize tracker with first frame and bounding box
tracker.init(FRAME, bbox)

After the tracker has been initialized, now its the time to get started. Running below code will start the countdown. You have to put your hands in position and then the code will start simulating the key press. Get the game ready before it finishes.

Working ->

  1. Below code just creates the tracker, puts the text on frame and waits for timer to end.

2. We then update the tracker with each new frame. It returns us the bounding box.

ret, bbox = tracker.update(frame) # ret - True, if found hand

if ret == False: # if hand not found.
print("Tracking stopped")
break

3. We send bbox to action.

action(bbox)

4. The keys which are currently pressed is displayed on screen along with the bbox across our hands and the circle outside which actions are performed.

activekeys = ' '.join([KEYS[i] for i in [key_acc, key_steer] if i])
drawbox(ret, bbox, frame)
draw_circle(frame, cent)

5. At last, we release all keys.

ReleaseKey(key_steer)
ReleaseKey(key_acc)

To quit, you can either take your hands to the top left or go to cv2 window and press Enter(if detector has failed).

And done, now GTA V is ready to be played with your hand gestures. Enjoy !!!

Further Improvements →

  1. As we are using tracker, so it may happen that tracker may lose the track of your hands. This may happen when you move your hands very fast or due to low lighting.
  2. Sometimes the vehicle may steer too much. So you can modify the sensitivity of steer function in that case.
  3. Add a virtual switch to restart program, when object tracking fails, to regain the control. (Adding Soon)

Final Note ->

So, this was my first article in Medium, I have tried to explain each portion well, as sometimes it can be difficult to understand some portions of others code.

If any doubt or suggestions, let me know in comments and don’t forget to give a clap.

Feel free to connect me on LinkedIN and Github.

--

--

Harsh Malra

Hey, I’m an AI enthusiast. Trying to gather new Skills each day.