Training a Neural Network to Autoshoot in FPS Games.

Published in

The Startup

8 min readDec 13, 2020

In this article, I will ultimately show you how to make an Autoshoot bot for any FPS game using a simple neural network on Windows or Linux.

It all started in early 2020, I was getting a bit bored with my usual routine — think of a game idea — execute a game idea. I’d been making terrible indie games for years, one of which did ‘make it big’ but in the hands of a big studio, however alone, I always lacked the artistic prowess to make something truly compelling.

My brother mentioned to me how Artificial Intelligence was playing a major role in the automation of jobs in the company he was working for and thus with nothing else to invigorate me I decided to give my hand at Neural Networks.

I remembered having seen a very interesting documentary on YouTube about Frank Rosenblatt who is accredited for inventing the perceptron, so that is where I started, at the fundamental cornerstone of AI; “The Perceptron”.

I had heard of back-propagation, but at this point, I really had no idea how it worked, I read a few articles and watched a few videos online but I found when cross-referencing the different explanations that the algorithm was vaguely described, and I had not come across anyone refer to it as the chain rule. I went with my intuition when it came to back-prop, I was wrong, but I was not far off, and the amazing thing is that it worked — which I feel taught me a very important lesson.

It was not until a little later in my learning experience that I discovered this YouTube video, a Siggraph 2018 talk by Andrew Glassner, where I started to hear back-prop referred to as the chain rule and, among other topics, I felt that if I had been exposed this talk a lot earlier on, I would have saved much time that had been spent reading through many different documents and papers, and maybe I would have absorbed a better understanding in that reading process.

My first neural network needed a purpose. I had always been a fan of Quake3 Instagib to the point of even just mentioning those two words feels like a severe case of deja vu, I’ve never really enjoyed playing any other FPS quite as much as Quake3, specifically the Excessive Plus Instagib game mode. I did play some Counter-Strike 1.6 and Source in my younger years but any game that went beyond teamwork and into strategical thinking was not a domain I was interested in when it came to playing video games; my excuse for this was that I satisfied this desire in my programming work and when I came to play a game I was already feeling pretty burned out, I didn’t want to use my brain any longer and I just wanted to relax and let my automated capacities do the work, such as my basic hand-eye reflexes.

I decided that I would make an Autoshoot bot for Quake3 Instagib game modes, Quake3 allowed for ‘forcing player models’ which means that it’s completely legal in an online game to force every other player model to one model and notoriously over the years, although less popular today, people would force the aqua blue bones model because it is the most visible in what tends to be quite dark arena’s / maps that the Quake3 game provides as default. So there was my simplification, can I train a neural network to detect when the aqua blue bones model was in the reticule? And so it was decided, I would hunt my bright blue friend.

On initial inspection you’d think, well, this looks easy you could just do a simple colour detection for the aqua blue colour and well you’re partly right that is sufficient to some degree but it’s not perfect, often those kinds of simple detections lead to you shooting at light patches of a grey floor or white objects such as light sources, as you can see the aqua blue is really a gradient from full white to the latter mentioned colour. Why does this matter? Well in a game of instagib you get one shot every few seconds, if you miss a shot even just once, between the time of being able to re-fire your chances of being hit by the opponent you failed to take out dramatically increases. So we need it to be perfect.

How do we train a model like this? It would be laborious to take thousands of screen captures and manually feed them into a network, and not only that you would not know if it was successful until after the training when you come to test it. So I decided that I would train the network in real-time.

How did I train the model in real-time? Well, it was simple really, I would use binary classification and I decided that when the reticule was not pointing at a target I would ‘de-train’ the network to signal a 0, and when the reticule was pointing at a target I would get the network to signal a 1, easy right? Well there’s one piece of information missing there, how do I tell the network when the reticule is over a target or not, if it’s not already trained to ‘know so’. Well, I decided to use that simple colour detection rule mentioned previously, using the less accurate simple colour detection I would train a more accurate neural model. Sure I could have also had a button on my keyboard I press and hold when the reticule is over a target and set up a game mode where bots move on a much slower time scale but it’s a lot faster to train off existing, less accurate models, like the simple colour detector. Just to clarify, the simple colour detection is being used as a capture on/off toggle, when the capture is on the network learns from 100 FPS of real-time game renders of the target object.

This actually worked out really well, and I was not even using an industry-accepted back-propagation method as mentioned earlier, in-fact what I was doing is feeding back the error as I made a forward pass, it was incredibly efficient just I guess you could say negligibly less accurate for my specific application.

The network was three layers deep, the first layer took in 3x3 RGB floats from 0–1 (3x3*3 — 3x colour channels per pixel) and output a 3x3 matrix of floats, then that 3x3 matrix of single floats was input into ten perceptrons as three different systems of dividing the 3x3 coordinate system; horizontal rows, vertical rows, and ‘quaterised’ rows, outputting a vector of 10 floats, finally that vector was put into one final perceptron which output one float, the binary classifier. No bias was used.

The third layer of 3 represents 10 perceptrons split into 3 classification techniques

And how did that back-prop work? Well it was like a normal back-prop apart from the loss at each layer was always expected to be 0 or 1 with the output prediction subtracted to get a simple loss gradient without any derivative (just multiplied by a learning rate), meaning the loss was never really back-propagated, each layer was trained based on the output of the last layer and the loss of the final neuron. It worked, wonderfully, and saved me having to do a second iteration backward to train the network after a forward-pass — it was all done in one single forward pass.

This method of back-prop was also improved as the layers to the neural network were trained in a series of three modes. The first mode Colour Aim trains the first 3x3 layer of the network, which allows the second mode; Neural Mode to be engaged, a mode that simply checks if any of the 3x3 perceptrons have been activated beyond the 0.7 activation tolerance (this mode is not trainable but verifies the accuracy of the trained neurons), then the third and final mode Deep Aim which uses the Neural Aim layer trained by the Colour Aim mode, and then trains only the ten space dividing neurons and last output neuron. The idea of Deep Aim is that it trains ten different perceptrons on different combinations of outputs from the Neural Aim layer; first, it splits the 3x3 input into four 2x2 chunks (one for each corner like a 2x2 convolution on a 3x3 input), the second split the 3x3 input into three horizontal lines, and the third split the 3x3 input into three vertical lines. Creating ten output floats. This is akin to a convolutional neural network, just on a very small scale and executed in a hard-coded manner — no dynamic looping. It was then up to the final neuron to weigh these ten inputs into one final binary output classification.

You can see from my code that I implemented an optional softmax before feeding the ten floats into the final neuron/perceptron, at the time I was not completely aware of how softmax was used in a neural network and assumed this to be an appropriate use of the function.

I even trained the network using a boolean network rather than floats, and it still had very good detection results, better than the simple colour clicker which was used to train the network, although I think a fair mid-ground would have been to use Uint8’s. The network used the ReLU activation function and once the network was trained, doing detection forward-passes using just integer math is pretty straight forward. Alas, I stuck to float32’s, but it’s food for thought.

The original GitHub repository for this project with all associated source code is located here. A refined release is available here.

If you would like to read pixels from the screen in the Windows operating system, you can use GetPixel() or CreateDIBSection() I have provided an example of the latter below;

void GetWindowPixels(HWND hWnd, BYTE* pixel_data, DWORD* byte_len)
{
 RECT rect = NULL;
 BOOL GetClientRect(hWnd, &rect);
 *byte_len = rect.right * rect.bottom * 4; BITMAPINFO bi;
 bi.bmiHeader.biSize = sizeof(bi.bmiHeader);
 bi.bmiHeader.biWidth = rect.right;
 bi.bmiHeader.biHeight = rect.bottom;
 bi.bmiHeader.biPlanes = 1;
 bi.bmiHeader.biBitCount = 32;
 bi.bmiHeader.biCompression = BI_RGB;
 bi.bmiHeader.biSizeImage = *byte_len;
 bi.bmiHeader.biClrUsed = 0;
 bi.bmiHeader.biClrImportant = 0; HDC hCDC = CreateCompatibleDC(GetDC(NULL));
 BYTE* bp;
 HBITMAP hBmp = CreateDIBSection(hCDC, &bi, DIB_RGB_COLORS,        
                                (void**)&bp, NULL, NULL);
 SelectObject(hCDC, hBmp);
 PrintWindow(hWnd, hCDC, PW_CLIENTONLY); memcpy(pixel_data, bp, bi.bmiHeader.biSizeImage); DeleteObject(hBmp);
 DeleteDC(hCDC);
}

And that is it, that’s how as a complete novice in Neural Network engineering I created my first functional binary classification network for the purpose of auto-shooting targets in Quake 3 using some basic knowledge of perceptrons and a little bit of intuition, which to be honest, is what the field of machine learning is founded upon, the intuition of many engineers over the years who pondered at what the best solutions might be.

What I ended up doing was training six different models of this network and them running them all one after another for the best results, so that’s six of the smallest forward-passes in the world, in tandem.

What was the lesson I alluded to learning? Well, that sometimes you don’t have to stick to convention and a little intuition can still get you brilliant results.

Here is a video of the network in action on the Elite’z Instagib server:

If you enjoyed this article and would like to read more consider reading “Creating a Machine Learning Auto-shoot bot for CS:GO. Part 1.”.

Training a Neural Network to Autoshoot in FPS Games.

Written by Fletch