The Hack Behind Sequential Data: Recurrent Neural Networks

I. Recurrent Neural Networks

If you were to go to Google Docs and type in the phrase “This article provides”, it would attempt to autocomplete your sentence with words like “This article provides an overview of…”. Behind this technology of Text Auto-Completion is a Deep Learning architecture called Recurrent Neural Networks (RNNs). The process of auto-completion falls under the broad umbrella of machine learning, which is an ever-growing field dedicated to the understanding and use of methods to learn from data. Furthermore, deep learning is a subset of machine learning in which a model progressively learns higher-order properties of a dataset. The structure, or set of structures, most commonly associated with deep learning are neural networks.

In short, neural networks consist of multiple layers of one or more nodes. Layers extract information from an input and pass it to nodes in subsequent layers. There are numerous kinds of neural networks, including artificial neural networks (ANN), convolutional neural networks (CNN), recurrent neural networks (RNN), etc. Typically, the type of neural network is chosen based on properties of an input, as different neural networks are best suited, based on their architecture, for certain inputs. For example, a CNN is most frequently used with image data and an RNN, as detailed in this article, has its applications in sequential data. Though many times different on the surface, most neural networks evolve from an artificial neural network, the simplest form.

Figure 1: Simple Neural Network Structure

Shortcomings of Artificial Neural Networks

A simple artificial neural network has three underlying parts: an input layer where the number of nodes in the layer are specified, hidden layers where a model performs mathematical operations to create meaning out of data, and an output layer that presents the results of the model and where the number of nodes in the layer are also pre-determined. In an ANN, a datapoint moves through these layers in a feed-forward fashion and then the loss (or error) is propagated back through the network to optimize the model. However, for applications like sentence autocompletion, language translation, and video game cheat detection, this type of neural network is insufficient for broadly three reasons.

Reason 1: Input Variability

An ANN’s performance is extremely susceptible to variability in an input, like, for example, differences in lengths of sentences that are used as input to train a model. Take the case of language translation where a user would like to convert the sentence “I want a cake tomorrow” to Spanish → “Yo quiero un pastel mañana”. Solving this problem with a traditional neural network would be non-trivial. Say that to approach this task we design a model where each node in the input layer is a word. In this case, it would be nearly impossible to train a model where the amount of nodes in the input/output layers are specified because the potential number of words in a sentence could be endless. Since we would have to pre-determine the number of nodes, we could have inputs like “I can” and “I want to go to the beach because it is sunny tomorrow and the water will not be cold” and have to treat them basically the same way.

A solution could be to have an immensely large number of input nodes to account for sentences with a lot of words. However, if there are just a few words in a sentence, then many nodes would simply be zero. Shorter sentences would lead to a large number of zero-valued nodes, which would waste memory.

Reason 2: Semantically Identical Inputs

Because ANN’s can only perceive input in a strict order, it fails to produce similar output for inputs that are not lexicographically but semantically identical. Now take the two similar sentences “I want a cake tomorrow” and “Tomorrow I want a cake”. These sentences have the same meaning and thus, should have the same translation. In the case of an ANN however, the difference in the position of “tomorrow” could force the word to take on a different meaning from the model’s perspective. Since the node corresponding to “tomorrow” for the first sentence would be different from the node corresponding to it in the second sentence, this input node would activate different nodes in subsequent layers and potentially conclude a different output.

Reason 3: The Problem of Order

An ANN cannot preserve the sequence of input data in an efficient and straight-forward manner . For the sentence autocompletion example of “This article provides”, an output depends on the order of the words in the phrase. A reordering of the phrase like “An article provides this” is different in meaning to the original phrase and thus, should theoretically change the autocompletion. For ANN’s, any reordering just has random cascading effects throughout the rest of the network as ANN’s cannot associate any significance to order.

The Fix: RNN’s

A recurrent neural network model addresses many of the shortcomings of ANN’s regarding sequential data. A simple RNN works as follows:

  1. Take the first element of the sequential data, x_1.
  2. Pass x_1 through the network performing the necessary mathematical operations and applying activation functions. The final output of this step is y_1.
  3. Take the next element x_2. Pass it through the same network, except this time, you will pass in y_1 as a part of the input. y_1 is the output from the previous element (in this case, first) in the sequence and acts as the model’s “context” as it progresses through the data.
  4. Now repeat until all the data is passed through and the model is fully trained. The general algorithm is to take an element x_i (starting from the first element) and formulate an output y_i. Then take the next element x_i+1 and pass that and y_i (the previous element’s output) through the model. y_i serves as the cumulative context for the sequence.
Figure 2: RNN Structure

The issue of the ambiguous input size for ANN’s (as described earlier) is addressed by RNN’s because an RNN operates element by element. There is no need to specify an input size because RNN’s work iteratively through the whole sequence and will automatically stop at the end of the sentence.

The second and third issues about sequence processing and position are addressed by the context term y_i. This makes sure that sentences like “I want a cake tomorrow” and “Tomorrow I want a cake” are treated the same. For example, y4 represents the context term for “I want a cake” (first 4 words), so when the model gets to “tomorrow”, it adds the time and tense associated with “tomorrow” to the sentence. Similarly, for “Tomorrow I want a cake”, y4 corresponds to “Tomorrow I want” and when the model arrives at “cake”, it completes the sentence. In this case, the time and tense are already present within the context and the thought of the phrase just needs to be completed. Overall, y_5 (the overall output of the model) will be the same for both sentences.

Overall, the main calling point behind RNN’s is its ability to preserve the sequential properties of the input data. By maintaining a context term as an input, a model doesn’t lose important information that it learned in previous steps. From a broader perspective, it is extremely similar to how we, as humans, learn. When we read, for example, we base our interpretations of the text based on our previous knowledge about plot events.

II. A Practical Application: Video Games

As many online video games are played on users’ local computers, developers must enforce integrity rules in their game’s terms and services. Regardless, people still download cheats and hacks on their computers to gain a competitive advantage on other fellow players. It is up to the developers again to program detection for such hacks and penalize cheaters accordingly. The algorithm for detecting such hacks sits in the heart of data science, specifically within recurrent neural networks.

Background for Video Games

Virtual shooter games, one of the most popular genres of video games, usually involve a simulated environment where players, in an online server, try to eliminate others with the game’s set of weapons. Some examples of such games are CounterStrike: Global Offensive, Valorant, Call of Duty, and Fortnite. Companies that own these games, like Activision, Epic Games, or EA, have their own cheat detection systems that generally scan for illegal installations. Recently, however, machine learning techniques have begun to analyze game scripts, in a manner similar to fraud detection, to understand any discrepancies between typical gameplay and illegal gameplay. This was a response to how newer cheating software that started to adapt and evade a game’s screening for hacking programs.

There are generally 3 types of hacks that can be induced by an installation of unauthorized software.

  • Aimbot — In many shooter games, aiming can be one of the toughest aspects. An aimbot automates all the aiming for a user so that they do not miss any shots. A player’s crosshair would lock on to the body of another player’s character with the cheating player having to aim almost minimally.
Figure 3: AimBot Example
  • WallHacks — The thrill of many shooter-based video games is not knowing where other players are nearby. The frame of view of a player is often limited by buildings, cars, trees, and a game’s settings/restrictions so that their knowledge of the game’s environment is constrained. WallHacks allow for a player to know where other players are at all times, which makes it extremely easy for a cheater to position themselves in advantageous spots to outplay other players, who thought they were hidden.
Figure 4: WallHack Example
  • Mechanical Assistance — Many games limit the speed and movement of each player. That is, characters can only sprint at a maximum speed and can only move in certain directions. Mechanical assistance allows a player to evade these limits and play the game in a way that is not intended by developers. Cheats would allow a character to, for example, sprint faster or perform maneuvers that were previously impossible due to the regular game mechanics.

(Note that there are other forms of cheating that are non-software-related like player collusion and illegal communication. These are extremely hard to detect and are not the focus of this application)

The Sequence: RNN’s

To understand the role of RNN’s, we must first understand the type of data a model would be trained on. To simplify the scope, this article will focus on CounterStrike: Global Offensive (CS:GO) and will relay interesting points and methods discussed in a study found in this paper.

The structure of the data comes in the form of demo files. A demo file is a collection of ticks. A tick is a packet of data containing the information for each event that happens within one game. A demo file (.dem) contains all the ticks for one game in the order that they occurred. For CS:GO, such files are publicly available for every competitive/tournament game played.

Figure 5: Demo File Example (See paper)

A training of a model with the aforementioned data would occur in a supervised manner. One would have to prepare data from cheaters and non-cheaters and label them as such. An unsupervised, clustering approach would be especially difficult because it would be tough to determine what exact feature(s) the model is categorizing off of. It would be hard to tell if the model is clustering based on skill level, experience, or any other characteristic without doing a deeper analysis into the results. A supervised approach makes it clear from the get-go what the purpose of the model is.

The fact that demo files retain the sequence of game events is especially useful as it allows for the application of recurrent neural networks. The analysis of a demo file is analogous to that of a sentence completion algorithm. Just like how previous words provide context for the generation of future words, past events provide context for events to come. For example, if a log in the demo file details that one player may have seen another player based on their angle of vision/crosshair position, and then subsequently shot at the other player, we may not suspect WallHacks. If a log shows that there was no way that Player A could have seen Player B, we would assume that Player A did not shoot at B. However, if they did, we could suspect the cheat. Because it matters what order these events occur and how they relate to past events, RNN’s fit nicely.

You can think of the role of RNNs in cheat detection in a similar manner as to how one would watch a soccer game. You watch a soccer game as it progresses, from minute-to-minute, from second-to-second. What happens on the previous play directly affects the results of following plays. You wouldn’t say that you’ve watched a soccer game if you just read the summary at the end of the match. This would be analogous to dumping a whole demo file into an ANN and trying to make reason out of it. Relationships due to time, which are inherent in the data, would not be preserved in this case. Because of the sequential properties of RNNs, a network is able to run analysis on the data as it progresses throughout its time steps. It can use any amount of previous information to make decisions on any future data.

III. Conclusion

The inner-workings of a recurrent neural network address many of the issues of an artificial neural network when it comes to dealing with sequential data. Because an ANN cannot preserve the order of an input, it only generates an output that is formed by looking at the data as a whole. An RNN, however, is able to accumulate information about an input in the order in which it unfolds. This is achieved by updating a term that serves as the context of pieces of the input that the network has already processed. In the video game setting, an RNN is especially useful in detecting cheaters. Since game logs are stored in a sequential manner, one can use an RNN to move through a game and detect events that were caused by illegal maneuvers or installations. Due to how common events in our lifestyle can be transcribed as sequential data, the applications of RNNs are endless. Detecting cheats in video games, autocompleting text, and translating human language are all mentioned throughout this article but other applications include speech recognition and basically, any forecasting in a wide range of domains.

--

--