Using AI to Interact with Human Agents

Published in

Bucknell AI & CogSci

10 min readDec 22, 2019

by Bryce DiRisio, Zeb Gordon, Eli Mauskopf, and Hung Giang

When our team first came to address the final project for our Artificial Intelligence course at Bucknell University, one of the main things we wanted to focus on was the use of AI within the video game industry. Ever since the creation of video games, developers have been seeking out ways to have autonomous agents make decisions with simulated intelligence. In this sense, companies would be able to create agents that are capable of making logical decisions and choices, much like a human. Not only that, but the degree at which an AI agent can closely follow logic could be modified in order to introduce certain levels of difficulty to the consumer. By doing so, programmers could hope to ‘mimic’ human behavior virtually, and create autonomous agents capable of carrying out gameplay through interaction with the agent. This could take many forms, such as a hitting a ball with a paddle in ‘Pong’, or randomly generating enemies with a curated strength to provide the appropriate challenge to a player. Thus, artificial intelligence has played a historic and integral part in the video game industry. Our team was chosen to investigate the role of AI within the entertainment industry, and how it interacts with humans in order to ‘play’. When we initially did industry research for AI application, we immediately gravitated to this idea of using AI to optimizing a game. Using the Emergent Tool Use article (https://openai.com/blog/emergent-tool-use/) as inspiration, we decided to follow the path of creating our own mini game and having our agent deal with solving a very limited problem space, while interacting with a human-controlled agent. Our team set out to have an AI agent be capable of learning from interactions with a human agent, and using those interactions in order to train its own path finding algorithm in order to track down the player agent and ‘tag’ them.

We chose Unity because it helped us narrow the scope of our problem and provided an API for Python based neural networks to interact with game objects within Unity. Therefore we could focus on the issue of properly training an agent rather than building a platform to do so. More specifically, we used the ML-Agents package within Unity as the core of our project. This package provides several tools to effectively train an Agent in any environment. We created an environment which consisted of a ball, a target and randomly generated walls. Our goal was to train the ball to reach the target while avoiding walls. We accomplished this by setting rewards for our agent if it successfully reached the ball. Additionally we were able to collect information on each frame of the simulation that we can pass into our neural network, such as the target location, ball location and velocity. Finally, by using the Python API, we were able to train the ball using Proximal Policy Optimization and having the simulation run thousands of times in order to maximize the reward.

PPO was created by researchers at Open AI and is described aptly on their site, “PPO strikes a balance between ease of implementation, sample complexity, and ease of tuning, trying to compute an update at each step that minimizes the cost function while ensuring the deviation from the previous policy is relatively small.” Furthermore, the neural network is flexible in that it allows for two different types of learning , specifically reinforcement learning and imitation learning. For our project we decided to use reinforcement learning. Reinforcement learning is when the neural network optimizes policy by attempting to maximize expected rewards. We chose reinforcement learning because our environment was dynamic since it relied on player input. We wanted our AI to be able to adapt to new situations better.

The AI (white) and Human (black) controlled agents interact with each.

The first environment we trained in simply had the ball try to find the target without player input. The target ball would change its position when the agent either touched it or fell off the map. We started off without player input in order to get a working model that could chase the ball. Below are the results after thousands of training iterations.

Tensorboard data for training without player input

Overall, the agent learned extremely well in this environment. The cumulative reward steadily increased and averaged out at 1 (the max reward) after about 22 thousand simulations. Furthermore, the amount of time it took for each simulation was cut in half by the time training session ended. This means that the ball was able to find the target quicker as time went on. The next environment we trained the agent in incorporated player input as described in the implementation section. The results were more of a mixed bag.

Tensorboard data for training with player input

Unfortunately, we were not able to achieve an average maximum reward during this training session. This was quite evident when we tested the model generated by this training session. As players moved away from the ball the agent was noticeably slower at adjusting to changes and simply could not keep up. Additionally the agent would often fall off the map when trying to make quick turns and adjustments. Lastly, the episode length varied significantly more during this session as there were many instances where the ball would neither fall nor catch the target.

The first concerning aspect of gaming and machine learning is data collection. Without question, gaming is an important part of entertainment for a lot of the population. According to the Entertainment Software Association of Canada, 80 percent of Canadians consider gaming “mainstream entertainment”. Currently, gaming companies have been collecting data from their player bases, varying from in-game decisions to individual chat messages. Platforms like massively multiplayer online games (MMO) make it easy to collect different forms of information from many sources and players. The same research from ESA shows that 64 percent of Canadians have played an online video game within the past month, and almost three quarters of all parents (71 percent) play video games on a weekly basis with their children.

For the ethical implications that go behind our project, our group wanted to focus on in-game information of players, rather than information that could be found on a profile (i.e. username, picture, etc.). This information can be consist of, in-game actions, player’s actions per minute (APM), in-game chat logs, and more. We found this to be more relevant since machine learning within a multi-player game needs to learn from the players, since this information is more relevant towards what decisions players make in order to ‘win’ at a game. That being said, collecting this information at face value is not totally harmless. For example, there has been a few research studies done on player’s psychological behavior based on in-game information. With player profiles, in theory, games provide enough information to expose a part of your own personalities. And with the rising popularity of VR gaming, in-game action can be more than just click and press keystroke — revealing even more about yourself.

Another ethical issue that can arise from this is cheating at games. The fact that there exists a definitive winner and loser in games encourage players to find every way possible to win, whether it be for the satisfaction of victory, or just to feel superior. Usually in many games, a player’s skill needs to be honed and trained to improve, just like sports. However, the existence of cheating tools subvert this fact, and enable less-skill player to win against tougher opponents. This ruin the gaming experience of all players in the game, and even that of the cheaters themselves. Machines have been able to study, and “solve” many games before — defeating even the best of human players. With the aid of such devices, cheating tools can become even more powerful in the future. Developers can look into the decision making process of AI, or analyze and design a smooth, more human-like aimbot. Without proper regulation, the abundance of cheaters will suck the fun out of, and drive people away from online games.

Finally, Esports viewing popularity has soared in recent years as a form of sport competition within video games. It has become a significant selling point in many video games in the industry, with top game developers actively designing and balancing their games around the top players. It provides entertainment not only to professionals playing the game, but also to millions of people watching throughout the world. If machine learning can improve cheating tools, it can also help developing training tools for professional. These tools can be applied towards data analysis, AI opponent simulation in game, or even predictive decision making. It is interesting to think about how much of outside training tools would be allowed with Esports, since it is not so similar to traditional sport.

In the book The Grasshopper: Games, Life and Utopia written by Bernard Herbert Suits, the author proposes a definition for game that would later be cited and used widely in game theory study. He defined several concepts:

Prelusory Goal: the state of game where the player wins. This goal exists outside the game and must exist for the game to exist.

Lusory Goal: how to win the game. This is the goal of the player and only appears after the game has started.

Lusory Means: the ways of achieving the goal that are permitted by the rules.

Constitutive Rules: The set of rules of the game.

Lusory Attitude: The attitude of the player of accepting the rules of the game in order to engage in the activity permitted by the game.

Then the definition of playing a game: “To play a game is to attempt to achieve a specific state of affairs [prelusory goal], using only means permitted by rules [lusory means], where the rules prohibit the use of more efficient in favor of less efficient means [constitutive rules], and where the rules are accepted just because they make possible such activity [lusory attitude]”.

Machine Learning was developed with embedded (explicit or not) rules of the game within its study path. An AI agent analyzes the state of the game, and try to “play” so that it achieves the highest reward possible, where the reward is predefined base on the rules and goals of the game. In our project, our autonomous agent is actually trying to achieve the goal in the most efficient way (resulting in yielding the highest reward). With the aforementioned definition of playing games, the rules exist to ensure that players achieve the goal in a less than ideal way. This contradicts with the purpose of the agent, since we can say that it is ideal for the agent to follow the rules.

Let’s consider a perfect world in the future, where everything has a tool allowing it to be done efficiently, with or without human’s participation. In such a world, we have an abundance of time on our hands relative to today. In that void, entertainment would exist as something gives meaning to our actions. Let’s then consider machine learning and AI in general gaming: sports, online gaming, and more. These AI agents, given enough time, will become the perfect player of that game, besting any human opponent. There are two scenarios that arise out of this:

The first consequence is that there would be a league formed for AI and only AI, and human can entertain by watching the AI duel it out, much like watching games of soccer or esports. This would create a new form of entertainment for humans, watching AI agents fight each other. We would no doubt run into concern about the ethic of working the AI without pay, but that’s for another discussion. The second consequence is that players will be discouraged from playing the game and improving their skills. The goal of games for many is the satisfaction of achieving something, of winning. If we are up against something that we cannot defeat in our physical states, it’s demoralizing and soul-crushing. An increasing amount of individuals are building their lives alongside video games, and a rise in AI players would take away from the human player experience.

In trying to find an intersection with our interest and machine learning, our team has developed an artificial intelligence to play a minigame and train its input based off actions the user inputs. There are many potential ethical implications that could plague an application like this in a future. Short term, there are many noble uses for AI agents to be implemented and played around with in video games. For example, these agents can be useful for looking into the decision making that goes into playing a games. Not only that, but we can analyze the decisions they make, or even use them in order to find optimal solutions that were not previously discovered. However, there is a darker story to be said for the long term use of collected, analyzed, and shared private data. Not only that, but even the implementation of artificial agents in video games can be somewhat controversial. Having autonomous agents that go out and mine this information during a match can create a very unfair video game for humans to interact with.

Entertainment is on the rise around the world, and with how much capital is being invested and listed for prize money, human competitors are going to feel even more pressure to try and implement these agents. Advancements in AI learning and application towards video games on the market are steadily on the rise, yet ethical discussions and regulations are almost currently nonexistent. If these trend stays on course for the next couple years, we could easily see a higher prevalence of cheaters utilizing these AI agents for self gain and other ‘unethical’ purposes.

Citations

Unity-Technologies/ml-agents. (2019). GitHub., from https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-ML-Agents.md
Proximal Policy Optimization. (2017). OpenAI., from https://openai.com/blog/openai-baselines-ppo/
The Grasshopper: Games, Life and Utopia. Suits, Bernard (1978).
Emergent Tool Use from Multi-Agent Interaction. (2019). OpenAI. from https://openai.com/blog/emergent-tool-use/

Using AI to Interact with Human Agents

Written by Bryce DiRisio