Our winning approach to a data science challenge in Junction 2018

Published in

paubatlle

8 min readDec 3, 2018

The biggest hackathon in Europe, Junction 2018 took place one week ago. It is a huge event that unites 1300+ people coming from all over the world competing over the weekend for different prizes in a huge variety of tracks and over 30 challenges. I participated in the Epic Games Data Science challenge together with fellow Polytechnical University of Catalonia (UPC) students and good friends Ramón García and Guillem Ramírez, and we won the first prize of the challenge. In this post I would like to present the problem we faced and our solution, built in just under 36 hours. The code for our solution can be found here.

The problem

The problem was centered about the relatively new free-to-play multiplayer action game The Darwin Project, developed by Canadian team Scavenger Studio. Specifically, the challenge was about using data from game matches to build a tool to help players of different skills to improve their performance on the game (a sort of a game coach, if you will) based on the game data. Apart from the data, we were also given the chance to play the actual game and talk to some of the game developers, which turned out to be a great experience and helped us a lot to understand the data. The data they gave us was really extensive, including the player positions, orientations, weapons and actions at each frame of the game for different matches, totalling around 20 GB.

The map in The Darwin project is formed of seven hexagons

Data exploration

The first few hours were all about diving into the huge amount of data. We combined data processing with playing the game, which lead us to a very accurate image of what were the different possible strategies to succeed and what might be differences between players. Playing the game was really a shortcut to understand some of the data, starting from simple things such as the shape of the map, which as shown in the figure on the left, is formed of 7 hexagons. That helped us to create different visualizations of matches such as the following one.

Did you notice something odd in the trajectory of the yellow player? There are indeed portals in the game that let you teleport across the map (funny thing, we discovered them in the data before we discovered them in the game). As mentioned, we were also given data about the player orientation (via orientation quaternions) and 3D-Field of View of the players, in particular we could know whether a particular player was within the field of view of another one, which we used to define one of the key terms of our project. An encounter from player A to player B (we say that ‘A started an encounter towards B’) is defined as the event that happens whenever player B enters the field of view of player A (and A actively moves towards B afterwards) and ends whether B has taken damage from A, some time has passed or B has escaped from A’s field of view.

Visualization of a 2D projection of the map and field of view of the different players

The fact that this is not symmetrical is key: not every time that B is looking at A, A is looking at B, and as explained later this asymmetry helped us to define some useful metrics. We created a visualization that takes into account player orientation to better explain what we mean by field of view and player orientation. In the video, the area between the two lines arising from each player is the direction the player is looking, and if it is inside the red circle, it means that the player would be able to see it.

After this data exploration and getting familiarized with the game stage, it was time to decide which kind of application would suit the players the most and what information should we portrait there. The difficulty here in comparison to other Data Science challenges I faced is that the most important thing was not maximizing a particular metric but improving the user experience for players of different skill levels. We concluded that the best approach was to present the player with a dashboard about its performance in the game given by different metrics, and also tips for improving. However, the game can be played in different styles and we also wanted to take that into account. Therefore, we divided the metrics we wanted to show the user in objective metrics and subjective metrics. By objective metric we mean metrics where, without any doubt, having a higher score means playing better, and can help to distinguish between different skill level players, and by subjective metrics we mean metrics that take more into account the style of play of the player and usually represent a trade-off between two aspects of the game.

We choose metrics that were both interpretable and correlated well with the different labeled players we had (few of the players we analyzed already had a tag of ‘beginner’, ‘intermediate’, or ‘pro’ and we used them to validate that the metrics made sense). Find below the specific definition of each of the metrics used to analyze a player performance. Feel free to skip the section if you are only interested in the final project submission.

Objective metrics

Offensive Rating: Defined as the ratio of encounters started from the player to another player that ended in the player damaging the other player over the total encounters started from the player

Defensive Rating: Defined as the ratio of encounters started from other players towards the player that ended in the player not taking damage over the total encounters started towards the player.

Backstab Rating: Defined as the ratio of encounters started from the player that were not matched (this is, the other player did not start an encounter towards the player because he was not looking at him, so we can say the attacker “backstabbed” him) over the total encounters started from the player

Back defense Rating: Defined as the ratio of encounters started towards the player that were matched (this is, the player was looking at the attacker and was therefore not backstabbed) over the total encounters started towards the player.

Accuracy: Ratio of successful weapon shots over total weapon shots.

Electronics Control: This one needs a detailed explanation and was one of the most interesting things we did in the challenge. Remember how the map was formed from seven hexagons? There are seven points in the map which are roughly in the center of each hexagon that are important for the game because at any time an electronic device can fall in one of those.

Voronoi diagram evolution in the beginning of the game

Voronoi diagram evolution in the late game

An electronic device is a scarce resource in the game that gives players a huge advantage because using it they can craft a powerful modifier. Therefore, being able to reach the key points before other players is really important. To check which player would arrive first to the point, a Voronoi diagram from the player points was drawn at each frame (and we also animated the diagrams as seen in videos on the left). A player’s Voronoi region is defined as the points in the plane such that they are closer (euclidean distance) to this specific player than any other player. Therefore, knowing which player’s region contained the key points we knew roughly which player would be able to reach the electronic device first, and therefore we awarded points for having these points inside their Voronoi region, and therefore this metric takes into account the amount of frames and the amount of key points that the player had inside its region. Note how this approximation did not take into account the portals mentioned before nor possible speed improvements player can obtain during the game. These details can be taken into account (although they make the algorithm to calculate the diagram far more difficult and we did not have time to do so during the event) and in fact I coded something which did, but since it was not part of our hackathon project I will leave that for a (close) future post.

Subjective metrics

Risk: Takes into account how many encounters the player has willingly created.

Greediness: Takes into account the trade-off between stockpiling resources to craft better equipment later on or use them right away to craft cheaper items.

Altitude Control: Ranks the average altitude in the z-axis of the player, which is also part of a trade-off since higher altitudes means more visibility but less resources.

Opponent proximity: Ranks the average closeness between the analyzed player and the other players.

User interface and final project submission

All of the metrics are given to the player in a screen similar to the one below. Subjective metrics are ranked from -1 to 1 where 0 is the average of all players analyzed.

There is also screen where the player is able to know specific tips for improving in the game. We identify the skill level of the player using the objective metric score and based on that, we take more or less into account the subjective metrics for the final tips. The idea behind that is for example that if a ‘pro’ player plays risky but gets kills and his performance is high we should not ask them to stop playing that way (we should however inform him in a descriptive way), but if we detect a beginner player being too risky and getting poor results as a result we should totally ask him to be more careful in the future.

Different tips for different player levels. Beginners (left) should be given more clear and general insights and professionals (right) can be given more specific insights using the metrics taking less into account the playstyle

The combination of clear objective metrics, interesting subjective metrics and final tips gives the player both clear, concrete tips for improving and numbers to check whether it is improving or not, and this is true for players of all skills, which was one of the difficult aspects of this challenge and one of the main reason the judges awarded us first prize.

Conclusions and takeaways

In conclusion, this challenge was a very enjoyable learning experience. My biggest takeaway is the importance of the context of a given dataset. As we played the game, more and more of the data began making sense to us. We drew conclusions about the importance of certain features/actions that would have been very difficult to acquire by just looking at the data. But not only did we learn a lot about the data by playing the game, but we also learned some game strategies by looking at the data! Data and context being together in this challenge was really important to not get lost in the gigabytes of numbers. It was good to see a data science challenge that focused in interpretability and user experience rather than maximizing a score metric. Creating metrics that were both interpretable and useful was not a trivial challenge.

Finally, a huge thanks to my teammates for being so hard-working yet amazing to work with and also to the Epic Games and Scavenger Studio team for making the challenge possible and helping us a lot during the weekend in the development of our tool.