Deep Learning in the Deep Side of the Pool

Working with latest technologies is quite fulfilling job. You can explore the unexplored, you can create new things, that never existed before and you can create new opportunities for someone else to do the same thing. Revolutionary technologies can have impact on many if not all the industries. I believe artificial intelligence is one of those technologies nowadays.

One of my learning strategies is that I apply my technical expertise in my non-technical interest or hobbies. What I usually do is that I’m trying to figure out how I could apply the given technology in my hobbies. This helps me to focus on the important details, which really make sense and it also gives good motivation to learn more and more in order to make the final solution better.

In this article I will explore the application of deep learning for live sport broadcast of a less-known sport, underwater rugby.

About Underwater Rugby

Underwater rugby is a team sport typically played in the diving well of a swimming pool. The goal is to place a negatively buoyant ball into a basket at the bottom of the pool. One team is usually divided into 3 pairs of players based on the three common positions, forward, defender and goalie. These positions help the players to focus and optimize air consumption, so they can make sure that there is always somebody protecting the goal or receiving the pass.

© Daniel Naujoks — New Jersey Hammerheads Underwater Rugby

The sport is most popular in Europe, but USA, Asia, Australia and Colombia are also catching up. A good example is our team, that didn’t exist 4 years ago, but now we have 18 players in average and we practice in 2 pools in San Francisco Bay Area.

Since this is the only sport played in 3 dimensional space, it gives you a unique experience to play, but it’s very hard to capture the same feeling with cameras. So this is the problem I’m trying to solve with technology.

About Deep Learning

Deep learning is a branch of machine learning, which attracted a lot of attention in the last few years. It’s based on the old concept of neural networks, but spiced with many tricks to make it a very powerful learning machine for hard problems, like speech recognition, translation or image recognition.

The basic idea in Layman’s term is that you give your inputs and the desired outputs to a big neural network and it will learn how to make that inference. This knowledge is represented as thousands or even millions of numbers, which control how the information flows in the network. In contrast to traditional machine learning algorithms one key difference is that, it can work on the raw data, like images or sound, so you don’t need to do any simplification on them, that we call feature engineering.

There are many unresolved problems though, which make this type of algorithm hard to use. It requires usually millions of examples, which means that you have to collect the input and provide the outputs that many times. You must have a unique expertise to tune those networks, there are many traps. Finally, you will also need powerful machines to speed up this exploration process. But things are changing very quickly, as many scientist are working on this field at the moment, so it’s good to keep eye on the recent results.

Practice highlights

Let’s see how we use this technology in “practice”.

Yes, we record our practices “for quality and educational purposes”. We mount one camera close to the basket and record the whole 1-hour long game. Although the game can be very intense, usually only 25% of the footage shows real action, which makes the replay pretty boring. So the idea is create highlights and this is where deep learning can help.

In this situation, we want to teach the machine to recognize if there is any action going on in front of the camera. This problem is called binary classification problem, where you want to produce only a simple yes or no answer to a given input. In order to teach our machine, we have to show many examples for both cases just like the following examples:

Sample image from our practices

This type of neural network is called convolutional networks. Once we trained it for hours, we can use the resulted model to recognize the frames with action. As a last step we use a video editor library to remove the boring parts and keep only the action packed seconds.

One might think, why don’t we use something simpler, like detecting the ball based on color or motion in the video. Those can be also perfect solutions and probably less complex to run, but it will be also very specific to the given sport. What we wanted to prove here is that this algorithm can guess only based on our intention, and we don’t need to tell how it should be done. So for example if we have many raw videos and corresponding highlights for a given sport, a machine could learn the logic how to create highlights automatically.

Multiple cameras

The same approach can also work with multiple cameras, as long as their field of view don’t cross each other that much. Once they do it’s a more complex and perhaps aesthetic tasks to cut the videos, but it could be also possible.

The following video example is a good example for this kind of input:

Practice of Newark UnderWater Recreation

Other ideas

Marking the ball

Underwater rugby is a 3-dimensional sport, which makes very hard to follow the ball, because it can be covered from any angle. One algorithm could detect or predict the location of the ball and mark it on the screen even it’s hidden on the given frame.

Marking the ball

Removing bubbles

One of the challenges of broadcasting underwater sports is that the video quality really depends on the water quality. In contrast to soccer for instance, it’s impossible to use a camera to cover the whole field, because very often the goal is not visible from the opposite side. Even if we had super clean water, bubbles will distract the view for sure.

There is algorithm to reconstruct a very realistic picture from blurred or partial pictures. However, at the current stage of this method it takes a lot of time to run for a single image, so it can’t work on a live video.

Down-time for players

Tracking the players will enable us to create statistics or on-screen information about the players. This will make the games much more interesting and dramatic to watch. Experienced viewers, who are usually the players from other teams, have the sense to notice if the goalie will need to go up soon, but for a wider group of new audience this would give more fun.

Want more rugby?

If you are interested in playing underwater rugby and located in San Francisco Bay Area, feel free to join us.

Discussion

I want to learn more about more ideas for underwater rugby or other sports. Please comment or contact me if this article inspired you.