Tennis Shots Identification using YOLOv7-Pose Estimation and LSTM

Published in

Augmented Startups

2 min readJan 9, 2023

Figure 1: Forehand- Ground Stroke identification using YOLOv7 Pose Estimation and LSTM

In this article we will go into the details how we can use YOLOv7 Pose Estimation model and LSTM model combine to identify different tennis shots.

In this project, I have classified 4 different tennis shots which include

Forehand- GS (Ground Stroke)
Backhand-GS (Ground Stroke)
Forehand- Volley
Backhand-Volley

As there are 8 basic shots in tennis, so you can further take this project and identify other tennis shots as well, which include serve, return, slice and overhead shot.

Let’s look into the details how we can train an LSTM model to identify 4 different tennis shots which include (Forehand-GS, Backhand-GS, Forehand-Volley, Backhand-Volley).

To implement this project, the following steps are followed.

The dataset used to train the model was gathered online from different YouTube videos and the 4 above mentioned tennis shots were separated from these videos.
Now, each video frame was fed into the YOLOv7 Pose Estimation model and predicted key points and landmarks (X-coordinates, Y-coordinates and the confidence) were extracted and stacked together as a sequence of 30 frames.
For each class label this process was repeated and the results are then saved in a Numpy array. The input data of the LSTM model is of dimension (n*30*51), where n is the number of training examples. Each sequence contains 51 features (X-coordinates, Y-coordinates and the confidence score of the 17 key points).

Code Snippet for KeyPoints Prediction and Different Tennis Shots Classification

Model Training

LSTM model is used for training , why I preferred LSTM, because LSTM has a feedback connection and is capable of processing the entire sequence of data apart from single data points such as images.

Code Snippet of the LSTM Architecture:

Model Testing and Prediction

The keypoints prediction from the Pose Estimation model is stacked as a sequence of 30 frames and passed to the trained model for classifying different poses. The model return the class probabilities and then np.argmax is used to determine the index position of the class which has te highest probability.

Demo Video:

Demo Video

Tennis Shots Identification using YOLOv7-Pose Estimation and LSTM

Written by Muhammad Moin