A.I Exercise Posture Assistant using GPT-4 Vision with a FastAPI backend and a Streamlit frontend

Dream AI
7 min readMay 2, 2024


Author: Rafay Farhan at DreamAI Software

In today’s digital age, as our lifestyles become increasingly sedentary, fueled by prolonged screen time and idle habits, the importance of physical activity for both our physical and mental well-being has become undeniable. In light of that, the significance of maintaining proper posture during exercise cannot be overstated, as it directly impacts both the effectiveness of the workout and the prevention of potential injuries. Incorrect posture can lead to muscle imbalances, joint strain, and decreased range of motion, hindering progress and increasing the risk of long-term complications. To address these issues, welcome to our latest innovation in fitness technology: the AI Exercise Posture Assistant. Leveraging the power of the OpenAI Python library, we’ve developed a cutting-edge software solution designed to revolutionize your workout experience. Say goodbye to improper form and hello to optimized performance, as our AI system analyzes your movements in real time, providing personalized feedback and guidance to ensure you achieve the perfect posture for every exercise. Whether you’re a seasoned athlete or just beginning your fitness journey, our AI Exercise Posture Assistant is here to help you reach your goals safely and effectively. Let’s dive right in and get moving!

The GitHub Repository for this code


Typically, I prefer to organize tasks by breaking them down into distinct phases. Given that we’re dealing with three primary components, we’ll split the work into three phases accordingly:

1. Leveraging the power of GPT-4 Vision (with some video processing utility Functions)

2. FastAPI backend

3. Putting it all together in Streamlit

How to use the OpenAI API

Before getting into coding, it’s crucial to understand how we’ll utilize the OpenAI API. Firstly, it’s important to note that it comes with a cost, and you can find all the pricing details at the provided link. However, rest assured that to accomplish our objectives, we’ll strive to minimize expenses as much as possible.

Upon completing the purchase, you’ll receive a confidential API key, granting access to the capabilities of OpenAI. Here’s how you’ll import OpenAI and integrate your unique key.

from dotenv import load_dotenv
from openai import OpenAI

load_dotenv() # Loads your API key from the .env file
oai = OpenAI()

MODEL = "gpt-4-vision-preview"

Phase 1: Leveraging the power of GPT-4 Vision (with some video processing utility Functions)

The focal point of this application lies in the OpenAI library, utilizing pre-trained vision models to accomplish our objective. There’s no necessity to create a dataset or conduct training since the models are already proficiently trained, hence their usage isn’t free. In essence, our app’s task is to analyze the input video provided by the user and generate descriptive feedback on the quality of the form and suggestions for improvement, akin to guidance from a personal trainer. This is where prompt engineering becomes crucial. The aim is to prompt the application to think and respond like a personal trainer would, providing specific instructions tailored to guiding the user effectively. It’s akin to providing an actor with a script and detailed character traits. The prompt’s objective is to be as precise and detailed as possible, essentially putting oneself in the mindset of a personal trainer to achieve this. Within the OpenAI API, messages often adopt specific roles to guide the model’s responses. Commonly used roles include “system,” “user,” and “assistant.” The “system” provides high-level instructions, the “user” presents queries or prompts, and the “assistant” is the model’s response. By differentiating these roles, we can set the context and direct the conversation efficiently. As the “role” of a “system”, here is how we will assign the instructions.

def deindent(text: str) -> str:
return textwrap.dedent(cleandoc(text)) #A helper function for clean output

"role": "system",
"content": deindent(
As an expert fitness instructor, your job is to give feedback on the form of the person in the video.

If they are doing the exercise incorrectly:
- List the mistakes. (At most 5 mistakes.)
- For each mistake:
- Teach them how to do it correctly. Just one line.
- Don't mention the video or the frames.
- No preamble or conclusion. Just the feedback.
- Markdown format.

If they are doing the exercise correctly:
- Praise them and let them know they are doing a good job. Just one line.

Address the person in the video as if they were your client. Don't be too verbose. Get to the point.
Be strict in your analysis but have a positive attitude and be encouraging!

Currently, we need to consider how our application will process the video input provided by the user. Initially, we’ll need to extract the individual frames from the video. Below is a utility function, using MoviePy and OpenCV, designed for this purpose.

def get_frames(vid_path: str, fps: int = FPS) -> list:
vid = mp.VideoFileClip(vid_path)
vid_fps = vid.fps
return [
base64.b64encode(cv2.imencode(".jpg", frame)[1]).decode("utf-8")
for i, frame in enumerate(vid.iter_frames())
if i % int(vid_fps / fps) == 0

Earlier, we mentioned the concept of “roles,” providing comprehensive guidance on the role of a system. However, our application also needs to analyze the video content. This is where we’ll craft a prompt from the perspective of the “user,” essentially instructing it to offer feedback on the extracted frames.

def create_frames_message(frames: list, frame_size: int = FRAME_SIZE) -> dict:
return {
"role": "user",
"content": [
"These are frames from the same video. 1 frame per second. Please review them and provide feedback.",
*map(lambda x: {"image": x, "resize": frame_size}, frames[0:10]),

In the end, we’ll merge all these functions to produce descriptive feedback. This will involve utilizing the chat.completions.create() method from OpenAI.

def create_messages(
video_path: str, fps: int = FPS, frame_size: int = FRAME_SIZE
) -> list:
frames = get_frames(video_path, fps=fps)
frames_message = create_frames_message(frames, frame_size=frame_size)
return [SYSTEM_MESSAGE, frames_message]
#return frames_message

def get_feedback(
messages: list,
model: str = MODEL,
temperature: float = TEMPERATURE,
max_tokens: int = MAX_TOKENS,
seed: int = SEED,
result = openai.chat.completions.create(
feedback = result.choices[0].message.content
return feedback

Phase 2: FastAPI backend

NOTE: Please refer to this brilliant article. I followed this recipe to integrate FastAPI and Streamlit.

Below, we’ll construct a fundamental request body. Utilizing Pydantic’s BaseModel for type validation, we’ll establish a class named User_input. This class will serve to relay the arguments to the create_messages function. Subsequently, we’ll transmit the resultant list to the get_feedback functions, which will be imported directly from the feedback.py file established earlier.

from feedback import create_messages, get_feedback
from fastapi import FastAPI
from pydantic import BaseModel
from fastapi.responses import PlainTextResponse

app = FastAPI()

class User_input(BaseModel):
video : str

@app.post("/messagesFeedback", response_class=PlainTextResponse) #PlainTextResponse will format the generated feedback correctly
def operate(u_input:User_input):
messages = create_messages(u_input.video)
feedback = get_feedback(messages)
return feedback

Phase 3: Putting it all together in Streamlit

Now, we will create a very basic Streamlit web application with the main purpose of showcasing our “A.I Exercise Assistant”. We will create a similar app to this where we also do video processing. Feel free to check it out yourself for a detailed analysis of the Streamlit components we will be using. However, the main difference will be that we will use the API (created through FastAPI) to send the response back to Streamlit after the user inputs the video here.

For backend integration within the Streamlit script, Python’s request library will be employed. In the Streamlit app, we are storing the inputs into a dictionary variable inputs = {“video”: str(tfile.name)} with tfile.name being the string path of the uploaded video. JSON is a syntax for storing and exchanging data. If you have a Python object, you can convert it into a JSON by using the json.dumps() method. We will convert our python object inputs into a JSON format while sending requests to our API.

feedback = requests.post(url = "",
data = json.dumps(inputs))

The feedback variable is the Response object that our API sent back to Streamlit. To display the text, we will st.write(feedback.text) and VOILA the generated feedback will be displayed. Here is the full code of our web app, simple:

import tempfile
import requests
import json
import streamlit as st
from fast_api import *

from feedback import create_messages, get_feedback

st.title("AI Fitness Coach 🤸🤖")

uploaded_file = st.file_uploader("Upload a video 🎥", type=["mp4", "mov", "avi"])
if uploaded_file is not None:
tfile = tempfile.NamedTemporaryFile(delete=False)

inputs = {"video": str(tfile.name)}
# Notice how I call get_feedback() as soon as the video is uploaded and I don't wait for the user to click a button.
# This gives a better user experience as the user doesn't have to wait as long for the feedback.
with st.spinner("👀"):
feedback = requests.post(url = "",
data = json.dumps(inputs))

Launching Our Application

To operate our application, we’ll need two terminals. In the initial terminal, input the following command to launch the API:

uvicorn fast_api:app --reload

To configure the Streamlit frontend, input the following command:

streamlit run app.py
Uploaded Video Shown