Streaming ChatGPT Responses Using Django DRF and React/Typescript

5 min readMay 7, 2023

ChatGPT API answers take a considerable time to appear, especially if they are longer. A way to shorten the time is to use the streaming option to build up the answer, similar to the way the ChatGPT UI builds up each word. To do this we simply set the stream parameter to true:

openai.ChatCompletion.create(
 model=GPT_MODEL_ENGINE,
 messages=messages,
 stream=True,
)

I used Simon Wilson’s blog Using the ChatGPT streaming API from Python to get started. This gets the chunks out of the OpenAI ChatCompletion API. Easy enough. Question is how do I make it work in my Django DRF/ React Hooks app? Turns out is not straightforward: The devil is in the details.

I asked ChatGPT for advice, but it led me down a rabbit hole of naked streaming without SSE, which led to a cumbersome solution. Only once I started writing this article I reconsidered all the options, and ChatGPT and I landed on Server-Sent-Events (SSE) to transmit the data to the frontend. This yields a super clean solution. I want to document this here so you don’t have to spend hours trying to figure out how to implement this.

Server-Sent Events (SSE) is a standard for sending real-time updates from a server to a client over a single HTTP connection. It is part of the HTML5 standard and is specified in the W3C Recommendation (https://www.w3.org/TR/eventsource/). It is particularly suitable for sending unidirectional events (If we wanted a more bidirectional conversation, Websockets would be the way to go).

Django DRF Backend

First pass the streamed data through Django API endpoint. Here are the steps:

Add the url pattern to urls.py

urlpatterns += [
  path('chatgpt/', views.chatgpt),
]

Create a view that wraps Simon’s code in a generator method stream() to return the data in a StreamingHTTPResponse:

import json
import logging

import openai as openai
from django.http import StreamingHttpResponse
from rest_framework.decorators import api_view, permission_classes, renderer_classes
from rest_framework.permissions import AllowAny

from salesgridapp.server_sent_event_renderer import ServerSentEventRenderer

logger = logging.getLogger(__name__)

openai.api_key = "<YOUR_KEY>>"
openai.organization = "<YOUR_ORGANISATION>"
GPT_MODEL_ENGINE = "gpt-4"
@api_view(['GET'])
@permission_classes([AllowAny])
# Add the custom renderer to the view
@renderer_classes([ServerSentEventRenderer])
def chatgpt(request):
    def event_stream():
        for chunk in openai.ChatCompletion.create(
                model=GPT_MODEL_ENGINE,
                messages=[{
                    "role": "user",
                    "content": "Generate a list of 20 great names for sentient cheesecakes that teach SQL",
                }],
                stream=True,
        ):
            chatcompletion_delta = chunk["choices"][0].get("delta", {})
            data = json.dumps(dict(chatcompletion_delta))
            yield f'data: {data}\n\n'

    response = StreamingHttpResponse(event_stream(), content_type="text/event-stream")
    response['X-Accel-Buffering'] = 'no'  # Disable buffering in nginx
    response['Cache-Control'] = 'no-cache'  # Ensure clients don't cache the data
    return response

Note the X-Accel-Buffering and Cache-Controlheader we are adding to the response. It is possible that something (like nginx in my case) between Django and your consumer may buffer the stream, in which case it turns a streaming response into a non-streaming response. I spent hours trying to debug this issue, until I found these headers do the trick in nginx. If you are accessing Django directly, you can skip the headers.

Each chunk of the output of the OpenAI API is a full JSON object, that’s why we can do json.dumps(dict(chatcompletion_delta)).To verify that your Django endpoint is indeed streaming:

curl — no-buffer http://localhost:8000/chatgpt/

This should show the output slowly being built up over time, something like this:

data: {"role": "assistant"}

data: {"content": "Hey"}

data: {"content": " there"}

data: {"content": "!"}

data: {"content": " I"}

data: {"content": " just"}

data: {"content": " wanted"}

data: {"content": " to"}

data: {"content": " share"}

data: {"content": " this"}

data: {"content": " cool"}

data: {"content": " JSON"}

data: {"content": " object"}

SSE expects the data to be wrapped in this data: {}\n\n notation, which is done in the chatgptmethod, as well as set content_type=”text/event-stream, which is done in the ServerSentEventRenderer.

from rest_framework.renderers import BaseRenderer

class ServerSentEventRenderer(BaseRenderer):
    media_type = 'text/event-stream'
    format = 'txt'

    def render(self, data, accepted_media_type=None, renderer_context=None):
        return data

We are only sending the delta portion of the OpenAI output to simplify things. The actual output has more data:

{
    "id": "chatcmpl-7D4RyJn5NU2YBK5wywT6jQRvmBlQA",
    "object": "chat.completion.chunk",
    "created": 1683349726,
    "model": "gpt-4-0314",
    "choices": [
        {
            "delta": {
                "role": "assistant"
            },
            "index": 0,
            "finish_reason": null
        }
    ]
}{
    "id": "chatcmpl-7D4RyJn5NU2YBK5wywT6jQRvmBlQA",
    "object": "chat.completion.chunk",
    "created": 1683349726,
    "model": "gpt-4-0314",
    "choices": [
        {
            "delta": {
                "content": "1. Query"
            },
            "index": 0,
            "finish_reason": null
        }
    ]
}
<MORE>
{
    "id": "chatcmpl-7D4VGw6mvFNBCqmaHtlfkuRSY3bLS",
    "object": "chat.completion.chunk",
    "created": 1683349930,
    "model": "gpt-4-0314",
    "choices": [
        {
            "delta": {},
            "index": 0,
            "finish_reason": "stop"
        }
    ]
}

Stiching the delta protions together will yeild an object with two keys:

{ 
  "role": "assistant",
  "content": "1. Query Cake\n2.CheddarCoder"
}

React Hooks Frontend

Since SSE is part of the HTML5 standard, the implementation on the client side is straightforward:

Set up an EventSource
implement onmessage method to add the delta that we are sending to the existing response

This builds up the response over time on screen.

import React, { FC, useEffect, useState } from "react";

interface IResponseObject {
    role: string;
    content: string;
}

const ChatGPT: FC = () => {
    const [response, setResponse] = useState<IResponseObject>({ role: "", content: "" });

    useEffect(() => {
        const eventSource = new EventSource("https://dev.salesgrid.xyz/secure/rest/chatgpt/");

        eventSource.onmessage = (event) => {
            const responseObject = JSON.parse(event.data);

            setResponse((prev: IResponseObject) => {
                const responseObjectRole = responseObject["role"] || "";
                const responseObjectContent = responseObject["content"] || "";
                const combinedObject = {
                    role: prev.role + responseObjectRole,
                    content: prev.content + responseObjectContent,
                };
                return combinedObject;
            });
        };

        eventSource.onerror = (error) => {
            console.log("Error with SSE connection:", error);
        };

        return () => {
            eventSource.close();
        };
    }, []);

    return (
        <div>
            {response ? (
                <div><pre>{response.content}</pre></div>
            ) : (
                <p>Loading chatGPT response...</p>
            )}
        </div>
    );
};

export default ChatGPT;

Add this ChatGPT component to your React app. The result is this:

Conclusion

The challenge for me was to get to know the different streaming options (naked streaming, SSE, and websockets), and resolving the fact that nginx was buffering the stream. Resolving those took hours of hard work. The end result is a super simple solution that I think was worth the effort.

Streaming ChatGPT Responses Using Django DRF and React/Typescript

Django DRF Backend

React Hooks Frontend

Conclusion

Written by Marc Fasel