How to Proxy and Modify OpenAI Stream Responses for Enhanced User Experience

Tech Tim (@TechTim42)
4 min readJan 16, 2024

--

Recently, we encountered a requirement to develop a function that proxies and modifies a stream from OpenAI. Essentially, our goal is to receive the GPT-4 response in a stream, but instead of directly returning the stream to the front-end, we aim to modify the value within the stream and then return the modified value to the front-end application.

Is this possible? The answer is Yes. This article discusses how to pipe a stream in NodeJS/ExpressJS to achieve better FCP (First Contentful Paint). Simultaneously, this article explains the process of modifying the stream content.

Why people want to use stream as Response?

The Reason is simple and straightforward in most of cases, to have better UX

It is similar to the concept we have in front end and UX, FCP (First Contentful Paint), streaming is a way to reduce the FCP time to let users to have a better UX.

Move from Long Spinning to Early Partial Rendering:

This Table and Gifs are From Mike Borozdin.

How to achieve it in NodeJS?

It is very simple and straightforward in Node.

If using Axios, it could be done in a few lines through a pipe

const streamResponse = await axios.get(sourceApiUrl, {responseType: 'stream'});  // this line was used to simulate OpenAI GPT4 Response
res.writeHead(200, {'Content-Type': 'text/plain'});
streamResponse.data.pipe(res);

What are the challenges here then?

If it the requirement is only to proxy the stream, using a pipe is all that is needed. The entire task is simply to build a proxy service for the OpenAI API.. Unluckily, the challenges here are the modification part.

Modification

In NodeJs, Transformer class is use to modify the streaming chunks, so we could update the codes to be like this

const response = await axios.get(sourceApiUrl, {responseType: 'stream'});  
res.setHeader('Content-Type', 'application/json');
const modifyStream = new Transform({
transform(chunk, encoding, callback: any) {
const modifiedChunk = chunk.toString().toUpperCase(); // modify the content to uppercase
callback(null, modifiedChunk);
}
});
response.data.pipe(modifyStream).pipe(res);

Before the pipe to express response, modifyStream transformer was added, to modify the content before return to consumers.

What if the stream response is not a chunk of text

We have to consider about this case as well, now many people are using the function and json response in GPT4 now, which means a simple transformer will not work as expected for modifying the stream content.

For example, to modify a chunk of unfinished JSON response in a stream,

  • We need to make sure the content is a valid JSON
  • At the same time, we need to make change on top of it, and return the modified content as early as possible to consumers.

However, JSON is a strongly structured data type, and it is not considered valid until it is closed with its curly bracket }.

This is a Node package to parse partial JSON.

import { parse } from 'best-effort-json-parser'
let data = parse(`[1, 2, {"a": "apple`)
console.log(data) // [1, 2, { a: 'apple' }]

On top of it, there is even simpler one, ‘http-streaming-request

const stream = makeStreamingJsonRequest({
url: "/some-api",
method: "POST",
});
for await (const data of stream) {
// if the API only returns [{"name": "Joe
// the line below will print `[{ name: "Joe" }]`
console.log(data);
}

In this case, we can update our codes to use it to make the modification for the streaming content.

const stream = makeStreamingJsonRequest<any>({  
url: sourceApiUrl,
method: "GET",
});
res.setHeader('Content-Type', 'application/json');
for await (const data of stream) {
modifiedData.modified = true; // add more modify
res.write(JSON.stringify(modifiedData));
}

Performance will not be as good as the original stream pipe, but it will meet all requirements for this case.

Please note that when using this library, you may need to add some optimisation logic on top of it.

Summary

It depends on your modification/transformation code, unless the modification logic is complex, it will not significantly impact the performance of the entire streaming process. The FCP will be optimised compared to waiting for the entire JSON response to be completed.

The GitHub repository of the sample codes is attached below.

Thanks for reading.

--

--

Tech Tim (@TechTim42)

❤️Learn, Share and Grow => ☘️Passionate about Improving Dev Experience, Software Engineering, Cloud Architect, AWS Community Builder