OpenAI API implementation for 3rd party LLM

Published in

STREAM-ZERO

3 min readJul 11, 2024

OpenAI’s API is fast becoming the de-facto standard for interacting with LLMs. From a ‘pragmatic engineering’ perspective it means not having to implement client libraries for your LLM making it easier to get developers using it.

Very important to note is the simplicity of the interface.

My contribution to the following is just the prompt. Rather than reading up on OpenAI documentation I just asked ChatGPT to write me a dummy server so I could have a quick look at the implementation. Will update once I test code and add some more metrics and logging. The purpose is to do some comparative metrics on response times.

For a more thoughtful implementation skip the rest and take a look at this article by Saar Berkovich : https://towardsdatascience.com/how-to-build-an-openai-compatible-api-87c8edea2f06

The Prompt

Write me a python server that implements the OpenAI endpoints as stubs. This is in order to simulate an OpenAI API compatible API interface for a 3rd party LLM and for performance testing.

The Response

Below is a simple Python server using Flask to implement stubs for the OpenAI API endpoints. This will allow you to simulate an OpenAI-compatible API for a third-party language model.

First, you need to install Flask if you haven’t already:

pip install Flask

Then, you can create a Python script, fake_openai_api.py, with the following content:

from flask import Flask, request, jsonify

app = Flask(__name__)

# Simulate the OpenAI API for completion
@app.route('/v1/completions', methods=['POST'])
def completions():
    data = request.json
    prompt = data.get('prompt', '')
    # Simulate a response
    response = {
        'id': 'cmpl-1',
        'object': 'text_completion',
        'created': 1234567890,
        'model': 'text-davinci-002',
        'choices': [{
            'text': 'This is a simulated completion for the prompt: ' + prompt,
            'index': 0,
            'logprobs': None,
            'finish_reason': 'length'
        }],
        'usage': {
            'prompt_tokens': len(prompt.split()),
            'completion_tokens': 10,
            'total_tokens': len(prompt.split()) + 10
        }
    }
    return jsonify(response)

# Simulate the OpenAI API for chat completions
@app.route('/v1/chat/completions', methods=['POST'])
def chat_completions():
    data = request.json
    messages = data.get('messages', [])
    # Simulate a response
    response_text = "This is a simulated response based on the chat messages: " + " ".join([message['content'] for message in messages])
    response = {
        'id': 'chatcmpl-1',
        'object': 'chat.completion',
        'created': 1234567890,
        'model': 'gpt-3.5-turbo',
        'choices': [{
            'index': 0,
            'message': {
                'role': 'assistant',
                'content': response_text
            },
            'finish_reason': 'stop'
        }],
        'usage': {
            'prompt_tokens': sum(len(message['content'].split()) for message in messages),
            'completion_tokens': len(response_text.split()),
            'total_tokens': sum(len(message['content'].split()) for message in messages) + len(response_text.split())
        }
    }
    return jsonify(response)

# Simulate the OpenAI API for embeddings
@app.route('/v1/embeddings', methods=['POST'])
def embeddings():
    data = request.json
    input_text = data.get('input', '')
    # Simulate an embedding response
    response = {
        'object': 'list',
        'data': [{
            'object': 'embedding',
            'index': 0,
            'embedding': [0.0] * 768  # Dummy embedding
        }],
        'model': 'text-embedding-ada-002',
        'usage': {
            'prompt_tokens': len(input_text.split()),
            'total_tokens': len(input_text.split())
        }
    }
    return jsonify(response)

# Simulate the OpenAI API for edits
@app.route('/v1/edits', methods=['POST'])
def edits():
    data = request.json
    input_text = data.get('input', '')
    # Simulate an edit response
    response = {
        'object': 'edit',
        'created': 1234567890,
        'choices': [{
            'text': 'This is a simulated edit response for the input: ' + input_text,
            'index': 0
        }],
        'usage': {
            'prompt_tokens': len(input_text.split()),
            'completion_tokens': 10,
            'total_tokens': len(input_text.split()) + 10
        }
    }
    return jsonify(response)

# Run the Flask app
if __name__ == '__main__':
    app.run(debug=True, port=5000)

This script creates a simple Flask server with endpoints that simulate the behavior of the OpenAI API for completions, chat completions, embeddings, and edits. The responses are dummy responses that mimic the structure of actual OpenAI API responses. You can run this server locally and use it to test integration with your third-party LLM.

OpenAI API implementation for 3rd party LLM

The Prompt

The Response

Published in STREAM-ZERO

Written by balaji bal

No responses yet