OpenAI API implementation for 3rd party LLM
OpenAI’s API is fast becoming the de-facto standard for interacting with LLMs. From a ‘pragmatic engineering’ perspective it means not having to implement client libraries for your LLM making it easier to get developers using it.
Very important to note is the simplicity of the interface.
My contribution to the following is just the prompt. Rather than reading up on OpenAI documentation I just asked ChatGPT to write me a dummy server so I could have a quick look at the implementation. Will update once I test code and add some more metrics and logging. The purpose is to do some comparative metrics on response times.
For a more thoughtful implementation skip the rest and take a look at this article by Saar Berkovich : https://towardsdatascience.com/how-to-build-an-openai-compatible-api-87c8edea2f06
The Prompt
Write me a python server that implements the OpenAI endpoints as stubs. This is in order to simulate an OpenAI API compatible API interface for a 3rd party LLM and for performance testing.
The Response
Below is a simple Python server using Flask to implement stubs for the OpenAI API endpoints. This will allow you to simulate an OpenAI-compatible API for a third-party language model.
First, you need to install Flask if you haven’t already:
pip install Flask
Then, you can create a Python script, fake_openai_api.py
, with the following content:
from flask import Flask, request, jsonify
app = Flask(__name__)
# Simulate the OpenAI API for completion
@app.route('/v1/completions', methods=['POST'])
def completions():
data = request.json
prompt = data.get('prompt', '')
# Simulate a response
response = {
'id': 'cmpl-1',
'object': 'text_completion',
'created': 1234567890,
'model': 'text-davinci-002',
'choices': [{
'text': 'This is a simulated completion for the prompt: ' + prompt,
'index': 0,
'logprobs': None,
'finish_reason': 'length'
}],
'usage': {
'prompt_tokens': len(prompt.split()),
'completion_tokens': 10,
'total_tokens': len(prompt.split()) + 10
}
}
return jsonify(response)
# Simulate the OpenAI API for chat completions
@app.route('/v1/chat/completions', methods=['POST'])
def chat_completions():
data = request.json
messages = data.get('messages', [])
# Simulate a response
response_text = "This is a simulated response based on the chat messages: " + " ".join([message['content'] for message in messages])
response = {
'id': 'chatcmpl-1',
'object': 'chat.completion',
'created': 1234567890,
'model': 'gpt-3.5-turbo',
'choices': [{
'index': 0,
'message': {
'role': 'assistant',
'content': response_text
},
'finish_reason': 'stop'
}],
'usage': {
'prompt_tokens': sum(len(message['content'].split()) for message in messages),
'completion_tokens': len(response_text.split()),
'total_tokens': sum(len(message['content'].split()) for message in messages) + len(response_text.split())
}
}
return jsonify(response)
# Simulate the OpenAI API for embeddings
@app.route('/v1/embeddings', methods=['POST'])
def embeddings():
data = request.json
input_text = data.get('input', '')
# Simulate an embedding response
response = {
'object': 'list',
'data': [{
'object': 'embedding',
'index': 0,
'embedding': [0.0] * 768 # Dummy embedding
}],
'model': 'text-embedding-ada-002',
'usage': {
'prompt_tokens': len(input_text.split()),
'total_tokens': len(input_text.split())
}
}
return jsonify(response)
# Simulate the OpenAI API for edits
@app.route('/v1/edits', methods=['POST'])
def edits():
data = request.json
input_text = data.get('input', '')
# Simulate an edit response
response = {
'object': 'edit',
'created': 1234567890,
'choices': [{
'text': 'This is a simulated edit response for the input: ' + input_text,
'index': 0
}],
'usage': {
'prompt_tokens': len(input_text.split()),
'completion_tokens': 10,
'total_tokens': len(input_text.split()) + 10
}
}
return jsonify(response)
# Run the Flask app
if __name__ == '__main__':
app.run(debug=True, port=5000)
This script creates a simple Flask server with endpoints that simulate the behavior of the OpenAI API for completions, chat completions, embeddings, and edits. The responses are dummy responses that mimic the structure of actual OpenAI API responses. You can run this server locally and use it to test integration with your third-party LLM.