Speed Dating LLMs: Finding the Perfect Match for Real-Time Voice AI

3 min readApr 12, 2024

In an era where artificial intelligence (AI) is seamlessly integrating into daily communications, the real-time performance of conversational AI applications is paramount. Voyp, a conversational AI app developed for voice interactions, stands at the forefront of this integration, requiring lightning-fast response times to maintain the fluidity of human conversations. As the founder of Voyp, I constantly explore various AI models to ensure our technology delivers optimal performance. This article dives into our recent tests comparing the inference speeds of notable language models including ChatGPT 3.5 Turbo, ChatGPT 4.0 Turbo, Claude 3, and Gemini.

Why Inference Speed Matters

Voyp operates by making phone calls on behalf of users, thus, speed is crucial. The AI must respond in real-time to keep the conversation natural and engaging. Given that Voyp’s responses generally stay under 200 characters, the efficiency of model inference directly impacts user satisfaction. Not to mention that there’s still the process of generating the audio from the text output and streaming the audio data other the wire

The Contenders

We tested several models, focusing on those commonly adopted in the industry due to their robustness and wide usage:

ChatGPT 3.5 Turbo: Known for its speed and reasonable output quality, it is a favorite for applications needing quick responses.
ChatGPT 4.0 Turbo: Offers superior output quality and enhanced reasoning capabilities, though at a cost to speed.
Claude 3 Haiku: Represents the Claude 3 series, designed for high-accuracy outputs, but not as fast as other models.
Gemini 1.0 Pro: A newer entry, promising speed and efficiency, potentially suitable for real-time applications.

Performance Evaluation

Our evaluations were based on real-case scenarios from Voyp’s operational logs. Here’s a snapshot of the average response times:

ChatGPT 3.5 Turbo: Averaged around 1116ms per response.
ChatGPT 4.0 Turbo: Slower, with responses averaging around 2887ms.
Claude 3 Haiku: Averaged around 2167ms per response.
Gemini 1.0 Pro: Comparable to ChatGPT 3.5, with an average of about 1208ms.

Findings and Observations

The results indicated two front runners in terms of speed: Gemini Pro and ChatGPT 3.5 Turbo. While Claude 3 Haiku didn’t match the speed of the leading models, it was faster than ChatGPT 4.0 Turbo, which provided the best output quality among the tested models. This highlights a trade-off between speed and sophistication in model outputs.

Decision Factors Beyond Speed

While speed is crucial for Voyp, other factors such as functionality, tooling integration, and cost efficiency play significant roles in selecting the appropriate model. ChatGPT models offer advanced function calling capabilities which are vital for enhancing the conversational experience beyond mere speed.

Future Directions

In conclusion, both ChatGPT 3.5 Turbo and Gemini 1.0 Pro emerged as viable choices for their rapid response times. However, considering the comparable pricing and superior tooling functionalities of ChatGPT 3.5 Turbo, Voyp will continue utilizing this model for the time being. This decision underscores the importance of a balanced approach to selecting AI models, where speed, functionality, and economic factors are all weighed carefully.

As AI technology evolves, so too will Voyp’s strategies in model selection. Ongoing research and development will remain a priority to ensure that Voyp continues to offer a real-time conversational experience that feels as natural and engaging as a human conversation.