Large Language Models and Their Feasibility for Natural Language Processing

Investince
4 min readJul 25, 2024

--

Introduction

This report explores the feasibility of using open-source LLMs for natural language processing (NLP) tasks within a pricing constraint. It evaluates self-hosting options, the potential for fine-tuning models, and compares the costs of various major LLMs, including GPT, Gemini, Mistral, and LLaMA.

Open-source LLMs with Self-Hosting

Mistral

Mistral offers efficient models optimized for performance, which can be self-hosted, thus providing control over costs.

Self-Hosting Costs:

  • Initial Setup: Requires investment in hardware or cloud infrastructure.
  • Cloud Hosting: $0.50 to $3 per hour on AWS, depending on the instance type.

Llama

Meta’s Llama models are also open-source and can be fine-tuned for specialized NLP tasks.

  • Hosting Costs: Similar to Mistral, primarily dependent on the cloud infrastructure.

AWS, Google Cloud, or Azure can be used for hosting, with prices varying based on usage and instance types.

Pricing Comparison of Major LLMs

OpenAI (GPT-3.5 & GPT-4)

  • GPT-3.5 Turbo: $0.002 per 1K tokens (input), $0.004 per 1K tokens (output).
  • GPT-4: $0.03 per 1K tokens (input), $0.06 per 1K tokens (output).
  • GPT-4 Turbo: $0.015 per 1K tokens (input), $0.03 per 1K tokens (output).

Vantage. (2024). vantage.sh.

Google Gemini

  • Gemini 1.0 Pro: $0.0005 per 1K input characters, $0.0015 per 1K output characters.
  • Free Trial: 2-month free trial available for Gemini Advanced.

Plus. (2024). plusdocs.com.

Mistral

  • Self-Hosting Costs: Depends on the cloud provider. AWS prices range from $0.50 to $3 per hour for compute resources.

Llama

  • Self-Hosting Costs: Similar to Mistral, costs are primarily infrastructure-related.

Models that can be self-hosted, with just cost for infrastructure:

Mistral 7B

Input: $0.25 /1M tokens Output: $0.25 /1M tokens

Fine-tuning Pricing for Mistral

Mistral AI provides a fine-tuning API through La Plateforme, making it easy to fine-tune open-source and commercial models. There are three costs related to fine-tuning:

  • One-off training: Price per token on the data you want to fine-tune our standard models on; minimum fee per fine-tuning job of $4
  • Inference: Price per input/output token when using the fine-tuned model(s)
  • Storage: Price per month per model for storage (irrespective of model usage; models can be deleted at any time)

https://mistral.ai/technology/

Llama 3.1 8B

Pricing per Million tokens of Llama models:

Pricing per Million tokens of Llama Models (https://llama.meta.com/)

Comparison

LLM comparison

👉 GPT-Neo, GPT-J, GPT-Neox might be some other viable GPT-like options that can be both fine-tuned on custom datasets and self-hosted.

Recommendations

  1. Self-Hosting: For greater control and potentially lower long-term costs, one might consider Mistral or Llama. Remember, you should ensure you have the necessary infrastructure and expertise to manage these models.
  2. Subscription-Based Models: For ease of use and robust capabilities, Google Gemini or Gpt-4o-mini .

👉 Mistral might be the better option among the models allowing self-hosting since it is a bit cheaper than Llama.

👉 If self-hosting is not an issue, Gemini might be better than both GPT 3.5 and GPT 4o-mini since it is a lot cheaper.

👉 If one had to choose between the GPT models, GPT-4o-mini would be the better option since not only it is a lot cheaper but OpenAI is offering free 2M training tokens/day to organizations with overages charged at $3.00/1M training tokens.

References

Reddit User (2014) Anyone managed to setup a Bard as self-hosted? [Online]. Available at: https://www.reddit.com/r/Bard/comments/1bnar3p/anyone_managed_to_setup_a_bard_as_self_hosted/ [Accessed 24 July 2024].

Google Cloud Community (2024) Regarding the cost calculation of Fine-tuning for Gemini 1.0 Pro [Online]. Available at: https://www.googlecloudcommunity.com/gc/AI-ML/Regarding-the-cost-calculation-of-Fine-tuning-for-Gemini-1-0-Pro/m-p/769485 [Accessed 24 July 2024].

Meta (2024) LLaMA [Online]. Available at: https://llama.meta.com/ [Accessed 24 July 2024].

Google Cloud (2024) Vertex AI Generative AI Pricing [Online]. Available at: https://cloud.google.com/vertex-ai/generative-ai/pricing [Accessed 24 July 2024].

Mistral AI (2024) Technology [Online]. Available at: https://mistral.ai/technology/ [Accessed 24 July 2024].

Infosys. (2023). Emerging Technology Solutions: Large Learning Models: The Rising Demand of Specialized LLM’s. Retrieved from Infosys Blog.

Shen, J., Tenenholtz, N., Hall, J. B., Alvarez-Melis, D., & Fusi, N. (2024). Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains. Retrieved from arXiv.

Reddit User (2024) Self-host LLaMA 3 on your own cloud in Python [Online]. Available at: https://www.reddit.com/r/LocalLLaMA/comments/1dtshho/selfhost_llama_3_on_your_own_cloud_in_python/ [Accessed 24 July 2024].

--

--