LLM: How much easier it’s become over one year

Shichengqi's Digest
5 min readFeb 12, 2024

Zhou Hongwei of 360 mentioned in his outlook for AI in 2024 that many considered ChatGPT a Manhattan Project a year ago, but have now everyone caught up. He attributed this to the abundance of open source resources. But just how simple has developing a large model become? In addition to the technical threshold, has the cost threshold been lowered?

Out of curiosity, I looked through the resources on Github and Huggingface to try to judge: 1) Technically, can individual enthusiasts independently train a large language model that is at least on par to GPT3.5 through open source resources; 2) If it is technically do-able, is it cost-effective? The cost here is divided into two parts, one is the training cost, and the other is the operating cost.

The result is: 1) It is absolutely possible; 2) The cost may be less than 100k USD.

How to implement it solo

Why are big models so big? It’s because it got that many parameters. Take the current industry leader OpenAI’s GPT4 as an example, which has about 175 billion parameters; the early GPT3.5 had at least 6 billion parameters.

However, being big alone is of no use, because you need to have enough training to make it useful. Compared with GPT3, GPT3.5 suddenly showed cross-generational leaps, and ChatGPT that could pass the Turing test was suddenly realized. A key engineering detail was that the OpenAI team went to Africa at a low cost to recruit a large number of real people who were native English speakers. This is done to achieve high-quality natural language feedback. What is high quality? That is to say, the information in the text content is no longer random bits of info from the web, but real people who have been trained to simulate, provide, and correct for data that is most suitable for large models to read.

This has changed dramatically today, 2 years later; instead of recruiting a bunch of real people, there is a ready-made FLANV2 database on HuggingFace. All content in the library is designated content for LLMs; millions of carefully selected high-quality data reduce the pre-training cost of any new model to 0 (data collection cost that OpenAI faced).

Of course, you will say that there are only millions of pieces of data, how can you train a complete large model? Now comes the tecnique popularized from [1]. Even the dedicated human feedback have a limitation — that is, the logical links are not clear enough. The efficiency of fuzzy logics between input and output is very low. Therefore, in that paper, the Microsoft team uses GPT3.5 and GPT4 to generate more detailed logical steps (logics purposefully design to be fed to new model training). [1] found that explicit logical links can significantly improve the efficiency of model training.

Therefore, by superimposing a specialized dataset like FLANv2, and using the logical steps generated by mature models to accelerate learning, solo developers like [2] can already train a set of large models by themselves.

Training Cost

I really don’t know much about the various server costs, chip costs, training time, etc. required for training LLMs, so I asked GPT4 how much it costs to train a 70B model now. The appendix has a specific estimation process, and the conclusion is 50,000–200,000 US dollars. What I quote here is training from scratch rather than fine-tuning.

Operating Cost

For running cost estimate, I used the estimate in [3], which is basically the cost of a desktop computer of about 10k USD… (The 5x6B of the Mixtral architecture is used in the article, and the effect is the same as that of GPT3.5).

Total Cost Comparison

Therefore, if you spend less than 200k on a professional AI-related cloud server, the training can be completed, and then downloaded to a personal computer to achieve minimalist total cost.

What’s even more interesting is that I made a similar calculation with ChatGPT in early 2023. The conclusion at that time was that training a large model would cost approximately US$100 million to US$1 billion. This was also in line with the predictions of some other bloggers at the time. Because GPT and I did not include labor costs in the two estimates, we took the upper limit of the two estimates: last year’s $1 billion vs. this year’s $200k. This is about a 5,000-fold cost reduction!

References

[1] Mukherjee, Subhabrata, et al. “Orca: Progressive learning from complex explanation traces of gpt-4.” arXiv preprint arXiv:2306.02707(2023).

[2] Hartford, Eric. Dolphin

[3] Hartford, Eric. dolphin-mixtral-8x7b

Appendix: The original text of GPT4.0’s training cost estimate:

Given the complexities and variables involved in running a large language model (LLM) with 70 billion parameters, let’s break down the rough cost estimation into two primary scenarios: inference and training. The costs can vary widely based on the specifics of the use case and the pricing models of cloud service providers.

### Inference Costs:

- **Low Usage**: For occasional inference use with efficient resource management, the cost might start from a few dollars per hour. Assuming a minimal usage pattern, the monthly cost could range from **$200 to $1,000**, depending on the efficiency of the model and the management of cloud resources.

- **High Usage**: For more intensive inference operations requiring continuous access to GPUs, the cost could escalate significantly. Assuming 24/7 operation on multiple instances, the monthly cost might range from **$3,000 to $15,000** or more, depending on the number of requests and the specific GPUs used.

### Training Costs:

Training a 70 billion parameter model from scratch is significantly more resource-intensive than inference and typically involves using multiple GPUs for extended periods.

- **Training (Short-term/Fine-tuning)**: For fine-tuning on specific tasks, which may take a few days to weeks, costs could range from **$10,000 to $50,000**, depending on the extent of fine-tuning and the compute resources employed.

- **Training (From Scratch)**: Training a model of this size from scratch, which can take weeks to months, might cost anywhere from **$50,000 to $200,000** or more. This wide range accounts for the continuous use of high-end GPUs, storage, and other resources over an extended period.

### Summary of Cost Range Estimation in USD:

- **Inference**: $200 to $15,000+ per month

- **Training/Fine-tuning**: $10,000 to $200,000+ (total cost, depending on duration and resources)

These estimates are very rough and intended to provide a broad understanding of the potential costs involved. Actual expenses can vary based on specific use cases, cloud provider pricing changes, and the optimization of resources.

--

--

Shichengqi's Digest

"That fish is happy in the pond." "How do you kow the fish is happy; you are not a fish." "How do you know that I don't know if the fish is happy, you ain't me"