GGML to GGUF: A Leap in Language Model File Formats

Sandeep Sharma
3 min readNov 16, 2023

--

Introduction:

Due to the high demand for Large Language Models (LLMs), the implementation of quantization has become crucial for their efficient operation. This technique seems to compromise by reducing the precision of model weights, leading to memory savings and accelerated inference but retaining most of the model’s performance. Recent advancements in quantization, particularly 8-bit and 4-bit techniques, have enabled the deployment of LLMs on consumer-grade hardware. Together with the introduction of Llama models and parameter-efficient fine-tuning methods like LoRA and QLoRA, this has fostered a diverse ecosystem of local LLMs that now rival OpenAI’s GPT-3.5 and GPT-4.

Photo by Eric Krull on Unsplash

Among the four primary quantization techniques — NF4, GPTQ, GGML, and GGUF — this article will help you to understand and deep dive into the GGML and GGUF.

GGML and GGUF represent crucial steps in simplifying language models. GGML was an early attempt to make models accessible on regular computers but had limitations. GGML, a machine learning-focused C library developed by Georgi Gerganov (as indicated by the initials “GG”), offers fundamental components for machine learning. Beyond providing essentials like tensors, it introduces a distinctive binary format for the distribution of LLMs. GGUF, introduced on August 21, 2023, emerges as a successor, addressing GGML’s shortcomings. It is designed to be clear, extensible, and versatile, transitioning from a prototype to a mature, user-friendly solution.

About GGUF:

GGUF, introduced by the llama.cpp team, is a replacement for GGML, which is no longer supported. It offers several advantages over GGML, including improved tokenization, support for special tokens, and extensibility. GGUF stores more information about the model as metadata, making it a future-proof format. It enhances performance, especially with models using new special tokens and custom prompt templates.

Difference and Preferences:

GGML (Pros and Cons):

Good:

  • Tried to make a file format for GPT models early on.
  • Made it easy to share models in one file.
  • Models could run on regular computers, which was good for more people.

Not So Good:

  • Lack of flexibility hindered additional model information.
  • Compatibility issues arose with new features.
  • Manual adjustments are required for settings.

GGUF (Advancements and Considerations):

Improved:

  • It’s like the next version after GGML, fixing its problems.
  • You can easily add new things to it without causing problems for old models.
  • It can work with different models, not just llama.cpp.

Things to Think About:

  • It might take some time for old models to change to the new format.
  • Users and developers need to adapt to the new format.

What’s Better? GGUF is better because it fixes the problems GGML had. It’s more flexible, you can add new things without issues, and it works with different models.

Conversion from GGML to GGUF:

Changing from GGML to GGUF is made easy with guidance provided by the llama.cpp GitHub repo. This tool, found at convert-llama-ggml-to-gguf.py, helps move models from GGML to GGUF smoothly. Users can use this to change their models, making the most of the better features and designs of GGUF. This practical approach ensures a smooth transition for users, aligning with the commitment of the GGML team to user convenience.

GGUF Support in Clients and Libraries:

As of August 25th, various clients and libraries already support GGUF:

  • llama.cpp
  • text-generation-webui
  • KoboldCpp
  • LoLLMS Web UI
  • ctransformers
  • llama-cpp-python
  • candle

Additionally, LM Studio is also updated on August 25th to include GGUF support.

Conclusion:

In conclusion, GGUF’s introduction represents a significant leap in language model file formats. Its user-friendly features, enhanced tokenization, and widespread support make it a preferable choice over GGML. The provided conversion option and increasing support from various clients and libraries highlight the community’s eagerness to adopt GGUF. This evolution is set to contribute to the continued progress of language models, making them more efficient and accessible to a broader user base.

--

--

Sandeep Sharma

Snr. Data Scientist | LLM | RAG | MS in Data Science & Analytics