What is GGUF and GGML?

Phillip Gimmi
2 min readSep 8, 2023

--

GGUF and GGML are file formats used for storing models for inference, especially in the context of language models like GPT (Generative Pre-trained Transformer). Let’s explore the key differences, pros, and cons of each.

GGML (GPT-Generated Model Language): Developed by Georgi Gerganov, GGML is a tensor library designed for machine learning, facilitating large models and high performance on various hardware, including Apple Silicon.

Pros:

  • Early Innovation: GGML represented an early attempt to create a file format for GPT models.
  • Single File Sharing: It enabled sharing models in a single file, enhancing convenience.
  • CPU Compatibility: GGML models could run on CPUs, broadening accessibility.

Cons:

  • Limited Flexibility: GGML struggled with adding extra information about the model.
  • Compatibility Issues: Introduction of new features often led to compatibility problems with older models.
  • Manual Adjustments Required: Users frequently had to modify settings like rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps, which could be complex.

GGUF (GPT-Generated Unified Format), introduced as a successor to GGML (GPT-Generated Model Language), was released on the 21st of August, 2023. This format represents a significant step forward in the field of language model file formats, facilitating enhanced storage and processing of large language models like GPT.

Developed by contributors from the AI community, including Georgi Gerganov, the creator of GGML, GGUF’s creation aligns with the needs of large-scale AI models, though it appears to be an independent effort. Its use in contexts involving Facebook’s (Meta’s) LLaMA (Large Language Model Meta AI) models underscores its importance in the AI landscape. For more details on GGUF, you can refer to the GitHub issue here and explore the llama.cpp project by Georgi Gerganov here.

Pros:

  • Addresses GGML Limitations: GGUF is designed to overcome GGML’s shortcomings and enhance user experience.
  • Extensibility: It allows for the addition of new features while maintaining compatibility with older models.
  • Stability: GGUF focuses on eliminating breaking changes, easing the transition to newer versions.
  • Versatility: Supports various models, extending beyond the scope of llama models.

Cons:

  • Transition Time: Converting existing models to GGUF may require significant time.
  • Adaptation Required: Users and developers must become accustomed to this new format.

Summary:

GGUF represents an upgrade to GGML, offering greater flexibility, extensibility, and compatibility. It aims to streamline the user experience and support a wider range of models beyond llama.cpp. While GGML was a valuable initial effort, GGUF addresses its limitations, signifying progress in the development of file formats for language models. This transition is expected to benefit the AI community by enhancing model sharing and usage efficiency.

--

--