Mastering LLM File Formats with Python

3 min readApr 11, 2024

Large language models, such as GPT-3, BERT, and others, are trained on massive amounts of text data and contain billions of parameters. To efficiently store and distribute these models, various file formats have been developed by different organizations and frameworks. In this article, we’ll explore some of the most common file formats used for large language models and provide examples of how to load and run them using Python.

GGML and GGUF: Compact and Efficient Formats

GGML (Glorot/Gated Gremlin MLmodel) and GGUF (Glorot/Gated Gremlin Updatable Format) are file formats specifically designed for efficient storage and loading of large machine learning models.

GGML is an optimized format that aims to reduce the memory footprint and loading times of large models, making it suitable for running on consumer hardware.

GGUF, on the other hand, is an updatable version of GGML that allows for fine-tuning or updating the model parameters.

To load and run a GGUF model in Python, you can use ctransformers library:

from ctransformers import AutoModelForCausalLM


class GGMLModel:
    def __init__(self, local_path, model_file):
        self.model = AutoModelForCausalLM.from_pretrained(local_path…

Mastering LLM File Formats with Python

GGML and GGUF: Compact and Efficient Formats

Written by Boqiang & Henry