[Rust] Comparing Rustformers and Candle for LLM Inference

Running Meta/Llama Models Locally

Rohan Kotwani
Init Deep Dive

--

By Author

Here we will explore two Rust libraries, Rustformers and Candle, for creating an inference application, locally. There are a few example repositories that I create for anyone interested in getting a jumpstart without having to look through and edit the libraries’ publicly available examples.

Why use Rust to Run LLMs Locally?

There might be a few reasons why you might want to use Rust for running LLMs locally. First, Rust has better performance, memory management, safety features than Python’s Transformer library. Also, fine-tuned, open-source, proprietary models, potentially running on the cloud, could benefit from reduced compute costs as well by running efficiently on general-purpose hardware, i.e., CPUs.

Inference with Rustformers

rustformers/llm is an ecosystem of Rust libraries for working with large language models — it’s built on top of the fast, efficient GGML library for machine learning.

I created a barebones example repository using the Rustformer repository examples as a template. The example uses the llm and rustyline libraries to create a conversational interaction between a user and a Llama model. The models is…

--

--