[Rust] Comparing Rustformers and Candle for LLM Inference
Running Meta/Llama Models Locally
Here we will explore two Rust libraries, Rustformers and Candle, for creating an inference application, locally. There are a few example repositories that I create for anyone interested in getting a jumpstart without having to look through and edit the libraries’ publicly available examples.
Why use Rust to Run LLMs Locally?
There might be a few reasons why you might want to use Rust for running LLMs locally. First, Rust has better performance, memory management, safety features than Python’s Transformer library. Also, fine-tuned, open-source, proprietary models, potentially running on the cloud, could benefit from reduced compute costs as well by running efficiently on general-purpose hardware, i.e., CPUs.
Inference with Rustformers
rustformers/llm is an ecosystem of Rust libraries for working with large language models — it’s built on top of the fast, efficient GGML library for machine learning.
I created a barebones example repository using the Rustformer repository examples as a template. The example uses the llm
and rustyline
libraries to create a conversational interaction between a user and a Llama model. The models is…