Submerging with Orca: The Deep Waters of Rust’s Dynamic Dispatch

Santiago Medina
7 min readOct 12, 2023

--

Ever dove deep into a project, only to hit a wall you never saw coming? That’s exactly what happened to me with Orca, my latest open-source brainchild. Picture this: Orca, an innovative Rust-based LLM (Large-Language Model) Orchestration framework, aiming to revolutionize the world of serverless LLM orchestration, bringing workflows not just to the expansive web, but also to the compact edge devices. Everything was going swimmingly until I stumbled upon a challenge that left me scratching my head for hours: dynamic dispatch in Rust. Stick around as I share this unexpected journey, the hurdles I faced, and how I tackled them. Believe me, if Rust has been on your radar, this is a tale you won’t want to miss.

Generated with DALL-E 3

Let’s start by setting the context. LLM Chains are a series of interconnected components that work together to process user inputs and generate responses. They not only use models but can also encompass other tools like query vector databases, loading records (or documents), etc. Here’s a simple LLM Chain in the latest version of Orca, which takes a prompt template and generates a response:

#[derive(Serialize)]
pub struct Data {
country1: String,
country2: String,
}

#[tokio::main]
async fn main() {
let client = OpenAIClient::new();
let prompt = r#"
{{#chat}}
{{#user}}
What is the capital of {{country1}}?
{{/user}}
{{#assistant}}
Paris
{{/assistant}}
{{#user}}
What is the capital of {{country2}}?
{{/user}}
{{/chat}}
"#;
let mut chain = LLMChain::new(&client, prompt);
chain.load_context(&Data {
country1: "France".to_string(),
country2: "Germany".to_string(),
});
let res = chain.execute().await.unwrap().content();

assert!(res.contains("Berlin") || res.contains("berlin"));
}

(Note: Orca is constantly evolving, so this might not represent the latest way to run a simple chain).

My challenge surfaced when passing a client (or model) into the chain. Those familiar with Rust will recognize its unique approach to ownership: when you pass a variable to a function directly, the function takes over ownership, which means you can’t use that variable in the same context after the function call. However, Rust offers a way around this with borrowing, signified by the & operator. Borrowing doesn’t transfer ownership; it merely lends a reference. In our example, by using the & operator, we’re not giving the chain permanent ownership of client; we’re simply allowing it temporary access.

Furthermore, my aim is for our chain to accept any LLM model, not just OpenAI’s. This is where dynamic dispatch comes in. Dynamic dispatch is a form of polymorphism that allows us to interact with data that implements a certain trait without knowing its exact type, enhancing flexibility in function calling. In the code below, our chain member variable is defined as &'llm dyn LLM, which means we can take in any reference to a type that implements the LLM trait, and that persists for at least the 'llm lifetime. Traits and lifetimes aren't the primary focus of this blog, but you can delve deeper into them here: Traits and Lifetimes.

pub struct LLMChain<'llm> {
/// ... more member variables ...

/// A reference to the LLM that this chain will use to process the prompts.
llm: &'llm dyn LLM,

/// ... more member variables ...
}

impl<'llm> LLMChain<'llm> {
pub fn new(llm: &'llm dyn LLM, prompt: &str) -> LLMChain<'llm> {
/// ... implementation ...
}

“Why doesn’t this setup work?”, you might ask. It does, as long as Orca’s functionality remains basic. Yet, I envisioned Orca to be more; I wanted it to harness the power of concurrency and parallelism. That’s where the waters got murky.

In my quest to implement a non-distributed MapReduce chain for Orca, I aimed for each worker to operate in a separate thread, ensuring concurrent, and hence speedier, inferences (setting aside the computational constraints for now). The roadblock arose when trying to shift the borrowed client variable into the thread—a definite no-go in Rust. Seasoned Rustaceans would likely have foreseen this, but with my sub-year Rust experience, the borrow-checker and I are still on shaky ground. In a nutshell, since client is a borrowed value, Rust's compiler throws a fit if you attempt to move it into a thread. Why? There's a risk that the original owner of the value might cease to exist before the thread concludes, and in Rust, values cannot outlive their owner — if a borrowed value were allowed to be moved into a thread, and the original owner of that value was dropped or went out of scope, then the thread could end up accessing dangling references, leading to undefined behavior.

To address this challenge, I turned to Rust’s Arc (Atomically Reference Counted). At its core, an Arc is a kind of smart pointer—a tool that not only points to data but also manages the memory of that data. What makes Arc special is its thread-safe nature, meaning it can be used safely across multiple threads. Unlike a regular pointer, which just points to a memory location, Arc keeps track of how many references or "clones" exist to the data it points to. Whenever you create a clone of an Arc, the reference count increases, signifying that another reference to the data exists. And when this count drops to zero—when no more references exist—the data is automatically cleaned up or deallocated. This feature of Arc, which can be closely compared to C++’s std::shared_ptr, made it perfect for my needs. By defining the llm variable as Arc<dyn LLM>, I could easily—and without much overhead—clone its value and pass it between threads.

Herein lies the crux of the matter that inspired this blog post. My vision for Orca’s interface was one of elegance and user-friendliness. I didn’t want to burden users with the task of wrapping their values in Arcs before feeding them into a chain object. To me, this interface seemed less than ideal:

        let client = Arc::new(OpenAIClient::new());
let prompt = "my prompt";
let mut chain = LLMChain::new(client, prompt);

My inaugural solution? Nest the model within an Arc further wrapped by another type, like so:

pub struct OpenAI(Arc<OpenAIClient>)

Naively, I assumed that altering the implementations to reference OpenAI instead of OpenAIClient would facilitate:

let client = OpenAI::new();
let prompt = "my prompt";
let chain = LLMChain::new(client, prompt);

where LLMChain::new is defined as so:

pub fn new(llm: Arc<dyn LLM>, prompt: &str) -> LLMChain<'llm>

Regrettably, this proved futile. Since LLMChain now demanded Arc<dyn LLM> as a parameter, a type mismatch emerged. Though OpenAI encapsulated Arc<OpenAIClient> and implemented LLM, why did the compiler expect me to re-wrap OpenAI in another Arc? Such redundancy seemed both non-idiomatic and impractical. My frustration stemmed from the fact that new wasn't permitted to accept just dyn LLM. Thus, my predicament became:

Why can’t the new method ingest dyn LLM directly, instead of &dyn LLM or smart pointers like Box, Arc, Rc, and the like?

My subsequent research yielded these insights:

  1. Size at Compile Time: In Rust, every value must possess a predetermined size at compile time unless ensconced behind some form of pointer or indirection. For specific structs or enums, the compiler can ascertain the precise size. But for a trait object (e.g., dyn LLM), the size remains ambiguous since any type implementing the trait could be in play during runtime. This restriction prevents having a standalone variable of type dyn LLM—its size is unpredictable at compile time. Consequently, you resort to fixed-size pointers or smart pointers (like &dyn LLM, Box<dyn LLM>, Arc<dyn LLM>, etc.)
  2. Vtables: In Rust, when using trait objects like dyn LLM, the exact methods that should be called at runtime aren't known at compile time because any type can implement the trait. To address this, Rust uses something called a vtable. A vtable is essentially a list of function pointers. When you have a dyn LLM trait object, it comes with two pieces of information: A reference to the actual data (the struct or instance that implements the trait) and a reference to the vtable for that specific type. When you call a method on a dyn LLM object, Rust looks up the appropriate method in the vtable and then calls it. This allows for flexibility but adds a layer of indirection compared to calling methods on regular, known types.
  3. Simpler Stack: A straightforward stack with predefined sizes facilitates Rust’s commitment to no garbage collection. Heap allocations (evident in Box, Arc, and others) offer greater flexibility regarding sizes and lifetimes.

So, to piece it all together:

While it’s feasible to accept dyn LLM as an argument, this would imply passing a trait object by value—a tactic Rust prohibits due to the indeterminate size of the trait object at compile time. Therefore, one cannot simply "move" or pass a dyn Trait by value. The solution lies in applying some form of indirection (references or smart pointers) to address this limitation.

Wrapping Up

Navigating the intricate pathways of Rust’s dynamic dispatch has been quite the adventure. As of now, I’m still exploring various solutions to make Orca as user-friendly as possible. While the deep dive into Rust’s workings provided a clearer understanding, there remains the hurdle of users needing to manually create an Arc themselves. It's not the sleekest solution, but it's where we currently stand.

If anyone out there has insights, alternative strategies, or perhaps a more elegant way of addressing this, I’m all ears! Feel free to get in touch, or even better, open a PR on the repository linked here. And if you resonate with Orca and its vision, please do not forget to star the repository as well. Together, we can make Orca swim even more smoothly in the vast ocean of Rust.

Orca: https://github.com/scrippt-tech/orca
LinkedIn: https://www.linkedin.com/in/santiago-med/
X: https://twitter.com/santiagomedr

--

--