Optimizing Solana Programs

Het Dagli
14 min read6 days ago

--

Average Solano Optimizoooor

Actionable insights

  • Use zero-copy deserialization for large data structures and high-frequency operations.
  • Implement custom serialization/deserialization to avoid Borsh overhead.
  • Use nostd_entrypoint instead of solana_program’s bloated entrypoint.
  • Mark critical functions with #[inline(always)] for potential performance gains.
  • Use bit manipulation for efficient instruction parsing.
  • Minimize dynamic allocations, favouring stack-based data structures.
  • Use Solana-specific C syscalls.
  • Measure compute unit usage to guide optimization efforts.

Solana devs have a spectrum of choices for writing programs, each with its own trade-offs between ease of use, performance, and safety. At one end, we have the Anchor framework, providing a high-level abstraction that simplifies development but may introduce some overhead. At the other extreme, we find low-level approaches using unsafe Rust and direct syscalls, offering maximum performance at the cost of increased complexity and potential security risks.

Between these extremes lie options like Anchor with zero-copy deserialization and native Rust programming. Each approach represents a different balance of development speed, runtime efficiency, and safety guarantees. The key question for devs is not just how to optimize, but when and to what degree optimization is necessary.

This blog post explores these options in-depth, providing a roadmap for devs to navigate the optimization landscape. We’ll examine:

1. Anchor: The standard high-level framework
2. Anchor with zero-copy: Optimizing for large data structures
3. Native Rust: Balancing control and ease of use
4. Unsafe Rust with direct syscalls: Pushing the limits of performance

The goal is not to prescribe a one-size-fits-all solution but to equip developers with the knowledge to make informed decisions based on their specific use cases.

By the end of this post, you’ll have a clearer picture of how to think about these different levels of abstraction and when to consider moving down the optimization path. Remember, the most optimized code isn’t always the best solution — it’s about finding the right balance for your project’s needs.

For the impatient:

TL;DR

Compute Units

Solana’s high-performance architecture relies on efficient resource management. At the heart of this system are compute units (CUs) — a measure of computational resources consumed by a transaction.

Why Care About Compute Units?

1. Transaction Success: Each transaction has a CU budget. Exceeding it leads to failure.
2. Cost Efficiency: Lower CU usage means lower transaction fees.
3. User Experience: Optimized programs execute faster, enhancing overall UX.
4. Scalability: Efficient programs allow more transactions per block, improving network throughput.

Measuring Compute Units

solana_program::log::sol_log_compute_units()syscall logs the number of compute units consumed by a program at a specific point in its execution.

Here is a simple compute_fn! macro implementation using the syscall.

#[macro_export]
macro_rules! compute_fn {
($msg:expr=> $($tt:tt)*) => {
::solana_program::msg!(concat!($msg, " {"));
::solana_program::log::sol_log_compute_units();
let res = { $($tt)* };
::solana_program::log::sol_log_compute_units();
::solana_program::msg!(concat!(" } // ", $msg));
res
};
}

This is taken from source code where the macro is used to log the CU consumed by various code blocks.

The above code snippet implements a counter program with 2 instructions initialize and increment

We will write the same counter program with the same 2 instructions initialize and increment in 4 different ways anchor, anchor(zero_copy), native rust and unsafe rust and compare the CU usage for all of them.

Initializing an account and making a minor change to that account(incrementing) is a decent benchmark to compare different approaches. We are not using PDAs for now.

For the impatient here are the CU comparisons for the 4 approaches:

CU comparisons
Initialize Instruction
Increment Instruction

Let’s get going…

Zero-Copy Deserialization

Zero-copy deserialization allows us to interpret account data directly, without allocating new memory or copying data. This technique can reduce CPU usage, lower memory consumption, and potentially lead to more efficient instructions.

Let us start with a basic Anchor counter program:

use anchor_lang::prelude::*;

declare_id!("37oUa3WkeqwnFxSCqyMnpC3CfTSwtvyJxnwYQc3u6U7C");

#[program]
pub mod counter {
use super::*;

pub fn initialize(ctx: Context<Initialize>) -> Result<()> {
let counter = &mut ctx.accounts.counter;
counter.count = 0;
Ok(())
}

pub fn increment(ctx: Context<Update>) -> Result<()> {
let counter = &mut ctx.accounts.counter;
//Not doing checked_add, wrapping add or any overflow checks
//to keep it simple
counter.count += 1;
Ok(())
}
}

#[derive(Accounts)]
pub struct Initialize<'info> {
#[account(init, payer = user, space = 8 + 8)]
pub counter: Account<'info, Counter>,
#[account(mut)]
pub user: Signer<'info>,
pub system_program: Program<'info, System>,
}

#[derive(Accounts)]
pub struct Update<'info> {
#[account(mut)]
pub counter: Account<'info, Counter>,
pub user: Signer<'info>,
}

#[account]
pub struct Counter {
pub count: u64,
}

Nothing fancy above now let’s make it fancy with zero_copy

use anchor_lang::prelude::*;


declare_id!("7YkAh5yHbLK4uZSxjGYPsG14VUuDD6RQbK6k4k3Ji62g");


#[program]
pub mod counter {
use super::*;

pub fn initialize(ctx: Context<Initialize>) -> Result<()> {
let mut counter = ctx.accounts.counter.load_init()?;
counter.count = 0;
Ok(())
}

pub fn increment(ctx: Context<Update>) -> Result<()> {
let mut counter = ctx.accounts.counter.load_mut()?;
counter.count += 1;
Ok(())
}
}

#[derive(Accounts)]
pub struct Initialize<'info> {
#[account(init, payer = user, space = 8 + std::mem::size_of::<CounterData>())]
pub counter: AccountLoader<'info, CounterData>,
#[account(mut)]
pub user: Signer<'info>,
pub system_program: Program<'info, System>,
}

#[derive(Accounts)]
pub struct Update<'info> {
#[account(mut)]
pub counter: AccountLoader<'info, CounterData>,
pub user: Signer<'info>,
}

#[account(zero_copy)]
pub struct CounterData {
pub count: u64,
}

Key Changes:

1. AccountLoader Instead of Account:
We now use AccountLoader<’info, CounterData> instead of Account<’info, Counter>. This allows for zero-copy access to the account’s data.

2. Zero-Copy Attribute:
The #[account(zero_copy)] attribute on CounterData indicates that this struct can be directly interpreted from raw bytes in memory.

3. Direct Data Access:
In the initialize and increment functions, we use load_init() and load_mut() respectively to get mutable access to the account data without copying it.

4. Memory Layout Guarantees:
The zero_copy attribute ensures that CounterData has a consistent memory layout, allowing safe reinterpretation from raw bytes.

This implementation reduced CU usage of initialize instruction from 5095 to 5022 and for increment instruction from 1162 to 1124.

Zero copy leads to minimal largely insignificant improvements for our case.

However zero-copy deserialization might come in handy for:

1. Large Data Structures: When dealing with accounts storing complex or extensive data, zero-copy can substantially reduce CPU and memory usage.

2. High-Frequency Operations: Programs that frequently read or write to accounts benefit from the reduced overhead of zero-copy deserialization.

Trade-offs and Considerations

Zero-copy isn’t without its challenges:

1. Increased Complexity: The code becomes slightly more complex, requiring careful handling of raw data.

2. Safety Considerations: Direct memory access requires extra attention to ensure data integrity and prevent errors.

3. Compatibility: Not all data structures are suitable for zero-copy deserialization. They must have a predictable memory layout.

In practice, the decision to use zero-copy should be based on your specific use case. For simple programs like our counter, the benefits might be minimal. However, as your programs grow in complexity and handle larger data structures, zero-copy can become a powerful tool for optimization.

While zero-copy optimization didn’t yield significant improvements for our simple counter program, the quest for efficiency doesn’t end here. Let’s explore another avenue: writing native Solana programs in Rust without the Anchor framework. This approach offers more control and potential for optimization, albeit with increased complexity.

Going Native

Native Rust programs on Solana provide a lower-level interface, requiring developers to handle many tasks that Anchor automates. This includes account deserialization, serialization, and various security checks. While this demands more from the developer, it also opens up opportunities for fine-tuned optimizations.

Let’s examine the native Rust implementation of our counter program:

use solana_program::{
account_info::{next_account_info, AccountInfo},
entrypoint,
entrypoint::ProgramResult,
program_error::ProgramError,
pubkey::Pubkey,
rent::Rent,
system_instruction,
program::invoke,
sysvar::Sysvar,
};
use std::mem::size_of;

// Define the state struct
struct Counter {
count: u64,
}

// Declare and export the program's entrypoint
entrypoint!(process_instruction);

// Program entrypoint's implementation
pub fn process_instruction(
program_id: &Pubkey,
accounts: &[AccountInfo],
instruction_data: &[u8],
) -> ProgramResult {
let instruction = instruction_data
.get(0)
.ok_or(ProgramError::InvalidInstructionData)?;

match instruction {
0 => initialize(program_id, accounts),
1 => increment(accounts),
_ => Err(ProgramError::InvalidInstructionData),
}
}

fn initialize(program_id: &Pubkey, accounts: &[AccountInfo]) -> ProgramResult {
let account_info_iter = &mut accounts.iter();
let counter_account = next_account_info(account_info_iter)?;
let user = next_account_info(account_info_iter)?;
let system_program = next_account_info(account_info_iter)?;

if !user.is_signer {
return Err(ProgramError::MissingRequiredSignature);
}

if counter_account.owner != program_id {
let rent = Rent::get()?;
let space = size_of::<Counter>();
let rent_lamports = rent.minimum_balance(space);

invoke(
&system_instruction::create_account(
user.key,
counter_account.key,
rent_lamports,
space as u64,
program_id,
),
&[user.clone(), counter_account.clone(), system_program.clone()],
)?;
}

let mut counter_data = Counter { count: 0 };
counter_data.serialize(&mut &mut counter_account.data.borrow_mut()[..])?;

Ok(())
}

fn increment(accounts: &[AccountInfo]) -> ProgramResult {
let account_info_iter = &mut accounts.iter();
let counter_account = next_account_info(account_info_iter)?;
let user = next_account_info(account_info_iter)?;

if !user.is_signer {
return Err(ProgramError::MissingRequiredSignature);
}

let mut counter_data = Counter::deserialize(&counter_account.data.borrow())?;

//Not doing checked_add, wrapping add or any overflow checks to keep it simple
counter_data.count += 1;
counter_data.serialize(&mut &mut counter_account.data.borrow_mut()[..])?;

Ok(())
}

impl Counter {
fn serialize(&self, data: &mut [u8]) -> ProgramResult {
if data.len() < size_of::<Self>() {
return Err(ProgramError::AccountDataTooSmall);
}

//First 8 bytes is the count
data[..8].copy_from_slice(&self.count.to_le_bytes());
Ok(())
}

fn deserialize(data: &[u8]) -> Result<Self, ProgramError> {
if data.len() < size_of::<Self>() {
return Err(ProgramError::AccountDataTooSmall);
}

//First 8 bytes is the count
let count = u64::from_le_bytes(data[..8].try_into().unwrap());
Ok(Self { count })
}
}

Key Differences and Considerations:

  1. Manual Instruction Parsing:
    Unlike Anchor, which automatically routes instructions, we manually parse the instruction data and route it to the appropriate function.
let instruction = instruction_data
.get(0)
.ok_or(ProgramError::InvalidInstructionData)?;

match instruction {
0 => initialize(program_id, accounts),
1 => increment(accounts),
_ => Err(ProgramError::InvalidInstructionData),
}

2. Account Management:
We use next_account_info to iterate through accounts, manually checking for signers and owners. Anchor handles this automatically with its #[derive(Accounts)] macro.

let account_info_iter = &mut accounts.iter();
let counter_account = next_account_info(account_info_iter)?;
let user = next_account_info(account_info_iter)?;

if !user.is_signer {
return Err(ProgramError::MissingRequiredSignature);
}

3. Custom Serialization:
We implement custom serialize and deserialize methods for our Counter struct. Anchor uses borsh serialization by default, abstracting this away.

impl Counter {
fn serialize(&self, data: &mut [u8]) -> ProgramResult {
if data.len() < size_of::<Self>() {
return Err(ProgramError::AccountDataTooSmall);
}

//First 8 bytes is the count
data[..8].copy_from_slice(&self.count.to_le_bytes());
Ok(())
}

fn deserialize(data: &[u8]) -> Result<Self, ProgramError> {
if data.len() < size_of::<Self>() {
return Err(ProgramError::AccountDataTooSmall);
}

//First 8 bytes is the count
let count = u64::from_le_bytes(data[..8].try_into().unwrap());
Ok(Self { count })
}
}

4. System Program Interactions:
Creating accounts involves direct interaction with the System Program using invoke and doing a CPI, which Anchor simplifies with its init constraint.

invoke(
&system_instruction::create_account(
user.key,
counter_account.key,
rent_lamports,
space as u64,
program_id,
),
&[user.clone(), counter_account.clone(), system_program.clone()],
)?;

5. Fine-grained Control:
In general, native programs offer more control over data layout and processing, potentially allowing for more optimized code.

How to think about Anchor v/s Native?

1. Explicit vs. Implicit:
Native programs require explicit handling of many aspects that Anchor manages implicitly. This includes account validation, serialization, and instruction routing.

2. Security Considerations:
Without Anchor’s built-in checks, you must be vigilant about implementing proper security measures, such as checking account ownership and signer status.

3. Performance Tuning:
Native programs allow for more fine-grained performance optimizations but require a deeper understanding of Solana’s runtime behaviour.

4. Boilerplate Code:
Expect to write more boilerplate code for common operations that Anchor abstracts away.

5. Learning Curve:
While potentially more efficient, native programming has a steeper learning curve and requires more in-depth knowledge of Solana’s architecture.

TL;DR

The biggest limiting factor going from Anchor to native is to handle serialisation and deserialization. In our case, it was relatively simple but it would get complicated as state management gets complex.

However, it is also true that Borsh used by Anchor is computationally very costly, so the effort is worth it.

Our optimization journey doesn’t end here. In the next section, we’ll push the boundaries even further by leveraging direct syscalls and avoiding the Rust standard library.

This approach is challenging but I promise it would provide some interesting insights into the inner workings of Solana’s runtime.

Pushing the Limits with Unsafe Rust and Direct Syscalls

We’ll now examine how to leverage unsafe Rust and direct syscalls to squeeze out even more performance from our counter program. This approach, while more complex and requiring careful handling, can lead to significant CU savings.

Let’s look at a highly optimized version of our counter program:

use solana_nostd_entrypoint::{
basic_panic_impl, entrypoint_nostd, noalloc_allocator,
solana_program::{
entrypoint::ProgramResult, log, program_error::ProgramError, pubkey::Pubkey, system_program,
},
InstructionC, NoStdAccountInfo,
};

entrypoint_nostd!(process_instruction, 32);

pub const ID: Pubkey = solana_nostd_entrypoint::solana_program::pubkey!(
"EgB1zom79Ek4LkvJjafbkUMTwDK9sZQKEzNnrNFHpHHz"
);

noalloc_allocator!();
basic_panic_impl!();

const ACCOUNT_DATA_LEN: usize = 8; // 8 bytes for u64 counter

/*
* Program Entrypoint
* ------------------
* Entrypoint receives:
* - program_id: The public key of the program's account
* - accounts: An array of accounts required for the instruction
* - instruction_data: A byte array containing the instruction data
*
* Instruction data format:
* ------------------------
* | Bit 0 | Bits 1-7 |
* |-------|----------|
* | 0/1 | Unused |
*
* 0: Initialize
* 1: Increment
*/
#[inline(always)]
pub fn process_instruction(
_program_id: &Pubkey,
accounts: &[NoStdAccountInfo],
instruction_data: &[u8],
) -> ProgramResult {

if instruction_data.is_empty() {
return Err(ProgramError::InvalidInstructionData);
}

// Use the least significant bit to determine the instruction
match instruction_data[0] & 1 {
0 => initialize(accounts),
1 => increment(accounts),
_ => unreachable!(),
}
}

/*
* Initialize Function
* -------------------
* This function initializes a new counter account.
*
* Account structure:
* ------------------
* 1. Payer account (signer, writable)
* 2. Counter account (writable)
* 3. System program
*
* Memory layout of instruction_data:
* -----------------------------------------
* | Bytes | Content |
* |----------|----------------------------|
* | 0-3 | Instruction discriminator |
* | 4-11 | Required lamports (u64) |
* | 12-19 | Space (u64) |
* | 20-51 | Program ID |
* | 52-55 | Unused |
*/
#[inline(always)]
fn initialize(accounts: &[NoStdAccountInfo]) -> ProgramResult {

let [payer, counter, system_program] = match accounts {
[payer, counter, system_program, ..] => [payer, counter, system_program],
_ => return Err(ProgramError::NotEnoughAccountKeys),
};

if counter.key() == &system_program::ID {
return Err(ProgramError::InvalidAccountData);
}

let rent = solana_program::rent::Rent::default();
let required_lamports = rent.minimum_balance(ACCOUNT_DATA_LEN);

let mut instruction_data = [0u8; 56];
instruction_data[4..12].copy_from_slice(&required_lamports.to_le_bytes());
instruction_data[12..20].copy_from_slice(&(ACCOUNT_DATA_LEN as u64).to_le_bytes());
instruction_data[20..52].copy_from_slice(ID.as_ref());

let instruction_accounts = [
payer.to_meta_c(),
counter.to_meta_c(),
];

let instruction = InstructionC {
program_id: &system_program::ID,
accounts: instruction_accounts.as_ptr(),
accounts_len: instruction_accounts.len() as u64,
data: instruction_data.as_ptr(),
data_len: instruction_data.len() as u64,
};

let infos = [payer.to_info_c(), counter.to_info_c()];

// Invoke system program to create account
#[cfg(target_os = "solana")]
unsafe {
solana_program::syscalls::sol_invoke_signed_c(
&instruction as *const InstructionC as *const u8,
infos.as_ptr() as *const u8,
infos.len() as u64,
std::ptr::null(),
0,
);
}

// Initialize counter to 0
let mut counter_data = counter.try_borrow_mut_data().ok_or(ProgramError::AccountBorrowFailed)?;
counter_data[..8].copy_from_slice(&0u64.to_le_bytes());

Ok(())
}

/*
* Increment Function
* ------------------
* This function increments the counter in the counter account.
*
* Account structure:
* ------------------
* 1. Counter account (writable)
* 2. Payer account (signer)
*
* Counter account data layout:
* ----------------------------
* | Bytes | Content |
* |-------|----------------|
* | 0-7 | Counter (u64) |
*/
#[inline(always)]
fn increment(accounts: &[NoStdAccountInfo]) -> ProgramResult {

let [counter, payer] = match accounts {
[counter, payer, ..] => [counter, payer],
_ => return Err(ProgramError::NotEnoughAccountKeys),
};

if !payer.is_signer() || counter.owner() != &ID {
return Err(ProgramError::IllegalOwner);
}

let mut counter_data = counter.try_borrow_mut_data().ok_or(ProgramError::AccountBorrowFailed)?;

if counter_data.len() != 8 {
return Err(ProgramError::UninitializedAccount);
}

let mut value = u64::from_le_bytes(counter_data[..8].try_into().unwrap());
value += 1;
counter_data[..8].copy_from_slice(&value.to_le_bytes());

Ok(())
}

Key Differences and Optimizations:

  1. No-std Environment:

We’re using solana_nostd_entrypoint, which provides a no-std environment. This eliminates the overhead of the Rust standard library, reducing program size and potentially improving performance. Credits to cavemanloverboy, here is the repo to deep dive: https://github.com/cavemanloverboy/solana-nostd-entrypoint

2. Inline Functions:
Critical functions are marked with #[inline(always)], inlining gives potential performance gains.

3. Bit Manipulation for Instruction Parsing:
We use bit manipulation instruction_data[0] & 1 to determine the instruction type, which can be more efficient than other parsing methods.

// Use the least significant bit to determine the instruction
match instruction_data[0] & 1 {
0 => initialize(accounts),
1 => increment(accounts),
_ => unreachable!(),
}

4. Zero-Cost Memory Management and Minimal Panic Handling:

The noalloc_allocator! and basic_panic_impl! macros implement minimal, zero-overhead memory management and panic handling:

noalloc_allocator!: Defines a custom allocator that panics on any allocation attempt and does nothing on deallocation. By setting this as the global allocator for Solana programs, it effectively prevents any dynamic memory allocation during runtime.

#[macro_export]
macro_rules! noalloc_allocator {
() => {
pub mod allocator {
pub struct NoAlloc;
extern crate alloc;
unsafe impl alloc::alloc::GlobalAlloc for NoAlloc {
#[inline]
unsafe fn alloc(&self, _: core::alloc::Layout) -> *mut u8 {
panic!("no_alloc :)");
}
#[inline]
unsafe fn dealloc(&self, _: *mut u8, _: core::alloc::Layout) {}
}

#[cfg(target_os = "solana")]
#[global_allocator]
static A: NoAlloc = NoAlloc;
}
};
}

This is crucial because:

a) It eliminates the overhead of memory allocation and deallocation operations.

b) It forces developers to use stack-based or static memory, which is generally faster and more predictable in terms of performance.

c) It reduces the program’s memory footprint.

basic_panic_impl! This macro provides a minimal panic handler that simply logs a “panicked!” message.

#[macro_export]
macro_rules! basic_panic_impl {
() => {
#[cfg(target_os = "solana")]
#[no_mangle]
fn custom_panic(_info: &core::panic::PanicInfo<'_>) {
log::sol_log("panicked!");
}
};
}

5. Efficient CPI Preparation

The InstructionC struct andto_meta_c, and to_info_c functions provide a low-level, efficient way to prepare data for CPIs.

let instruction_accounts = [
payer.to_meta_c(),
counter.to_meta_c(),
];

let instruction = InstructionC {
program_id: &system_program::ID,
accounts: instruction_accounts.as_ptr(),
accounts_len: instruction_accounts.len() as u64,
data: instruction_data.as_ptr(),
data_len: instruction_data.len() as u64,
};

let infos = [payer.to_info_c(), counter.to_info_c()];

These functions create C-compatible structures that can be directly passed to the sol_invoke_signed_c syscall. By avoiding the overhead of Rust’s higher-level abstractions and working directly with raw pointers and C-compatible structures, these functions minimize the computational cost of preparing for CPIs. This approach saves CUs by reducing memory allocations, copies, and conversions that would typically occur when using more abstract Rust types.

For example, the to_info_c the method efficiently constructs an AccountInfoC struct using direct pointer arithmetic:

pub fn to_info_c(&self) -> AccountInfoC {
AccountInfoC {
key: offset(self.inner, 8),
lamports: offset(self.inner, 72),
data_len: self.data_len() as u64,
data: offset(self.inner, 88),
owner: offset(self.inner, 40),
// … other fields …
}
}

This direct manipulation of memory layouts allows for extremely efficient creation of the necessary structures for CPIs, thereby reducing the CU cost of these operations.

6. Direct Syscalls and Unsafe Rust

This approach bypasses the usual Rust abstractions and directly interacts with the Solana’s runtime, offering significant performance benefits. However, it also introduces complexity and requires careful handling of unsafe Rust.

// Invoke system program to create account
#[cfg(target_os = "solana")]
unsafe {
solana_program::syscalls::sol_invoke_signed_c(
&instruction as *const InstructionC as *const u8,
infos.as_ptr() as *const u8,
infos.len() as u64,
std::ptr::null(),
0,
);
}

7. Conditional Compilation:

The #[cfg(target_os = “solana”)] attribute ensures this code only compiles when targeting the Solana runtime, which is necessary because these syscalls are only available in that environment.

TL;DR

While all this is fascinating I do think that using this in a production-ready program to secure real money might be a difficult sell. Further, making a memory map to decode the inputs in the right way might be complex for complex states. Using this you are most likely going to fall into the trap of premature optimizations.

However, some things are easily replicable:

  1. Using nostd_entrypoint instead of bloatedentrypoint by solana_program.
  2. Using inline functions wherever possible.
  3. Minimizing dynamic allocations and favouring stack-based data structures.

Conclusion

For a more detailed look at the programs and the tests, go here. I have made the programs as readable and easily approachable as possible sometimes at the cost of further optimizations ;)

If you have any queries hit me up at @daglihet. Although opening a PR in the repo above would make me reply faster ;)

Glass chewing should not stop!!! ;)

--

--