bit_seq in Rust: A Procedural Macro for Bit Sequence Generation

Johannes Z.
4 min readOct 6, 2023

Recently, I started working on an Arm64 assembler API library for just-in-time compilers, where efficient and accurate instruction generation is required. Traditional methods for creating such instruction, involve bitwise shifts and masks for instruction encoding. While this approach is functional, it can be cumbersome and prone to errors, particularly when modifications are required.

Consider a 32-bit encoding bit sequence for logical immediate instructions in an Arm64 assembler:

let mut cr = 0u32;
cr |= sf << 31;
cr |= (opc & 0b11) << 29;
cr |= (0b100100) << 23;
cr |= (nrs_mask & 0x1FFF) << 10;
cr |= (rn & 0x1F) << 5;
cr |= rd & 0x1F;

This bit sequence includes multiple parameters, such as opc and various registers, which specify the instructions and their operands. While functional, this approach is not particularly readable and is susceptible to errors, especially when changes are needed in the bit lengths of the parameters, as all subsequent shifts have to be modified.

Introducing bit_seq

The Rust crate bit_seq offers an alternative through its procedural macros. The same sequence can be succinctly represented as:

let cr = bseq_32!(sf:1 opc:2 100100 nrs_mask:13 rn:5 rd:5);

The macro bseq_32! generates a u32 expression. Variables to be injected are specified to the left of the colon (:), and their bit lengths or masks are indicated to the right. Constants can be directly included without length specifications.

Understanding bit_seq: Features and Usage

The bit_seq crate is a Rust library that focuses on simplifying the generation of bit sequences. It employs a procedural macro, bseq, to accomplish this task, thereby enhancing code readability and reducing the likelihood of errors. This is particularly beneficial for projects that involve systems programming or low-level hardware and protocol interfacing, where bit sequence creation is a common requirement.

Key Features

  • Simplified Syntax: The crate allows for the generation of bit sequences using straightforward syntax, making it easier to write and understand the code.
  • Versatile Input Types: It supports various types of input for defining bit sequences, including direct bit sequences, hexadecimal values, and integers with specified lengths.
  • Variable Interpolation: The crate enables the use of variables in length expressions, providing a level of flexibility that may not be present in other bit manipulation libraries.
  • No Runtime Overhead: The bseq macro compiles to common bit manipulation operations, ensuring that there is no additional runtime cost for using this crate.

Raw Sequence

Constant bit sequences can be included, while spacing doesn’t affect them.

let t = bseq!(0110 01 0 1);
assert_eq!(t, 0b0110_01_0_1);

Hex Values

Hexadecimal values without length specification are interpreted as 4-bit sequences.

let t = bseq!(01 0x1f);
assert_eq!(t, 0b01_0001_1111);

Length Expressions

Length expressions take the form <val>:<len>, where <len> is the number of least significant bits from <val> to be used.

let t = bseq!(3:1 0 0xf:2);
assert_eq!(t, 0b1_0_11);

Variable Interpolation

Variable interpolation is supported for length expressions.

let var = 0xf;
let t = bseq!(10 var:2);
assert_eq!(t, 0b10_11);

Unary Operations

The bseq syntax supports some unary operations for length expressions. This simplifies bit sequences like 0b111111.

// bit negation
assert_eq!(bseq!(!0:6), 0b111111);

// numerical negation with variable interpolation
let var = 1;
assert_eq!(bseq!(-var:8), 0xff);

Type-Specific Macros

While bseq! provides all features, it may not support variable interpolations with multiple variable types. For example, the following code would not compile:

let foo: u32 = 4;
let bar: u64 = 2;
let t: u8 = bseq!(foo:5 bar:3);

To overcome this limitation, bit_seq additionally provides the macros bseq_8!, bseq_16!, bseq_32!, and bseq_64!. These macros can process variables with different types.

Performance

The macro expands to a common inline expression that uses bitwise AND, SHIFT, and OR operations. For all type-specific macros, numerical type casts are used, which shouldn’t be noticeable at the instruction level. For example, the entry example of this blog post expands to:

(((rd as u32) & 31) << 0 
| ((rn as u32) & 31) << 5
| ((nrs_mask as u32) & 8191) << 10
| (36) << 23 | ((opc as u32) & 3) << 29
| ((sf as u32) & 1) << 31) as u32

While there haven’t been any further optimizations so far, I would welcome suggestions regarding the performance of the generated Rust code.

Alternative Approaches

Several options for bit sequence construction and manipulation exist for different purposes. Some of them are bitfield, bit-vec, bit-set, and bitflags.

However, I believe that bit_seq provides a valuable addition to existing solutions, especially when creating one-time bit sequences from multiple parameters.

Summary

The `bit_seq` crate offers a specialized utility for creating bit sequences in Rust. It provides a more streamlined and readable approach compared to traditional methods involving bitwise shifts and masks. While those methods remain effective for certain use cases, bit_seq fills a specific niche, particularly for projects that require precise bit-level operations.

Documentation and Contributions

For a more comprehensive understanding of the API, you can refer to the official documentation. Contributions to bit_seq are welcome and can be made by submitting a pull request or creating an issue on the GitHub page.

--

--