Encoding MIPS Instructions with C++17
I recently started writing an assembler from scratch using C++17 for a somewhat obscure operating system. The assembler’s design supports multiple target architectures, but for now, I only need to support the MIPS I instruction set, which happens to be extremely convenient, in that it’s trivial to encode.
A single MIPS assembler instruction is comprised of an operation mnemonic along with an optional operand list (e.g. addi $t1,$v0,0xFF
). The proper machine encoding of the instruction (i.e. the layout of its 32 bits) is dictated by the instruction’s format type, which is documented for all MIPS I instructions in the IDT R30xx Family Software Reference Manual.
The three MIPS Instruction Encoding Formats
The figure (A.1) shows how the available 32 bits of an instruction are used by each of the three encoding formats.
Bit-fields rs
, rt
, and rd
encode the index of a CPU register (0 thru 31).
Bit-fields immediate
, target
, and shamt
encode a user-supplied numeric constant value.
Bit-fields op
and funct
encode an assembler-supplied numeric constant value (i.e. flags), used by the CPU to identify and configure itself for the instruction.
Declarative Instruction Encoding [C++17][constexpr if]
To keep all of the encoding rules in one place, the assembler supports the following declarative syntax:
This map provides the encoding function for each and every MIPS I instruction. The expressions supplied for the map’s values generate these encoding functions via template expansion.
For example, from the snippet above, IType
names a function template which accepts 5 template arguments:
- Opcode. Numeric constant expression (constant bits), only.
- RS. Constant bits, or a sentinel value
Arg
, which tells the function template to read the value from the user-supplied operands (5.). - RT. Constant bits, or
Arg
. - Immediate. Constant bits, or
Arg
. - Operand parsing function. Reads the user-provided operands string into a tuple (more on this later).
Each function template (RType, IType, JType
) instantiation results in the generation of a custom function which supplies the constant portions of the instruction’s encoding at compile-time. In other words, the bit-fields of the instruction that are always the same (used by the CPU to identify the operation it should perform) are passed into these function templates as integral constants (non-type integral template parameters). In theory, this should allow the compiler to combine all constant portions of the encoded instruction at compile-time.
The only deviation from this pattern occurs in the presence of the Arg
sentinel, which uses a C++17 feature known as the constexpr if
statement to provide special-casing.
Note that the function
IsArgSentinel
isconstexpr
.
The function template shown above (used for MIPS I-type instructions) uses a constexpr if
statement to generate code to read the value for RS from the user-provided operand string (this is the operands list provided after the mnemonic), if and only if RS
as specified by the template instantiation is the Arg
sentinel.
A constexpr if
statement tests the result of a constant expression (provided as its condition), and *discards* the statements of the branch not taken. This condition is evaluated at compile-time, making it possible for the compiler to exclude the failed branch from the function’s code.
If the value is
true
, then statement-false is discarded (if present), otherwise, statement-true is discarded. — cppreference.
Reading the Operand String [C++14]
The instruction function templates (i.e. IType
, RType
, JType
) make no assumptions about the format of the operand string nor the ordering (or count!) of its contained operands.
Instead, each of these function templates accept a function pointer (5. from the example above) to an operand string parsing function, which splits the operand string into an std::tuple
of the relevant fields for the instruction format.
The function templates can then simply use the type-based overload of std::get
(C++14) to retrieve specific operands:
auto& rs_operand = std::get<assembler::RS>(operands);
This usage is shown in context for the I-type function template in the previous section.
Declarative Operand Parsing
To prepare these tuples, we must split an operand string into its components in accordance with the syntax rules of the corresponding assembler instruction.
The syntax rules for an instruction dictate:
- The formatting of the operand list. Either tuple-style (e.g.
$t0,$v0,0xFF
) or offset-style (e.g.$t0,0xFF($v0)
). - The ordering of the instruction components to which each operand will bind. For example, is
$t0,$v0,0xFF
supposed to beRT,RS,Immediate
orRS,RT,Immediate
?
Both of these rules can be implemented declaratively using function templates as well (see ITypeTuple
and ITypeOffset
):
Notice in the above examples that the order of our template parameters (i.e. RS
, RT
, Immediate
) defines the operand order for the instruction.
The implementation of this syntax relies heavily on the type-based overload of std::get
, which allows us to bind the I-th operand from the operand string to whichever tuple field type was provided for the I-th template argument, in the resulting tuple.
Optional Operands [C++17]
You may have noticed the use of std::optional
(C++17) above. This type is used to represent a value which may or may not be present. All wrapper operand types extend std::optional
:
struct RS : std::optional<std::string> { ... };
This allows them to be specified as std::nullopt
by operand parsing functions for which not all operand types are applicable. For example, many I-type instructions allow only two user-specified operands:
// Example: bgez $t0,0xCF
//
// RS = $t0, Immediate = 0xCF// Field RT does NOT come from the operand list. It must always
// be 1, as required by the bgez reference spec.{ "bgez", IType<0b000001, Arg, 1, Arg, ITypeTuple<RS, Immediate>> }
An ITypeTuple
function template with two template arguments handles this case by filling only two of the three operand fields in the resulting tuple.
The remaining field (not user-specifiable) will be std::nullopt
(RT
in the case of bgez
), though it will never be read as such, since no code will be generated to read it in the corresponding IType
function template instantiation.
It would have been possible to use variadic template arguments to handle both 2 and 3-tuples using a single function template, but this was decided against for simplicity.
Further Reading: `constexpr
` Function Parameters
In this post, I mentioned the use of a sentinel value Arg
used to special-case code generation for reading operands. Perhaps ideally, we could have used something type-safe (perhaps std::optional
) as our non-type template argument’s type, but this is not possible in C++17.
To work around this limitation, a type template parameter can be used in conjunction with the newly introduced constexpr
lambda (C++17).
Though this approach was not chosen due to complexity.
For a detailed explanation of this solution, check out Michael Park’s excellent blog post on the topic: “constexpr function parameters”.
References
- IDT R30xx Family Software Reference Manual
- https://en.cppreference.com/w/cpp/language/if (
if constexpr
) - https://en.cppreference.com/w/cpp/utility/optional
- https://en.cppreference.com/w/cpp/utility/tuple/get
- https://mpark.github.io/programming/2017/05/26/constexpr-function-parameters/
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0732r0.pdf