Rust: Raw string literals
r#”What is this?”#
While working with Rust, you will often come across r#"something like this"#
, especially when working with JSON
and TOML
files. It defines a raw string literal. When would you use a raw string literal and what makes a valid raw string literal?
When would you use a raw string literal?
First, let’s understand what a string literal is. According to the The Rust Reference¹, A string literal is a sequence of any Unicode characters enclosed within two U+0022 (double-quote) characters, with the exception of U+0022 itself². Escape characters in the string literal body are processed. The string body cannot contain a double-quote. If you need to insert one, you have to escape it like this: \"
.
Escaping double-quotes can be cumbersome in some cases such as writing regular expressions or defining a JSON object as a string literal. In these situations, raw string literals are helpful since they allow you to write the literal without requiring escapes.
Here is a snippet from the toml
³ crate:
Or another from serde-rs
⁴:
So, raw string literals are helpful, but what makes a valid one?
What makes a raw string literal?
The Rust Reference defines a raw string literal as starting with the character U+0072 (r), followed by zero or more of the character U+0023 (#) and a U+0022 (double-quote) character. The raw string body can contain any sequence of Unicode characters and is terminated only by another U+0022 (double-quote) character, followed by the same number of U+0023 (#) characters that preceded the opening U+0022 (double-quote) character⁵.
Escape characters in the raw string body are not processed.
Therefore the following raw string literals are all valid:
If you need to include double-quote character in a raw string, you must tag the start and end of the raw string with hash/pound signs(#
).
The raw string body can contain any sequence of UNICODE characters except "#
since it would terminate the literal. If you want to include the particular sequence, you have to change the number of #
that precede the opening double-quote. For instance:
Likewise, if "##
is to be included, you can add another #
to the starting and ending delimiters.
Wrap Up
Raw string literals are helpful when you need to avoid escaping characters within a literal. The characters in a raw string represent themselves. Informally, a raw string literal is an r, followed by N hashes (where N can be zero), a quote, any characters, then a quote followed by N hashes⁶.
Here’s how visualising⁷ raw string literals works for me:
That’s it for now!
Enjoyed this post?
References
- https://doc.rust-lang.org/stable/reference/
- https://doc.rust-lang.org/stable/reference/tokens.html#string-literals
- https://github.com/alexcrichton/toml-rs/blob/master/examples/decode.rs
- https://github.com/serde-rs/json
- https://doc.rust-lang.org/stable/reference/tokens.html#raw-string-literals
- https://github.com/rust-lang/rust/blob/master/src/grammar/raw-string-literal-ambiguity.md
- http://www.bottlecaps.de/rr/ui
Originally published at rahul-thakoor.github.io.