ownership & slices part 2

George Shuklin
Jul 2, 2017 · 10 min read

I got some free time, therefore, more Rust!

Strings

I’m reading chapter where ‘move’ is described. This chapter discloses one more important detail: how strings are stored. Rust uses ‘pascal style’ strings. That means they have no trailing \0 at the end. Well, not precise ‘pascal’, as pascal string uses bytes at the string start to describe it length, and Rust uses separate metadata: pointer, size and capacity.

Capacity is a rather unusual. As I got from Oleg Eterevsky explanation, they are used for reducing overhead on string growth. When something is appended to string, it length changes, but as long as length ≤ capacity, there is no need for reallocation.

Moreover, I start to see a spark of clever design in such string construction: Part with metadata is fixed in size, and that size is known beforehand at compile time. Moreover, when we have access to this part, we can reallocate string in any function without cooperation from calling calling code, as structure occupy same memory address. Only one pointer within that structure need to be changed. And reallocating code has freedom to tweak numbers: how much to grow ‘capacity’ on each reallocation is up to reallocator. It can use ‘double’ strategy (next reallocation double the size), can use 1.5x size. If memory is constrained it can reallocate often but with smaller wasted memory.

Downsize of this approach is that ‘C-style’ string need one pointer and one byte to store a string. Rust need pointer (one or two, I’ll thing about this later), and two size_t values. That means that on x86_64 a minimal non-empty string with size 1 would occupy 8+8+8+1 bytes (25).

How many pointers do we need? I saw in some blog entry that Rust can pass structures with known size on stack, without allocating them at runtime in the heap. In this case we may avoid using a dedicated pointer, but we still need a way to point to data structure on stack, therefore one more pointer is needed. (It can be pointer or offset in relation to the stack). Therefore, actual minimal non-empty string is 33 bytes.

Reasoning is good, but experiment is better. I will generate a program which will have many different minimal variables. I still don’t know how to do vectors in Rust and I’m interested in ‘real variables’, not in elements of array (which would be optimized).

Python generator for Rust source code:

import sys
print("use std::io;")
print("fn main() {")
print("\tlet mut placeholder = String::new();\n")
for i in range(0, int(sys.argv[1])):
print('\tlet v{} = String::from("1");'.format(i))
print("\n\tio::stdin().read_line(& mut placeholder);")
print("}")

I’ll look at memory consumption with different values (before pressing Enter, of course).

It will produce code like this (but with much more variables):

use std::io;
fn main() {
let mut placeholder = String::new();
let v0 = String:: from("1");
let v1 = String:: from("1");
let v2 = String:: from("1");
io::stdin().read_line(& mut placeholder);
}

Unfortunately, my clever attempt was stopped by rustc. It does not like too many variables in a scope. I found compilation for 10k variables a rather long, and even a lunch break (~30 minutes) didn’t give enough time to complete 100k run. About 50 minutes it failed with message ‘fatal runtime error: out of memory’ regardless of 10Gigs of available free memory.

Gathered numbers were inconclusive: I saw some increasement in both RssAnon and VmStk for processes as number of variables goes up, but it was within twofold range, therefore I was unable to confirm or reject hypothesis on memory footprint for Strings in Rust.

Returning back to ‘move’ chapter.

I was intuitively understood that ‘move’ means that we no longer possess value. Now I got clarification: when value is moved from one variable into other in Rust, it invalidates origin. Compiler mark variable as ‘invalid’ and no longer allow any kind of operation with it, except for assigning new value to it.

That brings us another question: why they use equal sign (=) for move operation? If they had used arrow sign of any kind (→, ▶, ➙, ➛, ➜, ➝, ➞, ➟, ➡, ➢) or any special symbol for that besides equal (=), I expect it would have been much clearer to understood the meaning of operation. Nevertheless we have another usage of ‘=’ symbol in a very unusual way. Once we had to deal with controversy around difference between ‘=’ and ‘=:’ (‘:=’). When people invented ‘=:’ they wanted to show that this is not a mathematical equality, that this is operation. Later convenience made it into ‘=’ and caused many troubles. Now we’ve got yet another non-tirvial operation, and it again hidden behind equality sign…

Offtopic: They use ‘clone’ function to do a deepcopy. I suddenly got interested: what is ‘clone’? I mean, the origin of word. As it turn out, it coming from the Greek klōn (twig, slip; akin to)…

Traits again!

Rust won’t let us annotate a type with the Copy trait if the type, or any of its parts, has implemented the Drop trait.

I see ‘to annotate’ here, which I think is meaning ‘to add’. My picture for traits and methods so far: Each type has some methods associated with it. They ‘annotate’ that type, that means provide functions to work with instances of the given type. It’s very close to ‘class’ thing but without classes itself. How annotation, method declaration, etc, is done I have no idea, but I already got strong feeling what’s it from usage side.

References & slices

Borrow was a very simple part of tutorial. There could be only one mutable reference or any amount of immutable. As I could understand, hard part would be to incorporate that simple idea into real code.

Slices, quite opposite, were hard. I’ll start from one of examples which failed to compile:

    let mut x = String::from("hello world !");
let mut z = &x;
let v : &str = z[1..];
println!("{}", v);
...error[E0308]: mismatched types
--> src/main.rs:7:20
|
7 | let v : &str = z[1..];
| ^^^^^^ expected &str, found str

Why? If I write:

    let mut x = String::from("hello world !");
let v : &str = &x[1..];
println!("{}", v);

it works! What’s the difference between let v : &str = &x[1..]; and let mut z = &x; let v : &str = z[1..];? I’ve changed code into:

    let mut x = String::from("hello world !");
let mut z: &str = &x;
let v : &str = z[1..];
println!("{}", v);

and it compiled. I feel that “reference” is not a type, but something different.

Let’s check it without involving slices.

But wait! I’ve done mistake and it worked! I put a &str instead of &String? What’s going on? .. and there were examples in chapter which takes &Stringand return &str… Anyway, let’s return to references first, then come back to slices.

Types of references

fn mock(foo: &String) -> &String {
return foo;
}
fn main() {
let mut x = String::from("hello world !");
let mut v = mock(&x);
println!("{}", v);
}

Examples above and below work as expected.

fn main() {
let mut x = String::from("hello world !");
let mut r = &x;
let mut v = mock(r);
println!("{}", v);
}

I’ve played with type of r:

let mut r: &String = &x; — works, let mut r: String = x;
let mut v = mock(&r);
works too.

Unexpectedly double reference (reference on reference) works like it’s a single reference:

    let mut x = String::from("hello world !");
let mut r: &String = &x;
let mut v = mock(&r);
println!("{}", v);

Proving the point:

fn mock(foo: &String) -> &String {
return foo;
}
fn main() {
let mut x = String::from("hello world !");
let mut v = mock(&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&x);
println!("{}", v);
}

It looks like reference to reference is an order-one reference. And reference¹⁵ is just a reference too.

So, reference is a property of type. There are two variants: type itself and reference to a given type. Simple.

So all my confusion was not due to references, but due to slices…

Slicing

Let’s return back to initial code with slices.

This code works:

    let mut x = String::from("hello world !");
let v : &str = &x[1..];
println!("{}", v);

slice clearly has type &str. Let’s check this explicitly:.

fn mock(foo: &str) {
}
fn main() {
let mut x = String::from("hello world !");
mock(x);
}

As expected, it does not compile: mismatched types.

Let’s pass some slice into mock.

fn mock(foo: &str) {
}
fn main() {
let mut x = String::from("hello world !");
mock(&x[1..2]);
}

It works. When I had forgot to put ‘&' in mock call, there were error message:

error[E0308]: mismatched types
--> src/main.rs:25:10
|
25 | mock(x[1..2]);
| ^^^^^^^ expected &str, found str
|

That strange… We got str??? Let’s check this.

fn mock(foo: str) {
}
fn main() {
let mut x = String::from(“hello world !”);
mock(x[1..2]);
}

It does not compile:

error[E0277]: the trait bound `str: std::marker::Sized` is not satisfied
--> src/main.rs:20:9
|
20 | fn mock(foo: str) {
| ^^^ the trait `std::marker::Sized` is not implemented for `str`

And that error is something I do not understand. ‘Trait bound is not satisfied’… Oh, man…

Ok, let’s take another approach:

let mut x = String::from("hello world !");
let mut y: str = x[1..2];

Again, same cryptic error:

error[E0277]: the trait bound `str: std::marker::Sized` is not satisfied
--> src/main.rs:25:9
|
25 | let mut y: str = x[1..2];
| ^^^^^ the trait `std::marker::Sized` is not implemented for `str`

I feel like whole String/str dichotomy would haunt me until resolved.

I’m at loss. Does slice return us str? Or &str? Why it return us &str instead of &String? And why I do it with references only, not with the actual type?

Crude local explanation:

  1. Accepts that &str is a basic type, not ‘reference to str’, at least, from position of importance and sanity of crude explanation.

Few more iteration of play. Can we have slice of str instead of slice of String? Yes, we can. Following code is compiling:

fn main() {
let mut x: &str = "hello world !";
let mut y: &str = &x[1..2];
}

But here it is again. If I remove & from &x[1..2], I’ll receive error: expected &str, found str. This is not true! I declared x as been a reference to str! (bad code below):

let mut x: &str = "hello world !";
let mut y: &str = x[1..2];

Hypothesis: & before x is not for ‘reference to x’, but for ‘take reference of returned value’. E.g. let mut y: &str = &x[1..2]; is actually let mut y: &str = &(x[1..2]);. Let’s check it. Indeed, it compiled, and even printed a proper value (when I added a println! for y).

That means that return value of slice operation is a str, not a &str. Unfortunately, I couldn’t test it as any attempt to assign str-typed value to variable or pass it to a function fails with the same cryptic message: the trait bound `str: std::marker::Sized` is not satisfied (the trait `std::marker::Sized` is not implemented for `str`).

My goal is to get ‘str’ out of slice to prove hypothesis. I tried with String::from but it need &str. I tried ‘==’ with other string. Got the trait `std::cmp::PartialEq<&str>` is not implemented for `str`. “foo”.cmp() says it need &str, not str.

I was unable to guess anything, and I looked into std:str. It was a bit disturbing as I have trouble to navigate that page. It used some kind of unexpectedly passive voice for ‘Required methods’. It sounds complicated and I feel lack of understanding for whole ‘trait/methods’ lingo for now. But I found function chars which … View the underlying data as a subslice of the original data. Subslice of slice? But it worked. Or, at least, compiled.

fn main() {
let mut x: &str = "hello world !";
let mut y = x[1..2].chars();
}

Unfortunately I wasn’t able to println! it. I tried to use bytes with count:

fn main() {
let x: &str = "hello world !";
let y = x[1..4].bytes().count();
println! ("{}", y);
}

And it printed a proper value. In the process I’ve lost type. I have no idea what is going on with bytes and it’s iterator, so I couldn’t prove my theory, but I stand for it — slice returns str type.

I feel very frustrated by lack of interactive introspection in Rust. I understand it’s compiling language, but I really got used to play with type and dir for any strange object I got in Python…

Meanwhile in the program above compiler warned me that mut is not necessary for both cases. It make sense to me, as I knew that slice is just a pointer in the middle of the string and it does not change anything. And it shouldn’t be changed either.

Moreover, as tutorial said in next paragraph, if I’ve made a slice out of string, it’s like I borrow it. Original string no longer could be mutated, therefore, any slice will be preserved intact.

… I read yet another paragraph and tutorial confirmed my hypothesis: slices return str’s. And they completely clarified all things around ‘str’ and constant strings in Rust. Yes, str is a very different from String, as it is immutable and useless, except for having a reference to itself. Slices need to be references (and become a &str) and hardcoded strings in binary have strtype too. Compiler uses some syntax sugar (or it’s something more deep?) to infer type &str for variable from statement let var = “some str”;, which implies: let var: &str = “some str”;.

Further description helped a lot. Slice is generalization of reference to some iterable value with added boundaries. It’s like a ‘loop’ statement. You can implement it by yourself with other means, but it’s so common (and is a so common source of mistakes), that implementing it in the language made writing much concise and error-prone.

Summary

Ownership/borrow/reference part was simple. Mostly because I knew a bit about it before start to learn Rust. Slices come to me as a big and unexpected surprise, which took me a bit to understand. str/String problem was entangled with slices and it took me some experimentation to grasp sense out of it. Now I understand how String is constructed and why language need special ‘str’ type — it covers very specific case of manipulation with strings. It enforces proper behavior through type system, which is a great achievement for any language, as it is a primary goal for types systems at first place. I dare to say that recognizing that read-only reference (with possible additional boundaries) to iterable is a separate THING which needs own type — is the greatest cool feature of Rust (from features I learned so far).

journey to rust

Side notes on learning Rust language

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store