Collections

George Shuklin
Sep 10, 2017 · 11 min read

Data types are the most loved and empowering part of any language. Even some assemblers provide scaffolding over memory bytes to have types imitation. A language uses data types as a way to restrict some unwanted operations for given data. A programmer uses data types to clarify code meaning. Compare to textual ways (variables/functions names, comments), data types are much better at keeping promises, as those promises are enforced by language itself.

Small step back

I started to read the book, chapter Common Collections with big excitement.

… but as soon as I’d started to read, I realized I forgot few things about Rust. How tuples and arrays are defined? I looked back: tuples declared by enumerating their types in braces:

let mytuple: (u8, &str) = (0, "foo");

Same goes for arrays:

let myarray: [u8: 2] = [4, 2];

Step forward

After refreshing my memory I proceed further. As usual, I found slightly annoying to read references to future lessons. Vec<T> uses generics, I have value understanding what’s this, but it’s already chapter 8, and I they are still vague…

What I have learned: Vec is a classic array with automatic resize (and few nuances, see below).

Vec declaration is straightforward, Vec<type>, and, as I understand, its initial size is zero. I want to check this.

let mut v: Vec<i8> = Vec::new();

…My first question here would be: why Vec is a type when Vec::new() implies namespace? It looks like there is impl behind a curtain, like in the enum case. I think all data types in Rust use impl for them.

I continued:

let mut v: Vec<i8> = Vec::new();
v[0] = 0;

thread ‘main’ panicked at ‘index out of bounds: the len is 0 but the index is 0’, src/libcollections/vec.rs:1497

Yep, as I expected. And println!(“{}”, v.len()); confirms that size is 0.

(those naive experiments may look silly, but they are key thing in my learning — to make a simple experiment one need to construct it, therefore, learning how to write and use).

My next experiment was bit more complicated. I suspected it would fail on ownership, but I wasn’t sure.

let mut v: Vec<i8> = vec![1,2];
v.push(v[0]);

--> src/main.rs:3:12
|
3 | v.push(v[0]);
| - ^ - mutable borrow ends here
| | |
| | immutable borrow occurs here
| mutable borrow occurs here

How can I fix this? I couldn’t say ‘borrow’, as it already is (error message confirm this). v[0] automatically uses immutable borrow. I’ll try to use a temporary variable. I can copy i8 as it has fixed size known to the compiler.

Yep, it worked:

let mut v: Vec<i8> = vec![1,2];
let take: i8 = v[0];
v.push(take);

I kinda understand it, but I have a deeper question: Does Rust have some construction to avoid that redundancy? I couldn't invent any yet, so I asked SO… An answer came negative: You cannot until non-lexical lifetimes are implemented. I take this as a sign that Rust is standing a bit too deep into the safer side, so it rejects some reasonably safe operations, as it can’t distinct between ‘reasonably safe’ and ‘unsafe’ in this area yet.

Next example from the book:

let v = vec![1, 2, 3, 4, 5];
let third: &8 = &v[2];

…I’ve tried to ‘put it back’:

v.push(third);

but failed: expected i8, found &i8.

With a bit of thinking I was able to get proper ownership error:

v.push(*third);

But I do not remember if * was discussed or not. I invented it as ‘C-style’ dereference (“take value by pointer”), because & resemble “take a reference”. But that was my wild guess.

I’ve checked chapter on ownership/borrow: Nah, they didn’t talk about it. So it’s my ‘intuitive’ invention which just worked. I give Rust kudos for this, but what is ‘*’ in Rust? It was discussed in older versions of Rust book, but newer one does not talk about ‘dereference’ operation. I’m kinda lost here.

There are two ways to access Vec elements. [] and .get, former cause runtime error in case of asking for an non-existing element, later returns Options enum with status and result. Very boring from Python point of view. Yes, there is an IndexError if you ask for impossible, and there is a .get() method (Actually, Python has no .get() method for lists, it’s implemented only for dicts). But from C point of view, application, capable to detect out of boundaries request is a Big Deal for the sake of CVE reduction. And our standard array has both ways to access: with an error status and with automatic exception … I mean panic. Just great.

I think, those obvious nice features of the modern language are the most lacking in C. You need C to make a real code, but you don’t want to have C-style Darvin prise. Rust looks promising…

Enum Vec combination

Book showed me a combination of Enum and Vec. It looked fun, but I couldn’t believe that they can handle Enum inside Vec without storing type marker somewhere in memory. No way.

Let me try to break it. I’ll base my attempts on their example of ‘int/string/float’ enum inside Vec.

enum E {
Int(i32),
Float(f64),
Text(String),
}
fn messup(input: E){
match input{
E::Int(a) => {println!("{}", a)},
E::Float(_) => {},
E::Text(_) => {}
}
}
fn main() {
let example = vec![
E::Int(3),
E::Text(String::from("blue")),
E::Float(10.12),
];
messup(example[0]);
}

error[E0507]: cannot move out of indexed content
--> src/main.rs:22:10
|
22 | messup(example[0]);
| ^^^^^^^^^^ cannot move out of indexed content
error: aborting due to previous error

I have no idea what that means. And it’s not like I’m jumping ahead of content in the chapter. It stopped discussing Vec and moved to String.

I found this SO question, and it reasonable. The issue is not where I expected it. [0] just moves ownership from example to messup function. And this couldn’t be done, therefore, error. Important lesson learned.

I took in account borrow issues. New code:

enum E {
Int(i32),
Float(f64),
Text(String),
}
fn messup(input: &E) -> i32 {
match *input{
E::Int(_) => {return 32;},
E::Float(_) => {return 42;},
E::Text(_) => {return 0;}
}
}
fn main() {let example = vec![
E::Int(3),
E::Text(String::from("blue")),
E::Float(10.12),
];
println!("{}, {}, {}", messup(&example[0]), messup(&example[1]), messup(&example[2]))
}

It worked!!! cargo run returned 32, 0, 42. Exactly as it should, but there is a huge problem. I have no idea how it could possibly be worked. How Rust knew what type is within enum at any given time. Does it store some ‘type signature’ inside enum (some value to indicate type)? I could search answer, but I’ll try to check it manually. As usual, I’ll use memory consumption as an indicator. Vec should be stored as array, therefore, it should have o(1) memory overhead. I’ll try to store a lot of values and will check how much memory was consumed. I’ll divide additional memory consumption over element count to get the size of each element.

While I played with that code, I was able to cause Rust to make something strange:

fatal runtime error: out of memory
Illegal instruction (core dumped)

Out of memory is fine, but ‘illegal instruction’? Reddit thread kinda gave me an idea that this is a bug in the compiler, not in my code. I reported a bug, but I have some doubts. Illegal instruction was caused by ‘UD2’ assembler instruction, which is designed to call invalid opcode exception. Why they do this? I’ll wait for my bugreport reply. (few days later) And it comes back! You can read the full answer in the bugreport (link above), and here is important part: Vec intentionally cause illegal instruction error on memory errors. That means if allocation fails, the whole application fails. I think this is an important property. Contrary, Python raises MemoryError exception which can be handled. This does not save anyone from OOM killer if it comes, but on rare occasions of hungry allocation, MemoryError may be raised. I’m slightly letdowned by ‘panic on memory errors’ logic… I’ll repeat this finding: If Vec receives memory error, it panics. This is by design.

Meanwhile, I’ve returned to the original problem: how Rust knew type variant in Enum? I’ve lowered loop count and for 10000000 cycles (2 entries per cycle) of Vec of Enum :

enum E {
Signed(i32),
Unsigned(u32)
}

it changed RssAnon from 144 to 156464kb. It’s 8 bytes per entry. But I said it is i/u32, so it should be 4. Wait here! I don’t like this.

I changed code to make it just a Vec for i32’s. Math says that each i32 is precisely 3.9989248 bytes in size (call now to get this amazing discount on your ints!). I’ve changed it to Vec<Enum> of one element. 4 bytes per element. I made Vec<Enum> of three elements. 8 bytes.

Mystery. I need to see it by myself. I changed my script to fill Vec with pairs visually distinctive numbers.

enum E {
Signed(i32),
Signed2(i32),
Signed3(i32)
}
...
example.push(E::Signed(0x60616263));
example.push(E::Signed3(0x90919293));

I dumped process memory. Hexdump for region with anon rw-p memory is self-descriptive:

00000000  00 00 00 00 63 62 61 60  02 00 00 00 93 92 91 90  
00000010 00 00 00 00 63 62 61 60 02 00 00 00 93 92 91 90
00000020 00 00 00 00 63 62 61 60 02 00 00 00 93 92 91 90

63 62 61 60 is the first stored number, 93 92 91 90 is the second. Numbers 00 00 00 00 and 02 00 00 00 (which are actually 0x0 and 0x2) are number of variant in enum. Therefore, Rust stores runtime information about data types in Enum. Enum with a single variant probably was optimized, but Enum with few data types variants simply stores that information inside structure.

It’s the closest analog on C would be:

struct Enum {
int type;
union {
int variant1;
char* variant2;
char variant3;
}

Mystery solved, but solution left a slight aftertaste. I expected to find miracle, match, operating with compile-only information. This is obviously impossible, yet I’m disappointed (I want Rust miracles!).

I have one more small inquiry before moving to String chapter. If I use i64 as one of the elements of enum, how large hidden type variable would be?

With i64 as an element of enum, the whole enum becomes 16 bytes in size. With 8 bytes perf i64 type, it makes ‘type’ whopping 8 bytes in length… I knew it was due to padding, but still… Yet another question I have: what is real size of ‘hidden type’ in Rust? Yet another test gives an answer: with only i8 types in enum, it’s 2 bytes in size. That means one byte ‘hidden type’. …Can I have more than 256 types in one enum? I generated code to have ~300 i8 values in enum, and enum size magically grew to 4 bytes (1 byte for value, 2 bytes for ‘hidden type’ and one byte of padding). That means that in i64 test there was 1 byte for type, 8 bytes for data and 7 bytes of padding. The key finding here is that Rust can handle an arbitrary number of types in Enum and will use reasonably minimal memory for this.

Option size

A bit later I’d heard that Option for pointer takes no additional space, by clever use of null (under the hood). I tested this statement. Unfortunately, I don’t know anything about lifetimes yet, so I can’t return Option<&type> from the function. At the same time, some daring wild attempt to use pointers (which I don’t know either) said that Option<const *str> consume 24 bytes per element. I suspect that I did something incorrectly, or, may be, that statement about ‘Option/Null’ thing is false. On the normal case (Option<i32>) element size was 8 bytes, which stands for expected (one byte for type, three bytes for padding, four bytes for value).

Strings

Strings are complicated. Everyone trying to implement them will be stuck between convenience and non-contradiction. The more convenient strings are, the more bizarre contradictions are. Fewer contradictions mean less convenience.

Upon reading string chapter I noted to myself ‘push’ operator, which allow adding characters. Python has no idea of a character, everything is a string, long or short. This causes occasional ‘oops’es as some operators are ambiguous in this notation (the most famous is in operator, which could find a substring in a string, but couldn’t find a sublist in a list).

Moving on. Plus operator is a disaster. It takes two strings, one normal string (which had it ownership transferred) and one string slice. No commutative property whatsoever.

But those were small nuances. Type coercion appeared suddenly. It is a big, big issue. I clearly remember me been bitten few times by C automatic coercion from int to float. I hate automatic coercion. It’s nice when it’s right, and it’s nasty when it’s wrong. It’s hard to debug and hard to find. Book moved a discussion on automatic coercion to chapter 15 (almost the end of the book), and I couldn’t say I look forward… I had hoped Rust may avoid this. Nope.

Next note on fprintf and vspnprintf, or, precisely, lack of. All that printf-madness went away! I worked on Russian Wiki page on printf few years ago, and I clearly remember how uncanny it was. Just look at vspnprintf: easy to remember, easy to use, easy to write, no more vulnerabilities at this time again. And %n… people were mad in the prehistoric age.

Rust provides println!, format! and it looks great.

But the rest of string operations in Rust (noted in the book’s chapter) are the complete disappointment. Rust strictly stands on a position of ‘better safe than sorry’, so strings lost almost all possible operations. No normal slices, no indexing… It looks more like byte operations with additional random panic on ‘bad combinations’ of index and string content.

Hashes AKA dicts

I find python’s term ‘dictionary’ to be an excellent name for key-value storage. When you call something ‘hash map’ you insist on using hash function. When you call something a dict, you don’t care if it hash map or any other trick. But Rust sticks to hash maps.

And the Rust book becomes confusing at this point. Hashmap chapter gives a lot of code without explanation, and available explanation is not keeping the narrative.

It’s more like a reference now, so I’m forced to play a lot with hashes by myself.

The first interesting observation:

fn main() {
use std::collections::HashMap;
let mut scores = HashMap::new();
}

gives us an error: cannot infer type for `K`. But this code works fine:

fn main() {
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
}

That means that Rust can do type inference based on few lines. It’s may sounds as a small thing, but it’s not. It uses types from other parts of the program to make a guess.

An attempt to use different values as keys fails, as expected:

7 | scores.insert(10, 10);
| ^^ expected struct `std::string::String`, found integral variable

Enum cause a storm of errors:

error[E0277]: the trait bound `Example: std::cmp::Eq` is not satisfied

11 | scores.insert(Example::i(4), 10);
| ^^^^^^ the trait `std::cmp::Eq` is not implemented for `Example`
error[E0277]: the trait bound `Example: std::hash::Hash` is not satisfied
--> src/main.rs:11:12
|
11 | scores.insert(Example::i(4), 10);
| ^^^^^^ the trait `std::hash::Hash` is not implemented for `Example`
error[E0277]: the trait bound `Example: std::cmp::Eq` is not satisfied
--> src/main.rs:10:22
|
10 | let mut scores = HashMap::new();
| ^^^^^^^^^^^^ the trait `std::cmp::Eq` is not implemented for `Example`
|
= note: required by `<std::collections::HashMap<K, V>>::new`
error[E0277]: the trait bound `Example: std::hash::Hash` is not satisfied
--> src/main.rs:10:22
|
10 | let mut scores = HashMap::new();
| ^^^^^^^^^^^^ the trait `std::hash::Hash` is not implemented for `Example`
|
= note: required by `<std::collections::HashMap<K, V>>::new`

Wooh. One small error for human, one big list of errors for compiler…

As I understand, it says that I need to implement my own hash function for an composite (enum) type. Funny enough, Rust can candle ‘single type enum’ at memory allocation step (no memory will be used for type as enum has one variant), but couldn’t do this for hash part. If enum has one type inside, it’s obvious that hash of enum is the same as hash for that single type inside.

Hash has .entry function which looks strange.

scores = HashMap::new();
scores.insert(String::from("Blue"), 10);

scores.entry(String::from("Yellow")).or_insert(50);
scores.entry(String::from("Blue")).or_insert(50);

I found strange that we have something taken from scores which can modify values inside scores. I can guess how it was implemented, but I find this notation awkward.

They noted asterisk operator (*) without explaining it. I played with it by myself, but they gave yet another big piece of language without explanation. I don’t like it.

Conclusion

My excitement on data types was preliminary. Rust datatypes so far are less expressive than python’s. I played with Enum memory footprint and I found that there is no magic — variable algebraic type is stored as runtime variable.

Vec is good, String and Hash are less impressive than I expected.

Maxim ‘memory allocation may not fail’ is worrying me. No exception handling, no options. No memory? Segmentation fault.

Overall, I start to lose patience, as I want to write some code, yet I find actual Rust examples been too cryptic. I need to learn more before I can try something real.

journey to rust

Side notes on learning Rust language

George Shuklin

Written by

I work at Servers.com, most of my stories are about Ansible, Ceph, Python, Openstack and Linux.

journey to rust

Side notes on learning Rust language

George Shuklin

Written by

I work at Servers.com, most of my stories are about Ansible, Ceph, Python, Openstack and Linux.

journey to rust

Side notes on learning Rust language

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store