The Fundamentals of Rust

Make yourself explicit.

As this essay collection progresses beyond the Automunge project, am trying to avoid making it some kind of vanity project. The goal for some time now has been to contribute to field of computer science, to add value through whatever channels become available. Part of the risk of dedicated focus in narrow subdomains like tabular preprocessing is that the work can lose relevance to more important advanced practices that lately have been extended well beyond the conventions of vanilla supervised learning. In the interest of avoiding tunnel vision, have thus recently begun a project of new language exploration, which will likely continue going forward on some recurring frequency like yearly or thereabouts. Supplemented with following the proceedings of a few research conferences I find makes keeping up with trends in modern practice feasible at least at a high level.

It was thus that I recently attempted a deep dive into the Rust programming language. The applicability of the Rust language to the field of data science is perhaps not an obvious one. Rust is a newer language that is still following an iterative update cycle of its own, with rollouts every 6 weeks as the API conventions are extended into nooks and cranny’s of various workflows explored by what I understand to be a welcoming and supportive developer community. The language is perhaps best thought of as solving a few issues associated with lower level languages like C or C++, as the integration of universally explicit variable type declarations makes for a speedy solution nearly on par with C while resolving a few security issues from channels like memory leakage that have been a recurring headache throughout the decades long history of C programming environments. Data science libraries in many cases are built in C++ for speed with python wrappers for interface; someday Rust may become an alternate convention for such infrastructure.

The explicit type requirements are perhaps the most noticeable distinction, although some other fundamental conventions to the language include the packaging of modules into “crate” containers and the “Cargo” interpreter itself. One of the benefits of the type conventions are that it becomes easier to consider and iterate on segments of code modules even independent of their context to the surrounding architecture, as each function becomes interpretable as an isolated system. A tradeoff is that the density of idiomatic code representation loses some of the readability that one might enjoy with a (much slower) language like python. Fortunately common IDE (integrated development environment) platforms enjoy autocomplete functionality for many scenarios of type specification that make the practice of writing at least a little easier. The added benefit of intelligent bug reporting (they call him “Clippy”) makes the experience more manageable than one may expect from simply viewing the resulting dense bracketed lines of code.

This document will mainly focus on fundamental conventions of the language by demonstration, and may be thought of as a kind of “cheat sheet” for API conventions of the Rust programming language. Although my Rust deep dive included a host of online courses in the Udemy platform (hey they were on sale), the coding demonstrations below owe a disproportionate debt of gratitude to the course Rust lang. the complete beginner’s guide from instructor Catalin Stefan, as many of the coding demonstrations below were adapted from my lecture notes (with the instructor’s permission). Please consider this a strong endorsement for the course which I will link to again at completion of this essay. Note that if you are looking for a simple solution for a pre-built Rust coding environment, probably the easiest way to get started is to check out the Repl.it online portal for a browser based IDE. Cool so without further ado let’s dive into Rust.

Comments

Idiomatic Conventions

Some fundamentals of writing code in Rust is that nearly every line that doesn’t end in a closing bracket } will require a closing semicolon ; which signals the end of a statement. Don’t worry if you forget a semicolon the troubleshooting printouts will probably catch it when you try to run the code. Nearly all functions, modules, etc. have scope delineated by opening and closing curly brackets { }. A helpful way to keep track of which scope a line segment belongs to is to follow line indentation conventions similar to what you may be used to in python — such indentations although idiomatic for readability purposes are not needed for operation and the curly brackets boundaries are the governing signal. The “Cargo” code editor will have a file system that at a minimum includes a main.rs file and within that file a main() function. When we run code the governing flow is based on this main() function, which we’ll discuss further below. (Some of our code demonstrations may omit the main() function wrapper for readability.)

Printouts

Variables

Primitive Data Types

  • integers: i8 / i16 / i32 / i64 / isize
  • unsigned integers: u8 / u16 / u32 / u64 / u128 / usize
  • float: f32 / f64
  • Boolean: bool (true / false)
  • Character: char (a single character string or emoji)
  • String: &str
  • Static string: &'static str (this relates to advanced memory management stuff)

Strings

Constants

Constants are variables that cannot be changed, even in cases of “shadowing”. The convention is that constants are named in all caps, and require explicit data type specification. Constants have durability within some defined bracketed scope { }. For example a constant defined in a function is only accessible within the brackets of that function. We’ll note further conventions for global variables below.

Mathematic Operators

main.rs

When running any kind of operation in rust, the master procedure is defined through the main() function located in the main.rs file. For example here is the hello world convention that could be populated in a main.rs file and then run.

Here is an alternate hello world convention where we call a function for the print statement inside the main() function.

In a more elaborate hello world convention, we could pass a variable pointer to the print statement function for integration into the outputted string.

In an alternate convention we may desire to pass a reference to an actual variable instead of a pointer to a variable. In this case we may want to make the variable mutable with mut and when we edit that variable would include the * prefix. This would result in editing the state of the variable even external to the scope of the function. More on these concepts will be discussed in the memory management section further below.

Extending even further our demonstrated hello world functions, we could frame it so that the function includes a return statement. There are two equivalent ways to specify a return statement, one is to state return variable; (with included semicolon), the other is to just state the variable in last line of the function variable (without a semicolon). We’ll also need to specify the returned type with the -> statement.

Modules

The main.rs file where we define our main() function is a kind of module. Just like variables have retention limited to the bracketed scope where they are defined, functions and other data structures are kept to the scope of their module unless they are explicitly imported into another module.

Here is an example of importing an entire module we’ll call module_name.rs into the primary module main.rs.

It is also possible to define a separate module within the same main.rs file, in this scenario will need to explicitly make the functions internal to the module public via pub so that they are accessible within the main() function.

Modules can even be nested within other modules, then when calling the function just call path of modules like so: top_tier::internal_tier::function(); .

Crates

Ok this is where the Rust terminology gets a little less precise so hope you can forgive my attempt to translate. We referred to the .rs files in preceding section as modules. We can also think of the .rs file as a kind of “crate”, which is a Rust concept. Note that multiple modules can be grouped into a crate. To be slightly more precise, there are actually two kinds of crates: binary crates and library crates. A binary crate has an “entry point” to functionality, a library crate on the other hand imports modules directly into your local state (and can even be downloaded from an external resource). The crates are managed by the “Cargo” IDE.

If we want to create a crate, we could create a new crate_name.rs file, create a pub mod mod_name within the .rs file, create any pub fn fn_name() within that mod that you want, and then in the main.rs at the start of file could run:

It is also possible to import external crates shared by other developers. In order to do so requires editing an additional .toml file located in the Cargo file system (which is sort of similar to a python package’s __init__ file). Basically this means adding a dependency to the toml file with some version number (e.g. version number “0.5.3”).

[dependencies]external_crate = “0.5.3”

Then when compiling we could access some module from that external crate in main.rs with a use statement.

Structured Sets

Structured sets refer to data structures that may encapsulate multiple entries, like a vector or an array. There are some key distinctions with how they are accessed from memory or edited which we’ll go into more detail shortly.

  1. Arrays
  2. Vectors
  3. Slices
  4. Tuples
  5. Structures
  6. Enums
  7. Generics

An advanced concept not addressed by this writeup is associated with “Traits” that can be used to formalize interfaces of what can be done by different kinds of structures. Each one of these structures relies on different kinds of traits defined under the hood.

1. Arrays

The Rust conventions have a few distinctions verses what you may be used to from working with Numpy arrays in python. While they share the convention that the entries to an array must all be of the same type, in Rust once you initialize an array the number of entries is fixed. While the elements can later be modified, their reserved address in the array cannot be deleted.

Note that when trying to print an array (and several other of the structured types), println! may not have native support for that type, which can be circumvented by using {:?} within the print statement or with a for loop.

When initializing an array, if we don’t yet know all of the values but know what size we want, we could initialize a mutable array with some default values, and then later update specific entries by address (where a first element has address 0).

2. Vectors

A vector is similar to an array but allows for a variable number of entries. Here are a few example operations on a vector.

3. Slices

A slice is a pointer to a block of memory containing some subset of entries in a structured data type. It can be used on arrays, vectors, and strings. A slice can optionally be made mutable.

4. Tuples

Tuples are an extension of arrays that allow you to store multiple data types, thus they have a fixed size and editable elements. Importantly, by convention they are limited in size to 12 elements. Accessing the elements in a tuple by address is also different than arrays, using a tuple.address convention instead of tuple[address]. “Destructuring” a tuple refers to taking elements and passing them each to a distinct variable.

5. Structures

Structures remind me of the python dictionary type as they aggregate a collection of key value pairs. A distinction though is that the key is a named variable instead of an e.g. string or integer as would be in python, and like most Rust objects that variable requires a type specification. Note the distinction of separating entries’ lines with a comma instead of semicolon, that is because the collection of entries could be considered a single line, this way of breaking into multiple lines is more idiomatic. (Also note that to add printing support with println!(), the statement #[derive(Debug)] can be added preceding the initialization.)

A neat bit of advanced functionality for structures is that you can add methods, like integrated functions that may take as parameters the entries of a structure’s instance. They are initialized a little differently than the entries (using impl keyword) and later can be accessed using a double colon notation structure::function.

6. Enums

An enum is kind of an alternative to structures with some different conventions, where they contain a collection of values which are potentially variables without defined types.

One use case is that an enum, once initialized, can have values imported from a crate into main.rs.

It is also possible to add data types to enum elements, the convention is a little different than usual though.

7. Generics

If we have variables where we want flexibility on the data type, generics allow deviations on data types within the structure.

Control Structures

Control structures resemble common conventions from other languages and include:

  • If statement
  • Match statement
  • Pattern matching
  • For loop
  • While loop

If statements if may either only specify a result associated with a passed logic test or may have one or more else if statements inspected when preceding tests are false. A final else statement may be used to specify a scenario when all preceding logic tests were false.

Match statements are sort of similar to what some other languages may refer to as “when” or “switch”. Here we demonstrate using a match statement to print a string of a city name based on a received integer area code.

Pattern Matching resembles match statements but instead of matching specific scenarios perform matching to logic tests. The conventions are very similar to match statement demonstration, we just replace the value scenario with a condition scenario.

Examples of patterns:- multiple values: 123 | 321 | 234- Ranges (shown here >0 and <= 3): 0..=3- Conditionals e.g. x if { x > b }

For Loops are similar to conventions in other languages. One can loop through ranged integers e.g. 0..=5, or one could loop through the entries of an array using .iter(). One could use a break statement to exit a loop based on a condition, one can also use a continue statement to halt the current iteration and then proceed through the rest of the loop. Here is an example including each of these scenarios.

While Loops will run until a condition is met (need to be careful of infinite loops of course). Can use continue or break statements just like in for loops.

while condition {
operations;
}

A similar convention to while loops is simply a “loop” loop which can be specified without a halting condition by relying on an included break statement.

Functions

The tutorial addressed a few topics related to function definitions:

  • functions and scope
  • closures
  • higher order functions
  • macros

The tricky part is associated with understanding where and when parameter variables need & types or &mut types or when a variable assignment needs a * preceding the variable.

Here are two examples of functions that in first case edits a mutable string internal to a function without a return statement, and in the second case populates a return value used to initialize an external variable in main().

Because there are no memory leaks, there is no need to manually deallocate variables, which is an important differentiator for rust. In other words, the scope found within a set of curly brackets { } is independent of external to curly brackets. If we initialize a variable in one scope it won’t exist when we exit that scope unless it is explicitly passed in some fashion.

While there are options available for global variables (those defined external to some specific scope) they are less safe than locally scoped ones. If you want to use a global scoped variable in a local scope, you should encapsulate in an unsafe{ } scope.

Closures are similar to a python lambda function in case you are familiar. The tutorial referred to them as “a function within a function”. In other languages they may be referred to as an anonymous function, or lambda expressions. We define a closure with straight bars containing used variables. In other words, they are functions only used in place without taking up the naming space. Closures can have explicit variable types or it is also possible to allow the type to be inferred at runtime, which we call a generic closure.

There are several types of more advanced concepts supported in function definitions. Higher order functions refer to cases where a function can take another function as a parameter. A macro is a way to use input expressions to output executed code, which are the conventions applied in what we have already seen for print statements via println! or string composition macros like format!.

Memory Management

Memory management is at the foundation of what makes Rust unique. The simple convention is that only one variable can own a piece of memory at a time. An important distinction is present in transfer of ownership properties between primitive data types (like strings or floats) verses higher order data structures (like arrays or vectors), in that primitive data types are transferred to a new variable by copying the original, while with higher order data types transfer of ownership is by memory transfer instead of copying.

Even though only one variable can own a piece of memory at a time, the & symbol can be used to borrow ownership of another variable’s memory. The ownership is returned at conclusion of the associated scope brackets.

For mutability support of the borrowed version, the reference has to match the mutability of the original, where for a mutable reference one can use &mut instead of &. (Note that you can’t borrow a mutable and unmutable from same variable at the same time.) Importantly, when you try to use the variable with the borrowing reference, you need a * prefix. (The point of the * character is that without it included the second variable is pointing at the first variable, with it included the second variable is pointing at the first variable’s value.)

Conclusion

For further resources related to Rust there are several excellent courses available on Udemy. A course that was a particular resource for this writeup was Rust lang. the complete beginner’s guide from instructor Catalin Stefan, please consider this a strong endorsement.

There is also a canonical reference text endorsed by Rust governance that may serve as a more formal reference, referring to The Rust Programming Language by Steve Klabnik and Carol Nichols.

Your best bet for quickly getting started is to take a look at the repl.it platform for browser based code editing where you could experiment with deviations on these examples and compose some of your own.

For these open source projects to work requires a community of developers, each looking to help others out for the certain need of reciprocal treatment down the road. That help can be by answering questions on Stack Overflow, publishing crates of your own custom modules, or heck even an occasional Medium post here and there. What you give is what you get.

Brokedown Palace — Grateful Dead

References

Stefan C. (2022, June). Rust lang: the complete beginner’s guide. Udemy Course. URL: https://www.udemy.com/course/rustaceans/.

For more essays please check out my Table of Contents, Book Recommendations and Music Recommendations.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nicholas Teague

Nicholas Teague

Writing for fun and because it helps me organize my thoughts. I also write software to prepare data for machine learning at automunge.com. Consistently unique.