After so many Java cups, I’m starting to feel Rusty (part II)

Miguel Rivero
Clarity AI Tech
Published in
11 min readJan 29, 2024

--

It only took me 3 years 😅 to find the time and motivation to write the second part of my first article about my current relationship (in constant evolution) with Java and Rust.

The main reason of taking so long is as usual, lack of time and having other priorities (a relocation with my family to another country and continent didn’t help) but also not knowing how to focus the article and what should be its main purpose.

After spending some time thinking about it, I decided that if I were a Java Developer out there, with zero previous contact with Rust, what I would probably find more useful is to have a clear example of how to translate my Java program to Rust code so I can get an idea of how it would look like.

In my experience, anything new is always easier to understand if I can associate it to a current reference, something that I already know and understand well.

So let’s see how we could make a simple program in Java, step by step, and at the same time let’s create that same program in Rust, trying to find the closest structures, expressions and idioms in both languages to make the codes the most comparable as possible.

The Problem to Solve

We’re going to use the New York City Open Data site to get a public CSV file containing the data for the Most Popular names of newborns in NYC during the years 2011–2019. The file has some errors and needs to be sanitized, but that’s not part of the problem to solve, so let’s assume that the data is correct (You can find the sanitized file in the Github repo for this article here).

The file has the following columns:

  • Year of birth
  • Gender
  • Ethnicity
  • Child’s First Name
  • Count
  • Rank

And what we want to find is:

  • First, the list of the most common name (Rank = 1) per year and ethnicity
  • Once we have that list, check if there were any names in the last 5 years (2015–2019) that were the most common for more than one year

Create the project and add dependencies

Let’s create the two projects skeleton and add our only dependency: a library to read the CSV file. We’ll use OpenCSV for Java and the CSV crate for Rust.

The standard build managers are Gradle for Java and Cargo for Rust. For Gradle, we can generate the project in many ways, but in the end the only important thing is that we end up with a build.gradle file similar to:

plugins {
id 'java'
}

group = 'org.my'
version = '1.0-SNAPSHOT'

repositories {
mavenCentral()
}

dependencies {
implementation 'com.opencsv:opencsv:5.8'
}

test {
useJUnitPlatform()
}

jar {
duplicatesStrategy = DuplicatesStrategy.EXCLUDE
manifest {
attributes "Main-Class": "org.my.Main"
}

from {
configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) }
}
}

And the equivalent cargo.toml for Rust:

[package]
name = "rust-project"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
csv = "1.3.0"
strum = "0.25"
strum_macros = "0.25"

# Remove symbols to reduce binary size. Resulting file won't contain debugging info
[profile.release]
strip = true

There’s an extra dependency for Rust, the Strum crate, that we’ll use to serialize Enums from Strings (something that we can do out of the box in Java). Also we can notice the “strip” compilation flag that will reduce dramatically the size of our binary, at the cost of removing all debugging info from it.

Create an enum to store the Ethnicity

The ethnicity field can only have 4 distinct values, so it makes sense for it to be an enum in both languages. We’ll parse directly the strings with the ethnicity values from the CSV file and convert them in our different enum values.

Our Java enum:

  private enum Ethnicity {
ASIAN_AND_PACIFIC_ISLANDER,
BLACK_NON_HISPANIC,
HISPANIC,
WHITE_NON_HISPANIC
}

And the Rust equivalent:

enum Ethnicity {
AsianAndPacificIslander,
BlackNonHispanic,
Hispanic,
WhiteNonHispanic
}

As we said, we want to create our enums directly parsing the strings in the CSV file. This is something we can do out-of-the-box in Java with the built-in method valueOf() . In Rust however, we need the help of an external lib to have the same functionality. We already added the strum lib in the cargo.toml file, so we can use the macros it offers to specify which string corresponds with each enum value:

use strum_macros::EnumString;

#[derive(EnumString)]
enum Ethnicity {
#[strum(serialize = "ASIAN AND PACIFIC ISLANDER")]
AsianAndPacificIslander,
#[strum(serialize = "BLACK NON HISPANIC")]
BlackNonHispanic,
#[strum(serialize = "HISPANIC")]
Hispanic,
#[strum(serialize = "WHITE NON HISPANIC")]
WhiteNonHispanic
}

A macro is not something we are used to see when coding in Java, but they’re quite common in Rust. They work more or less like the Java Annotations but in Rust they’re only evaluated in compile time and never in runtime. A macro generates extra code that extends the functionality we write.

Create a Most Popular Baby Name data structure

Java is an object oriented language by design so it’s all about classes and objects, Rust, on the other hand shares some of the concepts but is not object oriented from a traditional point of view, as it focuses on composition instead of inheritance (what is actually considered a good practice also in modern Java).

So let’s create the data structure we will use to solve our problem, one that make our logic easier. We want mainly two things:

  • Store the info of the most popular name for a given ethnicity and year
  • Construct that structure directly from a CSV row

So we can get our Most Popular Baby Names list with a single sequential read of our CSV file

In java we are going to use a record instead of a class, with the 3 attributes we need and a factory static method that creates a new instance given a CSV row split into its different columns (array of Strings). The factory method knows the format of each CSV line so it knows how to get the data for each attribute:

private record MostPopularBabyName(String name, Ethnicity ethnicity, int year) {
public static MostPopularBabyName fromCsvLine(String[] line) {
String name = line[3];
String ethnicity = line[2].replace(" ", "_");
String year = line[0];
return new MostPopularBabyName(name, Ethnicity.valueOf(ethnicity), Integer.parseInt(year));
}
}

The closest equivalent in Rust would be:

struct MostPopularBabyName {
name: String,
ethnicity: Ethnicity,
year: u16
}

impl MostPopularBabyName {
pub fn from_csv_line(line: &StringRecord) -> Self {
let name = &line[3];
let ethnicity_str = &line[2];
let ethnicity = Ethnicity::from_str(ethnicity_str).expect("Wrong ethnicity");
let year = line[0].parse::<u16>().expect("Year is not a correct u16 number");

let name = String::from(name);
return MostPopularBabyName{name, ethnicity, year};
}
}

Let’s focus on those things that looks different in order to understand how things works in Rust compared to Java:

First thing is that if we work with numbers Rust forces us to think about sizes in memory more than Java does. We need to decide and detail their size in bytes in compile time. That’s why we use u16 to declare an unsigned int of 16 bytes to save our year. It’s something similar to what you would do in Java, choosing short instead of int to store small numbers, but Rust makes you more conscious of the byte size at the time you’re creating your variables.

Second thing is that, same as C++, Rust separates the data structure from the logic, so we define a struct that just defines the data and then, outside the struct we define any implementation (impl) of methods for that structure. In this case is also a static factory method so it doesn’t actually access any internal info of a MostPopularBabyName object like the class-methods of Java do, but you could create methods that do so.

Third detail to mention are the several &symbols before some params and vars. Here is where Java and Rust start to diverge: we need to change how we think about the memory. In Java almost everything that is not a primitive type is a reference to an object somewhere in memory. In Rust if you need references you need to specify them. In this case with & we are creating a read-only reference to that value in memory. If we avoided the & symbol we would be changing the ownership of the value instead. This is part of the core of the Rust language, but could be complex to understand at the beginning, so by the moment let’s just say that if we want references like Java we need to remember to make them explicit.

And as a last interesting detail we could take a look to those calls to .expect() . This is actually kind of the equivalent to a try-catch block in Java. We’re telling Rust to check the result of the previous operation (parse a Ethnicity enum from a String or parse an int value from another String) and expect that the result will be OK. In case the result of the operation is an error it will stop the program and show a stack trace with the message that we provided as a parameter. If the operation was OK (as expected), it will return the parsed value and will continue normally.

Read the file and look for the most popular names

Now that we have our data structure ready, let’s use it to do the job of iterating over all the CSV lines and look for the entries we need, we are only interested on those rows where:

  • gender is “FEMALE”
  • rank is “1”
  • year is one of the last 5 years (2015–2019)

For each of those rows we will create a new object MostPopularBabyName with the name, Ethnicity and year of the baby.

Then, we will add that object to a map in which we will store the list of most common names per Ethnicity for the years we are interested in.

The code in Java to read the whole file and create that map of names per Ethnicity is:

Map<Ethnicity, List<MostPopularBabyName>> mostPopularNamesMap = new HashMap<>();

Set<String> lastYears = Set.of("2015", "2016", "2017", "2018", "2019");
try (Reader reader = Files.newBufferedReader(Path.of("Popular_Baby_Names_NYC.csv"))) {
try (CSVReader csvReader = new CSVReader(reader)) {
String[] line;
while ((line = csvReader.readNext()) != null) {
String gender = line[1];
String rank = line[5];
String year = line[0];
if(gender.equals("FEMALE") && rank.equals("1") && lastYears.contains(year)){
var mostPopularName = MostPopularBabyName.fromCsvLine(line);
mostPopularNamesMap
.computeIfAbsent(mostPopularName.ethnicity, (k) -> new ArrayList<>())
.add(mostPopularName);
}
}
}
catch(CsvValidationException e){
System.out.println("Couldn't parse lines of CSV file");
}
}
catch(IOException e){
System.out.println("Couldn't open CSV file");
}

And the equivalent code in Rust:

let mut most_popular_names_map: HashMap<Ethnicity, Vec<MostPopularBabyName>> = HashMap::new();

let last_years = HashSet::from(["2015", "2016", "2017", "2018", "2019"]);
let reader = csv::Reader::from_path("Popular_Baby_Names_NYC.csv");
for line in reader.expect("Couldn't open CSV file").records() {
let line = line.expect("Couldn't parse lines of CSV file");
let gender = &line[1];
let rank = &line[5];
let year = &line[0];
if gender == "FEMALE" && rank == "1" && last_years.contains(year) {
let most_popular_name = MostPopularBabyName::from_csv_line(&line);
most_popular_names_map
.entry(most_popular_name.ethnicity)
.or_default()
.push(most_popular_name);
}
}

The code is really similar this time, the only things to remark maybe is that we are again using read-only references with & to avoid duplicating those values in memory and the way we manage adding a new name to the list of names in the map:

  • In Java, if that Ethnicity doesn’t have yet a list of names we use computeIfAbsent()to create a new empty list before adding a new name
  • In Rust we have a method or_defaultin the map that will automatically create a new structure (a new vector, the equivalent to List) in case there’s no entry for that Ethnicity

In both cases, after the entry on the map is found (because it already existed or because a new list is created on the fly) the value is appended to the end of the list.

Find popular names for more than one year

Finally, the last part of our exercise consist in finding those names that were the most popular for a Ethnicity for more than one year between the years 2015-2019.

To do so we’re going to iterate over each entry (Ethnicity) of our map and will check for repeated names for that Ethnicity, and in case we find them we will print the years for each name.

This would be the code in Java to do it:

for (Map.Entry<Ethnicity, List<MostPopularBabyName>> entry : mostPopularNamesMap.entrySet()) {
System.out.println("\nChecking repeated names for ethnicity: " + entry.getKey());
var yearsPerNameMap = new HashMap<String, List<Integer>>();
for (var mostPopularName : entry.getValue()) {
yearsPerNameMap
.computeIfAbsent(mostPopularName.name, (k) -> new ArrayList<>())
.add(mostPopularName.year);
}
for (Map.Entry<String, List<Integer>> nameEntry : yearsPerNameMap.entrySet()) {
var name = nameEntry.getKey();
var years = nameEntry.getValue();
if (years.size() > 1) {
System.out.println("The name " + name + " was the most common in more than one year. Years: " + years);
}
}
}

And the equivalent in Rust:

for (ethnicity, most_popular_names_list) in most_popular_names_map.iter() {
println!("\nChecking repeated names for ethnicity {:?}: ", ethnicity);
let mut years_per_name_map: HashMap<&String, Vec<&u16>> = HashMap::new();
for most_popular_name in most_popular_names_list {
years_per_name_map
.entry(&most_popular_name.name)
.or_default()
.push(&most_popular_name.year);
}

for (name, years) in years_per_name_map {
if years.len() > 1 {
println!("The name {} was the most common in more than one year. Years: {:?}", name, years);
}
}
}

In a similar way to the previous section, we create a new map to help us , in this case it will have an entry per name and the list of years in which that name was the most popular, so in case that list has more than one year we will print it and show the years, solving the initial problem we defined at the beginning of this article.

Besides of the different idioms to do the same things in each language (the way to iterate over the key+value entries of a map, for instance), maybe the most interesting points to explain about the previous Rust code would be:

  • The mut keyword: In Rust every var is immutable by default, but in this case we want to add entries to our map, so we need to define it as mutable
  • The weird {:?} inside the println! macro argument: This would be kind of the equivalent of a default toString() method for any given object, so println can do a “pretty print” of any data structure. For this to work in practice we need to use the procedural macro derive(Debug) in our enum, so Rust will automatically examine its structure and generate the internal methods to show its contents as a String.
#[derive(Debug)] // Required to automatically format the value of this enum with println
enum Ethnicity {

Wrapping Up

And that’s all! Of course learning Rust properly implies way more than reading a couple of articles and tutorials, but I hope that if you consider yourself a Java Developer like I do you will find this article interesting enough to give Rust a try and start reading more about it! If you want to start learning Rust I really recommend this O’Reilly book.

I didn’t included the solution to the problem on purpose (which names are the most common ? Are there actually any repeated names?) so if you’re interested in finding the solution you will have to get the code from this Github repo and run it to finally know that and also to check if the Java and Rust implementations really give the same results or not ;)

And also, aren’t you curious to know which of the two implementation is faster or consume less resources?

Again, you will have to do some compiling by yourself to find that ^ ^

Bonus Track

The main purpose of this article was to illustrate the basic differences of the two languages so one of the goals was to use as less features as possible to keep the code simple and easy to follow.

If you still want further comparisons between Java and Rust you can find an alternative functional Java solution to the problem that uses streams and collectors (thanks Jose Ignacio Dominguez for it!) and then find here the corresponding solution that exemplifies how to use a similar approach in Rust (with iterators and mappers).

--

--

Miguel Rivero
Clarity AI Tech

Trying to improve World’s sustainability, starting with finances