Exploring layered configuration and concurrency with Rust

10 min readJan 29, 2023

How did I get here?

Till recent — most of projects I was involved into had their backends’ code written in C++/Java/Go and with growing emphasis on moving to Microservices/Kubernetes ecosystem/patterns. But one never stops exploring other alternatives and so about 1 month ago I’ve started to wonder of what could be done with Rust programming language/tooling (which initially appeared in few posts within my IT feed of interest) in that regards. And if it easy/hard to have some final non-fat executable (and I wanted it to be statically linked to eliminate any dependencies on shared libraries) to target several 64-bit architectures, which were x86_64 and aarch64 in my case. And so my Rust adventure has started (as well as return to technical articles writing).

Source code

Could be downloaded/cloned from: https://github.com/andrejusc/rust-example-config-concurrency

Setup of my environment

For quite a while on Intel/AMD based laptops (hoping folks could adopt below content for Macs with non-Intel CPU as well) I prefer to run various virtual machines (VM) under VirtualBox virtualization. So those VMs could be easily adjusted in terms of virtual memory/cpu/shared folders/etc. needs and cloned/updated/destroyed as needed. Similar approach I took while starting experimenting with Rust and such got me to this configuration:

VM running Ubuntu 22.04.1 LTS with Desktop option (Xubuntu distro) under VirtualBox 7.0.2 on host Windows 11 Pro with Intel i7 64-bit processor. I’ve assigned to it 2 CPUs, 8GB RAM and dynamic 100GB disk.
Inside VM — Rust of version for rustc 1.67.0 (fc594f156 2023–01–24)
Inside VM — Visual Studio Code IDE v1.74.2

Objectives to explore with Rust

From previous deep diving into/development with Go based microservices and various aspects of 12 factor application — I wanted to explore such objectives here without yet producing fully workable microservice and some Docker image for it (potentially will describe that aspect in some follow up article):

How to achieve layered configuration via YAML files in a way that there is some base configuration and then could be another one, which compliments/overrides some values in base?
How to support concept of different possible runtime environments’ roles, where role would mean Dev (for development) or QA (for testing) or any other from some defined DevOps Infrastructure and so only some subset of configuration values could be different from base one to suit their specifics?
How to externalize configuration, so it’s outside executable and could be plugged into at startup?
How to have easy adjustable structured logging in JSON format, which output could fit into some log analysis tooling?
How to make for example up to 10 concurrent REST API GET asynchronous calls to some https:// endpoint(s) with maximum calling capacity at given point in time as 5 (so not to blow up either client or server with heavy load)?

Project code structure

Before going further — below is presented project’s tree structure, where Rust cargo was initialized within service folder and env folder was made outside of src folder (see more in Layered configuration part below):

$ tree -a -I 'target|\.git'
.
├── .github
│   └── workflows
│       ├── rust.yml
│       └── slack-notify.yml
├── .gitignore
├── LICENSE
├── README.md
└── service
    ├── build.rs
    ├── Cargo.lock
    ├── Cargo.toml
    ├── env
    │   ├── logging-dev.yml
    │   ├── logging.yml
    │   ├── service-dev.yml
    │   └── service.yml
    └── src
        ├── custom_tracing_layer.rs
        └── main.rs

Final list of dependencies in cargo.toml file

Jumping ahead to show final state of dependencies used inside cargo.toml file, so you could see big picture first (and observe maybe already familiar by other projects entries):

[dependencies]
async-mutex = "1.4.0"
chrono = { version = "0.4.23", features = ["serde"] }
config = "0.13.3"
native-tls = { version = "0.2.11", features = ["vendored"] }
reqwest = { version = "0.11.14", features = ["json"] }
serde_json = "1.0.91"
tokio = { version = "1.24.2", features = ["macros", "rt-multi-thread"] }
tracing = "0.1.37"
tracing-subscriber = "0.3.16"

[build-dependencies]
copy_to_output = "2.0.0"
glob = "0.3.1"

Layered configuration

From above cargo.toml file you could see that config dependency is included. And such allows to have multiple sources of configuration in a way similar to such:

/// Makes merged main config. WIll panic if ENVROLE env variable is not set.
fn make_config() -> Config {
    let env_role = env::var("ENVROLE").unwrap();
    Config::builder()
        .add_source(File::new("env/service", FileFormat::Yaml))
        .add_source(File::new(("env/service-".to_string() + &env_role).as_str(), FileFormat::Yaml))
        .build()
        .unwrap()
}

where you could see that base configuration is coming from file env/service.yml and then merged with one corresponding to runtime environment’s role supplied via ENVROLE. As an example — such is provided in form of service-dev.yml file.

Same approach is applied to logging configuration as well with base configuration coming from file env/logging.yml and complimentary/override via ENVROLE as logging-dev.yml file:

/// Makes merged logging config. WIll panic if ENVROLE env variable is not set.
fn make_log_config() -> Config {
    let env_role = env::var("ENVROLE").unwrap();
    Config::builder()
        .add_source(File::new("env/logging", FileFormat::Yaml))
        .add_source(File::new(("env/logging-".to_string() + &env_role).as_str(), FileFormat::Yaml))
        .build()
        .unwrap()
}

Making external configuration work for local runs

For building project code I’m using such cargo build command:

RUSTFLAGS='-C target-feature=+crt-static' cargo build - release - target x86_64-unknown-linux-gnu

and because env folder is out of src folder — during cargo build content of it will not be copied to target build folder target/x86_64-unknown-linux-gnu/release/ and to overcome such — I’ve added buildtime dependency copy-to-output and made it run each time unconditionally for build inside build.rs file.

Structured logging in JSON

For structured logging in JSON format I’ve adopted approach described by Bryan Burgers in his 2 articles:

and saved all functionality into custom_tracing_layer.rs file.

I’ve also added small function to it to format date/time in UTC timezone and as ISO8601:

fn iso8601(st: &std::time::SystemTime) -> String {
  let dt: DateTime<Utc> = (*st).into();
  // formats like "2023-01-27T20:28:50.004726747Z"
  format!("{dt:?}")
}

and resulting string then added via timestamp JSON key to original on_event(<omitted parameters>) function inside custom_tracing_layer.rs file in such way:

        // Convert the values into a JSON object
        let mut fields = BTreeMap::new();
        let mut visitor = JsonVisitor(&mut fields);
        event.record(&mut visitor);

        // Output the event in JSON
        let output = serde_json::json!({
          "timestamp": iso8601(&SystemTime::now()),
          "level": event.metadata().level().as_str(),
          "target": event.metadata().target(),
          "name": event.metadata().name(),
          "fields": fields,
        });
        // println!("{}", serde_json::to_string_pretty(&output).unwrap());        
        println!("{}", serde_json::to_string(&output).unwrap());

To control at which level of logging verbosity provided code should run — inside base logging.yml file I’ve specified such:

level: info

to correspond to INFO level and to see what difference it could make on DEBUG level — specified that inside logging-dev.yml:

level: debug

And within main.rs code — tracing initialized with obtained logging level:

    let log_config = make_log_config();
    // Enables spans and events with levels `INFO` and below:
    // let level_filter = LevelFilter::INFO;
    let level_str = log_config.get_string("level").unwrap();
    let level_filter:LevelFilter = match FromStr::from_str(&level_str) {
        Ok(filter) => {
            filter
        },
        Err(error) => {
            panic!("Problem parsing log level: {error:?}, supplied level: {level_str}");
        }
    };
    // Set up how `tracing-subscriber` will deal with tracing data.
    tracing_subscriber::registry().with(
        CustomTracingLayer.with_filter(level_filter)
            .with_filter(filter::filter_fn(|metadata| {
                metadata.target().eq("service")
            }))).init();

Provided several lines of code in main.rs file have purpose to demonstrate if/how configured logging actually works:

    info!("Main entry point...");
    warn!("warning here");
    debug!("debug here");

and having on VM in terminal such for ENVROLE:

$ export ENVROLE=dev

then while running built executable service — part of output should look like such:

{"fields":{"message":"Main entry point…"},"level":"INFO","name":"event src/main.rs:83","target":"service","timestamp":"2023–01–27T21:33:46.036251767Z"}
{"fields":{"message":"warning here"},"level":"WARN","name":"event src/main.rs:84","target":"service","timestamp":"2023–01–27T21:33:46.036261313Z"}
{"fields":{"message":"debug here"},"level":"DEBUG","name":"event src/main.rs:85","target":"service","timestamp":"2023–01–27T21:33:46.036300410Z"}

I’m finding such logging output quite exciting to be utilized for some of future real projects knowing that more fields could be added easily.

Concurrency and mutexes

My initial readings about/hands-on Rust and asynchronous support made me select Tokio runtime (you could see tokio dependency in above cargo.toml file). And thus having main function defined as async:

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
<body omitted>
}

gives ability to deal with various asynchronous aspects right away. And so to try some concurrent REST calls I’ve selected reqwest dependency and Swagger PetStore API getPetById endpoint with id as 1, so request would hit such: https://petstore.swagger.io/v2/pet/1

Also as you remember I’ve stated at the beginning that I need statically linked executable at the end, so as involved additional dependency I’ve enabled feature vendored for native-tls support in cargo.toml file:

native-tls = { version = "0.2.11", features = ["vendored"] }

To mimic that I want total of 10 requests to make — I’ve placed them into an array:

    // Mock possible list of queries to perform
    let queries: [&str; 10] = [
        "https://petstore.swagger.io/v2/pet/1",
        "https://petstore.swagger.io/v2/pet/1",
        <other repetitive entries omitted>
    ];

with an intent to have such list in future via some file/stream with additional parameters/headers/tokens/etc.

For ease of spawned tasks with tokio handling I’ve selected JoinSet:

let mut set = JoinSet::new();

and decided that to keep track of current concurrency level via a single variable, which then needs to be protected with an asynchronous mutex as new spawn will increase value of that variable and finish of it — decrease:

    let concur_max = config.get_int("concurrency").unwrap();
    let data = Arc::new(async_mutex::Mutex::new(0));

As for client, which will make REST calls — user_agent was set (as some APIs may return unexpected status otherwise if not present) :

    // Name your user agent after your app
    static APP_USER_AGENT: &str = concat!(
        env!("CARGO_PKG_NAME"),
        "/",
        env!("CARGO_PKG_VERSION"),
    );

    let client = reqwest::Client::builder()
        .user_agent(APP_USER_AGENT)
        // .connection_verbose(true)
        .build()?;

While initially troubleshooting my code in progress — I’ve found quite useful to enable commented connection_verbose(true) line above, but then don’t forget to set in either base or environment’s specific logging-*.yml file logging level to trace.

After that - main invocation loop looks like this, i.e. checking if any of previously spawned tasks completed within desired timeout and so then decreasing counter (I’ve decided it to panic on timeout currently):

    let mut i: u32 = 0;
    let timeout = time::Duration::from_secs(config.get_int("timeout").unwrap().try_into().unwrap());
    for query in queries {
        i += 1;
        info!("Request id: {}, Making request to: {}", i, query);
        {
            // Lock to be gone when data will go out of scope
            let mut data = data.try_lock().unwrap();
            if *data >= concur_max {
                match tokio::time::timeout(timeout, set.join_next()).await {
                    Err(why) => panic!("! {why:?}"),
                    Ok(opt) => {
                        let call_result: CallResult = opt.unwrap().unwrap();
                        let resp = call_result.resp;
                        info!("Request id: {}, Status: {:#?}", call_result.opid, resp.status());
                        *data -= 1;
                        info!("join1 concur_current: {}", *data);
                    },
                }
            }
        }
        let fut = background_op(i, client.clone(), query);
        set.spawn(fut);
        {
            let mut data = data.try_lock().unwrap();
            *data += 1;
            info!("concur_current: {}", *data);
        }
    }

With final waiting loop to complete all spawned waiting like such:

    // Allow every spawned task to complete without timeout constraint
    while let Some(res) = set.join_next().await {
        let call_result = res.unwrap();
        let resp = call_result.resp;
        info!("Request id: {}, Status: {:#?}", call_result.opid, resp.status());
        {
            let mut data = data.try_lock().unwrap();
            *data -= 1;
            info!("join2 concur_current: {}", *data);
        }
    }

In above code snippets I’m using either “join1” or “join2” as part of logging output, so later could analyze when particular had occurred.

And to complete picture of my experiment — such is above called function background_op(<omitted parameters>), which adds extra delays (20 secs for request #4 and 15 secs for request #6) into 2 invocations to demonstrate impact:

struct CallResult {
    opid: u32,
    resp: Response
}

/// Wrapper for underlying reqwest client call to have operation id assigned.
async fn background_op(id: u32, client: reqwest::Client, query: &str) -> CallResult {
    let call_result = CallResult {
        opid: id,
        resp: client.get(query).send().await.unwrap(),
    };
    // Those delays below are to prove that concurrency works in expected fashion
    if id == 4 {
        let twenty_secs = time::Duration::from_secs(20);
        info!("Request id: {}, Sleep 20 secs", id);
        tokio::time::sleep(twenty_secs).await;
    }
    if id == 6 {
        let fifteen_secs = time::Duration::from_secs(15);
        info!("Request id: {}, Sleep 15 secs", id);
        tokio::time::sleep(fifteen_secs).await;
    }
    call_result
}

Linting and running code, observing outcome

Before trying to build and run code — it’s always best practice to try and lint. Here it’s done with cargo clippy:

$ RUSTFLAGS='-C target-feature=+crt-static' cargo clippy - release - target x86_64-unknown-linux-gnu

It should not produce any error and if so — then cargo run code in a similar way:

$ RUSTFLAGS='-C target-feature=+crt-static' cargo run - release - target x86_64-unknown-linux-gnu

In logging output you’ll see initially up to 6 spawned tasks:

{"fields":{"message":"Request id: 1, Before calling to: https://petstore.swagger.io/v2/pet/1"},"level":"INFO","name":"event src/main.rs:138","target":"service","timestamp":"2023–01–29T15:19:43.391607261Z"}
{"fields":{"message":"concur_current: 1"},"level":"INFO","name":"event src/main.rs:160","target":"service","timestamp":"2023–01–29T15:19:43.391662265Z"}
{"fields":{"message":"Request id: 2, Before calling to: https://petstore.swagger.io/v2/pet/1"},"level":"INFO","name":"event src/main.rs:138","target":"service","timestamp":"2023–01–29T15:19:43.391667526Z"}
{"fields":{"message":"concur_current: 2"},"level":"INFO","name":"event src/main.rs:160","target":"service","timestamp":"2023–01–29T15:19:43.391670661Z"}
{"fields":{"message":"Request id: 3, Before calling to: https://petstore.swagger.io/v2/pet/1"},"level":"INFO","name":"event src/main.rs:138","target":"service","timestamp":"2023–01–29T15:19:43.391672537Z"}
{"fields":{"message":"concur_current: 3"},"level":"INFO","name":"event src/main.rs:160","target":"service","timestamp":"2023–01–29T15:19:43.391676427Z"}
{"fields":{"message":"Request id: 4, Before calling to: https://petstore.swagger.io/v2/pet/1"},"level":"INFO","name":"event src/main.rs:138","target":"service","timestamp":"2023–01–29T15:19:43.391678527Z"}
{"fields":{"message":"concur_current: 4"},"level":"INFO","name":"event src/main.rs:160","target":"service","timestamp":"2023–01–29T15:19:43.391680884Z"}
{"fields":{"message":"Request id: 5, Before calling to: https://petstore.swagger.io/v2/pet/1"},"level":"INFO","name":"event src/main.rs:138","target":"service","timestamp":"2023–01–29T15:19:43.391682592Z"}
{"fields":{"message":"concur_current: 5"},"level":"INFO","name":"event src/main.rs:160","target":"service","timestamp":"2023–01–29T15:19:43.391685031Z"}
{"fields":{"message":"Request id: 6, Before calling to: https://petstore.swagger.io/v2/pet/1"},"level":"INFO","name":"event src/main.rs:138","target":"service","timestamp":"2023–01–29T15:19:43.391686959Z"}

but once concur_current will reach 5 — it will wait till one of already spawned before tasks will complete and only then will progress further. Completion you could observe in output containing “join1” part:

{"fields":{"message":"Request id: 2, Status: 200"},"level":"INFO","name":"event src/main.rs:148","target":"service","timestamp":"2023–01–29T15:19:43.839642699Z"}
{"fields":{"message":"join1 concur_current: 4"},"level":"INFO","name":"event src/main.rs:150","target":"service","timestamp":"2023–01–29T15:19:43.839850424Z"}

And after all tasks were spawned — in last loop code will await them all to complete. And completion will have “join2” part:

{"fields":{"message":"Request id: 10, Status: 200"},"level":"INFO","name":"event src/main.rs:168","target":"service","timestamp":"2023–01–29T15:19:44.060471331Z"}
{"fields":{"message":"join2 concur_current: 2"},"level":"INFO","name":"event src/main.rs:172","target":"service","timestamp":"2023–01–29T15:19:44.060856547Z"}
{"fields":{"message":"Request id: 6, Status: 200"},"level":"INFO","name":"event src/main.rs:168","target":"service","timestamp":"2023–01–29T15:19:58.945672861Z"}
{"fields":{"message":"join2 concur_current: 1"},"level":"INFO","name":"event src/main.rs:172","target":"service","timestamp":"2023–01–29T15:19:58.945742575Z"}
{"fields":{"message":"Request id: 4, Status: 200"},"level":"INFO","name":"event src/main.rs:168","target":"service","timestamp":"2023–01–29T15:20:03.849763114Z"}
{"fields":{"message":"join2 concur_current: 0"},"level":"INFO","name":"event src/main.rs:172","target":"service","timestamp":"2023–01–29T15:20:03.849831899Z"}

In above you could also see that Request #6 completed after all other requests, but before Request #4 and such was desired to observe with introduced delays.

Executable file size and check for statically built aspect

If checking executable file size:

$ ls -l target/x86_64-unknown-linux-gnu/release/service
-rwxrwxr-x 2 user user 13735312 Jan 29 15:19 target/x86_64-unknown-linux-gnu/release/service

it reveals that size is about 13.7 MB, which is good enough for some Docker image later to have inside.

And check for statically built:

$ file target/x86_64-unknown-linux-gnu/release/service
target/x86_64-unknown-linux-gnu/release/service: ELF 64-bit LSB pie executable, x86–64, version 1 (GNU/Linux), static-pie linked, BuildID[sha1]=46b2fc034555d117fb1b42fcfcdd16559190a5b0, for GNU/Linux 3.2.0, with debug_info, not stripped

as well as:

$ ldd target/x86_64-unknown-linux-gnu/release/service
 statically linked

which proves it’s built as expected.

Short conclusion and future thinking

Based on such my short exploration with Rust — I like outcome it produces and available support for real-life asynchronous use cases.

For used config dependency I’ve already filled Enhancement request for capability to print which effective config value came from which source.

As above activity was also for me to explore GitHub Actions in relation to Rust — under .github/workflows folder I have 2 YAML files to automate caching of compilation/linting/building and build notifications to some Slack channel.