Part 7.3: Parallel Processing in Google Workspace File Operations

2 min readMay 4, 2024

Suppose you’ve retrieved a large list of files from Google Workspace as described in “Part 7.2: Read the contents of Google Workspace” and need more specific information about these files. You might initially consider iterating over the list and sequentially fetching the required details for each file. While functional, this method is time-consuming.

For example, from the DataFrame created in Part 7.2:

// last statement from Part 7.2
...
let df: DataFrame = DataFrame::new(
    normalizer
        .indexes
        .iter()
        .map(|(name, &index)| Series::new(name, &normalizer.noprmalized[index]))
        .collect(),
)?;

Option 1: Sequential Iteration

Using fields such as “name, id, version, driveId, trashed”, you could sequentially process each file (as always, the code examples do not contain clean error handling):

let file_params = &[
    ("fields", "name,id,version,driveId,trashed"),
];

use std::time::Instant;
let start = Instant::now();
for i in 0..df.shape().0 {
    let id = df["id"].get(i)?.get_str().expect("your error message here")
           .to_string();
    let mut a_file = GoggleDrive::file(id, &file_params);
    let _file = a_file
        .load(imp_bearer.token().await)
        .await
        .files();
}
let duration = start.elapsed();
println!("The SERIAL operation took: {:?}", duration);

In a test case with 76 files, this sequential access took approximately 23 seconds.

Option 2: Parallel Execution

To decrease execution time, consider making requests for file information in parallel. Iterate over the DataFrame, launching a separate Tokio task for each file:

let start = Instant::now();
let tbearer = TaskedBearer::from(imp_bearer).await;

let handles: Vec<_> = (1..df.shape().0)
    .map(|_i| {
        // create a task-local client of tasked bearer to move 
        // into the tokio task
        // in tokio task still only one bearer exists!
        let local_bearer = tbearer.new();
        let id = df["id"].get(_i).unwrap().get_str().unwrap().to_string();
        tokio::spawn(async move {
            let _token = local_bearer.token();
            let mut a_file = GoggleDrive::file(id, &file_params);
            let _file = a_file.load(&_token).await.files();
        })
    })
    .collect();
for handle in handles {
    let _val = handle.await?;
}
let duration = start.elapsed();
println!("The PARALLEL operation took: {:?}", duration);

In the same scenario, this method took only about 0.485 seconds, a significant improvement.

Conclusion

This example demonstrates that parallelization can drastically reduce execution time by a factor of approximately 48. With minimal additional coding — mainly the creation of Tokio tasks — significant runtime improvements are achievable.

Part 7.3: Parallel Processing in Google Workspace File Operations

Option 1: Sequential Iteration

Option 2: Parallel Execution

Conclusion

Written by Alfred Weirich