I like having a difficult task to solve. One of those is my endeavor to truly learn a low-level programming language. My goal is to write really fast and safe APIs to supply data for visualizations on dashboards and in reports. I want to gain a fundamental understanding of how the language works, of the concepts and philosophy applied instead of trial-and-error with lots of googling. Sometimes it just makes sense to take your time.
As a Data Engineer and trained empirical economist I have plenty of experience with higher level languages primarily Python and the typical frameworks in this line of work (pyspark, numpy and pandas). While the abstractions a higher-level language provides enable fast implementation of business logic programs can be rather slow at runtime. Abstractions require overhead — imports done with every program in disregard of its purpose — that slow everything down. Additionally, I hope that really knowing a low-level language will enable me to understand better how many of the tools I use daily like interpreters, databases and APIs function.
While C and C++ have been around for a long time and are well used, they did not seem appealing to me. I never was able to locate a good reference and I do not know of something like a package manager for it (I am sure those exist though). One of the awesome things about Python to me is the ability to add functionality with just a pip install command. This seems even more effective when it is harder to write code which kind of is the case in low level languages. I first read about Rust on Hacker-News a while back and immediately found it interesting. The idea of combining a low-level language with all its security features and speed with a pip-like package manager (called cargo in rust) made intuitive sense to me. The fact that a nonprofit organization like Mozilla leads the development of the compiler makes sense to me as well. This should not be in the hands of a single giant cooperation as it is with Go. Quickly I found the awesome learning materials like “the book” called “The Rust Programming Language” available for free online. It is very well written and I was able to work through all of the examples within a few weeks. “Rust by Example” is great as well.
Since I went through the process of learning a programming language before I know that the best way to approach it is to start with something simple and then slowly increase the complexity of your programs. Thus I decided to start out with a small command line tool. Rust is a great choice to write command line tools since it provides cross platform functionality and is blazingly fast. My idea was to create a file age tracker. In cooperate environments you often have legal requirements that make it necessary to delete documents after a certain amount of time. I find it tedious to keep track of the age of all of my documents through file name or annotations within the file. I decided to automate the process. That’s what digitalization is all about after all.
The file age tracker should
- scan a folder of files and files within subfolders
- list all the files that are above a specifiable age threshold
- offer an option to delete those files
- offer an option to flag the files in the filename for review
In Rust you usually separate your code into multiple files to increase clarity and reusability of your code. Due to the fact that mine is a very small application I just kept everything in one file. First the imports, structs and methods are defined then the main function followed by functions. Let’s walk through the code together, shall we?
As you can see my program starts with several imports from the Rust standard library (everything with “std”) and from external sources which are called “crates” in Rust. SystemTime and Duration are pre-defined types as are Path and PathBuf. fs and io are necessary to perform file system operations. I utilize the external crate walkdir to read all the files in all the subfolders of the folder provided as a parameter. The standard fs function doesn’t do that — for a subfolder it just returns another metadata set that is classified as a folder. Timeago enables you to display Durations in proper format in the command line. For example if you let the application scan your files and one is a year old and the other three days timeago will take the corresponding durations and format them correctly for display. I utilized regex to check and match command line input. Clap is used to parse parameters and prettytable makes is easy to disply a table in the command line by providing a html-table-style interface.
If you are an experienced Rust developer reading this please be gentle because I am not sure if this code is the “Rust-way” to do things. Whenever there is a potential error in Rust the Result-type is used. It is an enum that either contains and Ok wrapper for the desired data or an Err wrapper for an error message. All the file system read operations return Result-types. This of course makes sense because reads can go wrong especially when it is off a server connected by a network. If you read file specific MetaData though every bit of metadata is in turn wrapped in another Result-type which I don’t get if the struct it is contained in is already wrapped. Due to the fact that the Rust compiler forces you to deal with errors these chained functions with repeated calls to unwrap() (unwrap returns the content of the Result-type Ok) come into existence. Maybe it would be more in-line with Rust philosophy to propagate the possible error and make the method return a Result-type as well?
Structs are similar to class definitions in python. They enable you to group data together and write specific functions for that data (called methods). Since this applications deals with files my struct is called File and contains the file-specific metadata that I need for this application: The path (also contains the file name), the files’ creation date and a calculated time since creation utilizing the standard library function for the SystemTime type called elapsed().
The first method initializes a new struct. The others are “getter-methods” that return the contents of the struct as a string for display.
Let’s move on to the main function. Similar to C the main function in Rust is special. In any program it is executed first. In this first part of my main function command line parameters are parsed using the wonderful crate clap which automatically generates a -help flag. First a path has to be provided by the user that contains the files we want to investigate for their age. The “if let”-syntax in line 65 is an abbreviation for the awesome match functionality that Rust provides. The Some() is part of the Option-type which is a construct in Rust instead of NULL values. Option is an enum that either wraps data in Some() or return None if no data is provided. Here if a path is provided we extract the data from the Some-wrapper, turn it into a Path-type and the call the function read_folder_content with it. The function will return all the files in the folder and its subfolders in form of the Files struct defined above.
We also pass a cut-off time as a parameter through clap. If I pass one year as a parameter entries will be filtered to contain only files that are a year old or older. The time paramter can be passed as year, day or hour. The regular expressions are use to check the format of the parameter and then execute code that calculates the comparison parameter accordingly.
The comparison parameter is then passed to a filter_files function that also takes the entries vector that contains all the files. It returnes all the files that have an creation date equal to or higher than the parameter provided. These are then printed to the command line as a table. The user can than choose to delete the files or flag them in the file name to review them later. Functions for deleting and flagging files are defined below my main function.
The function in the screenshot above gets called in main to read the metadata of the files in the folder provided and pass it into my File struct. Walker contains an iterable WalkDir struct with file metadata that comes with methods to get this data. It is passed into my cutom File struct. All the file information is returned in a vector.
Then filter_files is called in main that filters the return of read_folder_content function with the generated comparison parameter. This is memory efficient because the function takes ownership of the original file list. It goes out of scope at the end of the function and is deleted from memory.
The funtion delete_files may be called if the user decides to erase the filtered files. It relies on the standard library function remove_file.
flag_files uses another standard library function to add “_flagged” to the file name if the user chooses to do so.
parse_time_param parses the provided time paramter into an u64. This is called in main to calculate the comparison parameter. If the user passes -t 2y it identifies the number in this string (e.g. “2”) and then parses it into u64. Then in main it is multiplied by the number of seconds contained in the time unit provided. For example a day has 3600 seconds so if a user provides the parameter “5d” the comparison parameter is calculated a 5*3600.
This was fun. So far I really enjoy Rust. My next learning project will involve interaction with a database so stay tuned.
Please feel free to leave a comment and fork the code on github: https://github.com/marvintherain/file_age_tracker