Learning Rust pt. 4 — Binary Data, DateTimes, and UTF-16

Matthew Seyer
5 min readDec 24, 2016

--

Its time to parse some USN records! This continues on with my journey into learning Rust to create DRIF tools. We are looking into my first project: RustyUsn. In the previous post I created two data structures. The UsnRecordV2 structure will obviously hold the USN record, and, the UsnConnection struct will hold our file handle, position, and size of the USN file. We then implemented a function get_next_record() for our UsnConnection struct. Now its time to parse some data. The get_next_record() should return the next record in the USN journal until there are no more records left.

while let Ok(record) = usn_connection.get_next_record(){
println!("USN structure {}: {:#?}",cnt,record);
cnt += 1;
};

To do this we create a loop to look for a record. Within this loop we want to search for our record, parse it, and return it. self in this function refers to our UsnConnection structure that we are calling the function from. Much like in an object oriented style.

// function for getting a record
pub fn get_next_record(&mut self)->Result<UsnRecordV2,Error>{
loop {
...
}
}

We want to first check that the offset in the file is not larger than the actual file, if it is, we should return an End of File error.

// Check that our offset is not past the end of file
if self._offset >= self._size{
return Err(Error::new(ErrorKind::Other, "End of File"))
}

We then want to seek to the offset and make sure there is no error:

// Seek to offset
let soffset = match self.filehandle.seek(SeekFrom::Start(self._offset)){
Ok(soffset) => soffset,
Err(error) => return Err(error)
};
if soffset > self._size{
return Err(Error::new(ErrorKind::Other, "End of File"))
}

We then initialize the USN struct with:

// init record struct
let mut record: UsnRecordV2 = unsafe {
mem::zeroed()
};

I use the mem::zeroed() to set the structures memory to zero so I know all the values are zero. I don’t know if I actually need this or not. We roll with it for now.

Now I need to get the value of the record length and check that the length is valid. Right now I am only checking if it is non zero. If it is zero, I want to update the offset and start the loop over.

record.record_length = self.filehandle.read_u32::<LittleEndian>().unwrap();// Do some record checks first
if record.record_length == 0{
self._offset += 8;
continue;
}
// TODO: Add additional checks here

If all is good I want to continue parsing. I will parse the next 28 bytes:

record.major_version = self.filehandle.read_u16::<LittleEndian>().unwrap();
record.minor_version = self.filehandle.read_u16::<LittleEndian>().unwrap();
record.file_reference_number = self.filehandle.read_u64::<LittleEndian>().unwrap();
record.parent_file_reference_number = self.filehandle.read_u64::<LittleEndian>().unwrap();
record.usn = self.filehandle.read_u64::<LittleEndian>().unwrap();

This looks horrible with Medium’s code blocks and I appear to have no way to make it look better… but, Medium does allow for gist. I guess I can try with that.

… Bad news. That doesn’t make it any easier. Guess I will resort to screen shots for the rest of this blog.

So… we parse the next 28 bytes. very simple.

Now we get to our first date time value. Let me tell you, its easy! But I think it took me weeks to figure out. Here’s how I did it:

We create a DateTime structure (AKA NaiveDate) with the Windows epoch (1601–01–01). We then parse out the nanoseconds. Convert to micro seconds then add the microseconds as a Duration to the NaiveDate. That’s it! Very simple.

Now we parse the next set of fields:

Now we we get to the next fun part. UTF-16. Many Windows structures have it. How do you parse it in Rust? Here is one way.

UTF16 to String

The first thing I did was look up how I could get UTF-16 into a String. I found this function and thought it would work: https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf16. But this means that first we have to get our wide chars into a vector.

We create a Vector of unsigned 8 byte chars filename length in size. We set the size of the buffer, then we read into the buffer.

let mut buff_name = Vec::<u8>::with_capacity((record.file_name_length) as usize);unsafe {
// set size of byte buffer
buff_name.set_len(record.file_name_length as usize);
}
self.filehandle.read(&mut buff_name[..])

Then we have to create our wide char buffer and set it equal to our char buffer in two byte slices. Once we have the wide char buffer, we can then use the from_uft16() function on it.

// create a utf-16 buffer from the byte buffer
let wchar_buff: &[u16] = unsafe {
// slice into 2 byte pieces
slice::from_raw_parts(
buff_name.as_ptr() as *const u16,
buff_name.len() / 2
)
};
// set record file_name
record.file_name = String::from_utf16(wchar_buff).unwrap();

That’s the last field to parse. Now all we have to do is set the offset to the end of the record, and return the record we just parsed!

That’s it! We are now iterating over USN journal entries in Rust.

I really feel like there are still better ways to doing some of this. My plan is to keep adding to this project and refine it till we are happy with it, get some feedback from the Rust community, then use this project as a go to reference to help me frame all the other tools I want to make in Rust.

--

--