Taking Rust for a Ride to Azeroth: What writing an AH Scanner in Rust taught me

Published in

Digital Frontiers — Das Blog

15 min readJan 21, 2022

As a side project, I set out to develop a game client for a popular MMORPG. Since there would be lots of mangling with bytes required, I chose to use Rust as a comparably low-level language. In this article, I describe the problems I faced related to Rust and give you an overview of the project.

This is the short(ish) story of the project. I published a lengthy post with much more technical depth in my private blog, so if you are interested in the problems I faced with the project itself, go ahead and read the extended version of this blog post afterwards.
Although I really love Rust for its safe and fast code, I focus mostly on the negative parts of the language and the ecosystem in this article.

These are the key takeaways of this article:

Stay away from low-level libraries if you are not aware of how deep the rabbit hole goes.
Evaluate the library ecosystem of Rust thoroughly before using it for a project that requires somewhat exotic functionality.
Rust forces you to care about the details, regardless whether you want to.

What and Why

While playing some MMORPG with a few friends on a private server, we wondered whether we could increase the turnover of the ingame currency, if only we had insight into the market structure. Out of this thought, a new side-project was born: An Auction House Scanner written in Rust to learn to use some crates, network debugging, C++, and see what problems have to be faced when developing a more serious Rust program.

Let’s start by discussing an overview of the plan, shown in the image above. For legal reasons, I did not develop the client against the most famous MMORPG on the planet but used a server emulator as my reference. Within that server, there exists an Auction House where players can post and bid on auctions. The plan was to write an application that would periodically scan these auctions and dump them in a Postgres database. From there, analytics should be run using Apache Superset and Grafana.

The reference implementation

There exists an Open Source project that simulates a game server for the official clients of the MMORPG in question. Given that all network communication is implemented in that project, I could use the C++ sources as a reference implementation to build the client. With a bit of luck, the client would also work with the hosted private server we were playing on. More to that at the end.

The game server project is a massive code base, even if we exclude the accompanying database. Totalling nearly 800.000 lines of C++ code, navigation in the code base quickly became challenging. Tools like fd, ag and to some extent VSCode did help a lot. To run the server, it is advised to set up a dedicated virtual machine, since it is quite picky in terms of version for some dependencies like ACE, MySQL, and OpenSSL and easily pollutes your machine with 40Gb of data.

Getting the code to connect

The only thing I knew about the protocol when starting off was that it was some custom binary protocol based on TCP. So, to make network connections, I decided to learn a new Rust crate, mio. Mio is absolutely overkill for this project, but loving to over-engineer side projects, I thought this might be the perfect opportunity to learn mio. Spoiler: This was a mistake in hindsight.

Complexities regarding mio

Mio is centered around a user-controlled event loop, powered by epoll on Linux. The library is used as the network implementation for the higher-level tokio implementations. To provide the best possible performance, mio lets your event loop register interest in certain events, e.g. the availability of new data to be read, and only wakes your loop as soon as new events are available. Unfortunately, this also requires you to handle all available data, otherwise the unhandled data is lost and will not be available in the next execution of your loop. Once again, I realized that it is hard to write low-quality code in Rust, which can be a problem if your goal is to produce a prototype as quickly as possible.

Key takeaway: Stay away from low-level libraries if you are not aware of how deep the rabbit hole goes.

If you are interested in mio, I picked it up again to write about it in this post.

In the end, I managed to get mio to work in more than 200 lines, this could have been done in probably 10 lines using the normal std::net::TcpStream already included in Rust. The many additional hours surely were worth the increased efficiency at 3 packets per second.

The binary protocol

Now that the client can connect to the server, it was time to have a closer look at the actual login procedure. There are actually two servers at play, a login server and a game server, that split the login process across them. This allows a bunch of independent login servers to control access to a dynamic set of game servers. To enable scalability, the address of the game server is delivered by the login server, requiring us to make new TCP connections on the fly. This turned out to be a bit tricky to achieve from within the event loop with mio but that was exclusively my fault.

The login procedure is as follows:

The client announces its version, OS, locale, and so on, including the account name.
The server and client exchange some nonces and start an authentication protocol.
Using a modified version of SRP5, a shared session_key is computed.
The client disconnects.

That was all with the login server, using knowledge of the session_key, the client can now authenticate with the Game Server using some challenge-response scheme.

Until this last step, the whole communication happens unencrypted, allowing us to spy on it using Wireshark.

Wireshark even supports decoding the logon packets

Message Format and Decoding in Rust with the Bytes crate

The wire protocol is quite simple, making the parsing logic comparably simple. Each packet consists of a header and a payload. The header contains the size of the payload as well as an opcode that defines the type of the packet.

Parsing such a protocol in a naive way (that's what I did) is quite easy: You read the first two bytes, then continue reading until you have read the opcode plus the amount of bytes encoded in size or your available data is running out and you have to buffer it somewhere and wait until the next iteration of the loop has more data available.

The Bytes crate provides useful utility methods to read values and advance the internal read pointer. This makes reading them easy:

let size   = bytes.get_u16_le(); 
let opcode = bytes.get_u16_le();

Of course, you need to pay attention not to advance over the end of your available data. Rust does simply crash in such a case. This is good since it does not read some crazy more or less random value that appears to be next in memory. Unfortunately, Rust is not a silver bullet and you have to take care of this yourself.

Starting with message 3 in the image above, the protocol has a surprise in store: The headers are encrypted. This is done by the Game Server to deter reverse engineering the protocol and to prevent cheating by modifying the network traffic. Although I initially thought it is ridiculous to simply encrypt the header, it turns out this is quite effective for the use case: You cannot reliably decipher the individual protocol messages if you can not separate them from one another. In combination with nasty obfuscation within the packets themselves (bit-shifting based on GUIDs), patching packets directly in the network is extremely hard.

So how does encryption work in the protocol? The aforementioned session_key is used to derive a set of sub-keys that are in turn used as Initialization Vecors (IVs) to initialize two RC4 instances, one for each communication direction. Since the session_key was created based on unique seeds and is authenticated by the login server by simply saving it in a shared bit of storage, no further authentication messages are required. This makes a much simpler protocol when compared to TLS. Interestingly, newer versions of the game actually use TLS. Unfortunately, we are stuck with the old version and need to implement the protocol on our own.

Having fun with cryptography in Rust

Rust provides several cryptography libraries, some of which are unmaintained, some with overlapping functionality. Since the only one that supplied all I needed in one crate was last updated in 2016, I opted for a combination of openssl and ring. Having used ring before, I was amused how well they marked that some hash functions are outdated:

pub static SHA1_FOR_LEGACY_USE_ONLY: Algorithm

Unfortunately, RC4 is not included in ring and I had to use openssl for it, which unfortunately breaks release builds for me. This is due to release builds using optimizations with special CPU instructions not available on my server.

Key takeaway: Evaluate the library ecosystem of Rust thoroughly before using it for a project that requires somewhat antiquated cryptography.

After investing a considerate amount of time in selecting the “correct” cryptography crate implementing the header encryption turned out to be quite simple.

Having implemented SRP, I tend to understand why all the cryptographers get angry when people roll their own crypto: The amount of errors I ran into while implementing a simple and well-defined protocol is mind-boggling, even though having the benefit of being able to test the “correctness” of the implementation by firing up against the service and the stakes of failure being low. Here the strict language caused me to yell at the computer several times but saved me from doing stupid things even more often. For the implementation of SRP, some of the expressiveness of Rust (but also its generous usage of syntax) really showed. An example is the following snippet, allowing to interleave two message digests:

let mut K = ds0
       .iter()
       .interleave(ds1.iter())
       .map(|i|*i)
       .collect::<Vec<_>>();

Yes, the empty type hint is needed but please don’t ask why.

Further, I want to thank the guys behind the num-bigint crate, made for working with large numbers: converting them to byte slices and back worked flawlessly.

Traits for Handlers and Readers

Based on the now decrypted size and payload header fields, one can extract the fields of the payload. These differ for each and every packet, so we need to decide which parser to use based on the opcode. Further, every packet might or might not do the following things:

Change the local representation of the game state
Change the connection state
Produce zero or more response packets

For this I split packet related code and state mutation related code in two: Readable + Serializable and Handler.

The traits concerned with creating Rust structs from byte slices and vice versa are the following:

trait Serializable{ 
  fn write(&self, buf: &mut BytesMut);
} 
trait Readable{ 
  fn read(opcode:&u16, size: &u16, buf: &mut BytesMut) 
  -> Result<Self, ParseError> where Self: std::marker::Sized;
}

So far, nothing special. The Readable trait simply receives a pre-parsed header representation, while Serializable is only concerned with writing the current struct to a buffer. These traits are used in a parser trait that operates on each and every packet.

trait SpecializedParser{ 
  fn parse(&mut self, 
  bytes: &mut BytesMut, 
  state: &mut ConnectionState) 
  -> Result<Box<dyn Handler<ConnectionState>>,ParseError>;
}

The unwieldy return type is used to have trait objects of handlers, operating on some connection-specific state.

Finally, the Handler trait from above is quite simple, allowing to return zero or more reply packets in the form of trait objects implementing the Serializable trait. The passed state reference is the local representation of the game state that can be modified by a Handler.

pub trait Handler<T>{ 
  fn execute(&self, state: &mut T) 
  -> Option<Vec<Box<dyn Serializable>>>;
}

Let’s have a look at an example to get a better understanding of the traits in action. I present you the SAuthResponse packet. This packet is received as soon as we authenticated with the game server. In the above graphic, this is the response message of the server in step 3.

pub struct SAuthResponse{ 
  pub success: bool
} 
impl Readable for SAuthResponse{ 
  fn read(_opcode:&u16, size: &u16,buf: &mut BytesMut) 
  -> Result<Self, ParseError> { 
    ...
    let success = a == 0x80;
    Ok(SAuthResponse{ success }) 
  }
}

Using the result type from above, we can easily signal parsing errors, like malformed packets or fatal errors (in the above example, failure to authenticate is not considered fatal and handled elsewhere).

The counterpart is the Handler:

impl Handler<ConnectionState> for SAuthResponse{ 
  fn execute(&self, state: &mut ConnectionState) 
  -> Option<Vec<Box<dyn Serializeable>>> { 
    if !self.success { ... } 
    state.auth_state = AuthState::Authenticated; 
    Some(vec![ 
      Box::new(CReadyForAccountDataTimes{}),
      Box::new(CEnumCharacters{}), 
    ])
  }
}

When the handler is executed successfully, two response packets are sent: one for requesting information of time played and one to list the available characters on that realm. I really like the approach of Rust here, using explicit return types everywhere, avoiding unexpected exceptions or error handling by returning booleans (or worse, integers). However, the types are so verbose, a type alias might be necessary if the software would be developed further. The verbosity is especially visible in the part, that glues all of these components together: The opcode matcher. I will only show you a screenshot of the code since this is no pleasure without syntax highlighting.

The job of the opcode matcher is to generate the correct trait object according to the opcode presented. However, we need to “drop” the concrete type information to get a more general trait object. This is ugly. If someone has a better solution, please let me know. I know this could be done with a macro but I tend to use macros only if absolutely necessary. This problem gets way worse if you consider how many lines it would take to implement the whole protocol with more than 400 opcodes.

Rewriting this type mapping into a general-purpose crate is one of my follow-up projects, so stay tuned.

Working with bytes (Decoding the payload)

Decoding the network packets could be done in Rust without many problems thanks to the Bytes crate, featuring methods like get_u32_le and copy_to_slice that do exactly what you would expect. However, one thing I found missing are operations on a few bits, especially pattern matching. Something like Bitstrings in Elixir would have made parts of the code much, much more concise. I tried both, bitshifting and bitwise and , both worked but were not quite readable.

// Moderately readable bitwise and
bGuildGuid[1]= bitfield[1] & 0b0000_0100 > 1;// Moderately readable bit shifting
bits_0 += ((self.guid[2]>1)as u8)<<4;

In any case, working with single bytes was much easier compared to other languages, that do not have real unsigned bytes.

Working with time

Of course, it is possible to use time in Rust, but it is … special, let’s see.
The Game Server requires you to send periodic time synchronization messages, preventing the clock in the clients to drift away from the server clock, resulting in weird problems. This synchronization mechanism is based on the current tick, computed based on the milliseconds elapsed since server startup. The startup time is transmitted initially relative to the Unix epoch. Here it gets dirty. Let's have a look At other languages fist:

Java: System.currentTimeMillis()
Go: time.Now().Unix()
Erlang: os:system_time()

The solution the Rust guys came up with? See yourself:

match SystemTime::now().duration_since(SystemTime::UNIX_EPOCH) {
    Ok(n) => n ,
    Err(_) => panic!("SystemTime before UNIX EPOCH!"),
}

Is this complicated? Yes. Is this too complicated? It depends. Is this better than the other solutions? Yes! There are a lot, like a lot, of nuances regarding time. Things like time warp (actually supported by erlang) or the need for monotonic time and much, much more.
I think this is one of the tradeoffs the language needs you to make:

Rust forces you to care about the details, regardless whether you need to.

If my scanner crashed because I changed the timezone of the server, then so be it. Systemd will start it again the next second. For the firmware of an airplane, such behavior is probably not acceptable. Rust forces you to handle edge cases where other languages deliberately allow you to run into problems, hoping for mild consequences or seldom occurrence. But you pay for this safety with increased complexity.
However, this is only true if you do not need the “more correct” approach: If you use a less pedantic language and encounter problems that would otherwise be prevented by Rust and you need to solve them yourself (given that you spent the time to identify them), the complexity of the resulting code far exceeds that of the Rust code you wrote.

This increased complexity for the sake of correctness and safety shows all over Rust, from time and AAA to AAA. While I am a fan of correctly working programs, I understand that one cannot always justify the increased complexity and accompanying increased development costs before actually running into problems. But keep in mind here, that Rust targets areas typically dominated by C++, so a comparison with Java is not fair.

Data Analysis

The final step remaining is the analysis of the observed auctions saved in the database. For time-related overviews, I set up a small Grafana dashboard, using the Postgres data source. The screenshot below shows the aftermath of someone trying to reset the economy by buying all of the items of a specific type and then reselling them at a much higher price.

For all analysis that is not time-related, I used Apache Superset. This allows asking questions like

Are auctioneers selling a specific type of item specialized in producing only certain types or do they produce from all categories?
If there are specialists, do they have a monopoly there?

This is answered below: There are items that are sold by nearly everyone (The column of the red A) and there are people that sell only a few items of a certain type but have a large market share (the player behind row B).

Top Auctioneers and their market share per item class

Within the first few days, we could secure about 6% of the total market of “glyphs”, selling slightly below average price, as shown below.

Our market share in terms of auction count and value

We further could write database queries that told us at every moment, which items are currently listed for a price below their average market price.

Misguided people could use this information to write a completely autonomous trading bot. I explicitly made sure I did not implement anything that would allow such interaction before open sourcing the tool.

So what did we get out of this project from a game perspective? We gained quite a lot of insight into the inner workings of the game’s economy. Did we actually use the tool for anything more than learning? Not really, except a few experiments, we did not interfere with the economy. After all, who wants to ruin the game for other players.

If you want to read more about how the crypto works, what private-server operators do to keep hackers at bay, and what gas mask bags have to do with all of this, make sure to check out the longer post over at my private blog.

Conclusion

What did we learn about Rust in this project?

Even after spending way more time with this project than is justifiable, I still like Rust. The language simply gives you the confidence of writing better software. However, the constant nagging about types not being cast into their super-types and borrowed slices is not impossible to overcome but really slows you down.

Using mio was overkill and lead to unreadable code. The low-level nature of the crate did in fact surprise me: whilst it is clearly stated in the docs, I did not anticipate the amount of work the stdlib and tokio do for you. So if you plan to go the route of low-level implementations, I can only tell you (in an overly dramatic voice): “You are not prepared”. In the end, I learned a lot about mio and networking, a followup project was born from this.

As shown exemplary with the time API, Rust forces you to care about all the things that can go wrong, regardless of whether you want to develop airplane control software or a one-off script. This saves you from doing stupid things but might slow down development.

Rust has some areas where the ecosystem is not quite mature yet. So depending on your project, make sure all required libraries are available and working. Especially delicate topics like cryptography require mature implementations, so relying on integrations of tried and trusted libraries like OpenSSL is a good way to prevent reinventing the wheel.

Thanks for reading! If you have any questions, suggestions, or critique regarding the topic, feel free to respond or contact me. You might be interested in the other posts published in the Digital Frontiers blog, announced on our Twitter account.