Interviewing in Ruby: a Minimal Assembly Interpreter

8 min readMar 19, 2018

Imagine you’re applying for a Ruby Developer position at Company XYZ.

You successfully passed all previous interviews and you’re heading for the final one, which you know will be strictly technical.

You enter XYZ’s office where you’re warmly greeted by their office manager. They show you the way to a room where your interviewer waits for you.

At first, you’re feeling a bit nervous. But after a small chit-chat with the recruiter your stress level drops and you start feeling more comfortable.

“Okay, let’s get down to business,” he says and hands you over a piece of paper with the following content:

MOV A 10 # Store 10 in register A
MOV B A # Copy value from register A to register B
ADD A B # Add register A and B and store the result in register C
JMP 7 # Jump to line #7
JNZ 7 # Jump to line #7 if the value in register C is non-zero
MUL A B # Multiply values stored in registers A and B and store them in register C

“Oh sh*t, it’s Assembly!” you think. Your hands started shaking, but you try to look cool.

“Let’s implement a small Assembly interpreter that can execute instructions similar to those on this piece of paper” the interviewer explains.

This is a real story that happened to a friend of mine.

Since I think this is an interesting problem I decided to take a stab at it myself.

Here’s the deal.

I’ll be writing this blog post as I progress through the solution. In a perfect scenario, I’d like to limit coding to somewhere between 60–90 minutes.

But because at the same time I’ll be writing down my way of thinking, I have to trust my gut feeling how long it’d take coding.

Of course, it won’t be the same experience as an actual interview. I’ll be missing constant feedback from an interviewer. And I’ll be sitting comfortably in my apartment with a low level of stress.

So, my dear reader, imagine we’re in this interview together. Go grab a cup of coffee and let’s get that job!

Preparation

Let’s get started! We’ll use Ruby and Minitest (though normally I prefer RSpec).

Before we write any code though, let’s try to come up with concepts related to this task. You know, naming is the hardest thing in computer science. That’s why I’d like to spend time coming up with decent names first.

Normally, we can speak with clients, product owners, or other domain experts to learn how they call things in the domain of interest.

In our case, we could search for the necessary concepts in the computer science literature (or ask Google). But for the sake of this exercise, let’s come up with something instead.

So we have an Assembly language. It has instructions that we execute. How to call a set of instructions? An instruction set? Instruction sequence? I know! A Program! Sometimes obvious things hide in plain sight.

We’ll parse the code of the program in order to understand its instructions. Then using our interpreter we’ll execute these instructions. We also have 3 registers: A, B, C, that can store numerical values.

We just came up with our dictionary — the nouns and verbs marked in bold.

I think this dictionary is big enough for now. Although we may discover new concepts later, I’m comfortable enough to start coding.

First steps

Let’s come up with a basic test — executing an empty program.

Yeah, I like to give my projects cheesy names.

The code isn’t in any way smart — executing an empty program should result in having 0's in all our registers.

We could later decide to refactor register to a separate class. For the time being we’ll keep it simple and treat each register as a key, value pair in a Hash though.

Implementing the MOV instruction

Okay, let’s try to modify the content of register A by implementing the MOV instruction.

The tests are passing.

We assume our program can only consist of one line of code, but we’ll fix this later.

We can either refactor now or cover some corner cases. I prefer to cover corner cases first. It makes me more comfortable with refactoring later. I also find it easier to think of any edge cases as soon as possible.

So what happens when the instruction looks like this: MOV A , MOV , or MOV A asdasd ?

Or what if someone wants to use a register that doesn’t exist: MOV D 7?

These are all things we should guard against.

So let’s add tests and start throwing some errors!

Yay! I’m feeling much more comfortable now. But the code doesn’t look pretty. Let’s do some refactoring!

First Refactoring

Let us to extract some domain concepts into separate classes.

What should we start with?

Parameter validation in execute method looks suspicious. I think it doesn’t belong there. The whole code block inside the if statement looks like an Instruction. So let’s try to refactor out an instruction.

Also, this made me realize that the command variable has an incorrect name. It really should be called instruction.

That’s much more elegant and all our tests still pass!

We still have a conditional instruction in Assemrby.parse and some hard-coded strings though. Let’s leave them for now and implement more requirements instead.

Handling Multiple Lines of Code

I think it’s high time our library could parse more than one line of Assembly code. The change should be easy. Let’s write a test that uses the MOV instruction multiple times and implement the required changes.

That was easy!

It also made me realise one more thing: we should raise an error when we don’t recognise an instruction. Lets continue with our requirements!

Handling ADD and MUL Instructions

So let’s implement ADD and MUL instruction and handle the case when we don’t recognise the instruction.

Be aware that we can do the following: ADD 10 A , ADD 10 15 , or ADD A B. We have a lot of cases to cover.

Below we implement all these test cases:

This test file really starts looking like a mess. We should do something with it.

Here’s the plan.

First, we’ll extend the previous if-statement to a case-statement. We’ll handle the new MUL and ADD instructions. We’ll raise error when we don’t recognise the instruction.

Then we’ll refactor the new code.

Finally, we’ll refactor our tests.

So let’s continue coding! First, the raw code that passes all tests:

Those new instructions have a lot in common. I copy-pasted code from one to the other, so they’re really similar.

And they’re both really ugly.

I didn’t even bother to change variable names. I hope you don’t commit code that looks like this :).

On the other hand, our tests are passing. We can safely refactor this monstrosity.

The worst part of this code is that we have to treat numerical values and register references differently. Plus we have to do it in both classes.

We’ll try to extract a nice interface that will let us forget about this.

Then we’ll see if it makes sense to remove the remaining duplication.

It looks much better now!

I’m not super proud of the names we came up with though. Assemrby::Value#value doesn’t sound good and might be confusing.

On the other hand, I think that Add and Mul got handled well. They still look quite similar but this time I won’t remove the duplication.

When I was a junior developer I thought that code duplication is wrong and has to be fought immediately. However, with experience I learned it’s not always a bad thing.

Cleaning Up Tests

I guess if I haven’t been writing my thoughts down, I’d already hit either 60- or 90- minute mark. Let’s clean up the tests and finish. After this we’ll have a moment of reflection on the code.

We have a lot of tests that exercise different instructions. I believe we should move them to test files specific to each instruction. So here we go.

And I think we can stop here. We didn’t implement all the features but I think we went quite far for a blog post.

What we’re missing are the JUMP instructions. How would we implement them?

We’d have to store a value of an Instruction Pointer. An Instruction Pointer would store the line number we’re currently executing. We would have to pass the instruction pointer to Instruction#execute along with registers.

The way I’d approach it would be first extracting a separate class that would store both the registers and the Instruction Pointer.

How would we call the new class? ExecutionEnvironment perhaps? Or simply Environment. Then we could safely pass an object of this class around.

Also, the JUMP instruction could potentially jump into a yet unparsed line of code.

Because of this, it’d be simpler to first parse all the code and build the Instruction objects first with Instruction#build.

Then, we could store these objects in an Array. An Instruction Pointer would simply be an index of this array.

Then, we would perform an execution phase, where we’d simply iterate over the Array and call Instruction#execute on every object.

Again, here’s the whole code for reference.

Summary

What could be improved in the current code?

I’d certainly want to spend more time on having clear error messages when raising errors. This helps tremendously in debugging later on.

Also, I don’t like the way Instruction objects are being built. We have a case statement that’d grow with every new instruction added. We could have an Instruction registry and have each Instruction class register in it instead. The registry would then have a factory method that’d build Instruction objects.

I like how Assembry::Value let’s you forget about what kind of thing you’re dealing with, but I believe we should spend some time thinking about proper names there.

I also have mixed feelings about passing registers everywhere but I can’t think of any solution. Maybe that’s just the way it is?

We’re also explicitly calling registers['C'] in Add and Mul. However, I’m not sure this is that bad in this case.

We could also spend some time extracting small private methods to keep the abstraction level consistent.

And we could add several more tests. For example, we didn’t cover any cases in which we should throw errors in neither ADD nor MUL. And we only covered Registry-Numeric case, but we didn’t cover Numeric-Registry case. See the difference: ADD A 2, ADD 2 A.

I also think we failed to come up with clear names a few times. Do you really build an Instruction? Shouldn’t it be called parse instead? We talked about programs before, but this concept isn’t anywhere in our code.

So that’s it! I hope you’ve enjoyed it.

I’m really interested in how you would approach this problem.

What would you do differently?

What do you like/dislike about my solution?

I hope we’ll have an interesting discussion in the comments.