Unit Testing, Redefined

Our journey into creating and open-sourcing our own mocking library.

Tyron Jung
Tech @ Quizlet
12 min readSep 23, 2020

--

Photo by Angelina Kichukova on Unsplash.

Meaningful systems don’t operate in a vacuum. Systems provide value by interacting with other systems, often in a complex, interconnected web of moving parts. When we go about testing a system, its interactions with other systems often make the task considerably difficult.

In software, we deal with this complexity by simulating interesting boundary conditions for the system under test using mocks. Because mocking is such an essential part of software testing, many programming languages have an accompanying mocking library. For PHP, that mocking library is Mockery.

As Quizlet’s backend is written in PHP (well, not actually — but I’ll get to that in a second), Mockery was the natural choice for writing most of our unit tests. However, we ran into a series of problems that forced us to reckon with our dependence on Mockery.

So with that, let me tell you a story about three things: the changing landscapes of software, believing we can do better than the status quo, and redefining the way we write unit tests. This is the story behind Hammock — our very own mocking library.

1. A Brief History of PHP

When Mark Zuckerberg began writing code for what would later become Facebook, he decided to use PHP. To be frank, PHP is a rather divisive language. When I mention PHP to my fellow developers, most of them cringe and try to stay as far away from it as possible. I can’t blame them — after all, it was created by a guy who hates programming.

But when Mr. Zuckerberg was starting out in 2003, there weren’t too many options to choose from. Stuff like Django and Node.js didn’t even exist back then. If you wanted to quickly develop a website, PHP was the way to go.

As Facebook grew, accumulating millions of lines of PHP code, it became clear that PHP was the bottleneck. With the ever-increasing request load, performance was becoming an issue. PHP’s questionable design choices were catching up to Facebook.

Fortunately, they were able to come up with a solution during their internal hackathon in 2007. They created a tool to transpile PHP code into C++ code. This became known as HipHop for PHP (HPHPc).

Eventually, HPHPc was succeeded by HipHop Virtual Machine (HHVM). Using just-in-time (JIT) compilation, PHP on HHVM achieved a level of performance comparable to C++. Things were looking good — except for PHP’s lack of support for static type analysis. The lack of type-checking made it difficult to identify type errors without actually running the code.

In response to this, Facebook created their own typed version of PHP in 2012. “Hack” was the name of this new dialect, and it would also be able to run on HHVM. Due to the millions of lines of code already written in PHP, one of the key design requirements for Hack was to be fully interoperable with PHP. This allowed for an incremental conversion from PHP to Hack.

This was exciting news for Quizlet, as we had suffered our own fair share of headaches from the lack of type-checking in PHP since the early 2000s. Hack was the aspirin we so desperately needed. We thought it would be smooth sailing after that.

But we were wrong.

2. Goodbye PHP

In 2018, Facebook announced that HHVM v3.30 would be the last release series to support PHP. On one hand, cutting PHP support would allow Hack to become a fully type-checked language in its own right. On the other hand, this put many HHVM-dependent companies in a rather uncomfortable position, including Quizlet.

After this announcement, some organizations that ran PHP on HHVM ditched HHVM altogether, and other organizations that ran a mix of PHP and Hack hesitated to follow Facebook into these uncharted territories. For us, it meant that we would have to migrate the rest of our PHP code and remove all of our PHP dependencies.

Looking ahead, the amount of PHP code we would have to convert was intimidating. Looking behind, the amount of Hack code we had accumulated over the years made it feel like there was no going back. In the end, that’s what we decided — that there was no going back. Thus, we committed to removing one of our biggest PHP dependencies: Mockery.

Alright, enough history. Let’s look at some code.

3. Mockery Shortcomings

Besides the forcing function from Facebook, there were many other reasons to replace Mockery.

Consider the following PHP code:

Pretty straightforward, right? When a Dog is asked to fetch something, it simply returns the input argument. Now, what would mocking the behavior of Dog using Mockery look like?

In the above code, we define a mock Dog object such that when asked to fetch a ball, it returns a frisbee instead. This mock object can then be used in place of a real object, so that we may test our software under the specified conditions.

On the surface, this code looks harmless. You might even say that the fluent syntax is ergonomic. But when we read between the lines, we find the following:

  • Mockery actually does two things: mocking (changing the object’s behavior) and asserting (throwing when the expected conditions are not met). By conflating these two responsibilities, it leaves a bigger footprint on the codebase.
  • The fluent syntax encourages developers to test the implementation details. Methods like once (which will throw if the mocked method is not called exactly once) and with (which will throw if the received argument is anything other than the specified value) put rigid constraints on the test. If the implementation changes even slightly, the test may break.
  • The default behavior of a mock object is to throw if an unexpected method is called. In the above code, if a Dog method other than fetch is called, the mock object will throw. This can be prevented by calling makePartial, but the default behavior once again results in brittle tests.

Imagine having to update a bunch of tests every time you make a slight change to your source code. Doesn’t that sound painful? This is what happens when you focus too much on testing the implementation, rather than the behavior. But more on that later.

Mockery’s biggest shortcoming, however, is not any of the points I listed above. You may be surprised to know how Mockery achieves what it achieves. When it creates a mock object, it first creates a mock class. And in order to create that mock class, it uses code reflection, code generation, and code evaluation. That’s a lot of no-nos in one sentence.

The consequence of combining these three cardinal sins of coding is that the code is not going to be performant. Nor memory-efficient. In fact, we found upon investigation that Mockery was a serious bottleneck in the performance of our unit tests.

With that, here’s a summary of our predicament:

  • HHVM was no longer going to support PHP, so we had to get rid of our PHP dependencies.
  • While PHP had Mockery, Hack didn’t have a go-to mocking library.
  • Mockery had a lot of shortcomings, and we believed we could do better.

Thus, we set out to create a new mocking library for Hack. Keeping Mockery’s shortcomings in mind, we wanted to design a system that:

  • Creates mocks using less memory.
  • Creates mocks more performantly.
  • Cleans up the mocks elegantly.

And that’s exactly what we did.

4. Playing With Sharp Knives

Photo by Richard Iwaki on Unsplash.

When Facebook announced that HHVM was going to discontinue support for PHP, another company that was caught in the middle was Slack. Their office not being too far from ours, a group of Quizlet engineers (myself included) went for a visit shortly after the announcement.

There, we met up with the engineer responsible for upgrading HHVM at Slack. We wanted to consult him about Slack’s plan regarding the transition, wondering whether we should also double down on HHVM.

We learned a lot from that conversation. One of the most interesting things that we learned about was a function called fb_intercept hiding inside HHVM. There's hardly any documentation for this function, but it's nothing short of magical. Let me show you what it does:

There are several interesting things going on here.

First, it mocks the behavior of an object in a way that’s functionally equivalent to Mockery. Second, no mock objects are actually created. The call to fb_intercept changes the behavior of the real Dog class during runtime. Third, it achieves all this without any of that code reflection, generation, and evaluation nonsense.

How is this possible?

When fb_intercept intercepts a function, it modifies the function table at runtime. In doing so, it redirects all future calls to the intercepted function to a different function - the handler. By operating at such a low level, fb_intercept is able to change the code behavior without any of the inefficiencies imposed by Mockery.

Also, canceling the interception is just as simple:

When we replaced a few of the slowest Mockery calls in our codebase with fb_intercept, we were able to reduce the total runtime of our unit tests by half. This was amazing for two reasons: First, it proved that the 80/20 rule is real - a small fraction of the mocks were responsible for most of the slowdown. Second, we saw a real opportunity in using fb_intercept to significantly boost developer productivity.

But we can’t quite call it a day yet.

Anyone who has played with figurative sharp knives in software engineering should be a little worried about the contract of fb_intercept. For instance:

  • What happens if the target or handler function name is misspelled?
  • What happens if you intercept an already-intercepted function?
  • What happens if you forget to cancel an interception?
  • The list of worries goes on.

Now, I would have liked to tell you that your worries are misplaced — but they’re not. In fact, fb_intercept doesn't handle any of the above scenarios very well.

For instance, the following code will run without complaining:

And then proceed to do absolutely nothing (unless there is a class called Dogg with a method called fetch). Let's take a look at another example:

The above code won’t complain either. It’ll simply override the existing interception with another interception, calling another_handler instead of handler. One last example:

Okay, I think I’ve made it clear that fb_intercept's contract can lead to some pretty confusing bugs. So what did we do about it?

5. Enter Hammock

Like many responsible software engineers in possession of a sharp knife, we decided to put a harness around it. In particular, we wanted the harness to do the following things:

  • When the target function name is misspelled, throw an error.
  • When there’s an attempt to intercept an already-intercepted function, throw an error.
  • Automatically cancel interceptions at the end of the block scope, so that there is no chance of forgetting.

But it actually does more than that. We threw in these features as well:

  • Track calls into mocked functions for test assertions.
  • Allow spying on functions without altering their behavior.
  • Allow mocking an individual object without affecting other instances.
  • And much more.

We named the harness “Hammock”. Thanks to having fb_intercept at its core, it can do some pretty neat things. Without further ado, let me show you what it’s capable of.

5.1. Disposable Mocks

Let’s revisit our favorite example — only this time, we’ll write the code using Hammock:

Once again, there are several interesting things going on here.

First, we’re instantiating a real Dog, not a fake one. The mock that's being created via Hammock\mock_class_method is of the fetch method, not an instance of a Dog (and hence the variable name $fetchMock).

Second, there are three arguments into Hammock\mock_class_method:

  1. The class symbol, to statically prevent potential spelling errors.
  2. The name of the method to mock, as a string.
  3. An anonymous handler function that allows developers to create more sophisticated mocks and has access to the intercepted arguments, $args.

Third, by virtue of the using statement, the lifespan of the function mock is limited to its block scope. As you can see, $dog->fetch('ball') returns frisbee within the block, but returns ball outside the block.

The disposal mechanism elegantly cleans up the mock at the end of the block scope, eliminating any chance of human error. Not only that, but limiting the mock behavior to the block scope makes it much easier to write well-segmented unit tests that simulate various boundary conditions.

The last thing that I would like to point out about the above code is $fetchMock->getNumCalls(). Since the fetch method was called exactly once inside the using block, it returns 1.

Earlier, I mentioned that Mockery conflates two separate responsibilities: mocking and asserting. Hammock has one responsibility: mocking. The reason why we chose to implement a feature like getNumCalls but not something like shouldReceive or once is based precisely on this philosophy - that a mocking library shouldn't try to do anything other than mocking.

Sure, Hammock allows you to inspect the number of calls into a mocked function, but it doesn’t overstep its boundaries by trying to make assertions about those calls as well. Assertions can be made by a different library — one that specializes in assertions.

After all, code is useful insofar as it can maintain a concise and deliberate contract.

5.2. Spying on Functions

This one’s a favorite among Quizlet engineers. When you want to track calls into a function without altering its behavior, you can use spies.

Spies are particularly useful when you’re trying to validate optimizations. For instance, if you would like to ensure that a request first checks the in-memory cache before making a round trip to the database — and you want to do that without altering the cache behavior — you can spy on the cache methods to make sure that they’re being called.

For simplicity, spies provide the same interface as mocks. This is because they’re just a special case of mocks that pass through to the original behavior. Like mocks, they capture the number of calls and the arguments in each call.

5.3. Mocking Individual Objects

If you’ve been paying close attention, you might have noticed that the Hammock examples thus far are mocking at the class level, rather than the object level. In other words, the mocks in the above examples affect all instances of the Dog class.

But don't worry - Hammock lets you mock individual objects as well:

Notice that in this example, we’re calling Hammock\mock_object_method and passing in an object pointer ($alice) instead of a class symbol (Dog::class). This ensures that only the target instance is affected, and the other instances maintain the original behavior.

6. With Great Power…

On top of everything that I’ve shown you, Hammock has many other features that make it easier to mock functions in unit tests. However, that doesn’t mean that you should go and mock all the things. In fact, the fewer mocks you have, the closer to reality your unit tests will be.

“I recommend that you mock sparingly.” — Uncle Bob

When given a hammer, you have the responsibility to restrain yourself from seeing everything as a nail.

And while it is okay to mock at system boundaries (network, disk, etc.), mocking the internals of the system under test can lead you to false assumptions about the system, since you have changed its very behavior. If you’re having a hard time limiting yourself to mocking at the seams, remember this quote:

“If testing seems hard, there’s something wrong with your design.” — Sandi Metz

Follow SOLID design principles, and always make sure that the purpose of each system is unambiguous. The first step in good unit testing is good design.

That brings me to my final point: you should generally aim to test the behavior of your code, rather than the implementation. Sometimes, it makes sense to test the implementation — like when you’re trying to test a very specific optimization. But in general, you should always be looking at the overall behavior of the system.

For instance, ask: Does it achieve the desired effects? Does it produce the desired results? Can it withstand a significant amount of load?

Rather than: What arguments is this called with? Is this called exactly five times? Is it rendered in this exact structure?

When you test the behavior, it frees you from brittle tests and gets you closer to what you actually care about.

7. Profit

Today, all of our new Hack unit tests are written using Hammock. No more waiting on code reflection, generation, and evaluation. No more brittle tests. The developers are much more productive thanks to Hammock’s simple and performant APIs.

Also, Hammock is an open-source project! We thought the small Hack community might benefit from a dedicated mocking library like Hammock, so we open-sourced it. You can contribute here: https://github.com/quizlet/hammock.

Riya (left) and I presenting Hammock at HHHUG.

After open-sourcing Hammock, my co-worker Riya and I presented it at the Hacklang User Group Meetup (HHHUG) hosted by Slack. The audience was full of engineers from Slack and Facebook (and a few brave souls from Quizlet). Fortunately, there was a lot of interest and engagement from the crowd.

Finally, my friend Lexidor and I have been working to make Hammock compatible with as many versions of HHVM as possible. Quizlet is excited to see what HHVM has in store for us, and Hammock will definitely be coming along for the ride.

Acknowledgments

Special thanks to:

  • Lori-Anne Ashwood and Loretta Stevens for giving me feedback on my writing.
  • Fred Emmott for proofreading the history of PHP at Facebook.
  • Scott Sandler for telling us about fb_intercept.
  • Riya Dashoriya and Lexidor for developing and open-sourcing Hammock with me.
  • The engineers at Quizlet for using Hammock, providing feedback, and attending our talk at HHHUG.

--

--