Write better tests by using Behat with PHPUnit

;tldr It’s hard to write good Gherkin tests with Behat. But if you write Gherkin features inside your PHPUnit tests using my new package jonathanjfshaw/phpunitbehat, then you can get the clarity of Gherkin with the power of PHPUnit.

PHPUnit is the standard tool in the php world for writing conventional regression tests and unit tests. But more recently Behat has become popular, as it lets us write elegant specifications in a ‘Gherkin’ syntax:

Scenario: Buying a single product under £10
Given there is a "Sith Lord Lightsaber", which costs £5
When I add the "Sith Lord Lightsaber" to the basket
Then I should have 1 product in the basket
And the overall basket price should be £9

Behat allows these specifications to be evaluated programmatically, which makes them a kind of automated test too. Generally they are used for a higher level of project specification, but like so many good tools, you have the dangerous freedom to use it how you like.

The pitfalls of Behat

Phena Proxima recently wrote a great article which opened my eyes to the dangers of abusing Behat.

Once you can write these beautiful self-explanatory tests, it’s tempting to use them for everything you want to test, including those complicated little edge cases. Before long you might get features that look more like this (his example):

Scenario: Clearing an image field on a media item
Given I am logged in as a user with the "create media, update media" permission
When I visit "/media/add/video"
And I enter "Foobaz" for "Name"
And I enter "https://www.youtube.com/watch?v=z9qY4VUZzcY" for "Video URL"
And I wait for AJAX to finish
And I attach the file "test.jpg" to "Image"
And I wait for AJAX to finish
And I press "Save"
And I click "Edit"
And I press "field_image_0_remove_button"
And I wait for AJAX to finish
# Ensure that the widget has actually been cleared. This test was written
# because the AJAX operation would fail due to a 500 error at the server,
# which would prevent the widget from being cleared.
Then I should not see a "field_image_0_remove_button" element

Which no longer has much elegance or clarity.

How has this happened? Fundamentally it’s because implementation details have leaked to where they don’t belong. The scenario should be something like:

Scenario: Clearing an image field on a media item
Given I can add videos
When I add a video listing with image
And I edit the video listing
And I press the 'Remove' button for the image
Then the 'Remove' button for the image disappears

But this would require defining quite a few new methods unique to this scenario in order to explain what we mean by “I add a video listing with image” or “I press the ‘Remove’ button for the image”. But instead, it’s far more tempting to press more basic steps into service, stringing them together in a long confusing sequence.

It’s this that is the root of all evil in Behat: the abuse of predefined Gherkin steps. This isn’t a new insight: a pioneer of Cucumber (Gherkin’s Ruby older brother) wrote about it years ago. But it’s worth repeating.

Using a library of simple predefined steps makes your scenarios longer, hard to understand and brittle. It steers you towards an imperative style rather than a declarative style. You write the steps to fit the limitations of your predefined steps, not to express your logic with maximum clarity.

Theoretically then this is developer laziness, not Behat’s fault. But sadly, Behat makes it hard to develop good habits here. Behat expects the step definitions to live in separate ‘Context’ files and expects you to have a small library of these context files in your project. This creates a trilemma; you have to either

  1. Have many many contexts files, one for each feature
  2. Have a few contexts with a huge number of methods in them
  3. Reuse your methods heavily

None of these choices are pleasing. But the most unavoidable problem is the simple need to write your test in 2 different files, the feature file and the context file.

Test writing is often squeezed for time, so when Behat increase the cognitive load of writing good tests, that’s a real problem.

The pitfalls of PHPUnit

But let’s not think PHPUnit is intrinsically saved from these problems. There’s a reason we were attracted to Behat in the first place.

I jumped into Drupal’s code base to find an example, and the first test file I opened contained a single test method that was 200 lines long. It had comments, but they were few and far between and sometimes about implementation details.

The biggest danger of these opaque tests is that no one will remember what is being tested and why. Its very hard to tell what holes there are in the test’s underlying logic, and if a code change breaks the test it’s hard to see why and to assess whether the problem is with the test or with the changed code.

Tests expressed in Gherkin have the potential for a stunning level of clarity that even the best php tests will struggle to reach. You can add comments to to you php test code, but Behat-executed Gherkin will always have the edge here:

  • Gherkin drives the execution so the natural language statement of the test is not optional, it’s fundamental
  • The Gherkin is pulled out very cleanly, separated from all implementation details.

The solution: getting the best of PHPUnit and Behat

Phena Proxima’s solution to to write simple things that are of value to non-technical people in Behat, and other more tricky things in PHPUnit.
But what if we could declare our test logic using Gherkin, but from within our PHPUnit test classes. We’d get the best of both worlds:

  • Gherkin forces us to articulate the test logic in plain English, or the test won’t happen at all.
  • Having the Gherkin feature in the test class puts our step definitions right along with the steps that use them, making the process of writing unique steps that require new step definitions much easier.
  • We can use the full power of PHPUnit for handling exceptions etc
  • We can mix Gherkin tests alongside other tests in whatever way we like

Introducing PHPUnitBehat

Here’s what it looks like:

namespace MyProject\Tests;

use PHPUnit\Framework\TestCase;
use PHPUnitBehat\TestTraits\BehatTestTrait;

class MyTestBase extends TestCase {
use BehatTestTrait;
namespace MyProject\Tests;
class MyTest extends MyTestBase {
protected $feature = <<<'FEATURE'
Feature: Demo feature
In order to demonstrate testing a feature in phpUnit
We define a simple feature in the class
Scenario: Success
Given a step that succeeds
Scenario: Failure
When a step fails

Scenario: Undefined
Then there is a step that is undefined
* @Given a step that succeeds
public function aStepThatSucceeds() {
* @When a step fails
public function aStepFails() {

Which gives you friendly test output:

There were 2 failures:

1) MyProject\Tests\myTest::testBehatScenario with data set #1 ('Failure', Behat\Gherkin\Node\ScenarioNode Object (...), Behat\Gherkin\Node\FeatureNode Object (...))
Scenario 'Failure' had steps:
Failed: When a step fails

Failed asserting that false is true.


2) MyProject\Tests\myTest::testBehatScenario with data set #2 ('Undefined', Behat\Gherkin\Node\ScenarioNode Object (...), Behat\Gherkin\Node\FeatureNode Object (...))
Failed asserting that scenario passed.
Scenario 'Undefined' had steps:
Undefined: Then there is a step that is undefined

You can define these undefined steps in your PHPUnit test class like this:

* @Then there is a step that is undefined
public function thereIsAStepThatIsUndefined() {


You can start using it today

composer require jonathanjfshaw/phpunitbehat


Please give me your feedback! Have I lost the plot or is this an interesting idea?