Database Fixture Setup in PHPUnit

We already discussed when and how to reset a database between tests. But on top of that you often need some basic or more complex data to run the tests against. We discuss different variants and their respective trade-offs in this post…

Dump & Insert Live Data

The simplest approach — we still see this in the wild — for schema and test data management is to dump and replay live data for testing purposes. The benefit here is that you only have to adapt data in one place when the schema changes and it is really easy to set up. But there are a couple of drawbacks to this approach:

  • Large live data sets
  • The live data usually is “large” — at least a couple of hundred MB (while database are usually not considered large before reaching TB). It takes a lot of time to reset a database even with such a dump. You do not want to spend such an amount of time before each test, test suite or even before each test run.
  • Boundary value test cases
  • There is often a need for special test cases, like strange characters or other type of boundary values in your tests. Those data sets might not exist in the live data so you end up creating this data in your tests on top of the data dump anyways.
  • Data privacy
  • Depending on your use case and business your developers should not have access to all live data. There might be sensitive information which should be locked down on your servers. In this case a live SQL dump is no option. Especially in Germany we might be required by law to lock certain data away from certain people.
  • Changing data on the live system
  • It is obvious that data on the live system changes over time. This can make it hard to author tests that are stable and reproducible. In addition, tests might become less meaningful if you need to craft them in a way that they can cope with changing live data.

Modified Live Data Set

To avoid the problems mentioned above a next step usually is a modified SQL file, which is smaller and does contain sensible test data, like boundary values or stable data without any sensitive information for reproducible tests.

The problem which arises now is that you have to adapt two files when you change the data structure. And the schema and properties of the data will divert over time — no matter how careful you are. In the end this approach is always hard to maintain, so what can be done?

First we suggest you implement some kind of sensible schema management like DBDeploy, or even Doctrine Migrations if this works for your use-case. In this post we want to focus on the data / fixture management, though.

Base Data

Most applications require a set of base data, like default or admin users, some groups and permissions or something similar. This will be same in all installations (production, staging, development, testing, …). Depending on your schema management solution you will be able to insert this data during schema initialization. With DBDeploy you can just add some INSERT statements with Doctrine you could use Doctrine data fixtures (or even the Symfony bundle).

Test Data

The more important thing is the test data. Inserting sensible test data is tightly coupled to your test database resetting strategy which we discussed earlier.

Before Each Test

When you reset your database before each test you also want to insert the test data right inside the test. This is very obvious, makes tests easy to read and understand. In theory this is clearly the best solution. But as mentioned in the referenced blog post this will cost a lot of time to execute and likely be to slow for any non-trivial application.

Before Each Test Class

When you reset the database before each test case you can create the data throughout the test case. An simple common CRUD example could be the following tests:

  1. Create a new data set
  2. Load data set
  3. Fail loading a non-existent data set
  4. Load listing (with one item)
  5. Delete data set
  6. Fail deleting non-existent data set

Those tests depend on each other which should be indicated by using @depends annotations like in this example from our demo project.

This approach is still very clean and from reading the test case you can understand the full context. Another developer will still be able to understand what is going on. This is a lot harder when the data is inserted in another place, since you’ll always have to look in multiple places to get the full image. And new developers might not yet know all the places to look at. Tests which provide all the context you need to know in one file are very helpful for everybody.

This strategy will get more complex if you have dependent data — like tests for user permissions, which also require users and groups to exist. You could either use custom helpers or tools like Alice for this.

Before the whole test run

If you decided to only reset the database once before all tests are run you usually need a more complete data fixture. You will want to fill all tables with some basic data you can work with. Especially in such a case tools like Alice are very useful. By relying on Faker you even get sensible looking data without too much custom work.

If you want help finding the correct testing strategy for you and get a kickstart in testing — book us for an on-site workshop.

Conclusion

The way you initialise your database fixtures depends on your test schema management. We suggest to reset schemas at the begin of each test case and creating the data right in the test case. This proved to be a good compromise between speed, test (case) atomicity and test readability. Tools like Alice and Faker can ease the process of creating data a lot.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.