Making Rspec fast while using seeded data
Here at Opendoor, we have a unique problem where our Rails application requires a large amount of seeded data to run. We need to create mock addresses, cities, vendors, operators, lenders, and more just to run a single unit test!
When we started, we inserted every piece of seeded data into our database for each spec run. This would work, but would be terribly slow to perform. Our developer environment would take roughly 30s to run a single unit test. As our engineering team grew, this was an enormous drain on overall productivity. Not only did we need our specs to run faster, there were a few other requirements as well:
- Have the same seeded data at the start of each spec run
- Rolling back all additional data added in the spec run
- Support pre-seeded data with sequences
Given this problem, a search around the web did not yield any useful guide to solving this. So this post is here to share how we tackled this issue.
How we made it work
At a high level, to get everything to work we had to make a few tweaks:
- Use Spring to pre-load our Rails app and seed that data just once
- Use DatabaseCleaner to rewind our database state to a previous point
- Modify FactoryBot to update the sequences
1. Using Spring to seed once
Spring is a Rails app pre-loader that allows you to instantiate your Rails app once and keep it loaded in memory as a running process. For all the subsequent calls, your command runs on a fork of the original Spring process, making all your Rails commands run insanely faster than before.
By default, Spring will initialize your Rails app by loading your
config/environment.rb which means anything you include in there will remain in the parent process. To ensure we only seed the database once, we include code like this:
And inside of
spec_helper.rb we would seed the database by calling
2. Using DatabaseCleaner to roll back state
DatabaseCleaner is used to roll back any database operations we do inside of a spec run. Each spec run should begin in the same state as every other spec to reduce spurious failures. DatabaseCleaner runs in the forked Spring process and the code lives in
spec_helper.rb like this:
This config change will ensure any database operations that occur during a spec run is undone once the spec is completed.
3. Modifying FactoryBot to seek Sequences
We use FactoryBot, a Ruby library that provides a flexible way to create mock data of our models. FactoryBot also provides functionality for sequences, which is used to generate an incremental field like
firstname.lastname@example.org. This ensures that you will always have a unique value for your database records.
However, now that Spring is pre-seeding the database and DatabaseCleaner is used to roll back our state for each test run, we have another problem that our FactoryBot sequences are off. For example, check out this code:
By seeding our database in the parent process, we now have a new problem that our sequences are reset back to 1 for each Spring fork. This errors out as the row for sequence 1 is already taken. To fix this issue we store the FactoryBot’s current sequence before the process is forked and update it at the start of each spec run. See code here:
Now our specs will pass since we have our sequences set to the same position it was left off before the Spring fork.
Piecing it all together
Finally with all the above parts put together, we now have fast specs that meets all the original requirements. Our final setup now looks like:
Although we choose to use Spring, DatabaseCleaner, Rspec and FactoryBot here at Opendoor, there are many alternative Ruby packages that would work as well. If you’re using different packages for pre-loading or database rewinding, these same concepts can be applied in the same manner.
Results from these changes
From making these changes we saw an average speedup of 15x by dropping spec runs from roughly 30s to around 2s for each run 🚀. As we write lots of specs here at Opendoor, changes like this are massively impactful.