Your project is not ready for contributors, part III

Photo by Tobias Keller

If you’re just tuning in, allow me to get you up to speed. This article is part three. Part three of a five part series introducing the various techniques in which team leaders and architects can make their projects contributor ready.

The target audience is those with new or relatively early stage projects. As your project grows, you’ll still need to onboard contributors, quickly. However, you’ll have diminished many good hours / months / years of potential productivity on your team’s slow, early ramp up.

Contributor readiness is sink or swim. If you’re working with a growing team, every bit of work you put into making it easier for your developers will compound. If you’re working with a remote team, you have no chance if you’re not at a high-level of contributor readiness.

This week’s topic is data. If you want to catch up:

Your project is not ready for contributors, part I (Access)

Your project is not ready for contributors, part II (Environment)

There are five areas you need to refine. And you should have a standardized system in place so you can implement this more effectively each time you go through this process:

  1. Access
  2. Environment
  3. Data
  4. Tools
  5. Testing

Data is a wide topic, which is why I hesitate to use such a broad term. In this particular case, we are looking at the data that your developers need to run a productive and seamless environment. Data as a means of contributor readiness boils down to these two categories:

  1. Generated test data.
  2. Database backups readily available from multiple environments.

Small in number but wide ranging in implications. Lost time from these failures may result in endless cycles of manual testing from your highly paid developers.

Generated test data

This is one of my personal favorite contributions to a new project. This one saves you and your team trillions of hours in lost time filling out forms and other pages with various types of queries.

Make no mistake, this is no replacement for a robust and thorough automated testing strategy. Automated tests are something that mature projects should move toward. Immature projects on the other hand:

I’ve found it very helpful to add a little testData param to the url. This is for developers, and is the kind of thing that should be accessible in your development, staging or beta environments.

I would even say it’s fine to have this go to production for early stage products. I am sure this sounds downright heretical to enterprise developers, but early stage products have different needs. For an MVP or product with less than 1000 users, production testing is a necessary evil.

As for customers, they’re not interested in filling their forms with garbage data. Obviously, do not use your test data utilities to store credit cards and other sensitive information. Here’s how I would go about structuring the test data grab bag:

This is an extremely simple file. It has banks with useful words for testing different kinds of standard constructs in a generic application. Each bank has a getter, and these are naturally composable.

Assume your developers will need to compose different testing situations at a moment’s notice. Wouldn’t it be great to receive a pull request that includes a test module that other developers can use to quickly fill out forms or get the application in a particular state.

One of the best things I have found about random data is that it actually forces a testing strategy on your early efforts with the application. Choose great variation in your names, addresses, places, or nouns of any kind. Some of these values will end up failing or passing validation . The continual use of random data by developers will only strengthen your application to the realities of consumer use.

This is not a replacement for robust QA. Instead, it’s something that will get you to your QA stage much faster. It will also identify bugs earlier given the wider variance in the data.

Email

An important sub-point about generated test data is the matter of email testing. The problem:

  1. You want your developers to be able to send and receive test emails.
  2. You still want these emails and test accounts to maintain unique distinction from one another.

DevOps people taught me this trick years ago:

If you have a gmail address, and you send an email to yourfulladdressname+RANDOM_INFO@gmail.com, everything after the + sign is ignored.

This is one of my favorite tricks with test data. You can create loads of user accounts that are assured to send email to the same user. This is also extremely useful for beta testing.

Generating at the database level

Particularly in the early stages of a project, it’s extremely useful to have some kind of generated data at the database level.

Much like the utility above, I have used a similar process to creates loads of data and fill each table with roughly 10–100 records and their according foreign-key relationships.

As your application matures, you will find this becomes less useful, and potentially problematic. Eventually, you should be able to use database backups to quickly get your environment up to speed.

Although it’s not likely with early stage and MVP products. Sometimes, you are not allowed access to the production database. In these cases you have two choices:

  1. Continue using generators to create high quality data. This requires some investment on the part of your team.
  2. Have a robust testing apparatus to create scenarios, accounts and data in your staging environment. That environment should then be available for developer import.

Laziness and “let’s just manually create data” is not a great way to ensure your team is productive. You’re a developer, you can definitely figure out a faster way to generate data.

Backups and local import

This is essential. Most teams have a strategy for backing up their databases. Where they fail is that they do not have a good strategy for developers to quickly access and import the data.

Your developers should be able to grab a particular daily (and even hourly within the past week) snapshot of data from whatever is their preferred environment.

Imagine how useful it would be for QA and other developers to report bugs. Bugs are time stamped. Your developers would be able to grab a snapshot of the database they were testing on, within the hour of the bug report.

If you’re just getting started, here is a standard flow that works for early stage products:

  1. Have a task server run a cron job and run a backup on the time range you desire. A Node/Python/Ruby/INSERT YOUR LANGUAGE script will do the job.
  2. Send the backup files to S3. Have a similar task that knows how to clean up the archive (saves some $).
  3. Make sure your local developers have access to that S3 environment. See Part I: Access.
  4. Developers should be able to run a script:
scripts/import-database-from BETA 2017-07-04 6:00

Time arguments should be optional, and it should allow for a default daily backup.

This kind of flow is very easy for SQL and NoSQL databases at the early stage of the project. Your tools will inevitably get more sophisticated. For teams getting started, this will do just fine.

Providing your developers with rapid fire test data and on-demand backups for import is critical in their process and workflow. Nothing makes me sadder than watching developers struggling to come up with new usernames and passwords to test their features.

Give your team the tools to work quickly. You want your team to write code. Better yet, you want your team to generate good ideas, not manual test data.