Quality Assurance and Pragmatism

Darius
Sempiler
Published in
7 min readApr 10, 2019

QA is in a strange place these days.

In some organisations they employ manual testers and let the software engineer escape the responsibility of asserting quality for themselves.

Whilst this is an increasingly archaic model in modern software development — given the trend for engineers to own the quality of their work — often times companies with manual test resource still try to work in a modern agile way.

What invariably happens is the estimate for a ticket suffers thus, because somehow there’s an intent to build in the test effort performed by another person into the same task representing the implementation.

The engineers will estimate based on their perception of task complexity, whereas testers will estimate based on their effort to perform the quality checks.

How do you reconcile these two estimates when they are fundamentally based on two orthogonal ideas?

And yet, the organisations who automate testing do not have it worked out either.

Vanity metrics like code coverage are chased and preserved fervently, at the expense of passion invested in asserting the usefulness of the constituent tests.

Unit tests are criminal examples of this. The copy/paste, quick ’n’ dirty hacked together unit tests sometimes serve more usefulness testing the limits of the input parameters, or for ticking a checkbox on the Definition of Done, instead of actually checking the code works for the use cases in the wild.

Half the problem in my mind is that testing in this way requires further frameworks to be configured, and the tests live away from the actual source code they test in separate files (if not directories)— out of sight, out of mind.

Another problem is religions like BDD and TDD, that only have value when behaviour is well understood. What they fail to accommodate is the creative, sketching phase to feel out a feature, instead of framing the undercooked assumption for one.

QA and Sempiler

So where does the Sempiler project come into this rant?

It’s fairly critical a compiler tool is fit for purpose, but how can we be more pragmatic about asserting this?

One of the main components in the compiler pipeline is a parser. And any mainstream programming syntax in use today is rich in expressiveness, and the variability of the constructs you can compose.

Parsing source text correctly gets us quite far towards a compiler behaving as expected with your code.

But if I personally spend all my time writing tests the project will never see the light of day. There just isn’t the QA resource right now! …or the user base to inform it!

Let’s instead defer to cheaper strategies that help us build confidence in the code, rather than focus on outright analysis of it.

Strategies

Contrary to some misunderstandings, Sempiler is not based on a new language. In fact, we are implementing the TypeScript/JavaScript syntax to start with.

TypeScript is a superset of JavaScript so asserting the parser works for it should imbue confidence that it may work for vanilla JavaScript too.

Luckily, TypeScript is open source. And so is the test code.

However, we cannot just run their tests against our parser, because the outputs differ. What we can do is make use of their fixtures though.

Almost 5k files of TypeScript snippets, ranging from the typical to the esoteric. Perfect for exercising the parser.

TypeScript Test Fixtures Directory — Thanks OSS!

In my IDE I have two tasks, Test and Scratch.

Test will take those 5k files and run them through our multithreaded parser. Any errors encountered will be printed to the console.

It’s worth noting at this stage I am also not a fan of programming by exception. Instead, errors and other diagnostic messages from an operation are propagated through a custom Result type that also contains the result value.

Anyway, now we know which fixtures fail to parse, but the first problem is that the output error messages say things like:

Identifier expected

Without the context (line, column etc.) that message still requires us stepping through with a debugger on just that file until it chokes.

That is where Scratch comes in. It’s a task that will take just a single scratch file of input and try to compile it.

We dump the troublesome fixture code in that and then can step through easily with the debugger, without battling through the stacktraces of multiple concurrent threads at the same time.

That is fine but for each file we choke on, we have to manually:

  • Copy the fixture file path from the error output
  • Open the fixture file via the terminal
  • Copy the fixture file contents to our scratch file (we could combine this step with the one above but in practice I didn’t!)
  • Run Scratch with a debugger

It’s as laborious as it sounds. Especially when there were 100s of files the parser was choking on — not bad out of 5,000, but still soul destroying to debug.

To alleviate some of the pain I made a Q’n’D change that wipes the scratch file, and then appends each bad fixture (and some metadata about the file in a comment) to the scratch file as they are encountered — albeit with a clumsy and slow mutex!

Wiping the Scratch File
Appending Fixtures to the Scratch File

Going through some of the problematic fixtures you start to realise something pretty crucial — some of these fixtures are intentionally invalid.

That makes sense of course, you want to assert the parser complains correctly if given bad input. However, in this naive setup we were getting many false negatives.

Extracting these intentionally bad files from our input set was largely manual — searching for filenames containing words like invalid or malformed.

Separating these invalid files out for another day reduced our error count. We will fix these when we do resilience testing based on error recovery — the parser’s ability to synchronise on a particular token and continue (after hitting a malformed syntactic construct).

Other input files contained features the parser did not support yet, such as JSX tags. I moved such fixtures to a different folder, such that they could be reintroduced into the fixture file playlist once the corresponding feature was implemented.

OK so now we have a solution that consolidates bad fixtures in our scratch file. The problem with this was alluded to above — error recovery.

The parser currently only has lightweight error recovery, focused on synchronising using the delimiter token in a list (for example, the ‘,’ or ‘)’ when parsing a list of parameters).

When the parser chokes on a construct somewhere in the scratch file and fails to subsequently synchronise quickly, the errors cascade and mount up.

What we probably need to implement is a way to reset the error mode on the parser once it encounters the next fixture in the scratch file. We don’t have this as yet, and so again we are inundated with false negatives — the root cause of which may be an innocuous missing semicolon on some far away fixture in the same file.

This was mitigated to a degree by implementing a PrintSnippet function. One that takes a bad token and prints the invalid lexeme (ie. it may actually be invalid, or just a bug with our parser…or both!) amongst the surrounding lines.

Naive PrintSnippet Implementation
Typical PrintSnippet Output

Drawing the Line

There is obviously more to do beyond this point, but already this amount of effort had consumed the best part of a week’s focus. That might not sound like much overall, but it is substantial effort in the context of an MVP.

And so I harken back to an earlier point: if I spent forever satisfying all 5k fixtures then the compiler would never be released.

Yes the parser may unfortunately choke on some common cases, but those can be fixed almost as quickly as they crop up. What I don’t want to do is sink my time into supporting 100s of intentionally quirky combinations that may never get exercised in the wild.

And how people use your tool is something you can predict to the nth degree, but the moment it’s in their hands is when you’ll really know.

Moreover, we will forever be reacting to the latest, greatest TypeScript (or whatever language) release, and the new constructs it brings, and so we need to draw the line and be more pragmatic about feature completeness at this stage.

A feature is complete when it works for our current understanding of the requirements — ie. the actual data we feed it — not when it works for the entire universe of all possible inputs, as determined by the constraints.

Lastly, it needs to be noted that even when all files parse successfully, we are merely asserting that our parser does not choke on a file, not that the result it produces (the AST) is actually correct!

That is surely a subject for a different day…

braindump exited with status 0

--

--

Darius
Sempiler

Software Eng // prev @Microsoft // passionate about compilers & tooling 🛠️