Canary in Software Factory

Panu Viljamaa
8 min readJul 21, 2018

--

Coal-miners keep Canary-birds in cages in the mine. You may have heard of this strange practice. Why do they do that? As long as the canaries sing everything is fine. But if you see a dead canary you know it’s time to run out of the mine. You know there’s poison gas in the mine because you know it kills canaries before it kills humans.

Now think about your software. Software is like a mine in a sense. It has many tunnels of data-paths along which data travels, almost like air inside a mine. And it’s also possible that you get invalid, corrupt data traveling along those pathways. How do you discover that is happening before it is too late, before space-probe misses Saturn?

This article is about how to create software “Canaries in Coal-mine”, with open-source library “Cisf.js”.

Software Canaries

Software has its equivalent of Canaries. They are called “assertions”. You call a primitive that tests for a condition. If the condition fails it causes the assertion to “die”, to throw an error. Therefore if there is no error you know everything is fine.

But isn’t that a tautology? What was the value of Canaries again? The value is you can detect something is wrong ASAP, before it becomes a disaster.

Canaries are cheap to acquire and cheap to keep. They eat little because they are little. Their value is in the information they provide. Therefore software canaries must be cheap as well. They must not take many key-strokes to create, nor create many code-lines to maintain.

Assertions plus Unit Tests

Assertions go together with unit-tests like horse and carriage. Unit tests execute some code then assertions state the conditions required of the results. What may not be commonly appreciated much is that assertions are useful even at run-time, for several reasons.

  • Unit-tests: Run only at test-time
  • Assertions: Run both at test-time and at run-time
  • Unit-tests: Tell something about the externally observable behavior of software modules, software units
  • Assertions: Tell something about the internal State of software while its executing

Canaries in coal-mine don’t tell us about the amount of coal produced. They tell us something about the internal state of our mining operation.

Unit-tests are a way to assert something about the inputs and outputs of the software they are testing. But we also want to assert things about how system components interact with each other, for that we need Assertions. Assertions need to be called by unit-tests because they can only tells us something if they get executed. Assertions need Unit-Tests. Together they make the software strong because they tell us something about both its external and internal behavior. They make it easier to understand what the software does and also how it does it.

Why Inputs and Outputs are not the only thing that matters

You might think it does not matter what the software does internally as long as it produces the desired results. But that is not true because not only you want correct results, you want to make it easy to modify your software when requirements change.

When you modify your software and keep its tests unchanged the tests tell you whether you have successfully refactored the software to do the same correct thing in a better way, faster for instance. You modify the software being tested but keep the tests the same, to tell you it still behaves the same.

But requirements change over time, they become better, more adapted to the world around them. Or maybe the world around them changes in which case the requirements for the software must change too. In either case there is need for the software to be easily, but correctly, modified.

Both unit tests and assertions help you achieve that goal because they both help you understand your software. Unit-tests help you understand the externally observable behavior of software, and detect if it changes. Assertions help you understand the internal behavior of software and whether that is the way you think it is.

Assertions are most useful as the “glue” that holds components together, allows them to interact, by describing the accepted interchanges between them. So while unit-tests are needed to describe how a module looks from the outside, assertions are needed to describe its internal parts and what they can expect from each other.

Let’s install some Canaries in our software

Adding Canaries to your code must be cheap in terms of learning how to do that, and cheap in terms of key-strokes needed to add them to your code, and cheap in terms of the effort needed to maintain the assertions added. The surface of assertions must be small, like with the small birds Canaries.

The value of Canaries is their benefits minus the expenses they cause. If they are too expensive they will not be used and many miners will die. And that is part of the situation today I believe. Adding assertions to your code is too expensive to be used as much as it should if it was not so expensive.

The open-source JavaScript library “Cisf.js” provides a small symmetric group of APIs with short easy to remember API-names, easy to learn and fast to type. Rather than just provide “assertions” it allows you to create “Types” and then assert that something is or is not an instance of a type. Types are composable, you can create new types out of existing ones.

Example from the Node-World

Let’s take as example day-to-day use of Node.js API. Node.js provides many “asynchronous” APIs to deal with the underlying operating system. For example there’s an API for reading a file asynchronously. Because it is asynchronous you get the contents of the files only at some point in time later, and you also only learn if there was an error sometime later. Therefore the Node.js API follow the pattern of taking a callback-function as argument which gets called by the system when the contents is ready, or the system knows it could not find the file being requested.

Here’s the function-header for fs.readFile() from Node.js documetnation:

fs.readFile (path[, options], callback)

To keep our focus on what we are talking about let’s assume we have a similar function but without any ‘options’. In that case the function-header will be simpler:

readFile (path, callback)

The callback-function cannot be just any function. It must be a function that expects the kinds of arguments Node.js will call it with. Node.js documentation uses its own informal syntax for expressing what the Fs.readFile() expects as arguments:

The problem with that is it is very informal. Unlike with languages like Java and C#, JavaScript has no formal machine-readable standard way of declaring the TYPES of arguments a function expects to be called with.

The problem with informality is shown by the fact that the second argument ‘err’ in reality does NOT need to be an error. If there is no error in fact it will typically be ‘null’. Null is NOT an instance of Error.

If there is an error it can be a String, telling you what the error was, or it might an instance of the built-in class Error. The problem is that none of this extra information is expressed in the informal Node.js API-declaration above. Why? Because there is no language that would allow you to succinctly express it.

The CISF Way

By including cisf.js in your code you can express APIs like above formally, within source-code. You could define the above readFile() as follows:

const {Type, A}      = require (“cisf”);
const MaybeError = Type (Error, null);
const StringOrBuffer = Type (String, Buffer);
function callback(err, data)
{ A ([MaybeError, StringOrBuffer]
err, data
);

};

Type()’ and “A()’ are two API-functions you get by requiring the module “cisf,js”. There are more such API-functions exported by cisf.js , but thanks to JavaScript-6 destructuring -syntax you can conveniently import just the ones you want.

“A” is short for “Arguments” or “Array”. It’s first argument is an array of types, followed by N arguments each of which must be an instance of the corresponding element in the types-array. So above says the first argument of callback() ‘err’ must be an instance of type MaybeError, and second argument must be an instanceof StringOrBuffer.

Note that this is not just about someone writing documentation and saying “The first argument must be of type MaybeError”.

That would be very helpful already because the meaning of MaybeError is now unambiguous, its meaning determined by source-code. But this is not about somebody saying so. This is a fact. It must be true. Why? Because otherwise the function ‘readFile()’ above will not execute, will throw an error, will die like canary in coal-mine.

In practice you would put types like ‘MaybeError’ used often in their own a file, which you then import to wherever you need those types.

x() and ok()

In addition to ‘A()’ the two commonly used Cisf APIs are:

x (value, … Types)ok (truthyValue)

x() causes an error if its first argument is not an instance of one of its remaining arguments.

ok() causes an error if its argument is false, undefined, null, or “”. In other words, if its argument is not ‘truthy’.

The point of x() and ok() is that their names are short and mnemonic so they are easy to write and easy to read. Easy to remember. If they were not ,you would not use them.

The mnemonic for ‘x’() is it is like writing ‘x’ into a check-box. It checks that its first argument has the type given by the 2nd. If called with one argument it checks that the argument is not undefined or null.

There are more API-functions in Cisf but not too many. Cisf-API is kept minimal on purpose, for it to be easy to learn. You can read about the other API -functions in the README.md -file that comes with the installation.

Comparison

In summary consider these two ways of writing a function which checks that its argument is a Number, if not it throws an error.:

A) With cisf.js:

function myFunk (n)
{ x(n, Number);
}

B) Without cisf.js:

function myFunk (n)
{ if ( ! (n instanceof Number) &&
(typeof n !== "number")
)
{ throw "type-error"
}
}

The version B) is much longer because it must account for the fact that in JavaScript numbers come in two varieties, “boxed” and “unboxed”. Cisf.js instead knows about that and treats both the same.

Version B) is too long to be used very often in practice. Even if it clearly would help r reduce bugs in your software, and make it more readable, auto-documenting.

The reason why you are better off asserting the type of number-arguments in particular is that in JavaScript you can treat most everything as a number, performing arithmetic operations on it.

So if the arg was supposed to be a number but was not, it probably will NOT crash your program. It will just cause it to produce wrong results. That you might not notice until the space-probe misses Jupiter.

Getting Cisf.js

Assuming you have Node.js installed you can get cisf.js by executing this in a command-window :

npm install cisf

If you don’t have Node.js you can download cisf.js from GitHub, see the link below. Or you can install Node.js.

Next

My next post about Cisf.js is:

https://medium.com/@panuviljamaa/easy-way-to-create-node-js-callbacks-430ef188b347

Links:

Cisf.js Github Repo: https://github.com/panulogic/cisf

Cisf presentation for NYC Node.js meetup July 2018: https://panulogic.github.io/cisf/doc/index.htm

--

--