Errors, Composition & Not Going Crazy(part 2)

Marcus Zetterquist
Floyd Programming Language
7 min readJun 11, 2019

--

This is part 2. Here is part 1.

NORMALISED ERROR NAMESPACE

This is the idea to have one consistent strategy in your code for describing what went wrong.

What went wrong / why? Read, bad-format, file-not-found, out-of-memory

Which type of data was being worked on? JPEG, user-account, preference-file, game-update.

Which is the domain were things went wrong? Math, HTTP, REST, file-handling

SYSTEM WIDE STATIC FLAT NAMESPACE

Some systems simply make a shared list of all errors shared by all modules in the system. Unix and HTTP does this.

This gives you a shared error namespace but it creates hard coupling between all the modules and the list itself. This makes it very expensive to update the shared list. It’s also creates a tight implicit coupling between all the modules which breaks their composability. How do you add more errors without extending the shared list?

TREE NAMESPACE

Some systems use a tree of errors. There is a built-in set of base types that modules inherit and make more specific. It’s easy to add more errors without affecting any other modules. This allows modules to detect the basic errors or go for a more specific version of an error.

Examples are Java and C++ that uses exception classes.

logic_error
— invalid_argument
— domain_error
— length_error
— out_of_range
runtime_error
— range_error
— overflow_error
— underflow_error
— regex_error(C++11)
— system_error(C++11)
— — ios_base::failure(C++11)
— — filesystem::filesystem_error(C++17)
bad_alloc

Above: Some of C++ standard exceptions. Notice that its error tree isn’t consistent in what property it represents. bad_alloc is an action-based error while filesystem::filesystem_error is a module (sort of) based error.

There are some problem with the tree-namespace technique. Since all possible errors aren’t known at programming-time it’s hard to implement thunking. You need program-wide introspection or a registry to find all error types.

Only the built-in base errors can be used in a truly composable way since all other errors are tied to the module that introduces that specific error.

Also the error tree needs to decide if it’s a tree of actions, a tree of subjects or a tree of modules. Errors will combine neatly for the property you base the tree on but become complex for the other properties. Example: It’s simple to detect read-errors if you use a tree of actions. But detecting a HTTP error will be more complex since those errors will be spread out as subtypes on different in the tree.

Example: if the error tree is based on module there might be a base class for filesystem-error and another for http-client-error. This makes it impossible for a general function to detect read errors in a general way: it doesn’t know about filesystem-error and http-client-error nor should it need to.

ERRORS: TELL WHAT HAPPENED OR WHAT TO DO?

How does an error value describe what went wrong? A function that reports an error can be used by many different clients and it cannot make assumptions about those and still be composable. This leads to a few rules:

Error values should be good science and report facts only.

What actually happened. It should use the terminology and semantics of the module detecting the error.

No assumptions, no guesses. No ideas how to handle the problem.

Any assumption how the error will be handled is speculation about the client, which breaks composition.

ERROR CONTEXT & UNWINDING

You get more context for the error while unwinding the call stack. This is somewhat counter intuitive. Closer to the actual error we know more exactly what went on when the error happened. The exact thing that happened. But higher up in the call stack we know more about what the application was trying to accomplish which is also more information.

The code that get an error returned to it will know more about the context of the error than the function that returned the error.

Example: unwind sequence vs what the program could theoretically tell the user at that point:

  1. posix::fread() implementation detects out-of-bounds. Message to user: “Process X failed read outside file “~/Desktop/hello.wav”.
  2. file_module::read_file_stream(). Message to user: “Failed reading file stream “~/Desktop/hello.wav”.
  3. wave_module::read_wave_file_header(). Message to user: “The wavefile “~/Desktop/hello.wav” has a defect RIFF header”.
  4. game_audio_lib::preload_sfx(): Message to user: “Some files have illegal file formats and where skipped”.
  5. my_game::init() Message to user: “Your game installation is corrupt and needs to be reinstalled. Open the steam application, select “Library”, right-click the game and chose “Reinstall”.

Refining the error value on the way up gives a much better response than trying to handle the error earlier. I know which error I would rather like to get when I attempt to replay Portal 2 for the 4th time.

TYPE-E APPLICATION ENDPOINT

Only communicate errors to the outside world at one specific point in the stack of modules. This is the layer where you do all your application’s TYPE E — Application endpoint handling. This avoids the classic problem where you get many (100eds!) of alert boxes for the same user input.

One user action / client request should always result in exactly one response.

CHEATING AND ASSUMPTIONS

All errors must be handled in every module. Making assumptions like “if we run out of memory we can’t do anything anyways” are not OK and breaks composition and limits the use of that module. This kind of compromise can be made at the top-level of your application. not embedded in composable modules.

Making local decisions on which errors are meaningful to propagate breaks composition and makes module less useful

Making rock solid error handling in all production code

ERROR GRANULARITY

For most vanilla code you just rollback and propagate errors (code type B). The code doesn’t care at all which error happened. Error or no error is enough.

HERO ERRORS

For most functions prototypes there are a few errors that are more important to its typical client. It’s a good idea to specify these explicitly.

Ex: open_file(string path) may list file-not-found, permission-error. The function can still return any other error too, like out-of-memory or even a new error implemented in a new version of a submodule.

FIXED ERROR SET GUARANTEE

If a function (or especially a function in a protocol / interface) declares it only reports a fixed set of errors, then it can never be implemented or updated in a way that can cause other errors, not without breaking clients. Alternatively it needs to bend the truthfulness of the returned errors.

If a function can causes a call via an interface / callback / generic — then those functions are also limited to your function’s fixed set of errors forever.

This is a big problem when you connect things together using interfaces and will likely need you to falsify error facts at some points.

Notice that type-B code isn’t really affected by this. It does the same thing regardless of the type of error.

NO-ERROR GUARANTEE

This is taking a fixed-set of errors to the extreme.

Defining a function to never return errors proclaims it has special superpowers forever.

Promising a function is no-error comes with special responsibilities. It means clients can trust this function to never fail today or in the future. It means the function can never call a function that can fail, like allocating memory, use a heap allocation or call any function that is not also a no-error function.

No-error functions are usually leaf functions that are carefully designed to allow transactional things, like compare-and-swap or destructors or rollback.

Notice that type-B code isn’t really affected by this. It does the same thing regardless.

ERROR SAFETY GUARANTEES

Exception safety guarantees defines a set of terms for a function’s robustness: https://en.wikipedia.org/wiki/Exception_safety. Let’s borrow them:

  1. No-error guarantee
  2. Strong error safety, also known as commit or rollback semantics.
  3. Basic error safety, also known as a no-leak guarantee.
  4. No error safety: No guarantees are made.

Most code should aim for guarantee 2. Code with guarantee 3 is only OK at the higher layers of an application.

Code with guarantee 3 is unstable bedrock to build reliable programs on top of.

Avoid 4 for production code.

RUNTIME VS PROGRAMMING ERRORS

There are two categories of errors:

runtime errors = reading a file can fail for reasons outside of the application’s control. Maybe the user deleted the file? There is no way to protect your code from runtime errors.

programming errors = a function protests that its inputs are wrong or discovers corrupt state. It is theoretically possible to make programs that never have programming errors, but with a big caveat, see below.

At first it seems like programming errors / defects is something completely different to runtime errors. More like asserts and dropping into the debugger than normal error handling.

But categorising these errors that way breaks the rule that errors are facts without assumptions on the client.

The client needs to decide how to handle these errors, just like with read-errors.

Example: if you write a game, then failing to find a 3D-model is a critical error and the game exits with an error to the player. If you are writing a game level editor using the same stack of modules, not finding a 3D-model happens everyday and might only cause a small warning icon. It’s the client that decides the severity.

Example: If your module gets bad-param-error when it calls a posix function, then you code is defect and should break into the debugger so you can fix your code. BUT: If you are writing some sort of server, bad-param requests from the outside world will happen all the time when 3rd party developers develop their code and accidentally send illegal requests. You absolutely don’t want your server to shut down or drop into the debugger.

Example: if you are programming your own byte code compiler/interpreter, then all typical programming errors like compiler errors are runtime errors. From the client’s perspective.

Self promotion — don’t forget to checkout Floyd, a new programming language on github: https://github.com/marcusz/Floyd

--

--