Sapir-Whorf and You

There is a tidbit in the Python glossary:

EAFP
Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterised by the presence of many try and except statements. The technique contrasts with the LBYL style common to many other languages such as C.

There is a conjecture called the Sapir-Whorf hypothesis which, in a nutshell, states that the languages you ‘speak’ shape the way you think and reason about the world around you.

If we take Sapir-Whorf as proven fact and look at programming languages a curious discovery crops up: the languages people code in certainly do manipulate the way they think about building programs. The above quote from the Python glossary is a shining example; even most other imperative languages would shun the use of exceptions except for exceptional cases, but Python makes it normal control flow! An example in Python is casting a string to an int:

s = ‘7’
i = int(s) # => 7

What if we give int garbage?

s = ‘string’
i = int(s) # => Raises a ValueError exception

If we write a program that will be turning strings into integers and there’s a chance the string is garbage, the Python way is to just assume the string is fine, but put a try-catch around the exception throwing statement:

s = ‘string’
i = 0
try:
i = int(s)
except ValueError:
pass

Now the throwing of the exception is similar to an if-else, except for the performance hit when the exception is actually thrown and handled. In contrast, per the same quote above, we have the C-ism; Look Before You Leap. In C, we do not have an exception system, so return values, stack snapshots, and other error ‘handling’ mechanisms were born (e.g. errno et. al.). For a long time C programs used number return values to designate particular types of errors; many other C-derived languages took from the same convention (“Error Code 8275”…). In C, there is a function atoi which is similar to int in it’s aim:

#include 
#include
int main() {
char* s = “7”;
int i = atoi(s); // => 7
}

If we pass garbage to atoi, it will simply try to parse as much as it can and if it can’t make sense of anything, it will simply return 0. The only way you can get zero from int is if you invoke it with no arguments whatsoever, e.g. int()or you actually pass a string with one or more zeros.

So how is this ‘looking before you leap’? Since C is only handling values and not throwing exceptions, C has to be careful about what it returns and so needs to define what is valid for various cases. Don’t get me wrong, C is not a type-safe language and has plenty of undefined, dark corners! If we were to look into the implementation of atoi, though, we’d probably see something like:

#include <string.h>
#include <ctypes.h>
int atoi(char* buf) {
size_t len = strlen(buf); // XXX buf may not be nul-terminated!
int rv = 0;
if (buf != NULL || len > 0) {
for (int i = 0; i < len; ++i) {
char c = buf[i];
if (isdigit(c)) {
rv += c — ‘0’;
} else {
break; // Stop parsing.
}
}
}
return rv;
}

And we can already see the ‘looking’ that is going on here. One could argue that we could put asserts in, but the point of an error versus an exception is that errors are for irrecoverable situations, and not being able to parse an integer string should be recoverable. This implementation has a glaring, common C problem which is that strlen may fail if the string passed does not contain a nul-terminator. In this case we could make a modified atoi which accepts the length as an additional param, passing the responsibility of the length calculation onto the client who crafted the string in the first place. The signature would change to something like:

int atoi_new(char* buf, size_t len);

and the reference to strlen removed.

So are these the only two ways to program? Hell no! A lot of this is boilerplate that can be abstracted away with a strong type system. The common goto in a strongly typed language such as Haskell is read but anytime you’re working on a String for a datatype, you’re effectively writing a parser, and read is no different and will throw exceptions just like its imperative cousins! This is no better than our Python case, so can we do better and leverage the type system to help us? Hell yes!

{-# LANGUAGE OverloadedStrings #-}
import Text.Read (readMaybe)
i :: Maybe Int
i s = readMaybe s
main :: IO ()
main = print $ i “12” => // => prints “Maybe 12”

To me, this is what continually draws me back to types. When you start thinking type-first, you notice how other approaches are definitely shaped by their idioms. The C tradition of returning only ints for functions and convoluting error codes in addition to possibly valid return values can be circumvented, in particular, taking a lesson from the typed approach, we could return back a struct (product type) in our C example that has both the value and the error, which can be checked by the caller:

#include
struct intPair {
int val;
errno code;
};
struct intPair strToInt(char* str, size_t len);

We could also make our own form of Maybe and check the tag of the struct to see if something came back or not. Using union for sum types is a bit tricky in C as union can only be one thing at a time, and once set accessing the other fields can cause all sorts of nastiness.

The inspiration of this post was Chris Martin mentioning how he felt that a programming language is more like a calculus or an algebra and I think that’s a direct example of Sapir-Whorf. Working in languages that are fuelled by denotational semantics makes you see everything from the very same mathematical standpoint. In contrast, a lot of imperative languages bring people into thinking about the machine(s) architecture, first.

I want to stress that no one way is ideal over the other for every situation. As always, there are no silver bullets. Just as it is a positive thing to learn a new language as often as you can, it’s also important to learn new paradigms and tools. If you’re in the camp that languages are tools, this adheres to the metaphor that the more tools you have in your tool kit, the better off you are, but in this case the tools are the mental concepts we pick up as we continue to learn from as many places we can look.