On Zed Shaw’s notion of String typing (and typing in general)

Thiago Silva
5 min readNov 28, 2016

--

(or: “On the fact that even famous authors of programming language books create confusions about types”).

Recently, Zed Shaw (from “Learn Python the Hard Way” fame) wrote a post called The Case Against Python 3 (For Now) where he rants over a number of issues about Python 3 and the community. In this post, I’m concerned with one issue he raised: the “Statically Typed Strings”, which he devotes a section to it. On that particular subject, his analysis is confused and confusing, to which I can only attribute to a lack of basic understanding of types — something that is very common in our field.

To his credit, however, the practical concerns he raises, under the stumbling words he uses, are legit. In any case, this is not a rebuttal to any point of his thesis (a thesis I have no interest in); I’m just taking the opportunity to write down some clarifications of a technical concept — namely, types — in the hopes of helping programmers who are familiar but confused about this particular subject.

In what follows, I’ll walk through Shaw’s “Statically Typed Strings” section and comment on the relevant parts.

“Statically Typed Strings”

In the first sentences under this heading, Shaw writes:

Python is a dynamically typed language. That means I do not have to know the type of variable to use it.

Python is a “dynamic language” in the sense that it does not require (and general usage does not involve) a static type checker: a procedure that analyses the structure of the user code (without running it — hence the word static) and reports typing errors prior to executing it.

If I may put a nit-picking hat for a moment, “dynamic” in “dynamic language” does not mean that the programmer does “not have to know the type of variable to use it” . He or she may not have to declare it’s type, but that’s different. Every use of a variable carries an expectation of it’s type. Note, it’s type is not it’s class, necessarily. A type, in this context, can be seen simply as a set of operations that are expected to be supported by the values referenced by the variables. Such type may not even have a name. So, a certain type may be simply all values that can be “+” to another without raising a TypeError. Then, any variable a programmer writes that participates in an expression has an expectation by the programmer writing it — that expectation is knowledge of types.

Next, Shaw presents the following code and discusses the difference of behavior between Python 2 and Python 3:

def addstring(a, b):
return a + b

He notes that calling, , addstring(u’x’, b’y’) in both versions of Python yield different results: version 2 results in the string u‘xy’ and version 3 results in a TypeError exception.

This change in version 3, he notes, is very inconvenient for programmers. I suppose any programmer who had to deal with unicode and byte strings in Python would likely nod in agreement. However, later on, he calls this “Dynamic and Static Mismatch” and cumbersomely writes:

One fatal flaw of this decision to “static type” the strings is Python lacks the type safety gear to deal with it.

Here, he says that strings are “statically typed”, which presumes (and can only make sense if) there is a static type checker occurring. However, a static type checker in the standard Python stack (2 or 3) is nowhere to be found. His use of quotes on “static type” seems to indicate he is unsure how to refer to this or just don’t know how to present the issue in a better way.

One better way is simply this: the behavior of “+” changed. More precisely, it became more strict in version 3. Of course, that behavior is only discerned when executing the user’s code, thus it’s always dynamic (and never static, as per the common meaning of the word in such contexts).

Another way of illuminating the issue is to sketch the type interface of this function in both versions using overloaded functions.

In Python 2:

+(basestring, basestring): Union(str, unicode)

In Python 3:

+(st, str): str
+(unicode, unicode): unicode
+(str, unicode): TypeError
+(unicode, str): TypeError

[Note: I’m restricting myself to the operation on strings and ignoring “+” on other types such as numbers and lists].

Thus, in Python 2, the overloaded “+” operation on any type of strings will concatenate them and return either a str or a unicode depending on the parameters. This is somewhat analogous to the common int/float overloading of “+” (so called “promotion”). In Python 3, however, applying “+” to arguments of distinct types raises a TypeError. Again, this is analogous to accepting only sums of ints (e.g. 1+2) and sums of floats (e.g. 1.5 + 2.5), but disallowing a sum involving an int and a float (e.g. 1.5 + 2 or 1 + 2.5).

Yes, there is type checking involved here (generally, any raising of TypeError is a type checking) but they are not static in any way; they occur during the execution of the user program.

[As to his remaining remarks about lack of some “type safety gear” and how it relates to the “static checking” of strings he mentions, I’m at lost.]

Following, we read:

Python is a dynamic language, and doesn’t support type declarations on function arguments.

Actually, Python 3 does support type declaration syntax as per PEP 484. While these declarations have no effect on standard Python stack, they can be used by third party tools to perform code analysis. One of these tools gaining momentum is MyPy, a (now, really) static type checker for Python 2 & 3.

It’s also not statically compiled as strongly as it could be, so you can’t find these kinds of type errors until you run the code.

Above, Shaw is conflating different concepts, perhaps in an attempt to just say that vanilla Python lacks static type checker. First, there is no notion of “static compilation” that has any relation to type-checked code. The closest meaning for static compilation I know of is related to offline compilation, or compilation of code occurring when the user’s code is not running, contrasted with dynamic compilation as performed by Just in Time — JIT — compilers, which compiles code during the execution of the user’s code. Furthermore, there’s no notion of a scale of compilation that may go from weak to strong.

There is, however, a misconception of types talked in terms of “weak types” and “strong types” that perseveres to this day. A rebuttal to Shaw’s post has also critiqued his wording on “Statically Typed Strings” but suggested that the issue is of “strong typing”, instead of static typing. On one hand, there’s a point to it, as “weak” typing usually means that some operations are overloaded to a point where it’s use becomes confusing. On another, this terminology is usually avoided by those who study type systems (“weak” and “strong” types have no clear definitions and are not so useful when working and reasoning about types). Therefore, it’s better to avoid “weak” and “strong” qualifiers altogether and stick to a more precise description of the problem. In this case, one can just say that “+” became less overloaded.

--

--

Thiago Silva

I used to do programming language research. I still do, when nobody is looking…