Reflections on Trusting Trust

Krisha Mehta
Computers, Papers and Everything
4 min readDec 31, 2018

Reflections on Trusting Trust by Ken Thompson is one of the most influential papers which describes how important it is to trust code written by others. To explain it better, the paper begins with the following lines-

To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.

Ken Thompson is an American pioneer of computer science. Having worked at Bell Labs for most of his career, Thompson designed and implemented the original Unix operating system. He also invented the B programming language, the direct predecessor to the C programming language, and was one of the creators and early developers of the Plan 9 operating systems. Since 2006, Thompson has worked at Google, where he co-invented the Go programming language. The following paper has been derived from the Turing award lecture he gave.

It is very difficult for a programmer to trust the code written by others. To delineate this, Ken talks about one of the programming exercises he participated in. The goal of the exercise was to create the shortest self-producing program i.e to write a source program that, when compiled and executed, will produce an exact copy of the source as its output. For people coming from a non-cs background, compiling a program means to convert the program from a programming language like C, Java into machine language that the computer processors understand. The program was written in three stages.

Stage 1:

The program was written in C language. It was too large to win a prize but demonstrates two important properties
1. The program can be easily written by another program.
2. The program can contain an arbitrary amount of excess baggage that will
be reproduced along with the main algorithm. Even the comments are reproduced.
Here is the code:

Code for Stage 1

Stage 2:

The compiler to compile this code is written in C. However, this leads to many problems as the code and the compiler are in the same language. C allows a string construct to specify an initialised character array. Characters can be escaped to represent unprintable characters. To simplify this, lets take an example:
“ Hello World \n”
Here, \n represents a new line. The code below is an idealisation of the code in the C compiler that interprets the character escape sequence. The compiler knows what character code is compiled for a new line in any character set. The act of knowing then allows it to recompile itself, thus perpetuating the
knowledge.

Now, if we wish to add a new escape sequence like “\v” to represent the vertical tab character. To do this, we add a new case in the code as follows:

However, the compiler does not understand what \v means. We need to train the compiler for it to understand that. In order for the compiler to learn what \v means, we have convert it into decimal so that computer can understand the meaning of the command to be executed. Vertical tab i.e \v is decimal 11. Hence the source code is modified accordingly. Once the compiler understands what \v means in decimal, the self-referencing definition can be used.

Stage 3:

In this stage, a modification is made that will deliberately miscompile the program when a particular pattern is matched. If the modification was not deliberate, it would have been called “a bug.” Since it is deliberate, it should be called a “Trojan horse.” The actual bug planted in the compiler would
match code in the UNIX “login” command. The replacement code would miscompile the login command so that it would accept either the intended encrypted password or a particular known password. Thus if this code were installed in binary and the binary were used to compile the login command, one could log into that system as any user. Such blatant code would not go undetected for long. Even the most casual perusal of the source of the C
compiler would raise suspicions. The next step adds a second Trojan horse. This is a Stage 1 self-reproducing program that adds both the Trojan horses into the compiler. This in turn, requires a Stage 2 learning phase. First we compile the modified source with the normal C compiler to produce a bugged binary. We install this binary as the official C. We can now remove the bugs from the source of the compiler and the new binary will reinsert the bugs whenever it is compiled. Of course, the login command will remain bugged with no trace in the source anywhere.

Conclusion

It is difficult to trust the code that we have not written ourselves. Ken criticizes the press for the way they have handled the issue of “hackers.” Criminal code for such cyber-attacks should be improved. It is essential to trust trust.

--

--