What is tranlastion: Part II — programming languages

This is the right time to revisit Part 1, if only to get a feel of where I am going with this. This is not a technical article (for your own sake and everyone else’s, please don’t copy paste my code), rather my take on the linguistic aspects of computer languages.

In a rather sarcastic take on the INTERCAL language, Charlie Stross remarked:

INTERCAL is a programming language which is to other languages as elephants are to deep-sea fishing — it has nothing whatsoever to do with them. Nothing … except that it’s a programming language. And it serves as a wonderful example of the fact that, despite their theoretical, abstract interchangeability, not all languages are born equal.

Now, if you haven’t heard of INTERCAL, it is a rather low level language (actually, just assembly language dressed up to feel like a real programming language). It is not, however, the ‘low level’ that is the worst part of INTERCAL (C is sufficiently low level, if you want it to be). It is the way it was designed (as a practical joke); for example — statements must begin with the verb DO, or even PLEASE DO. This is just one of many syntactic derelictions that has been put in INTERCAL (again, the language has been designed as a joke). The most infamous of these are, probably, the COME FROM statement:

Unlike FORTRAN, INTERCAL has no evil, unspeakable GOTO command, and not even an IF statement. However, you would be wrong to ascribe this to INTERCAL being designed for structured programming; it is actually because C-INTERCAL is the only known language that implements the legendary COME FROM … control statement … For many years the GOTO statement — once the primary means of controlling the sequence of execution of a program — has been reviled as contributing to unreadable, unpredictable code that is virtually impossible to follow because it jumps about the place like a kangaroo on amphetamines. The COME FROM statement enables INTERCAL to do away with nasty GOTO’s, while still preserving for the programmer that sense of warm self-esteem and achievement that comes from successfully writing a really nasty piece of self-modifying code involving computed GOTO’s in FORTRAN … Basically, the COME FROM statement specifies a line, or lines, which — when the program executes them — will jump to the COME FROM: effectively the opposite of GOTO.

Now this is all very silly. Except that INTERCAL, in spite of is ludicruous syntax, has not been to difficult to translate. In fact, there is a (kind of) translation of INTERCAL to C, the ‘C-INTERCAL’ compiler, which is an INTERCAL compiler written in C, and runs on anything that runs C. Which only stokes the old flame: are all programming languages interchangeable?

Let’s consider a most simple function; this code in Python

if x >= 0:
print(x)
else:
print(-x)

This is the conventional abs function which provides the absolute value of a number, irrespective of its sign. In C, this would be something similar

if (x >= 0)
printf("%d",x);
else
printf("%d",-x);

For simple snippets like this, one can get away with a literal, keyword to keyword translation. What about more complicated stuff?

Objects

Languages differ in the way they perceive the world. One major watershed is ‘object orientation’ which is the way programs represent the real world. For example, this thing

  1. holds something inside it (procedural definition)
  2. is a box (object oriented definition)

The distinction is not as watertight as you would think. All programing languages need data structures; the differences between an object-oriented and procedural language is simply how they wrap abstractions around it.

This python code defines a box

# Pythonclass Box:
w = h = d = 0
def __init__(self,w,h,d):
self.w = w
self.h = h
self.d = d
mybox = Box(3,4,5)
print(mybox.w) # 3

This can not be translated directly into C, simply because C doesn’t support objects. The situation is similar to translating between English and Guugu Yimithirr; the later is characterised by lack of words for right and left (in fact, speakers of the language speak only in cardinal directions — i.e. ‘pick up the glass with your north hand and put it southwards’ or ‘I have headache towards the west of my head’; remarkably, these people have an aboslute sense of direction, no matter wherever they’re placed). One can now choose to skip the ‘Box’ altogether and focus on its three properties (w,h & d) and store them in an array.

/* C */
include <stdio.h>
main()
{
int mybox[3];
mybox[0] = 3;
mybox[1] = 4;
mybox[2] = 5;
printf("%d",mybox[0]);
}

Now, ‘mybox’ in C is no longer a ‘box’, but a list of three numbers — which define the box in its entirety. (I cannot but remark here, the Nyay school of Indian philosophers will have a field day translating Python to C). One might consider this as a ‘translation’ (it gets the job done), though not preserving the ‘spirit’ of the original.

If one attempts to translate the ‘C’ code back to the original Python, there is a stark risk of losing all the object-oriented baggage. This is because Python, like C, supports arrays (‘lists’), and a backtranslation from C would simply be

# Python
mybox = [3,4,5]
print(mybox[0])

One journey through low-level-land and our program has lost all its mystical ‘object oriented’ aura, and converted into a humdrum list of numbers.

Yes; everything that a computer does is represented by sequence of numbers in the memory. But at this point, you must think — what is the purpose of languages anyway?

Natural languages describe the real world — which is really one whole, continuous entity. Languages imparts discreteness to reality — an artificial construct to say the least. Language provides us the notion that this is a ‘box’ and that is a ‘ball’, or another ‘box’. However, the very fact that languages use words to label something is also its solemn weakness. One famous Zen koan emphasises this point:

Shuzan held out his short staff and said:
“If you call this a short staff, you oppose its reality. If you do not call it a short staff, you ignore the fact.
Now what do you wish to call this?”

The point being, if you call it just a ‘short staff’, you have omitted so many other facts about it (its material, texture, shape, molecular structure, ad infinitum…) that the description ‘short staff’ is very close to completely useless. The adage ‘short staff’ provides a boundary of a a region in spacetime (i.e. that occupied by the staff), i.e. it ‘cleaves’ spacetime. As Steven Pinker remarked in The stuff of thought

Kant was surely right that our minds “cleave the air” with concepts of substance, space, time and causality… They are digital where the world is analogue, austere and schematic where the world is rich and textured…”

Programming languages operate on a much more finite domain, i.e. the RAM of a computer, which is simply a sequence of 0 and 1’s. However, they must cleave this sequence in chunks for any rational meaning to appear. Object oriented languages take larger chunks of this sequence and represent a ‘box’, whereas C would slice a much smaller chunk and call it a list of three numbers. The chunk produced by OO languages is bigger, simply because apart from the three numbers, it must also accommodate the word, ‘Box’.

First class functions

Now, let’s move to a different kind of translation problem. What to with this python code

# Pythondef square(num):
return num**2
x = [3,4,5]
print(map(square,x)) # Prints [9,16,25]

There’s a lot happening here than meets the eye. x is a regular list, and square is a regular function which does what you would expect. However, map is a built in Python function which takes the name of another function as input. Languages that implement this feature are said to be ‘functional’, and this has been the second great watershed of programming languages. Essentially, the map function applies the ‘square’ function to each item in x, and thus produces a new list.

To implement this in C, one can be naive to the utmost, and still get the job done. Seems that functional programming is not that great at all!

/* C */#include <stdio.h>main()
{
int x[3];
x[0] = 3;
x[1] = 4;
x[2] = 5;
int y[3];
int i;
for (i=0;i<3;i++)
y[i] = x[i]*x[i];
printf("%d %d %d",y[0],y[1],y[2]);
}

However, functional languages must also have the ability to return a function as the output of another function; and this is where compatibility between imperative (i.e. C, Java etc) and functional languages (Python, Javascript) end. For example, in Python, you can write something like:

def f(a):
def g(b):
return a,b
return g
print(f(3)(4)) # (3,4)

Which does exactly the same thing as

def f(a,b):
return a,b

Taking the roundabout (function returning a function), also called currying has a few interesting applications. The most striking use of currying, was demonstrated in this article by Steve Losh, where he demonstrated a purely functional way of constructing an array (i.e. without using any data structures).

empty_list = Nonedef prepend(el, l): # Appends el to the beginning of a list l
def f(command):
if command == "head":
return el
elif command == "tail":
return l
return f
def head(l):
return l("head") # The 'list' l is actually a function
def tail(l):
return l("tail")
def is_empty(l):
return l == None
e = empty_list;print(is_empty(e));
# True
names = prepend("Jaimini",prepend("Yaska", prepend("Kapil",empty_list)))
print(is_empty(names))
# False
print(head(names))
# Jaimini
print(tail(names))
# Some function representing the list of ("Yaska", "Kapil")
print(head(tail(names)))
# Yaska

As Steve explains, the magic happens in the prepend function, which returns another function, el or l depending on what question is asked (‘head’ or ‘tail’). This represents a significant departure from imperative languages, and does not have any translation in C. The situation is eerily similar to translating from modern English, which is rich in subordinate clauses, to early English/ Hebrew/ Russian, which did not have such a rich set of subordination. For example, what in modern English would be

There was smoke because of the fire started by the demon.

Would have been said by a Englishman from the middle ages as

There was a demon.
And the demon started a fire.
And there was smoke.

This is the kind of English which is found till the fifteenth century, i.e. when written language was very close to the spoken version. Things change quickly in the digital world, and between C & Python (actually, between FORTRAN and Lisp), functional programming seems to have flowered.

Is C really so primitive?

In writing about translation between programming languages, I have ignored libraries and focused only on core language features. (One could write ‘document.write()’ in C only if a library for web browsers is made available in C). There is, however, one aspect of C which is nearly as powerful as ‘divine’ languages like Lisp.

Everybody’s who’s nibbled at Lisp knows that Lisp (and its more modern version, XML) is its own parser. What is not so well know, is that there are two programs written in C (and part of standard Unix systems), lex and yacc, that can translate a grammar to a parser, i.e. it can look at the ‘description’ of a language and generate a parser for it. For example, if you define a new programming language to consists of ‘lines’ which are made of ‘expressions’, like the following:

%%
line : line expr '\n' {printf("%d\n", $2); printf("> "); }
| /* empty word */
;

expr : expr TOK_PLUS term {$$ = $1 + $3; }
| term {$$ = $1; }


term : factor TOK_TIMES term {$$ = $1 * $3; }
| factor {$$ = $1; }


factor : TOK_LP expr TOK_RP {$$ = $2; }
| TOK_NUMBER {$$ = $1; }
;
%%

(For this example, and the full tutorial on lex and yacc, visit this site.)

  1. The first line defines a ‘line’, which is any combination of another line, some ‘expression’, and a newline ‘\n’
  2. The second line defines what an ‘expression’ is, i.e. another expression plus serm ‘term’
  3. The third line defines a ‘term’ as a ‘factor’ multiplied by a ‘term’
  4. The final line describes the atomic unit of this language, a ‘factor’, which is a number

When you run yacc, it generates a compiler for this new language. This new language is actually a mini calculator for additions and multiplications, and can calculate expressions like

2 + 4 + 3 * 5

In my view, lex and yacc are as close as any ‘high level’ translation of the magical self-parsing capabilties of Lisp (and other functional languages). In fact, functionalities of these two programs have been replicated in XSLT, which converts one XML dialect to the other (i.e. converting a SQL database directly to a webpage through XML) based on only a specified grammar.

I hope I could make my point between interconvertibility between programming languages, on different levels, and point out at similar issues faced with natural languages — which goes on to show programming languages have reached a level of complexity where they are beginning to rival natural languages, at least in the matter of difficulty in translation.

Postscript: Javaland

I have nothing against Java (in fact, I learnt most of OOP from Herbert Schildt’s Java2); however, since mid 2003 — I did not like where the language was going.

Consider writing a text file in python

f = open("file.txt", "a")
f.write("So it shall be written")
f.close()

Simple enough, without being overtly pretentious. Consider the same in Java.

import java.io.FileWriter;public class WriteToFile {
public static void main(String[] args) {
FileWriter w = new FileWriter("file.txt");
w.write("So it shall be written");
w.close();
}
}

Somewhere in early 2000s, object orientation ran amock in Java and suddenly, all functions were subjugated to an object (maybe I’m wrong, maybe Java has always been like this). As Steve Yegge points out, nouns like ‘FileWriter’ have taken over Java, creating a most artificial sounding language. But when you think about it, naming things per their function is deeply ingrained in Western civilisation (‘dishwasher’, ‘lawnmower’), which also apply to people (you never know if the aforementioned ‘dishwasher’ is an actual person). This tendancy might have spilled into double-abstract things like a ‘FileWriter’. While Java retains its utility, reading Java code makes you realise — it is a linguistic faux pas. The difference in culture between Java & Python (or C) is almost like that between the West & the East, and no amount of translation can reconcile their differences.

Muggle