Ballerina: Thinking about names .. why restrict to English?

Not just identifiers but also reserved words!

Sanjiva Weerawarana
5 min readApr 15, 2017

[This blog is a continuation of a series of blogs I’m writing about Ballerina, the sequence diagram based programming language we’re creating. See http://ballerinalang.org/ for more information. My last blog was about how we make sequence diagrams work as a programming language; see https://medium.com/ballerinalang/ballerina-making-sequence-diagrams-work-d0d7b3846a80.]

Programming languages have traditionally allowed a fairly limited set of characters for identifiers, the names that programmers must come up with when writing code. Prior to Unicode, for example C used to require that identifiers start with either “_” or an upper or lower case English letter and be followed by zero or more of those or a digit. So if you were a programmer thinking in Thai, well, tough luck — you have to name your variables and functions and everything in English. That’s awful if you’re a Thai hacker because all good code uses thoughtful, semantically meaningful words for identifiers, but then you need to express that in English!

Java and more modern languages that came after Unicode have changed that drastically to allow a much broader set of code points (aka characters). That means we can now name variables and identifiers in Java with Sinhala names for example:

public class hello {
public static void main (String[] args) {
String නාමය = args[0];
System.out.println ("Hello, " + නාමය);
}
}

So of course Ballerina allows a similar set of characters for identifier names. Easy to do these days thanks to Unicode and UTF-8. However, Ballerina is not just a textual language but also a graphical one.

What about spaces in identifiers?!

WHAT?! Heresy!

Is it though?

I started my undergraduate programming life in 1985 on a VAX-11/782 running VMS. Then of course moved onto BSD Unix. Both accessed with VT100 terminals .. all of 80 columns by 24 rows. With limited space available I got used to naming directories and files usually with one short word and on a good day with two short words using a hyphen or an underscore to connect the words.

That’s a practice I still maintain .. but now I’m much more liberal with the length but still insist on using hyphens to connect the words.

Of course using a space on the shell meant you had to suffer maximum pain to escape that character with a “\”. Just not worth it. I still have several terminal windows open all the time running multiple shells; so files with spaces in their names are just a nuisance.

But oh my — when I look at files named by non-CS type people today they are full of long names and with (DRUMROLL) spaces separating them!!!! Ugh.

So really the difference is that they don’t even know the word “terminal”, let alone “shell”. They just know that you open the file browser and type the name you want.

Ballerina has a complete graphical syntax in addition to its textual syntax and in the graphical editor you just type what you want for identifier names into the box for it.

Ballerina will allow identifier names to contain spaces and use any set of Unicode code points the person wants.

To make this work we will have syntax for identifier literals .. similar to string literals (using double quote characters to demarcate of course) but instead using the vertical bar (|) character to demarcate the identifier name.

Thus, the following will soon be valid Ballerina syntax :

function |My First Function| (string s) (string r) {
r = s + s;
return;
}

Heresy? Maybe to us old programmers .. but to someone just writing a program without having been brainwashed with all the arcane rules we take for granted, is it at all?

What about those pesky reserved words — why English only?!

Ballerina has a host of reserved words — function, service, resource, connector, action, if, else, worker, fork, join, etc. etc.. Notice that they’re all in English!

In fact, that’s the case in pretty much every programming language that I’ve ever used: all the keywords and/or reserved words have been in fine English, all carefully designed to use exactly the right word to capture the semantics.

In designing Ballerina we have often spent hours and days arguing about the right English word to use to make the syntax as intuitive as possible. (That discussion can be quite fun as most of us are non-native English speakers but that’s another story!)

Well isn’t that ridiculous?! Lets go back to that Thai programmer — WHY does she need to learn a set of English words to tell the compiler what she wants to do?

Here’s the Ballerina hello world sample in English:

import ballerina.lang.system;function main(string[] args) {
system:println("Hello, World!");
}

Now here it is in Thai (with the help of Google Translate .. apologies in advance for bad language!):

นำเข้า นางระบำ.เท่านั้น.ระบบ;ฟังก์ชัน หลัก(สตริง[] อาร์กิวเมนต์) {
ระบบ:พิมพ์("Hello, World!");
}

Um that doesn’t look like Ballerina at all at first blush! BUT, all we’ve done is allowed a string of Thai characters to map to the same underlying token in the grammar. That means, when you look at the syntax trees of the two programs they are identical!

In the above I took some extra liberties to drive home my point: I assumed that the reserved namespaces under ballerina.* would also be given synonyms in Thai. That’s yet another step of internationalization.

The cool thing is that an editor that is aware of the different language bindings can show the “English version” of this code, with respect to reserved words, automatically and with ZERO loss of information.

We’ve done some experiments with making this work in ANTLR4, the tool we use for the Ballerina parser. More work is TBD to make it work so we can add language packs easily.

Programming need not be an English-only endeavor!

What we’re trying to do with Ballerina is question every standard mantra of what programming is and “the way things work” and see whether we should change it.

This is 2017 and there are more non-English speakers waiting to program than there are English speakers .. why do we need to disenfranchise them by forcing them to use foreign words?

Ballerina will be a fully internationalized language where every bit of the code can be in another (supported) language. Yes you will be able to mix languages too, but children, please, behave.

Thoughts welcome!

(I’d like to thank our co-conspirator James Clark for making me think further out of the box for Ballerina!)

--

--