I’m thinking about designing a new programming language
I recently read a brain dump by veekun about the kind of programming language he would like to see on the market. And I thought, hey, this would be a cool idea to have. I might as well try to design it.
Admittedly, I’m quite comfortable with Python and Java — we use Python in Intro to CS 2 and Java in APCS, and Java was my first programming language — but Java’s a bitch to use, especially when the Processing IDE doesn’t support all its latest features. But low-level problems with their type systems (such as having to use complicated reflection code to create a generic array) can sometimes get in the way of higher-level programming (such as when I wrote a double-ended queue for APCS), and while you should probably just get used to your language’s quirks (or not), I think that designing a compiler that eliminates them is an interesting AI problem.
Let’s start with a bottom-up solution.
I propose a type system where everything is a primitive, but built-in and user-defined traits are automatically assigned to everything so that they can act like objects, too.
So far, my type system has seven primitive data types:
- null
- boolean (true or false)
- int (arbitrary-size signed integer)
- float (64-bit IEEE floating point number)
- string (Unicode)
- table
- function
- type*
(Why no arrays? Why no objects, classes, interfaces, etc? Why aren’t int and float subtypes of number? I’ll get to that soon enough.)
Undoubtedly we are already familiar with all of these primitive types except for table and function. The function data type is already present in JavaScript and Lisp as lambdas; however, I’m going to add functionality to lambdas that allows you to create custom syntax for functions.
The table is the only compound primitive data type. Like a table in a relational database, it consists of rows and columns that can be queried, indexed, and constrained. (On second thought, this idea might be overkill… but we’ll see. Perhaps, as veekun himself suggested, our language could have a tabular data structure in addition to the traditional compound types, hidden away in a library. Nah, I like it too much.)
So you ask, what about the other types — list, set, dict, fraction, complex?
Enter the trait.
A trait is a type with a name, a condition that determines whether the trait applies to a particular object, and a set of constructors and methods that can be used by the object. If an object meets the condition, it automatically belongs to the type that the trait represents. This is just duck typing — so far, nothing new.
Sets, lists, dicts, fractions, complex numbers, and numerous other built-in data types are all implemented as traits on top of primitives such as tables. For example, a set is a table with one column whose members must be unique. A dict is a table with two columns, the first of which must have unique members. A list is a dict whose first column can only contain non-negative integers.
I’m not sure yet what the syntax for traits will look like, but I’m thinking about something like this:
def type set(v=any) = table(value:v unique)def type dict(k=any, v=any) = table(key:k unique, value:v)def type list(v=any, length=+infty) = dict(int, v):
0 <= dict.key < length
As for the numerical data types that aren’t primitives:
def type number = real or complexdef type real = int or float or fractiondef type fraction = dict(string, int):
numerator
denominator >= 0def type complex = dict(string, real):
realpart
imagpart
The ‘or’ keyword in the first two type definitions means that they are unions — any object that is an int, float, or fraction is also a real. Likewise, ‘and’ can be used to create an intersection. Type unions and intersections are analogous to superclasses and multiple inheritance.
Also notice that traits can take parameters that further constrain what kind of data can be applied to. For example, we’ll have a vector(n) trait that’s required to have n members:
def type vector(n) = list(v=number, length=n)
As with functions, these parameters can be required or optional.
If the compiler sees that you have instantiated an object as part of a trait and then try to modify it so that it no longer belongs to that trait, it will raise a compile-time error.
Lambdas — not your average functions-as-objects scheme
I’ve always wanted a language feature where you can define your own syntax. It might work something like traits, where just creating the object changes a whole bunch of objects… Aha! Lambdas!
Lambdas are just functions that are first-class objects. In JavaScript, they’re also used as classes. I think the symbol will be a tilde (~) because it looks like a Greek λ turned sideways. So here’s the basic syntax for a lambda:
~(template expression - use ~ as placeholder for function name):
implementation
You could use the function name in the template expression if you wanted to, but using ~ instead is safer in case you change the function name.
Since there are a whole bunch of mathematical operators already defined for vectors, I’ll use the vector(n) type defined earlier to show how this feature will work.
def + = ~(X:vector(n) ~ y:vector(n)):
return [x[i] + y[i] for i in range(n)]def - = ~(~X:vector):
return [-x for x in X]def * = ~(x:vector(n) ~ y:vector(n)):
return sum [x[i] * y[i] for i in range(n)]def ** = ~(x:vector(3) ~ y:vector(3)):
return [helper(i) for i in range(3)]
def helper(k):
i - (k + 1) % 3
j = (k - 1) % 3
return x[i]*y[j] - x[j]*y[i]
Note, there’s a helper function inside the last function (cross product), and it’s after the return statement. This is possible because all def statements in a code block are executed before anything else. I added this feature because Python forcing me to bury my main code under a bunch of defs is a pain in the ass (Java’s nice about this).
Also note that all of these functions contain list comprehensions. I’ve always thought they were a cool feature in Python; I want them to run on the GPU in my new language. Scientists are looking for ways to compute more and more things on GPUs; they make numerical algorithms lightning fast but the popular languages don’t have built-in support for them, so they’re clunky to use. I want my language to support GPU computation out of the box. More on GPUs and mathematical features in a future post.