What is this thing called programming language?

Kornelije Petak
6 min readOct 25, 2023

--

Photo by Fahrul Razi on Unsplash

đź”— This article is part of the series: What do programmers and computers do?
đź”— Previously: Between humans and computers: A tale of two notations

What is this thing called programming language? 🤔

It seems a bit weird to ask this question, but what is it actually?

So you’ve been writing in one (or in many of them) for a while, maybe. But have you ever stopped to consider why it is called like that? As I mentioned previously, it’s pretty helpful to pay attention to our mental models.

The best way to understand what the phrase means is to understand what each of the words means.

So, what is programming? And what is a language?

I’ll start with the latter because humans are much more familiar with languages than with programming.

At its core, language is a structured way of conveying information.

Structured. Convey. Information. These three words are essential. Let’s do a mental exercise to make it easier to understand why.

If I were to tell you

The astronaut orbits around the Earth.

you would (most likely) understand what I mean.

The reason why you would understand is because the sentence is structured and because you know what each of the words means. The vocabulary and the grammar define that structure. Vocabulary defines what words are available to me, and grammar defines how to use those words to convey the information I want.

If I were to say instead:

The Earth around the astronaut orbits.

you would also understand, but you would get the wrong idea or struggle to understand because even though I used all the correct words, I failed to arrange them according to what I wanted to say.

So, all three things are important. A structure to be followed (grammar), information to be conveyed (words and their meanings), and the intention to convey something in the first place.

Okay, enough about linguistics; how is this related to programming languages?

The whole point is the fact that it is not by accident that programming languages are being called languages. Programming languages embody precisely the same definition: A structured way to convey information.

In the case of programming languages, a structure is defined by the grammar of the language (we often interchangeably use the word syntax).

How about conveying information? The question is, what are we conveying, and to whom? And the answer is straightforward: we are conveying to a computer what we want it to do.

This is also the answer to our initial question: what is programming? At its core, it’s telling a computer what to do.

But how does a computer know all these languages (by some estimates, there are thousands of different programming languages and dialects)? How is it that regardless of which language I’m using, a computer knows what to do?

The following four languages (C, C#, Haskell, Fortran) all do the same thing — print out “Hello World”:

#include <stdio.h>
int main() {
printf("Hello World");
return 0;
}
namespace HelloWorld;
System.Console.WriteLine("Hello World");
module Main where

main :: IO ()
main = putStrLn "Hello World"
program helloworld
print *, "Hello World"
end program helloworld

but how does a computer understand each and every one of them?

It’s really simple. Let me illustrate.

I don’t speak Spanish (although I can understand some). So, if I visit Madrid and want to talk to a native who doesn’t know any other language than Spanish, I’m out of luck.

Well, unless I have somebody who can translate between the language I’m using and Spanish. And if that person knows both languages properly, we can talk completely fine (albeit slower because of translation overhead).

This is exactly what is happening with computers. Somebody (or something) is translating from all those thousands of languages into a language a computer understands.

Now, it’s important to note that when I say “a language that a computer understands,” what I really mean is “a language that a processor (CPU) understands.” Each processor implements a particular architecture, such as x86, x64, ARM, etc. Each architecture speaks a different language, although there are many similarities between them. The CPU architecture specifies the CPU instruction set, an exhaustive list of operations a CPU can perform.

For example, here’s a x86 snippet of code (without any meaningful goal):

B8 04 00 00 00   ; mov eax, 4
B9 03 00 00 00 ; mov ecx, 3
01 C8 ; add eax, ecx
B9 01 00 00 00 ; mov ecx, 1
01 C8 ; add eax, ecx
3D 08 00 00 00 ; cmp eax, 8
74 02 ; je short_label
B8 00 00 00 00 ; mov eax, 0

The mov, add, cmp, and je are instructions, and the hexadecimal numbers on the left are binary representations of these instructions stored in memory and visible to the CPU to execute.

So, if you want to create a new language, you also have to create something that will translate from that new language of yours into one of these CPU architecture languages (also known as machine code). And this something is a compiler.

Not all programming language designers are versed in CPU architectures, so they can’t really create such a compiler. However, suppose there is another language for which there is a compiler to machine code for a desired CPU architecture. In that case, they can create a compiler that compiles from their language into this other intermediate language.

For example, I can speak English, and I want to talk to a Spaniard. And I can’t speak Spanish. Let’s pretend I have no one available who can speak English and Spanish. So they can’t translate for us. But if I can find somebody who can speak English and French, and somebody who can speak French and Spanish, I can talk in English to an English/French person, and they would translate my English into French, and then a French/Spanish person would translate that into Spanish.

When done with programming languages, this is called transcompilation, or shortened: transpilation.

And then there’s another approach in which a language is compiled into an intermediate code representation (similar to a machine code, but still on a higher level of abstraction) — called bytecode. Then, a separate program, called an interpreter, executes that intermediate code immediately as it reads through the bytecode. A variation of this is compiling to machine code from that intermediate code while the application is running and not before, as is the case with compilation.

And then there are more complex alternatives, but I think I’ll stop here.

So, this was a mouthful.

And why all of these philosophical thoughts about languages?

đź’ˇ I have talked about it because I want to create a small programming language as part of this series, and for this language to be useful, we have to understand what would make it useful.

As announced in the previous step in this series, I want to make a small point about notations. In a sense, a programming language, with its syntax and vocabulary, is a notation in and of itself.

It’s a notation in which instructions are given to a computer. As seen in the example of printing “Hello World,” different notations can instruct the computer to do the same thing.

The last point in the previous article stated that we must use the right notation for the right job. This implies that there are good and bad notations (programming languages) for a particular task. This is why I want to think about how to make a good notation — a good programming language.

But what is good in this context?

Let’s go back to our definition of language as a structured way to convey information. We need to consider what kind of structure (syntax) is good for the developer experience.

For example, the programming language Brainfuck has structure (and even Turing Complete), but it’s not good to work with:

+[-->-[>>+>-----<<]<--<---]>-.>>>+.>>..+++[.>]<<<<.+++.------.<<-.>>>>+.

So, to make a good decision about what syntax (structure) to use, it’s good to think about how we (humans) process language. As you’ll see, it’s pretty useful to pay attention to our mental models.

Because as far as the computer is concerned, it doesn’t care about the source language; it just cares about the machine code. But since code is written and read much more by a human (oneself, as well as other humans), it’s important that it’s easy to comprehend and work with.

And when it comes to conveying information, I’ll have to decide what kind of information will even be available to be conveyed; in other words, I’ll decide what the language can do.

In the next post, read about making choices about what the language should look like and what its capabilities will be.

--

--