We should thank Sanskrit for the 21st century

We live in the age of information. Data is generated or gathered all around us, and processed to become information that has value. The processing of this information requires logic, which has to be formulated in structured ways. When we first learn the rules of logic in a high school math class, we learn them through the medium of whatever language we speak. Think “Kids, today we’re learning about the contrapositive. Take the following sentence: ‘If something is a square, it is a quadrilateral.” The contrapositive is ‘If something is not a quadrilateral, it is not a square.’” This is how we are exposed to the rules of logic: through the langauge we speak (in this case English).

The basis of all higher-level logic is Boolean logic. This is easy enough to understand via the language that we speak. However, to build more complex logic on top of this foundation, the human language becomes an inadequate tool. Human languages lack a robust structure, are in constant evolution, and are prone to having their rules broken often and with no shame. Such a medium does not stand up to the requirements of creating abstract logic.

Thus it seems like an obvious dichotomy that any programming language capable of creating high-level abstract logic would have to differ quite a bit from fickle, unstable human languages. This dichotomy has been classified as natural versus formal languages. Natural languages are those that humans have evolved to communicate with each other and facilitate society and culture. Formal languages are those that are not a product of human evolution but of careful planning and codification.

Incredibly, wonderfully, Sanskrit is both.

Sanskrit has been around on the order of three to four millennia, since the oldest text we have in Sanskrit is the Rig Veda, dating from 1700-1100 BCE. The Rig Veda is a sacred text, a Vedic collection of wisdoms. However, these wisdoms could not be imparted solely through the meaning of the text: the sounds, pitch, and tonality were equally important. Thus both qualities of the text (semantic and vocal) had to be pristinely transmitted from Brahmin to Brahmin, generation to generation, to maintain the integrity of the wisdoms.

To meet this need, a Hindu grammarian by the name of Panini dedicated his life to composing the Ashtadhyayi around 500 BCE. In this monumental text (often credited with creating the fields of descriptive and generative linguistics), Panini set out to create a “complete, maximally concise, and theoretically consistent analysis of Sanskrit grammatical structure”. I hope you’re starting to see some parallels.

In his magnum opus, Panini created and codified 3,976 rules that prescribe the generation of Sanskrit words and sentences from roots. The roots are derived from phonemes and morphemes. A phoneme is any distinct unit of sound (e.g. in English, “p” vs “b” vs “t” vs “d”). A morpheme is the smallest grammatical unit in a language (e.g. “unhappiness” is made up of the morphemes “un-”, a bound morpheme and the prefix, “happ[y]”, a free morpheme and the root, and “-ness”, another bound morpheme and the suffix).

Panini also provided a list of all Sanskrit phonemes. Before we get into the truly delicious parts of his grammar, let’s see what we’ve got so far. We have lexical inputs in the form of phonemes. And we have an algorithm (~ 4,000 rules) that allows for precise generation of clear and well-defined words. So using Panini’s text, one could conceivably take an input, process it with the rules, and output a legitimate Sanskrit word. In fact, this is exactly true. One could take a phoneme long out of fashion, combine it with other such phonemes, and as long as one follows the rules, the output will be a legitimate Sanskrit word, even if that word hadn’t been heard in millennia, or ever. That’s the power of Panini’s systematic grammar.

Panini also developed a meta-linguistic system for referring to entire classes of phonological segments with just one syllable (hmm, a class that can be instantiated with a variable…). He also created another metalanguage that allowed him to speak precisely and unambiguously about the language he was analyzing.

There are multiple simultaneous planes of abstraction in Panini’s grammar. And here lies the true innovation: Panini paradoxically constructed a system that can be both extremely precise and open to infinity. By infinity I’m referring to the fact that one can use Panini’s rules and meta-rules to create an infinite number of grammatically, syntactically, and semantically correct words and sentences. The deterministic precision and interminable magnitude form such a powerful basis that it underlies John Backus’s Backus Normal Form, a formal language theory that Backus, a computer language design pioneer, allegedly discovered independently. This is a system of metalinguistic formulae that Backus designed with the goal of abstracting the low-level computer languages of the 1950s (machine code and assembly) into higher-level languages.

I say allegedly because of the history of Western linguistic theory and the fact that this theory directly informed how Backus constructed his language theory. As the Sanskrit scholar Murray Emeneau writes, “Most of the specific features that are taken… to distinguish an ‘American’ school of linguistics from others are Blomfieldian, and … many are Paninean”. Leonard Bloomfield was a scholar of structural linguistics whose work greatly influenced the development of linguistics science in the 20th century, particularly America. He studied Sanskrit as a graduate student at the University of Wisconsin and later in Germany. He summarized the impact of Panini’s work on modern linguistics as follows:

“Around the beginning of the nineteenth century the Sanskrit grammar of the ancient Hindus became known to European scholars. Hindu grammar described the Sanskrit language completely and in scientific terms, without prepossessions or philosophical intrusions. It was from this model that Western scholars learned, in the course of a few decades, to describe a language in terms of its own structure.”

Another scholar, Paul Kiparsky, asserts that “Western grammatical theory has been influenced by [Panini’s work] at every stage of its development for the last two centuries.”

Given that Backus predicated his formal theory on then-modern Western linguistic theory, it is not a stretch to claim that FORTRAN and all its siblings and scions in the realm of high-level programming languages owe their linguistic origins to a Sanskrit grammarian.

In 1967, a programmer named Peter Zilahy Ingerman wrote to the Communications of the ACM (Association for Computing Machinery) to argue that “since it is traditional in professional circles to give credit where credit is due, and since there is clear evidence that Panini was the earlier independent inventor of the notation, may I suggest the name ‘Panini-Backus Form’ as being a more desirable one?”

As a species, we have accomplished an inordinate amount (even by our standards) in the past half-century. Let us not forget that we stand on the shoulders of giants.