An Alien Dictionary

Graphs (Part 1)

Sergey Piterman
Outco
5 min readMar 11, 2020

--

Imagine you stumbled across a list of words, written in an alien language you didn’t understand. You have no idea what the words mean, you have no idea how to pronounce them, and you don’t even know how many characters are in their alphabet.

The only piece of information I can give you about that list of words is that they are in alphabetical order. Your task is to figure out what that order is, based on that list of words given.

Now, you could try to solve this problem for a list of words using made-up characters like in the title image, but for simplicity’s sake, let's just use the English alphabet. So to be clear, we’re going to pretend that the aliens use the same letters as us, but they have a different alphabetical order.

Which shouldn’t be too hard to imagine, considering our alphabetical ordering of letters is arbitrary. Kind of like how all words are made up:

So, let’s get our inputs and outputs. Our function will take in a list of words (strings), and will output an array of characters (single character strings). Here’s an example:

Input:  words[] = {"baa", "abcd", "abca", "cab", "cad"}
Output: Order of characters is 'b', 'd', 'a', 'c'
Note that words are sorted and in the given language "baa"
comes before "abcd", therefore 'b' is before 'a' in output.
Similarly we can find other orders.

Input: words[] = {"caa", "aaa", "aab"}
Output: Order of characters is 'c', 'a', 'b'

This may seem like an odd problem with very few real-world applications, but there is one historical event that immediately came to mind when I first saw it, and that was the discovery of the Rosetta Stone.

It allowed us to decipher ancient Egyptian because it contained one message but written in 3 different languages, one of which was ancient Greek. Historians could then rebuild the lost language, and that gave us a way to peer back into the past. To me, that seems like the closest thing to deciphering an alien text there is.

But as an algorithm problem, I like the Alien Dictionary Problem for two main reasons.

First, the solution makes use of one of the most common data structures asked in interviews: graphs.

And graphs are hard for a lot of people at first. They are one of the later data structures we teach at Outco for a reason. It takes time to build up the necessary knowledge about other, simpler data structures and algorithms first.

There’s also a number of ways to implement graphs, each with their own set of tradeoffs and use-cases. In this series, I’ll explore the different kinds of graphs and their properties, and show how to derive them so you have a better intuition about what’s going on. The goal is to help you connect the dots (which itself is a graph metaphor), and see problems from a lot of different angles.

Now, we don’t have a Rosetta stone for this problem, but we also don’t need one. We’re not looking for the meaning of the words, we just want the order of the alien alphabet.

This alphabetical ordering is just useful information that is hidden within the list of words that we are given. It’s embedded within the input. And we’re going to use our understanding of graphs and their algorithms to massage that input to extract what we really want to know. We’re exploring an idea that lies at the heart of every good software problem: transformation.

But the second reason I like this problem is that it’s an intellectual exercise that dives deep into our understanding of language. If your native language is English, and only English, the idea of learning a language like French or Spanish may seem daunting. There are all these grammar rules, verb conjugations, accents placements and vocabulary differences to worry about, not to mention pronunciation and idioms you’d need to learn if you really want to fit in with the locals.

But fundamentally, they all use the same alphabet.

If you look at something like Russian or Ukrainian, you have a different, larger set of characters to play with: the Cyrillic alphabet. They have a few variations between each other, but they both use 33 letters instead of our 26. You might even be able to find some overlap with the English alphabet since originally they both come from ancient Greek and Latin scripts.

But their common origin means that the biggest commonality they share is that they are phonetic alphabets. Each symbol represents a sound.

This isn’t the case with all languages. Chinese and ancient Egyptian use logograms, where each symbol represents an idea. Ancient Sumerian and Mayan scripts used syllabaries, where symbols represented syllables.

Even the notion of alphabetical order can become complicated:

This all comes down to the fact that oral communication uses sound waves, which are continuous. When humans first started to transcribe these sounds into writing, it turned out there was more than one way to do that encoding.

To me, this is all fascinating, because communication, language, information, and encoding are at the heart of what we do as software engineers.

So give this problem a shot if you think you got it. It took me a while to wrap my head around it the first time I saw it, but once I did, I found it extremely gratifying. Here’s the link to the original source on Geeks for Geeks. They have a solution there, but I want to unpack it a little because even though it only has 3 steps, there’s a lot to know about HOW to arrive at those steps.

And if you have no idea where to start, stay tuned for the next part of the series ;)

It’s been a while since the last post, but I’m back! I have a couple of other posts/series I need to wrap up, so stay tuned for those too. I’ve had a lot going on in my personal life lately, so I haven’t had the extra bandwidth to write or work on as many side projects as I’d have liked. But hopefully, I’ll find a good rhythm now that a few things have been settled (just moved into a new place).

If you’re interested in Outco’s program, consider attending an info session. It helped me get to where I am now at Google, and you’ll be surrounded by a great community of engineers also looking for jobs. I can say knowing how to solve this problem specifically helped me with a similar one that I was given during an onsite interview I had with a big tech company.

--

--

Sergey Piterman
Outco
Editor for

Technical Solutions Consultant @Google. Software Engineer @Outco. Content Creator. Youtube @ bit.ly/sergey-youtube. IG: @sergey.piterman. Linkedin: @spiterman