An Algorithm for Finding Anagrams
Finding anagrams of words does not look like a difficult problem but has an interesting solution.
This article has been updated. Find the updated version here:
An anagram is a word or sentence that can be transformed into another word or sentence. Elvis has all the same letters as Lives, so Elvis is an anagram of Lives.
The way most people would immediately solve this problem would be to take a word, go through every word in the dictionary and see if the combinations of letters match exactly.
The way to do this will use a multiset. Sets are like arrays where the order doesn’t matter and repetitions aren’t allowed. With an array, [a, b] is not the same as [b, a]. But with a set, (a, b) is the same as (b, a).
Set’s don’t allow repetitions. So (a, a, a, a, a, b) is the same as (b, a) — because the first set would be turned into (a, b).
A multiset is a set which allows repetitions but the order doesn’t matter. For this example, let’s start small.
With our example, we have a list called a dictionary where each item in that dictionary is a word. We want to find out which of these words are anagrams of “Elvis”, so what we do is for loop through the dictionary like so:
You’re right. This is really slow. It would work, but we have 2 problems. The first problem is that capitalisation won’t make the multisets equal. The second problem is that more spaces can appear in the outputted sentence than in the original word.
As an example, “roast beef” is an anagram of “eat for BSE”.
Given 2 sentences such as “roast beef” and “eat for BSE”, if we turned these into multisets they wouldn’t be equal due to the differing amount of spaces. There’s another way we can calculate if 2 sentences are anagrams of each other. The fundamental theorem of arithmetic states:
Every integer either is a prime number itself or can be represented as the product of prime numbers and that, moreover, this representation is unique, the order of the factors.
Prime numbers are numbers which only have 2 factors — the number itself and 1.
If we assign each letter in the alphabet to a prime number, like so:
And so on, then compute the product of these numbers this number is unique — because of the fundamental theorem of arithmetic.
That means that for a multiset of letters, the product of prime numbers for each letter in that multiset is unique. If two words or sentences have the same number, these two words or sentences are anagrams of each other. Let’s see a quick example, is “BAC” an anagram of “A Bc”?
Determining which words are anagrams of other words would be as simple as creating a dictionary of {Word: Prime factorisation} and then grouping all the prime factorisation up.
Now, given 2 sentences we can easily tell if they are anagrams of each other.
Did you like this article? Connect with me on Social Media to discuss all things computer science related 😁
Don’t forget to click that 👏clap👏 button to show your appreciation!
I didn’t get paid for writing this article. If you want to support me or enjoyed this article, feel free to buy me a cup of tea or something below 😍✨