Perl 6 small stuff #18: applying permutations to an anagram challenge

The Perl Weekly Challenge has come to its fifth instalment. This time both challenges has to do with anagrams.

Challenge 1: Write a program which prints out all anagrams for a given word. For more information about Anagram, please check this wikipediapage.
Challenge 2: Write a program to find the sequence of characters that has the most anagrams.

Now — the challenge doesn’t mention whether the solutions are supposed to find one-word anagrams or multi-word anagrams. Later, on Twitter, told us to write one-word solutions, but at that point I’d finished my answer that supports both one and two-word solutions.


Challenge 1

Let’s start with the simplest scenario: one-word anagrams. Let’s say you have a word — wolf — and want to find its anagrams (it has only one, “flow”). A good starting point is to split the word into its parts, the characters, so that you can manipulate it to find the anagram. For the splitting you can use the routine .comb on the string. That returns a List with the characters.

Now — the List object has an interesting routine, .permutations. You won’t find an equivalent to this in Perl 5 without resorting to modules. This great function is just built-in to Perl 6, so let’s use it: A naive way of trying to find anagrams would be to apply permutations on the list.

> "wolf".comb.permutations
((f l o w) (f l w o) (f o l w) (f o w l) (f w l o) (f w o l) (l f o w) (l f w o) (l o f w) (l o w f) (l w f o) (l w o f) (o f l w) (o f w l) (o l f w) (o l w f) (o w f l) (o w l f) (w f l o) (w f o l) (w l f o) (w l o f) (w o f l) (w o l f))

And sure enough, “flow” is the first result! However, permutations are just that: Permutations. There is no guarantee that the permutation in question actually is a word. So we have to introduce a dictionary to check the permutations against.

On my Mac (macos 10.14.1), there’s a dictionary of a couple of hundred thousand words in the file /usr/share/dict/words. Many Linux distributions has the same, and for those distributions and operating systems which don’t, dictionaries are easy to find on line. In this example I assume the existence of that dictionary file.

What the code below does: Line 1 reads the dictionary and stores it in memory. I convert it to a hash where each word has the value True. This makes it easy to check a word’s existence: Just try to access the element in the %dict variable. If the returned value is True, then we’re OK.

Line 2 tries to figure out whether you provide words to the script through stdin (using a pipe) or as arguments on the command line. It then loops through the words provided and tries to find anagrams that matches (lines 3 and 4).

File: anagrams.p6
my %dict = "/usr/share/dict/words".IO.lines().map({ $_.lc => True});
for @*ARGS ?? @*ARGS !! ! $*IN.t ?? lines() !! Nil -> $w {
say "$w - $_" if %dict{$_} and $_ ne $w
for unique do for $w.lc.comb.permutations { $_.join };
}

There you have it: The solution to Challenge 1 is four lines of code (in reality three lines, because I added a newline for readability’s sake).

# Output:
$ perl6 anagrams.p6 fowl tornado
fowl - flow
fowl - wolf
tornado - odorant
tornado - donator

A couple of noteworthy things here: For some reason .permutations doesn’t guarantee that a permutation is unique. The returned list can therefore contain duplicates. So I use unique to filter out duplicates.

Note $*IN.t; the special variable $*IN is Perl 6’s name for STDIN. It has a few methods and properties among them the method .t. That returns True or False depending on whether the program can expect standard input to come from the terminal (tty) or through a pipe.

Note: If you prefer, you could rewrite line 3 so that no if’s or for’s are used at all. It would look something like this:

my $dict = "/usr/share/dict/words".IO.lines()>>.lc.Set;
for @*ARGS ?? @*ARGS !! ! $*IN.t ?? lines() !! '' -> $w {
$w.lc.comb.permutations>>.join.grep({ $dict{$_} and $_ ne $w }).map({ "$w\t$_\n" }).unique.join.say;
}

As you can see I’ve used >> (.hyper) here. That thing tries to run the .join in batches in parallell, and therefore potentially much faster. There’s no notable speed difference here, but I thought I’d start implementing it anyway so that it’s there when it can have effect.


Challenge 2

Now — how can we figure what letter combination returns the most anagrams? By starting with a script that begins exactly like the first — line 1 reads the dictionary file and stores it in a hash.

Line 2 declares a new hash. We will use this to create a kind of index with fast lookup, kind of a reverse dictionary. In line 3 I loop trough every word in the dictionary and sort the characters in the word alphabetically. This sorted version becomes the key of our index; every word corresponding to this key will be stored in the same element of the hash in a list of corresponding words.

With what we have now we try to figure out something about max anagrams, as the assignment calls for. Line 4 and 5 takes this reverse dictionary and sorts the keys in descending order of the number of corresponding words. Line 5 just checks the first in the list — i.e. the character combination with the most anagrams — and stores the number of anagrams in the variable $max-anagrams.

File: max-anagrams.p6
my %dict = "/usr/share/dict/words".IO.lines().map({ $_.lc => True});
my %rev-dict;
push %rev-dict{$_.comb.sort.join}, $_ for %dict.keys;
my @variation = reverse
sort { %rev-dict{$_}.elems }, %rev-dict.keys;
my $max-anagrams = %rev-dict{@variation[0]}.elems;
for @variation {
last if %rev-dict{$_}.elems != $max-anagrams;
say "$_ ($max-anagrams): " ~ %rev-dict{$_}.join(',');
}

Lines 6 through 11 iterates through the sorted list and prints some information of every character sequence with the same number of anagrams as the sequence we got the numbers for in line 5.

Line 7 makes sure that as soon as we iterate to a character sequence with less than the max number of anagrams, we break out of the loop and end the program.

# Output:
$ perl6 max-anagrams.p6
aelps (10): lepas,sepal,pales,spale,speal,elaps,slape,salep,lapse,saple
agnor (10): argon,orang,grano,rogan,groan,ronga,goran,angor,nagor,organ

That’s it. I guess somebody with a better understanding of Perl 6 than me could — perhaps — utilize some kind of parallelism to speed up the index generation. My implementation is fairly slow, so don’t be alarmed if the program seems to hang. It hasn’t. It’s just computing.

Added later: Somebody actually read this and tried to implement it with parallelism! See Parallel permutations by gfldex. It so fun to see that somebody reads what I write and also interacts with it.


Thanks again to Mohammad Anwar for running the Perl Weekly Challenge. I find it inspiring.

BTW, I have versions of the above that creates two-word anagrams as well. Those utilize the .combinations routine combined with set arithmetics, two great features that are built-in to Perl 6. Perl 6 is a BIG language, but it’s when you find features like the aforementioned permutations and combinations that you understand why. After a while you get to expect that if you think that something should have been supported by base Perl 6, it surprisingly often is! More stuff is possible without resorting to third-party libraries, and with consistent and fairly elegant design.

What’s not to love?