Perl 6 small stuff #7: Q: How many elements are “AAA”..”ABS”? A: 695 *and* 19!

During the months I’ve been on my Perl 6 journey I’ve occasionally bumped into differences between Perl 5 and 6. Most are understandable, but some are baffling — this one in particular.

If you told your programming language to compute every permutation of “AAAAAAA” to “ABRAXAS”… how many permutations would you believe there are?

Well, let’s try.

# Perl 5 code
my @a = “AAAAAAA”..”ABRAXAS”;
print $a[10100] . “\n”; # Output: AAAAOYM
print scalar @a . “ elements.\n”; # Output: 19665535
# Perl 6
my @a = “AAAAAAA”..”ABRAXAS”;
say @a[10100]; # Output: ABEADAL
say @a.elems ~ “ elements.”; # Output: 16416

The sharp-eyed reader will notice something peculiar:

  • Perl 5 computes 19,665,535 elements, and states that element 10100 is “AAAAOYM”.
  • Perl 6’s answer is absurdly different. Perl 6 says that element 10100 is “ABEADAL”, and that the array only contains 16,416 elements.

Intuitively Perl 5’s answer is closer to what I expected. Perl 6 misses the mark by literally millions. Surely I’d stumbled upon a Perl 6 bug? Well, no. This is an area where Perl 5 and 6 are different on purpose.

After some thinking and testing, I found out what the difference is.

Correction: Several readers have pointed out that I misunderstood what Perl 5 actually did. You can read what I originally wrote in the note below. [1]

Perl 5 treats the range as a Base 26 number system, i.e. similar to a number system where A = 0 and Z = 25. It starts from the right and counts from A-Z, goes one left, increments A to B, and counts the rightmost from A-Z again. Etc. This is easier to grasp when you see it, so here’s an example of “AAA”..”ACT”:

$ perl -MText::Wrap -E 'my @a = "AAA"..."ACT"; $Text::Wrap::columns = 34; say wrap(" ", " ", @a);'
AAA AAB AAC AAD AAE AAF AAG AAH
AAI AAJ AAK AAL AAM AAN AAO AAP
AAQ AAR AAS AAT AAU AAV AAW AAX
AAY AAZ ABA ABB ABC ABD ABE ABF
ABG ABH ABI ABJ ABK ABL ABM ABN
ABO ABP ABQ ABR ABS ABT ABU ABV
ABW ABX ABY ABZ ACA ACB ACC ACD
ACE ACF ACG ACH ACI ACJ ACK ACL
ACM ACN ACO ACP ACQ ACR ACS ACT

Perl 6 does almost the same, from right to left as Perl 5, with the exception that P6 stops “counting” when it’s reached the value of the column as specified in the final word of the range (in this case “ABRAXAS”). This means that it first computes [A-S] for the rightmost column, then it jumps two to the left and starts counting [A-X], etc. Again, this is easier to understand when you see it. I use the range “AAA”..”ACT”:

$ perl6 -MText::Wrap -e 'say wrap-text(("AAA".."ACT").join(" "), :width(34));'
AAA AAB AAC AAD AAE AAF AAG AAH
AAI AAJ AAK AAL AAM AAN AAO AAP
AAQ AAR AAS AAT ABA ABB ABC ABD
ABE ABF ABG ABH ABI ABJ ABK ABL
ABM ABN ABO ABP ABQ ABR ABS ABT
ACA ACB ACC ACD ACE ACF ACG ACH
ACI ACJ ACK ACL ACM ACN ACO ACP
ACQ ACR ACS ACT

All of this means that Perl 6 only generates 16,416 permutations of “AAAAAAA”..”ABRAXAS” compared to Perl 5’s 19,665,535.

Which one is right? I’m not sure. Intuitively, Perl 5 seems to do what I expect. But intellectually I think that Perl 6’s way is the correct one. What do you think?

I would like to hear from core developers if they know what’s the reasoning behind the change in Perl 6.

Added later:

I think what follows first and foremost expose that I haven’t understood how the smartmatch operator actually works. But the spirit of this blog is to also showcase my misunderstandings, so here we go.

Over in the Perl 6 group on Facebook, Ali Elshishini pointed out a strange behavior that may (or may not?) be a bug. If you use the smartmatch operator ~~ to check whether Perl 5’s element 10100 (“AAAAOYM”) is a part of the range Perl 6 computes, the answer’s yes even though it’s not. Again, this is easier to understand when you see it:

$ perl6 -e 'say ("AAAAAAA".."ABRAXAS").grep("AAAAOYM"); say "AAAAOYM" ~~ "AAAAAAA".."ABRAXAS";'
()    # result of the grep: AAAAOYM is not in the range
True # ...but the smartmatch operator says it is

I’m not at all sure what all of this means, but there seems to be some kind of inconsistency here. For all I know it’s on purpose, but if it is it’d sure be interesting to know why.

This inconsistency and/or error disappears if you convert the range to an array:

$ perl6 -e 'my @a = "AAAAAAA".."ABRAXAS"; say @a.grep("AAAAOYM");'
()     # result of the grep: AAAAOYM is not in the range

My gut feeling is that the smartmatch operator used on a range should work approximately the same way that the .grep does (@loltimo pointed out to me on Twitter that used on arrays and lists the smartmatch operator don’t look for membership but equivalence).

If so, the problem lies in the Range class itself. I’m not an expert in the inner workings of the Rakudo Perl 6 code. But can it be that the problem lies in the lines 378–381 of Range class source code (version from August 25, 2018)? Here it is:

multi method ACCEPTS(Range:D: Mu \topic) {
(topic cmp $!min) > -(!$!excludes-min)
and (topic cmp $!max) < +(!$!excludes-max)
}

To be specific: I think this has to do with using the cmp operator. The cmp operator does, I believe, an alphabetic comparison. Here it’s used to compare X with the minimum and maxium value of the range. Let’s say what we were comparing was the range “AAAA”..”AXAS” (just to simplify it a little).

AOYM is NOT a part of that range. But if we use cmp to compare against the min and max value, you’d actually believe it is:

$ perl6 -e 'say "AOYM" cmp "AAAA"; say "AOYM" cmp "AXAS";'
More
Less

When viewed isolation, the answer is that AOYM is less than AXAS and more than AAAA. I.e. that AOYM is within the range. But as we now — in the way that Perl 6 computes a range, AOYM is not a part of the range (in a way ~~ treats the range as if it was Perl 5’s range). The method would have to check “AOYM” for equality to every single element in the range, for the smartmatch operator to work as expected.

So what to do? Convert to array/list and use grep.

A way to avoid the whole thing is to flatten the range or convert it into a list and use grep. Hopefully this is interesting for one or two people other than me out there :-)

(This addition was so long that I’ve spun it off as a separate blog. If you have comments to the addendum specifically, it’d be nice if you leave your comments on that post.)

Notes:
[1]
Regarding the correction, this is the paragraph that was there in an earlier version of this post: “Perl 5 “counts” character lists from left to right while Perl 6 “counts” them from right to left. This sounds more cryptic than it actually is. Perl 5 finds every combination of Axxxxxx (where x is A-Z). Then P5 jumps to the next column an computes every combo for ABxxxxx (x is still A-Z), then AB[A-R]xxxx, then ABRAxxx, then ABRA[A-X]xx, etc.” Thanks to everyone that pointed out my flawed understanding.

About this post. After a few years away from programming, I’m trying to get up to speed again by learning Perl 6. This series is meant to be sort of a progress report, showcasing not only what I’ve learnt but also all of my misunderstandings and errors.