A Real-World(ish) Example of Great Developers Being Infinitely Better
Note: once you’ve read this, see the updated code in There’s Always Room For Improvement (a clearer solution to AoC Day 10)
In The Best Developers are Not Ten Times as Effective as Average Developers I said that the best developers can solve hard problems, while average developers might not, and “Divide ‘can deliver’ by ‘can’t’ and you get infinitely more effective.” Here’s a problem that illustrates the point.
Look-and-Say
The tenth day problem of the very entertaining Advent of Code (hat tip to @ericwastl) involves transforming strings per the Look and Say Sequence.
Briefly, look-and-say takes a string of numbers and transforms them by taking each consecutive string of like digits and “saying” it into the result. For example, “1” becomes “11” because there is one “1” in the original string. “112” contains two “1”s followed by one “2”, so it becomes “2112”. Continuing that pattern:
- 112
- 2112
- 122112
- 11222112
- 21322112
- etc.
Brute Force Solution
The tenth day problem gives an initial string and asks for the length of the string after 40 look-and-say transformations. In the rush of the competition, I wrote a simple function to apply the look-and-say transformation:
function lookAndSay S
put 1 into C
put char 1 of S into lastChar
repeat with i = 2 to length(S)
if char i of S is not lastChar then
put C & lastChar after R
put 0 into C
put char i of S into lastChar
end if
add 1 to C
end repeat
return R & C & lastChar
end lookAndSay
Nothing too complex, and it gets the job done. Give it the input string and apply it forty times, and it gives the solution of about 250,000 characters in about 0.7 seconds on my Mac.
A Slightly Harder Problem
The second part of the challenge is to apply the transformation 50 times. Using the same function fifty times gives the ~3,500,000 character answer in about 10 seconds.
Obviously if the challenge had asked for the length of the string after 60 transformations I would have had to wait for some time; if it had asked for 100 transformations, my solution would have failed completely.
Enter the Genius
This is where the brighter mind comes in: in this case in the form of John Conway, a mathematician and one of the smartest people alive (you thought I was the genius of this story?).
Conway worked out that the look-and-say transformation generally breaks down into 92 discrete sequences, which he named after the elements. Conway’s “elements” have defined transformations into other elements, as shown on this page by Henry Bottomley.
Elements Solution
Using Conway’s elements, it’s possible to transform the string much more efficiently. This code produces the same solution for 50 transformations in about 1 second, a 10x improvement:
local Eon mouseUp
put fld 5 into E -- a list of the transformations
split E using cr and tab
put "Po" into S -- starting string
repeat 50
put lookAndSayElements(S) into S
end repeat
put comma before S
repeat for each line L in fld 6 — a list of the lengths
replace (word 1 of L) with (word 2 of L) in S
end repeat
put sum(S) into fld 3
end mouseUpfunction lookAndSayElements S
repeat for each item i in S
put E[i] after R
end repeat
return R
end lookAndSayElements
Although it’s ten times faster, this uses the same method as the original solution: it generates the string (using the elements) and then determining its length.
Both solutions scale poorly: this new solution still won’t solve for 100 iterations, which produces a string about 2.5 trillion characters long.
Going Backward to Move Forward
If we want to solve for 100 iterations, we’ll need to get creative. The fundamental problem here is that both the above functions calculate the string’s length, which is the actual goal, by computing the actual string, which isn’t required.
It’s not possible to work forward from the initial string and calculate only the lengths: after the first transformation, the length alone won’t tell you what the next transformation is. However, it is possible to work backward and never store the actual strings, but still calculate the lengths.
A Backward Example
Suppose we start with the string 1322113, Astatine, and we want to know how long that string will be after five transformations. Working backward:
- We know that Mercury, 31121123222113, is 14 characters long.
- We know that Praseodymium, 31131112, is 8 characters long.
Now, because Thallium transforms into Mercury, and Neodymium transforms into Praseodymium, we can say that after one transformation, Thallium will be 14 characters and Neodymium will be 8 characters.
And we know that Lead transforms into Thallium, and Promethium transforms into Neodymium, so after two transformations, Lead will be 14 characters and Promethium will be 8 characters.
And we know that Bismuth transforms into Promethium and Lead, so after three transformations, Bismuth will be 14 + 8 = 22 characters.
Further, Polonium transforms into Bismuth, and Astatine transforms into Polonium, so after five transformations Astatine will be the same 22 characters.
Backward Code
This method requires tracking the length after N transformations for all the elements, in order to go back another transformation and calculate the length after N + 1 transformations, but there are only 92 elements/transformations to track per step. Here’s an example of code that does it:
local transformArray
local lengthArrayon mouseUp
initTransformArray
initLengthArray
put lengthArray into E
put the long seconds into S
repeat with i = 1 to iterationCount()
put reverseSeeAndSay(E) into E
end repeat
put startArray() into startArray
put 0 into totalLength
repeat for each key K in startArray
repeat startArray[K]
put bAdd(totalLength,E[K]) into totalLength
end repeat
end repeat
put totalLength
end mouseUpfunction reverseSeeAndSay E
repeat for each key K in E
repeat for each item i in transformArray[K]
put bAdd(E[K],R[i]) into R[i]
end repeat
end repeat
return R
end reverseSeeAndSayon initTransformArray
if the keys of transformArray is not empty
then exit initTransformArray
repeat for each line L in fld 5
repeat for each item i in word 2 of L
put (word 1 of L),”” after transformArray[i]
end repeat
end repeat
end initTransformArrayon initLengthArray
if the keys of lengthArray is not empty
then exit initLengthArray
put fld 7 into lengthArray
split lengthArray using cr and tab
end initLengthArray
Results
- The brute force method calculates 50 iterations in about 10 seconds.
- The elements method does the same 50 iterations in just under 1 second.
- The backward method calculates the same 50 iterations in about 0.075 seconds. So it’s over 13x as fast as the previous method.
Scaling
But that’s far from the end of the story. Both the first two methods generate the actual string at each iteration. The string length grows about 1.3x on average for each iteration, meaning that if the initial length is L, the length after 50 iterations will be about L*1.3⁵⁰. Therefore the overall efficiency of the first two algorithms is O(c^n), where n is the iteration count. Scaling exponentially is horribly inefficient.
The “backward” algorithm scales linearly, meaning that 100 iterations takes about twice as long as 50 iterations, and 200 iterations twice as long as that.
Conclusion
All three algorithms calculate correct results. The Advent of Code only asked for 50 iterations, so the brute force solution or the elements solution are both fine.
But if the Advent of Code had asked for the length after 100 iterations, either of the first two algorithms would have failed. Roughly, the brute force algorithm would take 2 months to complete (if my system were able to handle terabyte-long strings).
With the backward algorithm, 100 iterations takes about 0.15 seconds. Going further, it calculates 1,000 iterations in about 2.5 seconds, even though the length of the string after 1000 iterations is itself 117 digits long.
And that is why, for hard problems, a great developer is infinitely better than an average developer.
PS — The Code in this Article
Everything here was written in LiveCode. It’s a high-level scripting language with a built-in interactive interface toolkit, which makes it super-easy to work on problems like this. LiveCode is reasonably fast, but not a speedster language. Working in C would have made the brute force solution work through a few more iterations in a reasonable time, but even 100 iterations would have been impractical, taking well over an hour and a terabyte of storage.