Scala Short Take: Will it Fold?
A tale of procedural code made functional
Background
For a reason that’s not important, we wanted to generate all versions of a multiword string (we’re Rue Gilt, so it’s a retail brand name).
Words, their initials, followed by “.”, space, or nothing at all. So “Jones New York
” generates that string intact, through “J New.York
” to “J N Y
” and everything in between. We think there are 3³ * 2³ = 216 variations.
We were in a hurry so just wanted to knock something out that worked. We started with loops and quickly got into special cases and off-by-one errors. But we soldiered on regardless. The code looked like
def extractFirstLetter(line: String) = {
var words = line.split(" ")
val fl = words.map(_(0).toString)
words ++ fl for (i <- 0 to fl.size - 1) {
for (j <- i + 1 to (fl.size)) {
if (fl.slice(i, j).size > 1)
words = words :+ fl.slice(i, j).mkString("")
}
} words
}
A few things we’re wondering
- Is it correct?
- Could we explain it?
- Why the
var
? - What does the
words ++ fl
do? - Can we make it for functional?
Let’s clean it up
We think there’s a fold
or reduce
in there somewhere. There is. Here’s how, with the steps to derive it along the way.
First let’s make functions for things, including one that contains the for
loop with a slight change to clean it up.
def allCombos(remaining: Seq[String]): Seq[String] = {
def safePrefix(word: String) =
if (word.size == 0) "" else word.substring(0, 1) def variations(word: String) =
Seq(word, word + ".", word + " ", safePrefix(word)) def expand(word: String, tail: Seq[String]) =
for (w <- variations(word); t <- tail) yield Seq(w + t.mkString)
}
(The lines of code wrap around, but we’re looking for a different kind of fold!) In any case this looks like a nice encapsulation of some of the operations we’ll probably need.
safePrefix
gets the initial letter of a word if it exists, or ""
if it doesn’t. variations
uses this result and returns word
with some suffixes. expand
takes a word
and returns all the word variations (in w
) along with all the remaining combinations. We’re going to arrange to pass the current word (in, well word
, which we totally should have called head
) and the remaining words in the sequence tail
. The for
is just nested loops, and so a bit unsatisfactory, but we’ll get to that later.
Next task is to put these new functions together. We’ll make a function that traverses the list of source words in the input, peeling them off one at a time.
remaining.size match {
case 0 => Seq("")
case _ => {
val tail = allCombos(remaining.tail)
expand(remaining.head, tail).flatten ++ tail
}
}
We made it messier because we have to check for an empty list. We can’t avoid putting somewhere: Either as we have it above or guarding the recursive call. And while the recursive call itself is OK to look at, all the tail
and flatten
tomfoolery is a bit much. And that would ruin our chances at being “provably correct by inspection” a thing we value highly at Rue Gilt Groupe.
In any case, now we call the entry point and print the results
allCombos(words).foreach(println)JonesNewYork
JonesNewYork.
JonesNewYork
JonesNewY
It gives us what we want, along with duplicates, which we don’t quite understand. But we could easily throw them into a Set
to dedup them.
Now let’s get fold-y or reduce-y
We’ll start with Seq("")
, a list of a single empty string. Then we’ll add all the word combinations for each source word as a prefix to all the previously created strings.
In other words, we keep expanding all the different possibilities so far with the variations of each word. And so on, until we’ve worked our way to the first word. Simple enough to say, but how do we implement it?
val words = Seq("Jones", "New", "York")def safePrefix(word: String) =
if (word.size == 0) "" else word.head
def variations(word: String) =
Seq(word, word + ".", word + " ",
safePrefix(word), safePrefix(word) + ".")
def combine(word: String, tail: Seq[String]) =
for (w <- variations(word); t <- tail) yield Seq(w + t.mkString)
def expand =
(word: String, tail: Seq[String]) => combine(word, tail).flattenval everything = words.foldRight(Seq(""))(expand)everything.foreach(println)
That .foldRight
expresses just the right thing: start with Seq("")
, and compose all the variations of each word to the right until we’ve done the last one (on the left).
So why .fold
and not .reduce
? Two reasons. .reduce
does not work on empty lists because it requires at least one element, the head of the list. We don’t have an empty list (this time), but we want our code to be robust.
So we need .fold
for that reason, which works because it takes a starting value. Note in .fold
the function can produce a different type to the elements of the list, whereas .reduce
variants can not: Everything must be the same type. This is another good reason to use .fold
.
One other thing, there’s an ungainly .flatten
in there, which is not only ugly but we feel sure can be done a better way.
But is that enough?
Not really, if we’re honest. There’s still a nested loop in there, even though the Scala syntax makes it expressive and compact.
Also, this blog post is about fold and reduce, so we need to go full beans on making that happen.
Start with some declarations, a bit cleaned up this time and more general.
val words = Seq("Jones", "New", "York")def safePrefix(word: String) = word.headOption.getOrElse(' ')
def suffixes(word: String) = Seq(word, word + ".", word + " ")
def variations(word: String) = suffixes(word) ++
suffixes(safePrefix(word).toString)
safePrefix
now returns Char
we got with .headOption
. We have to convert this to a String
somewhere. Not really sure of the best place. Now we have to figure out how to write a new combine()
function that we can substitute for the old one.
We can do it in a one-liner, but this looks arguably cleaner, and so more likely to be correct and easier to explain.
def combine(word: String, tail: Seq[String]): Seq[String] = {
def concat(t: String) = tail.map(_ + t) variations(word).flatMap(concat)
}
The _ + t
is a bit mysterious until you know Scala. We could have written this as w => w + t
, which you could argue is clearer. As a bonus, we removed the recursive call and made it recurrent. We don’t need the return type, but write Seq[String]
for clarity. Good IDEs will show this if you hover over the function or value. In any case our driver code is still
val everything = words.foldRight(Seq(""))(combine)everything.foreach(println)
Which when we run it delivers the following and we’re done.
YorkNewJones
York.NewJones
York NewJones
YNewJones
Y.NewJones
Y NewJones
...
Wait what? The words and variations are in reverse order. What happened? And how do we fix it?
With the help of some println
debugging code, we find the .foldRight(concat)
is applied over the source words in the wrong order.
We broke it. Can we fix it?
Sure we can. Simply using a .foldLeft
should do the trick.
Grrrr…. What’s going on. We look at the signature of .foldRight
and it’s
def foldRight[B](z: B)(op: (A, B) => B): B = ...
meaning type we start with, B
, a Seq[String]
, which is the second argument to op
, the function combine
in our case. And we return a B
, Seq[String]
.
The function must work with the first argument of the type A
, a String
here. So it makes sense to us now that the words get appended in the wrong order, from the right.
Let’s see whether .foldLeft
is the solution, appending source words in the correct order. Along the way, we can find the source of the error above.
The signature of .foldLeft
is
def foldLeft[B](z: B)(op: (B, A) => B): B = ...
B
is still Seq[String]
, the initial value and the return type of the .foldLeft
. But the order of arguments to op
is different. Why the complexity? Ignoring the specific implementation of op
for a moment, let’s rewrite .foldRight
as
op("York", op("New", op("Jones", Seq(""))))
and .foldLeft
as
op(op(op(Seq(""), "Jones"), "New"), "York")
The types of the arguments to op
are reversed. An easy fix. Just change the signature of combine
to
def combine(tail: Seq[String], word: String): Seq[String] = ...
The upshot
Was it worth it? We think so. We learned how to write iterative functional code to replace explicit loops. (It took considerably less time to do the work than write about it!) It’s easier to explain and “prove correct by inspection”, something we said we shoot for.
Also we removed the var
, which is always really nice. var
is a bad smell in code and we try to avoid it at all costs. Mutability encourages procedural code. We want functional.
Thanks for reading. There’ll be more on Scala in a bit.