Why Loops Don’t Always Have to Be Loops: A Case for Map, Filter, and Fold/Reduce

Whether you learned how to program in school, or on your own, chances are you’ve at least seen loops (for, while). In fact, its more probable that you’ve used them extensively in schoolwork, side projects, work, etc.

One common example (in Java) is as follows:

// Code to find the outputs of the function
// f(n) = 3n^2 + 1, where n = 1,2,3,4,5
// input: arr = [1,2,3,4,5]
for (int i = 0; i < arr.length(); i++) {
arr[i] = 3*arr[i]*arr[i] + 1;
}
// output: arr = [4,13,28,49,76] 

This is an instance of a fairly common pattern in programming: you have some data structure containing some data, and you want to apply some transformation on everything contained in it. Sometimes, if we’re lucky, the transformation is a one liner, like above. Other times, the transformation is a little more complicated, and the code will grow more and more dense rather quickly.

Introducing Map

This pattern of applying some function on every element in some data structure can be encapsulated in a feature that is becoming more and more prominent in many modern programming languages (such as Python, Scala, Haskell, etc.): map.

Let’s take a look at the map function for lists in Haskell, a personal favorite:

map :: (a -> b) -> [a] -> [b]

This is map’s type signature, indicating what the inputs of the function will be, as well as the outputs. The first input is a function, which takes an element of type a and outputs an element of type b, where a and b are basically any types you want (Ints, Floats, Strings, you name it). The second input is a list, with elements strictly of type a. The output is a list with elements strictly of type b.

Just from looking at this signature, it should be clear that map does the obvious, if not only thing it can actually do: apply the function to every element in the list. But wait, that’s exactly what we did earlier in the Java example! Let’s check the code out in Haskell:

f :: Int -> Int
f n = 3*n^2 + 1
g :: [Int] -> [Int]
g lst = map f lst

Well, that’s less characters than I’m used to when I’m coding in Java! Code golf aside, this example seems rather self explanatory: f is a function on ints, which applies a formula to them, and g is a function on a list of ints, which “maps” that function f to every element in the list.

One advantage of the map construct is clear: Once you know what map does, every time you use map in your code, you can almost immediately figure out what the code is doing. We’ll come back to this idea later.


Let’s look at another Java code sample, why not?

// Code to filter out valid email addresses from
// an array of strings, generated by user input.
// Just in case the users dun goofed.
// For simplicity of example, we’ll just check if it has at least has one ‘@’ symbol.
// arr = [“this is an email address.”,
// “thisIsNotAnEmailAddress@email.com”,
// “whyJava@whyDisLanguage.net”]
// result is an arrayList of valid email addresses
for (int i = 0; i < arr.length(); i++) {
String str = arr[i];
Boolean isValid = false;
for (int j = 0; j < str.length; j++) {
char ch = str.charAt(j);
if (ch == ‘@’) {
isValid = true;
}
}
if (isValid) {
result.add(str);
}
}
// result = [“thisIsNotAnEmailAddress@email.com”,
“whyJava@whyDisLanguage.net”]

You do not have to read all of that if you don’t want to, and you probably don’t. All you need to know is that this code looks through an array of strings, and tries and find valid email addresses, albeit through a very loose interpretation of valid emails.

Despite this, the code has gotten a little verbose already, and even if the code is simple in design, it takes more time than it should for the layperson to understand. In other words, spending even a few minutes trying to parse through this is a waste of time, yet you might have to, if the comments weren’t there in the first place.

Now, one could argue that this is why documentation exists in the first place, and this is true. However, I am under the belief that code that is simple should look simple, and that the code itself should give some intuition on what it’s trying to do, for the reader’s convenience (easier said than done, I know, but it is something to strive for!).

Moreover, this is another common pattern in programming: looking through a data structure full of stuff, and taking the stuff we need. That should be a no brainer for people to figure out, ideally.

Enter Filter

What we need is a programming construct called filter, and again, I think it will be informative to look at its type signature in Haskell:

filter :: (a -> Bool) -> [a] -> [a]

So, the first input is a function, which takes some element of type a, does some tests on it, and returns True or False if it passes. The second input is a list of elements of type a. The output is the list of elements from the input that passed the test the input function provided.

Again, it’s crystal clear what the implementation of filter is doing, just from the type signature: it takes a function which tests for some success metric, applies it to every element in the list, and returns the elements that passed the test.

Now that we have this abstraction in our toolbox, let’s see what the above Java code would look like in Haskell:

validEmails :: [String] -> [String]
validEmails lst = filter isValid lst

Again, the actual function finding which emails are valid is super easy to read and understand, once you know how the filter function operates. You may be wondering what the function isValid is. Well, we’re about to get to that right now.


How do you test if an email is valid? In our simplified version, we just checked if there was at least one “@” symbol in it, and if there is, then it’s a valid email address. Let’s revisit the portion of our Java code that took care of this:

// str is a possible email.
// isValid is initially set to false.
// If it has an "@" symbol, it is a valid email
for (int j = 0; j < str.length; j++) {
char ch = str.charAt(j);
if (ch == ‘@’) {
isValid = true;
}
}

This for loop alone isn’t that bad actually, once you understand the context, especially what the inputs str and isValid are. The thing is, it’s nested inside another for loop, which makes the code look more complex and dense, when it actually is not.

Also, this is an example of yet another pattern that is common in programming and can (should) be abstracted away: fold (or as some call it, reduce).

Enter Fold/Reduce

You know the drill by now:

foldl :: (b -> a -> b) -> b -> [a] -> b

This actually looks a little more complicated, but it’s not that bad, I promise. There are three inputs this time.

The first input is a binary function, taking in two inputs of types b and a respectively, and outputs a value of type b.

The second input is a value of type b. For reasons that will be explained soon, this value is called the accumulator.

The third input is a list of elements, each of type a.

Finally, the output is just a value of type b.

Now, what the heck is going on here? Well, the reason the input of type b is called the “accumulator” is that it accumulates information from every element in the input list. Basically, you first apply the input function on the accumulator and the first element of the list, then you get a new accumulator.

With the new accumulator, you apply the input function on that, and the second element of the list, and get a new new accumulator. And so on until you reach the end of the list, and then you get to output the new new new new … new new new accumulator!

In case it’s not clear, this is exactly like iterating through an array and getting information from each element, and storing that information in a variable! Kinda like how we did with isValid in the Java for loop. Time for some Haskell:

-- "_" is a catch all character, saying that specific input is 
-- irrelevant to the output of the function at that case.
-- There are two cases for this function, and each line of
-- code represents a different case.
f :: Bool -> Char -> Bool
f _ "@" = True
f acc _ = acc
isValid :: String -> Bool
isValid str = foldl f False str

So here’s what’s happening: our accumulator is the boolean value False, so we’re assuming the string is guilty of not being an email (until proven innocent). f is a function that, on most characters, will just return the input boolean, no questions asked. On the other hand, finding that one “@” guarantees the function f will always return True.

So, isValid applies f to the accumulator, False, and every character of the String, and if none of the characters are an “@” symbol, is Valid will overall be False. Otherwise, if one “@” is found, it will be true, so isValid works for our purposes.

Let’s do a concrete example: we’ll apply isValid to “st@ring.com”.

isValid "st@ring.com"
= foldl f False "st@ring.com"
Since f False 's' = False, now we evaluate
foldl f False "t@ring.com".
Since f False 't' = False, now we evaluate
foldl f False "@ring.com".
Since f False '@' = True, now we evaluate
foldl f True "ring.com".
Since foldl now has True as its accumulator, and there's nothing in the rest of the string that can make it false, 
isValid "st@ring.com" is true, so "st@ring.com" is a valid email.

We’ve seen three examples of common programming patterns, and we’ve put three powerful abstractions to help simplify the code used for them. To recap:

Map allows us to apply a function on every element in some data structure of uniform type.

Filter allows us to check a data structure for elements that pass some criteria, and return those only, while ignoring the rest.

Fold/Reduce allows us to use some function and some starting value (accumulator) to extract some information from every element in some data structure, and return the new value which has accumulated all of that information.

There are three very general, useful constructs that simplify a lot of iterative code into more readable code, as shown earlier. There might be a bit of a learning curve getting used to these abstractions, as opposed to traditional imperative programming, but after that’s over, there will be several ideal outcomes.

First, your code will look a lot more like what your design initially intended, rather than being obfuscated by the noise/boilerplate that loop constructs inevitably create.

Second, while loops can be very useful, every one looks different, even if two of them are doing very similar things. This leads to wasted time for the person trying to debug your code, and using a very common one liner like map, filter, or fold/reduce (assuming the function is nearby), will make your code that much clearer.

Lastly, you might have noticed that using these constructs tended to make our code more modular, by separating the function that did that one specific thing from the function that passed over some data structure. This is a good thing, because it makes your code a lot easier to write, read, debug, understand, and anything else you want to do with it.

While I did all of my examples for these constructs in Haskell, these ideas can be found in other, more mainstream languages like Python, Scala, and more, so if you ever have a language that supports these features, use them and reap the benefits. It’ll save some headaches from everyone, and help you think about your code in a different way, which is always a good thing.

In short, whenever you see these common patterns, use the best tools available to deal with them.