on Higher Order Functions

‘Higher Order Function’ can be an intimidating term. I know it was for me the first time I heard it. Today, I want to take some time to simplify and understand the power and utility behind this concept.

We know that computers basically follow instructions. It’s kind of like having an assistant who only does exactly what you say, very, very exactly. So say you ask you assistant to add up some numbers for you. What is 3,153 + 5,678, you ask? It’s 8,831 he/she replies.

Now take a second to think about what happened. You gave your assistant some input, the values of which were explicitly available. Thus, the assistant performed a simple function and added the two numbers together.

Now, imagine you wanted your assistant to do a similar task, but get the numbers from two different people in your office, and then add them together to give it to you. So you write down the exact instructions to ask him/her go to another office in the building and obtain some specific numbers and add them together to get the first number. And again to another office to get the second number. Now you obviously don’t want to deal with the two numbers yourself, you only care about their sum. So you ask the assistant to sum together the values obtained through two other functions.

Notice how the the instructions in this case don’t have explicit values, but the values have to be derived through the execution of two other instructions. This is what could be loosely referred to as a higher order instruction, or in the world of code, a higher order function.

This isn’t an exhaustive explanation of the concept, but hopefully it simplifies the concept enough for further analysis.

Next, I want to take a look at some of the most commonly used higher order functions and their usefulness. Let’s start with the map function.

Say you have an array of numbers representing your sales revenues for a week, such as [5000, 6500, 4500, 6400, 2500, 4000, 1280], and you want to calculate the cumulative sales amounts for each day of the week. Let’s quickly assemble the code for this. First you write a function that holds the value of a number and then adds another number to this number:

Step 1: Declare a variable ‘i’ to track the position within the array, and set its initial value to 0.

Step 2: Design a function that holds an empty array variable, and takes two arguments (inputs) — a function called accumulate, and the daily sales array.

Step 3: Construct the accumulate function to have a variable of x=0 and to add to x any value given to it, and return x.

Step 4: Call the original function to process each value of the array with the accumulator function, and add the value to the output array.

During processing, the original function would start by passing to the accumulator the first value, and since the initial value was 0, it’ll get back 0 + 5,000 i.e. 5,000 and set it as the first element of the array. And it’ll repeat this process as many number of times as there are elements in the array. Eventually you will end up with a new array that holds the cumulative values of the original array. Congratulations, you’ve just designed a map function. Simple, wasn’t it?

Next, let’s look at the reduce function. Reduce is similar to Map in everything but how it stores the value. For the example above, if we didn’t need cumulative sales but the total sales of the week, i.e. all elements of our array added together, all we would need to change is a bit of step 2 above. Instead of holding an empty array variable to eventually output, we want to hold an integer variable. Make that change and voíla, you have a reduce function

The value of such functions doesn’t lie in what they can do, its usually in how much of it they can do and how fast they can do it. Map-reduce functions actually form a fundamental basis for big data analysis. Say you have a dataset of all the names of people in a country, and you want to figure out the most common names and get the number of people with each name.

Let’s see how we can get this done with map-reduce. First you’d use a map function to go through each name, and separate the first names from the dataset into unique arrays. You’ll probably end up with an array full of Jim’s and another full of Pam’s. Then you’d call on the reduce function to calculate the size of each of the array’s and store that information. Now putting these two functions together, you could very quickly compute the information in a matter of seconds. Again, it seems like a simple concept, but its power can be felt when it scales over data that would take human beings years of calculation to derive. For those looking for some more detail here’s a link to a paper written at Google by Jeffrey Dean and Sanjay Ghemawat where they talk in detail about how Google implements sap/reduce over large data clusters.

I hope you found this article helpful in understanding the value behind higher order functions. As always, I love receiving feedback on my writing, so if you have any comments, please feel free to leave them below.

Originally published at Code for Thought.