Why functional programming is more suited for big data analysis

Programmers think of their logic in a sequence. At any point of time in the sequence, there is a state. State of the program in time t1 is different from state of the program in time t2. State of variables change as we progress through the execution of program (This paradigm is called imperative programming) .

This approach has negative consequences when we get into parallel programming. When there are several parallel programs trying to access a mutable variable, there will be non-deterministic results. Functional program is a way to solve this issue. We can avoid (to a large extent) mutable variables using function programming paradigm.

Let us look at these two paradigms using a simple program that find the factorial of a number. Code is written in Scala.

Imperative

def factorial(n: Int): Int = {
if (n == 0) 1
else n * factorial1(n-1)
}

Functional

def factorial(n: Int): Int = {
if (n == 0) 1
else n * factorial(n-1)
}

Why functional style makes more sense for Big Data processing?

All the big data processing frameworks (Examples are MapReduce, Spark) uses parallel hardware to process huge volume of data. Big data are usually in Terra bytes and they won’t fit into one machine. So it has to be split and distributed across various machines. Once we have our data sitting across different machines, we need a paradigm that allows us to think of accessing and processing this data in parallel. Functional paradigm is a better fit here compared to imperative style. Functional paradigm has the framework in places to solve the programming problems in parallel which is in perfect alignment with big data situation.

Summary: Big data processing happens in parallel. Functional programming paradigm in an attempt to avoid mutable variables created the necessity to think programming in parallel. These two align together in perfect fashion and thus functional programming a perfect fit for big data processing.