My First Haskell Program (Part 1)

While pretty much my whole career has been spent doing work in dynamically-typed programming languages, in the past couple months, I’ve started getting a lot more interested in type systems. In Javascript, I’ve been using flow.js, and in Ruby, contracts.rb.

These new type systems, and the type systems in other modern languages often draw inspiration from Haskell. I’ve gotten pretty tired of reading about references to Haskell and having them go completely over my head, so I finally decided to bite the bullet and learn it.

What is Haskell?

Haskell is a general purpose programming language that is distinguished by two main features:

  • It is a “purely functional language”. This means that any function written in Haskell if called with the same arguments will return the same values, and does not have any “side effects”. This means that it will not modify an array or map, print anything out or do any other kind of I/O.
  • It has a very powerful system of static types.

My Approach

I find that I tend to not learn much while going through tutorials, so I decided to solve a simple problem and learn what I needed to learn along the way:

Write a program that takes in a string containing a bunch of words, along with a maximum line length, and prints them the words, minimizing the amount of space at the end of each line.

I chose to define “minimizing the amount of space at the end of each line” as minimizing the sum of the squares of the space at the end of each line (to make the space more balanced).

Here’s a Python solution that I wrote a while back to do this:

Don’t worry too much about reading the details of the program. If you’re curious about the algorithm, basically, what this is doing is iterating over each word i, and then for each word j from the beginning of the string to i, choosing the cheapest cost j, where the words from j to i are on a line together as the last line in the output string. The cost of placing a line break before j is the amount of space left over on this last line, plus the cost of just considering the words from 0 to j.

Alright, now, to rewrite it in Haskell!

The first thing I decided to start with was putting together a program that computed the “minimal cost” of the optimal solution, rather than actually printing out the words in the paragraph, like I do in the Python program.

The familiar and easy part

The first step I took in getting this program to work was handling inputs and outputs. The interface is as follows:

# "10" here is the max line length to use, and words.txt is
# a file containing the words to print.
$ ./consistent_printer 10 words.txt

The entrypoint to any Haskell program is a function called main, and the operation for pulling out the command line arguments looks like this:

main = do
[lineLengthRaw, fileName] <- getArgs

To extend this to print out those arguments would look like this:

main = do 
[lineLengthRaw, fileName] <- getArgs
putStr lineLengthRaw
putStr fileName

For those who have mostly worked with imperative languages, this probably looks very familiar — it simply parses and writes the command line args to the variables lineLengthRaw and fileName, and then prints them out using the putStr argument. Now, if you remember what I mentioned earlier, Haskell is a language where there are not supposed to be side effects; all functions are pure. There are clearly some side effects going on here — we are pulling values from the environment, and outputting stuff in the console. The reality of any programming is that some amount of imperative processing is going to be required. Haskell provides the do keyword that allows programmers Haskell programmers to do that. I’m not going to go into details on how it works here, but if you’re interested, the StackOverflow answers here are great.

Inverting Control Flow

The bulk of the Python program I wrote is a nested for loop. Here it is again, without the comments, and without any reference to previous_break_indexes for simplicity:

for i in xrange(len(words)):
minimum_cost = sys.maxsize
for j in xrange(i + 1):
line_length = len(" ".join(words[j:i+1]))
if not line_length > max_line_length:
extra_space = max_line_length - line_length
previous_cost = 0 if j == 0 else cost_array[j - 1]
total_cost = (previous_cost + (extra_space ** 2))
if total_cost < minimum_cost:
minimum_cost = total_cost
cost_array[i] = minimum_cost

Haskell doesn’t have for loops, so we’re going to have to do things differently here. Why it doesn’t have loops is related to the purely functional nature of the program — a for loop doesn’t have any value at the end, it just executes some code multiple times. In a world with pure functions and no side effects, executing a block of code multiple times and not returning any value is completely pointless. For more detail on why this is the case, this article is a great resource.

Therefore, Haskell has other mechanisms that it uses for looping. For instance, it has the map function, which takes a list and a function, and applies that function to each element of the list and returns the new list. In addition to map, it has a whole slew of other array operators.

What is the right option here?

Well, let’s break down what this code is doing. What are we computing on each iteration of the loop is the value of cost_array[i]. So, instead of computing this in a loop, we could simply write a function that given , returns the value of cost_array[i]. This value, however, is dependent on the results of another for loop. What’s happening in that for loop though is that we are building up the value of variable minimum_cost. Similarly to above, we can instead have a function that given an i and a j, returns the cost, use map to call it for all possible j values, and then take the minimum.

Let’s implement it:

We’ll start with the outer loop, and implement a function called costOf. It’ll take an index, as well as the list itself and a maximum line length:

-- `costOf` refers to the minimum cost if linebreak
-- is placed after the ith word.
costOf :: [String] -> Int -> Int -> Int
costOf list max_line_length i
| i == 0 = 0
| otherwise = minimum costAtEachJ
where costAtEachJ = map (costAtJ list max_line_length i) [0..i-1]

The line costOf :: Int -> [String] -> Int -> Int is the type signature, and specifies that this function takes an Int, list of strings, and another Int, and returns yet another Int. The | operator is what’s called “pattern matching” in Haskell and allows you to specify what the value of the function ought to be based on some conditions. Here, I am saying that if i == 0, then the function should return 0, and that otherwise, it should return the minimum value of the list costAtEachJ, which contains the values of the minimum cost for each possible value of j.

The where operator allows you to define local variables — here I am defining the variable costAtEachJ, which essentially is the value of the inner loop in the Python program (we’ll define the costAtJ function shortly). To get ahead a little bit, the costAtJ function will take four arguments, list, max_line_length, i, and j. What I’ve done here with the map function is called currying — in Haskell, calling a function with fewer arguments that are required returns a new function that only takes the remaining arguments. Therefore, calling costAtJ list max_line_length i returns a new function that takes a single argument j. Therefore, when we pass this new function to map , it applies that function to each element of the list specified by [0..i-1].

Moving on, computing costAtJ is a a little trickier:

-- `costAtJ` is the cost incurred by placing a newline
-- after j.
costAtJ :: [String] -> Int -> Int -> Int -> Int
costAtJ list max_line_length i j
| lineLength > max_line_length = (maxBound :: Int)
| otherwise = previous_cost + extra_space ^ 2
where lineLength = length (intercalate " " (drop j (take i list)))
previous_cost :: Int
previous_cost
| j == 0 = 0
| otherwise = costOf j list max_line_length
extra_space = max_line_length - lineLength

While computationally, this function is trickier, it doesn’t really use any new concepts. The only new things I introduce here are the intercalate, drop, and take functions. What drop j (take i list) essentially does is just return the list from indices j to i, and intercalate is essentially what join is other languages — it converts a list to a string, using the first argument as the delimiter.

So, with these two functions written, we have the building blocks to build up a whole program that computes the minimum cost for a list of words. Here’s the solution:

What’s missing?

There are some very clear differences algorithmically from the Python program I presented. Notably, the cost_array list is completely absent — we aren’t actually storing a list anywhere here. This is actually quite problematic — even if this program is taking up less space, every time we encounter costAt, we have to recompute it, and we end up computing the same values over and over again.

In the next couple posts, I’ll explore strategies for solving this problem and “memoizing” the costAt function, as well as reincorporating the break_indexes variable and printing out the solution.

What have we learned so far?

In Haskell, the different control structures present require thinking about code differently. Instead of thinking in terms of using loops to build up values, you have to think in terms of values you are computing.

What’s interesting is that it’s possible to write code like this in other languages. In the Python program, we could have easily written a very similar costAt function that recursively calls a costAtJ function. Many modern languages support first-class functions and allow control structures like map, which is necessary to write code like this.

The main difference then, is that unlike Python, Haskell requires code be written in a functional manner and does not support breaking the paradigm — there isn’t an easy way out. Writing Haskell code is great training in writing code in a functional way for this reason, you have no choice but to do so.

Also, debugging Haskell programs feels a lot more like debugging SQL queries than like debugging Python programs. For instance, in the Python program I include above, it was easy for me to drop a bunch of print statements when I was debugging it. There is no notion of “dropping in print statements” in Haskell programs in a similar way — to debug logic issues, you have to use special functions like trace to do a similar kind of debugging, which print out which function calls are made during the course of a program run. This is, of course, all you would care to see in print debugging, as variables in Haskell functions are immutable, and cannot change during the invocation of a function.

Another observation is that it’s a little bit harder to reason about the runtime performance about this program written without the for loops. In the python version of this, it’s easy to tell that the program runs in O(n²) time, because it’s two nested for loops, while in this case, it requires fully reading and grokking the code to determine that as is, it runs in O(n!) time (in a follow-up, we’ll add back the cost_array array and make it O(n²) in the Haskell version as well).

Conclusion

Getting started with Haskell was a lot of fun. The constraints you have to deal with force you to think about your code in terms of formulas and think in terms of “what should the value of this array be?”, rather than “how do I iteratively build up this array?”. While this particular example wasn’t great at demonstrating some of the benefits of functional programming — you could imagine that the assumption that functions are immutable makes adding certain features, such as memoization, easier. In the next post, I’ll be covering that.

I hope that this post demonstrates practically the kind of paradigm shift that programming in Haskell requires. Feel free to reach out to me on Twitter if you have any questions or comments!