Functional Programming And Formal Software Verification For Non Industry Applications
- Or -
Why you should learn Haskell and Coq

Fabian Schneider
17 min readOct 10, 2018

--

Introduction

One and a half years ago, I was reading an article on Medium titled “Why Object Oriented is a dead walking thing?”[1]. While I don’t fully agree with the article (OOP has a couple of flaws, but I still think that it is just misused often) it kept stuck in my head for a long time.

Shortly afterwards, I started a fascinating discussion with one of my university professors, in which he told me that I should give Haskell a try; just for the fun of tackling problems in a different way.

At that time, I often got told, especially by OO programmers, that “functional programming wouldn’t add any value to their work”. (I altered the original, often annoyed or passive aggressive statements.) As I don’t like being biased, I just gave the language a try; thereby falling in love with the functional paradigm.

After exploring this new world for a bit, I came across something called formal verification that was apparently used in the hardware industry for developing chips. I knew that chips were being developed using languages like VHDL or Verilog, which are essentially just programming (or description) languages. As chips always perform some specific function that essentially represents an algorithm, why not use this cool sounding technique for verifying that a chip works as expected for ‘normal’ software as well?

I found out that the field of formal software verification along with the functional paradigm was quite old and had had a couple of breakthroughs that actually made it usable. It made me wonder why these powerful things were predominantly used in academia.[2]

Therefore, I went out and tried to learn all these cool techniques I found with the notion of making them more popular in the world of software development. The rest of the article is for doing exactly that:

(Note that I still have a lot to learn, but at this stage, I can write about it in an understandable, not too abstract way.)

Application in the legal tech sector

For most people it is fairly easy to understand why for example hardware (which is designed by code in VHDL, Verilog and Co.) might be something that benefits from formal verification:

The designers have to be 100% sure that the chip they designed performs the function it is designed for in a 100% of the cases. The chip will probably be incorporated into a larger architecture and therefore needs to work reliable. A simple chip with very few inputs and outputs could of course just be tested for validity, but as the chip grows in complexity, we may find that we simply cannot test an infinite amount of input-output pairs, as we don’t have the time and resources to do that.

However, the same also holds true for most software that is being written.

Nowadays we perform a lot of code review and end-to-end tests (which is not inherently a bad system), but like with hardware, we cannot test software for an infinite amount of time.

Working a lot with people who aren’t programmers, I often get asked how I can be sure that the stuff I wrote really does the things it should do under all circumstances.

Using formal verification and functional programming when writing a library of just even a short but complex script, in addition to testing and code review, can be of enormous value when one has to explain why the thing is working or how one could proof that some error was caused due to wrong usage on the client side rather than by faulty code. Thanks to functional programming we can state that the pure functions we wrote will always produce the results intended, because the global state of the computing system cannot alter the calculations. (It also holds some psychological value when we can state, that we have a formal/mathematical proof for the correctness of the function or program.)

Especially in the legal tech sector, the developers as well as their clients (be it the company they are working for or an actual client using the software) have to be very sure, that the things they do are correct, as the fees for misconduct or failing to do something are pretty high and can cause an enormous amount of not only monetary loss but also loss of trust by clients.

So let’s look into

Functional Programming (FP)

What it is:

FP is all about not writing object oriented, imperative code. That means that your code won’t be executed in the way most people are used to (normally from top to bottom, following the path they are coding). I will explain the concept by using Haskell and pseudocode for the examples. (The Headings with — … — describe some of the differences between the functional and the imperative paradigm.)

In Haskell (like in other imperative languages) you have got a main function which is the entry point of your program. Programs in imp. languages will enter at the main function and then follow the instructions given to them. E.g.:

function main():
print assemble(3)
function assemble(x):
var y = 0
var s = “”
do until (y > x)
s += y++
loop
return s
// output : 0123

However, in FP languages we don’t have loops. One of the most important concepts is something nearly every imperative programmer hates[3]: recursion.

In Haskell, the example above could look like this:

Note that in Haskell a comment begins with .

main = putStrLn $ assemble 3
-- this is simply to print the result of assemble(3)
assemble 0 = "0" -- the base case of the recursion
assemble y = assemble (y-1) ++ show y
-- the recursion itself and the assembling of the string
-- “show y” is to convert from Int to String
-- output : 0123

Wait; why are there multiple assemble functions in the Haskell version? Is the function overloaded?

— Pattern Matching —

The short answer is: No. What you see here, is called pattern matching: When we call the function (e.g. in main), the parameter is matched with every “version” (every row) of the function.

So in this example 3 is first matched with 0 [from assemble 0 = ...], which is of course not applicable as 3 is not 0. Then 3 is matched with assemble y. As y is generic (it is matched with any value), the statement behind
assemble y = is being executed. [Note that we could as easily write “assemble val”, instead of assemble y; it is just the name we use to refer to the value later on (e.g. in show y).

Now what is going on here? First the main function is evaluated, resulting in the evaluation of the function assemble with the parameter 3, so far so clear. We can see, that we have “2” assemble functions: one with 0 and one with a variable y. The 0 and the y are the parameters of the function: So there is really only one function assemble, that just decides (by its’ parameter) if it should use the result “0” [if the parameter given is 0] or the recursive call in the second line [if the parameter given is something else than 0].

The recursive call itself should be pretty obvious: it counts down the value of y until it reaches 0 and then starts to append the values 1,2 and 3 to the respective results using [result of call] ++ y.

Where we want to get, instead of how we want to get there —

One other way to explain the difference between the functional and the imperative approach is:

In the imperative paradigm, we specify exactly how an algorithm should run; in the functional paradigm, we specify how the result of the algorithm looks like.

Take a look at the implementation and description of QuickSort in Haskell on this website: http://learnyouahaskell.com/recursion. (Just search for quicksort once on the site… ;)
Here we see, that instead of saying ‘okay let’s do this and then that and then depending on the result do that’, we say ‘what is a sorted list? Well it is the stuff bigger than the first element in the list, plus this element, plus the stuff smaller than that element. The smaller lists are again the stuff bigger than the first el…’
I think you see where this is going ;)

— No Variables —

The second thing you may have noted is that there are no variables in the FP code[4]. Therefore, when we code in a functional way, we cannot modify the global state (aka. we cannot create global variables that influence how our functions work)[5]. Most of the functions in a functional program are pure, which means that they don’t have any side effects. (No user input, no screen output, no global state changes, etc.)

Having no variables leads to the consequence that (1) whenever we call a function (that just increments every number in that list and returns it) with some list as a parameter, the list we passed stays unchanged and instead a new one is constructed that is then returned and (2) we use a lot of anonymous functions (lambdas), which you may know from languages like Java, JavaScript, C++, etc.

It is often argued that the first consequence is very bad, as we often want to have good performance. However, the compiler is good enough to keep that in mind and optimizes the code internally (we just don’t see that)[6]. Don’t take my word for it; take a look at some Haskell benchmarks ;)

— Strong Type System —

However, one of the most important features of FP languages is, what we call, their strong type system. If you have ever programmed, you are undoubtedly familiar with types (int, float, double, string, etc.).

If a language has a strong type system, types may not be implicitly converted into one another (float to int for example), meaning that if you call a function with an object as a parameter, the objects’ type has to be identical to the type of the parameter the functions wants to have. This feature can be used/exploited in a way that when we encode our logic, we can “use types as values” and therefore let the compiler catch logical errors in the program. Let me explain that again:

Instead of using values that have some type, we could just interpret a specific type as a specific value and work with the types instead of the values. This way, the compiler, which checks if all the functions get the correct types as parameters, checks if the “values” [the types we regard as values] are correct; leading to an implicit logical check by the compiler itself. An example:

If we want to write a function that subtracts 2 from some value and rely on the function to always return some value greater than 0 (maybe for using it as a divisor), in imperative style we would have to write it like this:

function calculateDiv(x):
if(x<=2) return 1
else return x-2

However, when not being careful, we could eventually make the mistake of using <=2 or even>2. Depending on where the function is used, this might only pop up in very special cases of the program, that we must not miss testing. Writing that function in a strong type system could result in this:

Note that this is just pseudocode to prove a point!

calculateDiv :: (Num n) => Successor (Successor (Successor n)) → n
calculateDiv n = n-2

What happens here?

First of all: calculateDiv is the name of the function. So why is it there multiple times? As explained before, functions in Haskell work primarily with pattern matching. Here, however, there is just one line that actually does something. The first line is just the type of the function. This can also be omitted, but is often written, because we can deduct a lot about a function when we see its’ type; especially in more complex cases. (In this case it actually couldn’t be omitted, because it is the integral part of how this works. It can be omitted in normal Haskell code.)

In the first line, we define the type of the function: It should be a function, which takes a something (in front of the last arrow →) and returns a something (at the end of the →).

The thing it takes (the parameter) should have the type Successor (Successor (Successor n)) where n is a Num. In this case, Num would be a positive Number (so from 0 to +infinity). However, Num is not a value, it is a type. A value of that type can either be Z (for zero) or Successor n, where n is either Z or Successor n again.

To clarify this: We can now interpret Successor (Successor (Successor Z)) as the number 3.

So now we have a function that takes a value of the type Successor (Successor (Successor n)) which we can now identify as a number that is at least Successor (Successor (Successor Z)) [3] or greater if the n in there is not Z but again Successor n [succ. of some other n].

Now, at compiletime, the compiler would check if every call of calculateDiv is essentially done with some parameter of the type of Successor (Successor (Successor n)) [>3].

(Of course we would need to use this scheme of writing numbers all over our program and would therefore probably introduce some syntactic sugar, etc.)

This way of writing the function saves us the hassle of testing if for any value we give to the program, we always pass a number greater than 3 into that function.

If you didn’t get it right away, please read the section again carefully. After it, you should be able to see why we are able to know a lot about a function that is written in the functional paradigm when we just see the type of that function.

This is just a simple example, but when done correctly, this can also be applied for complex types in complex functions, leading to much shorter[7] testing time (which is of course far more interesting).

Here are some of the differences of the functional and the imperative paradigm:

Why use it?

As mentioned above, functional programming languages prevent (or at least aide) you writing pure code that will always produce the same results upon the same inputs. This makes testing such programs much faster, as we can argue a lot about how the code works and therefore prevent bugs before we run it and more importantly: let the compiler catch logical errors. Additionally, thanks to the syntax of e.g. Haskell we make complicated algorithms very easy and concise to read and therefore also to reason about and debug them.

The second argument why we should use it is it’s good compability with formal software verification, which we will hear about in the next section.

Why Haskell?

The short answer for this question is: it is the first functional programming language that I came across. Additionally, there is a big community and lots of libraries and code examples out there that range from simple mathematical algorithms to libraries for building UIs and even web frameworks[8] in Haskell.

Mainstream Applications

If you go to this website: https://wiki.haskell.org/Haskell_in_practice you can find some major use cases for Haskell. You will probably find that most of the cases are either very abstract and applicable in academia and research or that the examples are not really used by end users, but by other software developers. However, the more one gets to know Haskell, the more one sees that it is actually applicable in a lot of cases outside these areas. If explained to an end user in an easy way, using functional programming languages can benefit either party.

Formal Software Verification

What it is:

Formal Verification is the process of proving the correctness of systems (hardware/software/algorithms) using mathematical methods. Essentially it is used for ensuring, that something (be it a motherboard, a small chip, a library, a whole program or operating system) works as expected. For doing so, one formulates the “thing” mathematically, then formulates theorems that should or shouldn’t hold and finally proves or disproves these theorems in mathematical ways.

When using the process in Software Development, it is often referred to as Formal Software Verification.

Why use it?

By utilizing the methods of formal verification in software development, the time to debug and test a software product decreases strongly. As we specify the wanted outcomes and program behaviors and formally prove them, we strongly diminish the possibility of having logical errors in our programs.

However, there is a second reason: When we e.g. write a program that operates on natural numbers (and therefore should work for every natural number), we just don’t have the time to test the correct outcome for an infinite amount of natural numbers. This is where formal verification (and especially proving theorems about functions inductively) comes in handy.

Why Coq?

The same thing that applies to Haskell applies here too: Coq was the first system that I came across and that I found a lot of resources for. Additionally it comes with the neat functionality of being able to export programs directly to Haskell code (yeih!).

Mainstream Applications

As far as I am concerned, mainstream applications for using just proof assistants include hardware verification and proving big mathematical theorems like the Four-Color-Theorem[9] (which was completely proven within Coq). Software Verification wise there are a few verified compilers (for C++, GHC [Glasgow Haskell Compiler] is apparently in progress) as well as the verified operating system sel4.

How Functional Programming and Formal Verification are related

Now that we know what the techniques are essentially about, it should be easy to see, how they relate. The aim of formally verifying software is being able to proof or disprove certain properties and behaviors that the code we wrote should or should not express. So we are going to need some code.

When we write our code in a functional programming language (like Haskell or Gallina [the language used for Coq]) we can make use of their strong type systems that help us to filter out a couple of logical errors in advance. (Obviously, we can’t verify software that does not compile.)

The second advantage of using functional languages is that their code is fairly easy to reason about: We work a lot with guards, pattern matching and recursion which is simpler for an automated system to check than complex imperative code with nested loops, break statements and potentially not fully expressed logic (which is prevented by the proper use of pattern matching and the compilers of functional languages).

The advantage of being able to reason easily about functional code may be not as obvious as I would like it to be. However, you will be able to see that more and more when you start learning some functional programming language.

Now how would we write a verified library? When we want to write it using Coq and Haskell we could do it like this:

First, we think about the data structures and the types of the functions we are going to have. In functional programming, knowing the types of your expressions tells you a lot about how the program will work when it is implemented. After this, we can implement the structures and functions in the Gallina programming language (e.g. using CoqIDE as our IDE).

Now here comes the fascinating part:

Using Coqs’ theorem proofing abilities we state some theorems about how our functions should or shouldn’t work and proof or disprove them using the Coq tactics system (the core of the proof assistant).

After we did all our desired proofs, we can export the finished module directly into Haskell code, compile it, test it and afterwards use it as a library.

Thanks to the formal verification of the library, a whole lot of testing time can actually be cut because we have proven that the theorems we stated hold true for our program. As you see: the more theorems you can think about and proof, the more testing time you can cut at the end. Additionally, you got the advantage of actually having proofs for the claims you state when using or publishing your library or program.

(You may have noticed by now, that most of the time this process doesn’t save time; it just shifts the time you used for testing and writing test suites to thinking properly about the things you want, before you write the code and to the part where you prove theorems about your code. The method is not intended to save time, but for you to be more confident about not forgetting any important “tests”.)

Conclusion

In the article I have shown that functional programming as well as formal software verification hold a lot of practical value that is not constraint to the hardware industry or academia. We can use their advantages to prevent misunderstandings, protect ourselves as well as users of our software from legal complications and aide our software development process by introducing proper formal reasoning into it.

Don’t get me wrong: I do not believe, that object oriented programming is dead or that other paradigms lead nowhere. There is a time, place and project for each and everyone of them, but it surely doesn’t hurt combining the approaches to get some more security into our development process.

The two techniques exist since quite a long time (in fact one of the first programming languages, LISP is functional), I hope that they gain some popularity over the next years (just as 50 year old machine learning algorithms got recently).

Finally, utilization of old machine learning algorithms and the two discussed techniques are an excellent example that, even though we live in a world that is rapidly developing ever new things, we might sometimes benefit from looking back in time for inspiration.

So maybe instead of learning that new JavaScript framework, some software developers may try to rediscover some almost forgotten or just unpopular, weird, academic techniques and use them for building something truly new. After all it’s for the best.

Stay open minded ;)

If you plan on working with Haskell, this is the stuff I have installed:

Or if you are more of a VIM person, as me sometimes when I am too lazy to start the GUI on my Linux system:

Stephen Diehls’ VIM setup: http://www.stephendiehl.com/posts/vim_2016.html

For Coq just download the CoqIDE or EMACS with the ProofGeneral plugin and be sure to check out tutorials and examples, as it is very abstract and not intuitive to use at first glance ;)

Resources

The main Haskell spot: haskell.org
An online book to learn Haskell: learnyouahaskell.com
Another site for learning and reading about Haskell: https://mmhaskell.com/
Hoogle (Haskell Documentation): https://www.haskell.org/hoogle/
LiquidHaskell (reasoning in Haskell): https://ucsd-progsys.github.io/liquidhaskell-blog/
CoqIDE: https://coq.inria.fr/
About the name of Coq: https://en.wikipedia.org/wiki/Coq
Coq tutorial: https://coq.inria.fr/tutorial-nahas | https://coq.inria.fr/tutorial/
Pierce Software Foundations Course: https://softwarefoundations.cis.upenn.edu/
Formally Verified Image Browser Tutorial: http://www.michaelburge.us/2017/08/25/writing-a-formally-verified-porn-browser-in-coq.html
Sel4: https://sel4.systems/
Formally Verified C Compiler: http://compcert.inria.fr/compcert-C.html
Formal Verification in Hardware: https://is.muni.cz/th/vt3yj/thesis.pdf

For you blockchain lovers:
Formal Verification of Solidity: https://github.com/pirapira/ethereum-formal-verification-overview

References

[1] “Why Object Oriented is a dead walking thing?”, Yaron Biton, 20.09.2018 https://medium.com/@vyaron/the-returning-of-the-rich-client-aa545ced5a9a

[2] After starting to learn the techniques though, I realized why they are not common: They are not the simplest to learn and understand. Additionally many people get confused because in the beginning one often doesn’t see how one could eventually apply them.

[3] This is based on me watching my fellow students when they have to solve some problems recursively.

[4] Wait, wait; What is y in the assemble function? This y is not a variable in the classical sense. If it were a conventional variable, we should be able to change the value of that variable; however, we are not able to do that. We also can’t dereference the pointer to where it points and change the value in memory. It is a placeholder, yes, but not a classical variable. For the very interested:
There is also a construct called Do-Notation in Haskell:

doNotation = do
let x = “A”
let x = “B”
putStrLn x

Even in this casem the thing that looks like it is a variable that can be changed (x) is just syntactic sugar for evaluating a function and passing immutable values.

[5] Actually, we can alter global state, but that is quite complicated and violates the principles of functional programming. Nearly all functions in a functional program are pure and therefore independent from global state.

[6] One of these “optimizations” is Haskells’ lazy evaluation. If, for example, we construct a list with a 100 entries and later in the program, we only use the first three, essentially only the first three values of the list are calculated instead of all 100. This is very hard (though not impossible) in languages that do not have that feature, as we are responsible to do that on our own there.

[7] Of course, we could also encode a class Greater0 and use that class as type for the parameter in an object oriented program. These techniques of verification work for object oriented programming too. However most of the time they are far more complicated to realize and later hard to read and debug.

[8] https://wiki.haskell.org/Web/Frameworks

[9] 4-Color-Theorem: https://www.ams.org/notices/200811/tx081101382p.pdf | https://www.ucalgary.ca/rzach/blog/2005/04/four-color-theorem-verified-in-coq.html

The whole content of the article along with the title image is published under the License CreativeCommons (CC BY-NC-ND 4.0)

--

--

Fabian Schneider

Software Developer & PoC-Engineer TaxTechnology PwC Austria