Prolog: A Brief Introduction

A language made for solving puzzles!

Published in

Dabbler in Destress

12 min readDec 26, 2020

Prolog’s a pretty interesting programming language that will make you think in ways very different from what you’re used to — whether you’re coming from a procedural, functional, and/or object-oriented paradigm.

It’s a declarative programming language from the 1970s and has its roots in first-order logic. What that means is you declare facts and rules as a programmer, run a query, and the computer gives you back valid results from the search space.

All of that’s very vague if you don’t really know what’s going on, so let’s dive in and explain concepts as we go along.

Basics

Atoms, or constants, start with lower-case letters or are integers: cat, john, 1, 2, 3, car. Variables are prefixed with upper-case letters or underscores: X, Y, Z, Result, _var, _1. What we’ve been seeing so far are terms. We can have compound terms too. Let’s use a compound term to declare a fact:

student(saif, nus).
student(saif, uoft).

The first fact states that saif is a student of nus . Notice the full-stop at the end of the “sentence.” We can insert more facts into the program. We call this a clause, with student being the predicate.

We can now run a query on these clauses using a variable. Let’s see how that’s done. [Lines with ?- is what I type, % start comments, rest is what Prolog says. ]

?- student(saif, X).
X = nus ;
X = uoft.

Prolog’s figured out that X must be either nus or uoft!

The astute among you might have already noticed that Prolog is an untyped language. That is absolutely correct.

Relations vs Functions

Before we dive deeper, it’s helpful to take a step back and realize that we’re not defining functions in Prolog. What we’re doing is declaring relations. What’s the difference?

Let’s say we have an add(a, b) function in *insert_favorite_language*. This function takes in 2 integers and gives us an output. So add(1, 2) would return 3.

In Prolog, you’d query add(1, 2, R) and the language would tell you R = 3. “Err, okay…” I hear you say. But that’s just 1 way to run the query; we can run it 3 more ways:

% checking / assertion
?- add(1, 2, 3).
true.?- add(1, 2, 4). 
false. 
% subtracting!
?- add(X, 2, 3). 
X = 1.
% and enumerating all possible answers!
?- add(X, Y, 3). 
... (infinitely many combinations of X and Y make 3!)

Sweet! Though that last step does seem to take an awfully long time :shrugs: We can modify it to take less time by giving Prolog more facts about Vars.

?- add(X, Y, 3), X > -1, Y > -1. X = 0
Y = 3; X = 1
Y = 2; X = 2
Y = 1; X = 3
Y = 0;

Notice, I’ve used the AND operator for the first time here: it’s the comma. We’ll see the OR operator shortly as the semicolon.

Logic, finally!

% example courtesy, CS2104, NUS
father(john, mary).
father(john, tom).
father(kevin, john).
mother(eva, tom).
mother(eva, mary).
mother(cristina, john).
male(john).
male(kevin).
male(tom).
female(eva).
female(cristina).
female(mary).

And let’s query on these facts:

% Who's mary's father?
?- father(X, mary).
X = john.% Who are eva's children?
?- mother(eva, C).
C = tom ;
C = mary.% Who is eva's daughter? 
?- mother(eva, C), female(C).
C = mary.

Let’s teach Prolog some more things about how families work.

daughter(X,Y) :- female(X), parent(Y,X).
grandparent(X,Y) :- parent(X,Z), parent(Z,Y).
brother(X,Y) :- male(X),sibling(X,Y).% first use of OR and HORN CLAUSE
parent(X, Y) :- father(X, Y); mother(X, Y).% prolog has the standard mathematical operators
sibling(X,Y) :- parent(Z,X), parent(Z,Y), X\==Y.

:- denotes a “Horn Clause”. I read it like an implication. For example, I read the parent clause as “If X is the father of Y OR X is the mother of Y, then X is a parent of Y.”

We can do recursions too! The following says that Y’s ancestor is Y’s parent X, or Z (where Z is X’s ancestor).

ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z),ancestor(Z,Y).

We have to be careful; we might get stuck on an infinite loop with left recursion in the following example:

% DO NOT DO THIS 
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- ancestor(Z,Y),parent(X,Z).

BUT HOW? !— Unification and Resolution

Prolog’s uncanny ability to answer questions (even simple ones like these) come from its Resolution and Unification algorithms.

Prolog does this using Unification — essentially trying to substitute constants in the places of variables (from known facts, clauses, and pattern-matching). As long as the end result of the substitution is consistent, it is one of the possible answers.

Prolog then goes back to square 1 and tries it again with all the other possible values for each variable, essentially doing a Depth-First-Search until it exhausts the entire search-space. This process of answering a query is called Resolution. Whatever can be proven is true; everything else is false (closed-world assumption).

Lists

We need to know a few more things before getting our hands dirty with real Prolog programs. First off, lists.

[this, [], is, a, random_term(X), lst].% joining 2 lists together
append([], Y, Y). 
append([X|Xs], Y, [X, Rs]) :- append(Xs, Y, Rs).

If you look closely at the append relation, it’s using Haskell-like pattern matching and accumulator pattern. You’ll find this a natural way of thinking if you’ve come from a functional language. [X|Xs] is used to represent a list in 2 parts: X is the front of the list, and Xs is the rest of the list. The 2 lines basically say:

If the first list is empty, the result is the second.
Otherwise, the result is the head of the first list + the result of joining the tail of the first list with the entirety of the second.

Let’s run a few queries with our new append function:

?- append([1,2,3],[4,5],Z).
Z = [1,2,3,4,5]. % computing the difference of 2 lists
?- append([1,2,3],Y,[1,2,3,4,5]).
Y = [4,5]% splitting a list
?- append(X,Y,[1,2]).
X=[], Y=[1,2];
X=[1], Y=[2];
X=[1,2], Y=[].% finding prefix of a list (_ means I don't care about this). 
?- append(X,_,[1,2,3]).
X=[]; X=[1]; X=[1,2]; X=[1,2,3].

Of course, some of these basic relations have been defined for you. You can look at the implementation of reverse() from SWI-Prolog here. Here’s a little brain teaser for you. What is the result of reverse(X, X). ?

Let’s tie this back to Resolution and Unification. Consider the sel(elem, […]) relation, which takes in a list, and tells us whether elem belongs in that list:

Controlling Backtracking using Cut Operator

Consider a relation that takes in a list and removes the duplicates:

remDupl([], []). 
remDupl([H|T], R) :- sel(H, T), remDupl(T, R).
remDupl([H|T], [H|R]) :- not(sel(H, T)), remDupl(T, R).

Fairly straightforward. The first line is the base case. The second line says, if H is found in T, discard it. The third line says, if H is not found in T then append H to result and continue recursing.

We’re actually controlling the branch the search tree takes using that first sel() clause. This turns out to be a recurring use-case in Prolog — so we use a “cut” operator denoted by !. Read this as: If you take this branch, don’t bother with the other options at this level.

remDupl([], []). 
remDupl([H|T], R) :- sel(H, T),!, remDupl(T, R). 
remDupl([H|T], [H|R]) :- remDupl(T, R).

More concise? More readable? You decide. I’m just saying it exists.

Mathematical Operators —and a tiny problem.

We have all the usual mathematical operators too: <, >, ≥, ≤, =\=, =:=, +, -, *. Let’s try and write the factorial function:

% declaration
fact(0, 1). 
fact(N, R) :- N>0, M is N-1, fact(M, R1), R is R1*N.

fact(15, R) correctly points out that R = 1307674368000. Don’t believe me? I’ll wait.

Unfortunately, `fact(X, 120).` will not give us 5. The problem is that the mathematical operators can only be used 1 way. So when we say M is N-1 , N must be known for this to work :(

We’ll get around this problem using the powerful Constraint Logic Programming module that makes Prolog much more fun to code in.

Constraint Linear Programming

One final thing you should know to make Prolog useful: Constraint Solving. A lot of the puzzles we solve in real life are based on a finite domain. If you know Linear Programming (or have taken Linear Algebra), you know how we shrink the search-space for a set of equations (to the point where brute-force is practical).

The package is quite simple to use: it has operators similar to the mathematical operators you’re familiar with (just with a # prefix) and some pretty nifty relations. The canonical example for the CLP module is the SEND+MORE=MONEY puzzle. Let’s look at the code here and understand what it’s doing. In case you aren’t familiar, “The send more money puzzle is the quintessential example of a constraint problem. It amounts to assigning different digits 0 through 9 to the variables [S,E,N,D,M,O,R,Y] such that SEND+MORE=MONEY”:

use_module(library(clpfd)).sendmoremoney(Vars) :-
    Vars = [S,E,N,D,M,O,R,Y],
    Vars ins 0..9,
    S #\= 0,
    M #\= 0,
    all_different(Vars),
                 1000*S + 100*E + 10*N + D
    +            1000*M + 100*O + 10*R + E
    #= 10000*M + 1000*O + 100*N + 10*E + Y.

We see the ins relation from clp saying all the Variables in the list Vars should take a value from 0 to 9. The next 2 lines say that the first digits of the words can’t be 0 (solution becomes trivial). The next line uses the all_different relation to make sure all the letters get a different assignment. The final line ensures Prolog understands that the position of the letter matters and that the 2 words add up (+) to (#=) MONEY.

Another advantage we get from using clpfd is that we can run our relations both ways if we’re not using the arithmetic operators we saw earlier. Here’s an example of a factorial function that can be run both ways:

% just note that the recursive call comes last. 
cfact(0, 1). 
cfact(N, R) :- N #> 0, M #= N-1, R #= N*R1, cfact(M, R1).

A Sudoku Solver

Still here? Tough cookie, you are. Alright, time to dive into a real problem. Let’s make a Sudoku solver. Oh, wait. SWI-Prolog already has one. No worries, the solution is for a 9*9 Sudoku with 3*3 blocks. We’ll generalize it to any N*N Sudoku, with B_R*B_C blocks. Let’s look at the original solver’s code and see what we need to modify:

sudoku(Rows) :-
        length(Rows, 9), 
        maplist(same_length(Rows), Rows),
        append(Rows, Vs), Vs ins 1..9,
        maplist(all_distinct, Rows),
        transpose(Rows, Columns),
        maplist(all_distinct, Columns),
        Rows = [As,Bs,Cs,Ds,Es,Fs,Gs,Hs,Is],
        blocks(As, Bs, Cs), blocks(Ds, Es, Fs), blocks(Gs, Hs, Is).

The first clause ensures we have 9 rows.
The second clause has this maplist() relation. It takes each of the 9 Rows, and provides it to a curried function same_length(Rows). Since we’re dealing with Square matrices, this line basically says: every row must be equal in size to the number of columns.
The append/2 relation takes the Rows and “unrolls” then to create an 81 element array called Vs. Every element in Vs must be between 1 and 9 (integers only). Remember, if this is to hold true, every element in Rows (i.e. the matrix) must be between 1 to 9 as well.
The next line uses maplist again. It says, “For every Row in Rows, every element in Row must be distinct.”
Transpose the Rows into Columns. We run the same maplist. This ensures that all columns are distinct too! Clever.

At this point, we’ve ensured all the rows and columns have numbers from 1 to 9, and no number is repeated in any single column/row. The last thing we need to do is make sure no number is repeated in the 3*3 mini-grids. This is what the remainder of the relation does:

We then give each row a name: As, Bs, Cs, and so on.
We then say the first 3 rows make a block. The next 3 make another block. And the last 3 make the last block.

Okay, so what’s a block? Glad you asked.

blocks([], [], []).
blocks([N1,N2,N3|Ns1], [N4,N5,N6|Ns2], [N7,N8,N9|Ns3]) :-
        all_distinct([N1,N2,N3,N4,N5,N6,N7,N8,N9]),
        blocks(Ns1, Ns2, Ns3).

A block is made of 3 lists of 9 elements each. Empty lists on all 3 are valid.
For a block to be valid, the first 3 elements of each list must make a list where every element is distinct. And, this condition should hold on the rest of the remaining elements of each list. Genius!

Let’s declare a problem 9x9 grid, sparsely populated, and see what solution Prolog comes up with:

problem(1, [[_,_,_,_,_,_,_,_,_],
            [_,_,_,_,_,3,_,8,5],
            [_,_,1,_,2,_,_,_,_],
            [_,_,_,5,_,7,_,_,_],
            [_,_,4,_,_,_,1,_,_],
            [_,9,_,_,_,_,_,_,_],
            [5,_,_,_,_,_,_,7,3],
            [_,_,2,_,1,_,_,_,_],
            [_,_,_,_,4,_,_,_,9]]).
% Rows represent the grid of problem 1, where _ denotes unknown. 
% The given numbers give additional constraints. 
% Make sure you fill up the unknowns with the rules described in 
% sudoku. 
% After that, print each row. ?- problem(1, Rows), sudoku(Rows), maplist(portray_clause, Rows).
[9, 8, 7, 6, 5, 4, 3, 2, 1].
[2, 4, 6, 1, 7, 3, 9, 8, 5].
[3, 5, 1, 9, 2, 8, 7, 4, 6].
[1, 2, 8, 5, 3, 7, 6, 9, 4].
[6, 3, 4, 8, 9, 2, 1, 5, 7].
[7, 9, 5, 4, 6, 1, 8, 3, 2].
[5, 1, 9, 2, 8, 6, 4, 7, 3].
[4, 7, 2, 3, 1, 9, 5, 6, 8].
[8, 6, 3, 7, 4, 5, 2, 1, 9].

A Better Sudoku Solver!

Warning: The following took me 2 days of thinking. Given that this was the first Prolog program I was writing, I’m quite proud of it! This was a real exercise in thinking about difficult problems, keeping a cool head, and trying again and again until I hit the right decomposition.

To generalize this solution, we have to find non-hardcoded ways to do essentially the same thing for any N, B_R and B_C.

Everything except the last 2 lines is trivial to fix. We just need a new variable N and replace 9 with N. Most of the hardcode resides in the last 2 lines.

We can’t give arbitrary names to the rows, so we need to partition the grid using B_R rows per group.

/* Result = first N elements of Src List, Rest is Suffix. */
take_firstN(0, Src, [], Src) :- !.
take_firstN(N, [H|T], [H|P], Suffix) :- 
    N #> 0, M #= N-1,
    take_firstN(M, T, P, Suffix).
/* Partitions our matrix into groups with N rows per group */
part(_, [], []) :- !.
part(N, Rows, [Group | RestGroups]) :-
    take_firstN(N, Rows, Group, RowsLeft),
    part(N, RowsLeft, RestGroups).

We made a relation that takes breaks a list into first X items and rest. The elements of the list do not matter — only their ordering does. The fact that each element is a row by itself does not matter to do the take_firstN() relation.
We then use this auxiliary relation to write part() — which makes partitions of B_R rows in each partition — given the entire grid.

We now have to feed each of these partitions to another function that ensures the mini-blocks are valid.

/* Take first N elements of each of M lists into M lists. RestOf contains suffix of each of those lists (2D) */
take_firstN_fromEach(_, [], [], []) :- !.
take_firstN_fromEach(N, [H|T], [Result|RestOfResults], [RestOfH|RestOfT]) :-
    take_firstN(N, H, Result, RestOfH), /* Take N from first list */
    take_firstN_fromEach(N, T, RestOfResults, RestOfT). /* Take N from other lists */valid_blocks(_, [X|_]) :- 
    X = [], !.
    valid_blocks(B_C, Partition) :-
    take_firstN_fromEach(B_C, Partition, Grid, OtherGrids),
    flatten(Grid, FlatGrid),
    all_distinct(FlatGrid),
    valid_blocks(B_C, OtherGrids).

take_firstN_fromEach takes a partition, takes B_C elements out of each row, and then ensures these B_C*B_R elements are all distinct. Sweet!

All that is left is to put our Sudoku relation together. Looks quite similar to the original!

/* Generalized Sudoku Solver
N - size of entire block of N x N
B_R - mini-block column size
B_C - mini-block column size
*/gen_sudoku(Rows,N,B_R,B_C):-
    N #>1, B_R #> 1, B_C #> 1, N #= B_R * B_C,
    length(Rows, N), maplist(same_length(Rows), Rows),
    append(Rows, Vs), Vs ins 1..N,
    maplist(all_distinct, Rows),
    transpose(Rows, Columns),
    maplist(all_distinct, Columns),
    part(B_R, Rows, Partitions),
    maplist(valid_blocks(B_C), Partitions).

We can now arrive at the solution for problem 1 with `problem(1, Rows), gen_sudoku(Rows, 9, 3, 3), maplist(portray_clause, Rows).` Best of all, it works with different sizes!

Conclusion

The next part of the assignment was even more challenging. The problem statement was: Given a lower bound X and an upper bound Y, can you populate a N*N grid with block sizes B_R*B_C such that the grid only has a unique solution? Can you do this fast? Can you do this with the minimal amount of spots used?

Needless to say, it took a while….

You can check out my solution here: https://github.com/Psyf/Sudoku/blob/main/sudokuB.pl. The solution runs in a reasonable amount of time, using a heuristic-based algorithm — the probability of finding a unique grid on the first run decreases with complexity. Special thanks to my friend Paras for brainstorming solutions with me.

I hope you’ve found value in this article — and an appreciation for different, powerful ways of thinking as seen in Prolog. I’d like to thank Prof Chin Wei Ngan for instilling in this noob an appreciation of different programming paradigms during CS2104.

I’d love to hear from you! How would you code a generalized Sudoku solver in your favorite programming language?