Solving Sudoku with Neo4j

Published in

Neo4j Developer Blog

7 min readJan 1, 2020

In my last blog post, we looked at a Sudoku graph as a way to explore the recently release K1 Coloring algorithm for Neo4j. Playing with a Sudoku grid could give us a good feel for graph coloring, but the K1 Coloring algorithm doesn’t always converge to the optimum solution for a graph. In this post, we’ll build an algorithm for Sudoku that will find the optimum solution.

I experimented with several ideas for a Neo4j Sudoku solver with mixed results. Then I turned to the wikipedia page for Sudoku solving algorithms. The page introduced me to several ways to think about a Sudoku algorithm. I found that the approach of modeling the puzzle as an exact cover problem worked well for Neo4j.

We’ll create nodes for each possible move in Sudoku. A move represents placing a specific number in a cell. For example, “put a 4 at row 1 column 3” is a move. I will give my moves attributes for row, column, block, number, and status. The status can be “Yes”, “No”, or “Maybe” reflecting whether I have included the move in the solution, ruled it out, or it is undetermined.

//Create moves
UNWIND RANGE(1,9) AS row
UNWIND RANGE(1,9) AS column
UNWIND RANGE(1,9) AS number
CREATE (m:Move {row:row, column:column, number:number, status:"Maybe"})
SET m.block = 3*((m.row-1)/3) + (m.column-1)/3 + 1

The rules of Sudoku set up certain constraints. Each number must appear in a row exactly once. Each number must appear in a column exactly once. Each number must appear in a block exactly once. Each cell can contain only one number. We create nodes representing each of these constraints. Then, we create edges connecting each move to the constraints it could fulfill. You will need to have multi-line statements enabled to run the code below. If you’re not sure how to do that, you can find more information here.

//Create row requires number constraint
UNWIND RANGE(1,9) AS row
UNWIND RANGE(1,9) AS num
CREATE (c:Constraint {row:row, number:num, type:"Row requires number."})
WITH c
MATCH (m:Move)
WHERE m.row = c.row and m.number = c.number
MERGE (m)-[:MATCHES]->(c);

//Create column requires number constraint
UNWIND RANGE(1,9) AS column
UNWIND RANGE(1,9) AS num
CREATE (c:Constraint {column:column, number:num, type:"Column requires number."})
WITH c
MATCH (m:Move)
WHERE m.column = c.column and m.number = c.number
MERGE (m)-[:MATCHES]->(c);

//Create block requires number constraint
UNWIND RANGE(1,9) AS block
UNWIND RANGE(1,9) AS num
CREATE (c:Constraint {block:block, number:num, type:"Block requires number."})
WITH c
MATCH (m:Move)
WHERE m.block = c.block and m.number = c.number
MERGE (m)-[:MATCHES]->(c);

//Create cell requires number constraint
UNWIND RANGE(1,9) AS row
UNWIND RANGE(1,9) AS column
CREATE (c:Constraint {row:row, column:column, type:"Cell requires number."})
WITH c
MATCH (m:Move)
WHERE m.row = c.row and m.column = c.column
MERGE (m)-[:MATCHES]->(c);

When the puzzle is solved, each constraint will be connected to exactly one move with a status of “Yes.” Therefore, whenever we set a move status to “Yes,” we should set the statuses of the move’s neighbors two-hops away to “No.”

At the start of the puzzle, some cells are already filled in. The code below will update the statuses of some moves to reflect the initial clues for a puzzle. As we work through the puzzle, it will be important to know the order in which we changed move statuses. Our search for a solution might go down a dead end, and we would need to backtrack. We’ll use a property called “search” to keep track of the search order.

//Activate clues
MATCH (m:Move {row:2, column:1, number:1}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:4, column:1, number:2}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:6, column:1, number:8}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:7, column:2, number:7}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:9, column:2, number:6}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:3, column:3, number:5}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:5, column:3, number:6}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:9, column:3, number:3}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:1, column:4, number:2}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:2, column:4, number:9}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:5, column:4, number:4}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:8, column:4, number:6}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:3, column:5, number:4}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:9, column:5, number:1}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:1, column:6, number:5}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:2, column:6, number:7}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:4, column:6, number:9}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:2, column:7, number:6}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:5, column:7, number:1}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:2, column:8, number:2}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:6, column:8, number:7}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:7, column:9, number:4}) SET m.status = 'Yes', m.search = 0;
MATCH (m:Move {row:8, column:9, number:9}) SET m.status = 'Yes', m.search = 0;//Update conflicted moves
MATCH (m:Move {status:'Yes'})-[:MATCHES*2]-(m2:Move) 
SET m2.status = 'No', m2.search = m.search

To solve our puzzle, we’ll follow a version of Algorithm X created by Donald Knuth. This is a depth-first search, where we try to follow a branch all the way to a solution. If we hit a dead end, we backtrack to the last fork and follow it. Here are the steps.

Find the constraint node connected to the smallest number of moves with “Maybe” status.
If there are “Maybe” moves connected to the constraint in step 1, choose one of the “Maybe” moves and set the status to “Yes.” Mark the constraint and the maybe nodes with the next search number in case we need to backtrack later. If there are no maybe moves connected to the constraint in step 1, skip to step 4.
Set all the “Maybe” moves two hops away from the move selected in step 2 to “No.” Mark these nodes with the search number you used in step 2. Repeat the algorithm from step 1.
If there are no “Maybe” moves found at step 2, check to see if there are any “Maybe” moves left in the graph. If not, you have solved the puzzle!
If there are no “Maybe” moves found at step 2, but there are “Maybe” moves left in the graph, you have hit a dead end, and we need to backtrack. Find the highest search number and set all the moves with that number back to “Maybe.” Find the constraint with the highest search number and pick a different connected “Maybe” move to try and continue from step 2. If there are alternate “Maybe” moves, remove this search number from all nodes and repeat step 5 to backtrack further.

I will show you how to run the Cypher steps individually. You can check out this jupyter notebook to see how I used Python to put the steps together.

Step 1 of the algorithm looks like this.

//Find move to search
MATCH (c:Constraint)
WHERE NOT (c)<-[:MATCHES]-(:Move {status:"Yes"})
OPTIONAL MATCH (c)<-[:MATCHES]-(m:Move {status:"Maybe"})
WITH c, m
ORDER BY id(c), id(m)
WITH c, collect(m) AS maybes
ORDER BY size(maybes), id(c)
LIMIT 1
RETURN id(maybes[0]) AS move, id(c) AS constraint

If we got a result back, we’ll use the next query to make the updates for this search pass. Use the IDs that you received from step 1 as the parameters $moveId and $constraintId.

//Update moves
MATCH (m:Move)
WITH max(m.search) + 1 AS newSearch
MATCH (m)-[:MATCHES]->(c:Constraint)
WHERE id(m)=$moveId and id(c)=$constraintId
SET m.status = "Yes",
m.search = newSearch,
c.search = newSearch,
c.moveId = $moveId
WITH m, newSearch
MATCH (m)-[:MATCHES]->(:Constraint)<-[:MATCHES]-(m2:Move {status:"Maybe"})
SET m2.status = "No",
m2.search = newSearch

Repeat these two code blocks until the first block returns no results. When you get no results back, check to see if there are any “Maybe” moves left in the graph.

MATCH (m:Move {status:'Maybe'}) RETURN count(*) AS maybeCount

If you have zero “Maybes” left, you have solved the puzzle! Many easier Sudokus will be solved without having to backtrack.

If you do need to backtrack, we need to find the most recent search number, undo the decision, and take a different track.

//Undo and search for alternate branch at last depth
MATCH (m:Move)
WHERE EXISTS(m.search)
WITH m.search AS search, collect(m) AS moves
ORDER BY m.search DESC
LIMIT 1
FOREACH(move in moves | set move.status = "Maybe")
WITH search
MATCH (c:Constraint {search:search})
OPTIONAL MATCH (c)<-[:MATCHES]-(m2 {status:"Maybe"})
WHERE (id(m2) > c.moveId OR m2 is null)
RETURN id(c) AS constraintId, id(m2) AS moveId, search

If this code returned a moveId, you can remove the highest search number from the moves and constraints. Then you can plug the constraintId and the moveId into the “Update moves” code above and continue.

MATCH (n) WHERE n.search = $search SET n.search = null, n.moveId = null

If the “Undo and search for alternate branch at last depth” code doesn’t return a moveId, we have exhausted the search at this depth. We remove the last search number from the graph and run the “Undo and search for alternate branch at last depth” code again until we find a possible branch to explore.

//Remove last search ID
MATCH (n) WHERE n.search = $search 
SET n.search = null//Undo and search for alternate branch at last depth
MATCH (m:Move)
WHERE EXISTS(m.search)
WITH m.search AS search, collect(m) AS moves
ORDER BY m.search DESC
LIMIT 1
FOREACH(move in moves | set move.status = "Maybe")
WITH search
MATCH (c:Constraint {search:search})
OPTIONAL MATCH (c)<-[:MATCHES]-(m2 {status:"Maybe"})
WHERE (id(m2) > c.moveId OR m2 is null)
RETURN id(c) AS constraintId, id(m2) AS moveId, search

Solving Sudoku with Neo4j

Written by Nathan Smith