Data Structures, Algorithms, and Sudoku

This article is inspired by the LeetCode challenge “Valid Sudoku

Saheel Sakharkar
Strategio
6 min readSep 9, 2022

--

Credit: depositphotos.com

Sudoku, the classic number-placement logic puzzle. A 9x9 grid containing nine 3x3 boxes, with some numbers already filled in. The rules sound simple enough: fill in the grid with a digit (1–9), such that each row, column, and the box contains one of each digit. However, once you start filling in numbers, you realize how challenging it can be.

If you haven’t played it, I recommend you try it. It can be head-scratching at times but a fun kind of head-scratching. Admittedly, I play Sudoku online, which reduces some of the headaches. If I make a mistake and enter an invalid number, I get a warning and can undo that entry. Though, it makes me wonder: how does the webpage do that? One possibility is that the answer key is stored elsewhere and can be directly compared to, easily pointing out errors. Another possibility is a check done after I enter a number to see if it violates any of the rules. What might that check look like?

Sudoku Board Validity

Let’s assume we’re given a Sudoku board. It can either be fully filled in with digits in all 81 squares or partially filled in with some non-digit characters in the blank squares. What are the rules we’re enforcing and what are our constraints? For this discussion, I’ll use Java as my programming language of choice.

Constraints:

  • board.length == 9
  • board[i].length == 9
  • board[i][j] is a digit 1–9 or ‘.’, representing a blank square

Rules:

  • Each row must contain the digits 1–9 without repetition
  • Each column must contain the digits 1–9 without repetition
  • Each of the nine 3x3 boxes contains the digits 1–9 without repetition

We can imagine our “checker” looks like this:

Given that our board is a 2D array, we can check the values in the rows and columns with row-wise traversal and column-wise traversal, respectively. But then the question is, how do we enforce the rules of no repetition of digits?

HashSets

In Java, HashSet is a class that implements the Set interface, an unordered collection of objects that doesn’t store duplicate values. HashSet uses a hash table in its implementation, allowing operations such as add, remove, and size to perform in constant time, assuming the hash function spreads the elements out between the buckets appropriately.

This is a fantastic tool for us to use in our problem, for its performance and functionality that doesn’t allow duplicate values.

Checking Rows

Let’s create a HashSet for our row values. We won’t initialize it immediately for reasons that will become apparent soon.

As stated before, we can search the board with a standard row-traversal loop. If we come across a digit, we can add it to the HashSet:

Note that we initialize rowSet at the beginning of our outer loop. This is so that when i increases, i.e. we move on to the next row, we reset our HashSet because our row scope has changed.

This code currently goes through each row and makes a temporary HashSet to store the digits seen in that row. One thing that’s obfuscated here is that the add method ignores duplicates. It does so by checking if the argument passed is present in the Set already. If it is, then it returns false. This means if the call to rowSet.add() ever returns false, we’ve encountered a repeating digit in our row, thus making the board invalid. Let’s account for that:

Checking Columns

We can then apply the same principle here to perform column-based traversal! Make note of the difference between the way rowVal and colVal retrieve their values.

Checking Boxes

Now comes the tricky part. Rows and columns are simple enough to check, but a box is a sub-2D array. There are a few different ways to approach this. I’m going to discuss a solution that leverages the row-based traversal we’ve already performed but feel free to ponder other methods of tackling this kind of check.

If we want to start inspecting boxes from the top left of the board with a row-based traversal, what are all the relevant indices? Let’s look at the board as a whole.

The bolded indices are all the squares that make up the top-left box. The italicized indices are the squares that will be accessed in our row-based traversal's first iteration of the inner loop. We’re only interested in the first three values for the top-left box. However, looking ahead, we can see that the middle three values will be relevant for the top-middle box, and the final three values will be relevant for the top-right box. If we use three separate HashSets, we can grab all of those and put them in their respective buckets!

Now, in addition to grabbing row and column values for our row and column HashSets, the left, middle, and right row values are added to our left, middle, and right box HashSets. But why did we initialize the box HashSets outside of the loop, unlike the ones for rows and columns?

That’s a sneaky bit of forward-thinking. Think about when we want to reset our box HashSets. Let’s look at our board again:

Our boxes begin consideration at rows 0, 3, and 6. Seems straightforward; we initialize our box HashSets if (i % 3 == 0).

The problem is that, at least in Java, the compiler can’t assume the if statement's body is ever executed. Putting our box HashSet initialization in that if-clause will result in a compiler error once it sees code referring to a possibly un-initialized HashSet. Luckily, this is easy to account for. Since we unconditionally initialize our box HashSets, all we have to do is modify the expression in our if statement:

And that’s it! We’ve built a Sudoku board “checker” to determine whether a board is valid. In a live Sudoku application, this function could be called in between player moves to warn against invalid entries. If you’re playing on pencil-and-paper, you could use it to check your progress.

Of course, the approach I’ve outlined here is just one of many. What alternatives did you think of? What are some ways this function could be optimized? Let me know your thoughts!

--

--