Solving Sudoku
... Search for the Weakest Link
First 'Hello Kitty', and now this. The Japanese puzzle Sudoku is sweeping, if not the world, then at least the media. I learned about it from an audio report by Seth Stevenson of Slate. Several British newspapers now publish a daily Sudoku column.
My goal was to build a Sudoku Solver. In the process I came up with a good approach for that, and a reason to build an original database of sample puzzles.
|
Press 'New' to choose a new puzzle from the database. Press 'Clear' if you want to erase the board and enter values from another puzzle. Press 'Solve' to figure out the puzzle and fill in the remaining squares. |
How to Solve It
Working out a solution to a Sudoku board is an iterative process of elimination. For each blank square on the big board, figure out how many numbers are 'eligible' for that space. To do this you'll start with the full set 1, 2, 3, 4, 5, 6, 7, 8, and 9. Then you'll eliminate any of these numerals that are found elsewhere in A) the particular Row, B) particular Column, and C) 'local' 3x3 grid for the Cell you are considering.
When you scan all of the blank cells in a Board this way, you'll usually discover one or two that have a small number of possible values remaining ... this is the Board's Weakest Link. If you find a cell with only one remaining candidate, fill it in and then repeat the process of scanning the remaining blank cells for candidates. If the Weakest Link has more than one possible value, you'll have to try each in turn to find which will lead, ultimately, to a solution or a dead end. Keeping track of this is a little daunting at first (although as numerals are added, the number of 'Weak Link' alternatives quickly drop to 1 or 0). It is an apt application for the sort of depth-first with backtracking approach used in the Scramble Squares puzzle.
Symmetry and Uniqueness
Besides solving puzzles entered by users (erase the Board with the Clear button, enter the values you know in the correct spaces, then press the 'Solve' button), I wanted to provide a library of puzzles to test with. Rather than swipe puzzles from other Sudoku sites (tempting, but wrong), a Perl script was written to generate random boards.
This turns out to be more complicated than just throwing a random set of numerals on a Board and seeing if it Solves. First of all, my exhaustive study of existing Sudoku boards (well, two of them, anyway) suggests that a "well-formed" board is symmetrical in the 81-square grid such that, for example, if a number is placed in row 2, column 7, then a number also needs to be placed in row 8, column 3 (the numbers themselves are chosen at random). Asymmetrical boards are of course possible, but less interesting. One could take a single 'solved' board and remove random sets of cells to create any number of asymmetical boards to start with.
There are also multiple paths to find a solution and (possibly, although I've not actually verified this) multiple solutions to some boards. Of the first 1,000 Starting Boards I've created, only 18 have a single path to solution ... less than 2%. One board has over 100,000 paths to success!
Programming Notes
I wrote the original solver in Perl. The 'Big' board is easily represented as an 81-character string, and the depth-first/backtracking algorithm is recursive, so a simple data structure like this keeps things manageable. I then had to port the Perl script to PHP for the Web page. I'd start with PHP, except that I can't figure out a good development tool for it. I use the Komodo IDE from ActiveState for Perl, which does a marvelous job spotting syntax errors, and ActiveState Perl provides clear error messages when something goes wrong. In PHP, everything from a minor syntax error to a major flow of control flaw resolves the same way ... a blank Web page staring back at you. Debugging is a slow, painful process even though PHP is ultimately a clean, fairly powerful scripting tool.
Since Sudoku puzzles can take a lot of steps to solve, resulting in a Web page slow to respond, I keep track of every successful puzzle/solution in a database, so that subsequent requests only have to retrieve the answer, rather than figure it out again. I haven't tried to figure out if solutions are always unique ... I suspect they aren't but I don't know. I do know that solutions can be found by multiple paths, but I only keep track of the first one I find.
Real Code
I was recently sent a link to this geographically impressive example of a Sudoku board on the UK's Sky One site. I have no idea if this thing is real or a Photoshop miracle (although they claim Guinness Book of World Records authenticity), but I once again tip my hat to British creativity.
The puzzle itself has you guessing within the first several moves. I dislike this sort of Sudoku, preferring those that yield to logical deduction. But the recursion in my algorithm is designed to handle exactly this problem ... otherwise a simple loop would be the far simpler approach to take. The trouble is that PHP is not a great scripting choice for deep recursion. I'm no authority on the subject, but I've followed developer discussions about this and I'm inclined to believe it (it depends on whether you use PHP 4 or 5, but under 4 the available memory space for recursion is very constrained.)
The Sky One puzzle requires no fewer than 10 nested guesses (about a 1 in 1000 chance of getting all the correct guesses right the first time if the solution is unique ... which it isn't, but we'll save that math lesson for another time). A Sudoku board is usually considered very difficult if it requires two guesses, so a requirement of 10 is pretty much over the top. Still, my PHP algorithm, running in its Apache environment, stumbles on it, so here I'd like to provide the Perl script on which it was based. Perl will support very deep recursion, if needed.
Recursion, for those who aren't familiar with the term in this context, occurs when a function in a program calls itself in order to do what it is supposed to do. The old joke is that if you look up "recursion" in a programmer's dictionary, the definition says, "See recursion". (That actually defines an infinite loop, which is an everyday threat when working with recursion, but as a friend of mine says, "If you buy the premise, then you buy the joke.") A practical example often given is for a function that computes the factorial of an integer N. For its code, such a function "factorial(N)" might simply return the value "N * factorial(N - 1)". As long as the same function tests for the special case of N = 1, and just returns 1 in that case, it is an extremely compact way to capture the functionality
First part of the script:
#!/usr/bin/perl -w use strict; our $move_ctr; |
Sure, uninspired, but it must be done. Of course, if your perl is installed elsewhere, update that first line as needed. Next, I'm going to show the main body of the program. In practice, I always enter the subroutine definitions first, and that's how it will be in the actual file. |
# +-----------------------------------------------------------+
# | M A I N program. |
# | |
# +-----------------------------------------------------------+
my $board;
my $solution;
my $guesses;
my $answer;
my $state;
print_time("Start time");
# Gather new board from STDIN
$board = $ARGV[0];
$answer = $board;
$move_ctr = 0;
($solution,$guesses,$state) = solve($answer,0,0);
if ($state == 0) {
die("Could not solve the board!\n");
}
print "Solved after $guesses guesses and $move_ctr moves:\n\n";
print_board("Before",$board);
print_board("After",$solution);
print_time("Stop time");
|
We gather an 81-character string from the command line that represents the board, done this way just to keep it utterly simple. The string starts with the top row, each row has nine characters corresponding to the numbers in that row. If a cell is blank, we put a "0" (zero) in its place. So the command to solve the Sky One puzzle will look something like: perl sudokusolve.pl <81 character string> We get the board string from the @ARGV array, but since we may want to compare the original board to the solution, we leave it as is and make a copy of it into the variable "$answer", which we will update with our solution. The print_time subroutine takes a label for an argument and simply prints out a timestamp. I am always wondering how long things take to complete, so I'm in the habit of printing this before and after big steps. Otherwise, all the main program does is call the "solve" subroutine, which takes as arguments a game board, a 'level' counter (to track recursion depth ... we start at level 0), and a counter for the number of guesses made on the way to a solution. A guess happens when the best alternative on the board still requires you to guess among two or more possible values for that cell. The solve subroutine looks like this: |
# +-------------------------------------------------------------+
# | Subroutine solve |
# | |
# +-------------------------------------------------------------+
sub solve
{
my $board = $_[0];
my $level = $_[1];
my $guess_ctr = $_[2];
my $nextrow;
my $nextcol;
my $score;
my @legal;
my $nlegal;
my $val;
my $i;
my $c;
my $newstr;
my $solution;
my $guesses;
my $state;
($score, $nextrow, $nextcol, @legal) = score_board($board);
if ($score == 0) {
return($board,$guess_ctr,0);
}
# If $score is still 10, then all of the spaces are filled!
# This should be a winning board. "state" returned is 1.
if ($score == 10) {
return($board,$guess_ctr,1);
}
# Otherwise, we take our best alternative and push ahead ...
$nlegal = @legal;
$move_ctr++;
# If we have a cell with only one valid response, fill it in
# and go on ...
if ($nlegal == 1) {
$val = $legal[0];
substr($board,($nextrow * 9) + $nextcol,1) = $val;
return(solve($board,$level + 1,$guess_ctr));
}
# Otherwise, we will try a set of alternatives until we find
# success (or run out of alternatives!)
$guess_ctr++;
# The following is an optional diagnostic for looking at
# recursion depth.
print STDERR "Guess $guess_ctr, checking $nlegal possibilities.\n";
for ($i = 0;$i < $nlegal;$i++) {
$val = $legal[$i];
substr($board,($nextrow * 9) + $nextcol,1) = $val;
($solution,$guesses,$state) = solve($board,$level + 1,$guess_ctr);
# See if that path led to a solution
if ($state == 1) {
return($solution,$guesses,$state);
}
}
# We only get here if all of our alternatives failed. Return
# the last failure.
return($solution,$guesses,$state);
}
|
The solve subroutine invokes the score_puzzle subroutine, and acts on its results. score_puzzle simply looks at each and every square in the game board. If the square is vacant (represented by a zero in the argument string), it analyzes the Sudoku board to determine what numbers could be placed there. It looks at the column the cell is in, the row, and the 'local' 3x3 grid that the cell is in, and eliminates any of the integers from 1 through 9 already found in any of those locations. It then counts how many possible integers remain. The subroutine keeps track of the row and column with the lowest possible alternatives. |
# +-------------------------------------------------------------+
# | Subroutine score_board |
# | |
# +-------------------------------------------------------------+
sub score_board
{
my $str = $_[0];
my @board;
my @scores;
my $i;
my $j;
my $cell;
my @legal;
my $nlegal;
my $lowscore = 10;
my $lowi = 0;
my $lowj = 0;
my @lowlegal = ();
# The @scores array is a 9x9 grid representing the board $str.
# If a cell has an @scores value of -1, then it already has an
# assigned value on @board.
# Otherwise the cell value is the number of unused numbers that
# could be assigned at that row and column. If this value is
# zero (0), then we are at a dead-end. Otherwise, we want to
# return the row and column of the cell with the lowest number
# of alternatives, and the list of those alternatives, to try
# in order.
for ($i = 0;$i < 9;$i++) {
for ($j = 0;$j < 9;$j++) {
$board[$i][$j] = substr($str,($i * 9) + $j,1);
}
}
for ($i = 0;$i < 9;$i++) {
for ($j = 0;$j < 9;$j++) {
$cell = $board[$i][$j];
if ($cell != 0) {
$scores[$i][$j] = -1;
next;
}
@legal = legal_values($str,$i,$j);
$nlegal = @legal;
$scores[$i][$j] = $nlegal;
if ($nlegal < $lowscore) {
$lowscore = $nlegal;
$lowi = $i;
$lowj = $j;
@lowlegal = @legal;
}
}
}
return ($lowscore, $lowi, $lowj, @lowlegal);
} |
|
If the $lowscore returned by score_board is zero, then there is a cell on the board that can not be filled with a legal value ... which means the board is at a dead end and can't be solved, so the 'solve' subroutine will return a state that says "Give Up!". If no blank cells are found, that means that the board IS solved, and "solve" returns a state that says "Found It!" If score_board returns a one, it means it has found a cell where only one possible number fits. The "solve" subroutine places this number in the indicated coordinates, and then calls itself to analyze the new board. If the return value is greater than one, then the subroutine iterates through the list of possible legal values until a solution is found (or all the alternatives fail). Finally, for those interested, the following is the PHP translation of the main functions that I came up with when scripting the page. Other code is involved, including a real Frankenstein's Monster of a function to extract the board values from the form into the array, but the main solve and score_board functions are here ... connecting the pieces will be straightforward. |
$str = "000004050000003700208006000540000000100000003";
$str .= "000000028000200106003700000070500000";
$original = $str;
$nctr = 0; // Counts steps in solution
$solution_ctr = 0; // Counts number of solutions found
$solutions = Array(); // Array of solutions found.
function score_board($str) {
global $nctr;
$lowscore = 10;
$lowi = 0;
$lowj = 0;
$lowalts = "0123456789";
$nctr++;
# The $scores array is the same 9x9 grid as the $board array. If
#a cell has an $scores value of -1, then it already has an assigned
# value on $board. Otherwise, the cell value is the number of
# unused numbers that could be assigned at that row and column.
# If this value is zero (0), then we are at a dead-end. Otherwise,
# we want to return the row and column of the cell with the lowest
# number of alternatives, and the list of those alternatives, to
# try in order.
for ($i = 0;$i < 9;$i++) {
for ($j = 0;$j < 9;$j++) {
$board[$i][$j] = substr($str,($i * 9) + $j,1);
}
}
// Initialize $alts to allocate some space to it.
$alts = "0123456789";
for ($i = 0;$i < 9;$i++) {
for ($j = 0;$j < 9;$j++) {
$cell = $board[$i][$j];
if ($cell != 0) {
$scores[$i][$j] = -1;
continue;
}
# Prep array to keep track of numbers that can and can't
# be played in this particular location. Note that '0'
# denotes an unused cell ... the numbers 'in play' are 1
# through 9.
for ($k = 0;$k <= 9;$k++) {
$ctr[$k] = 0;
}
# Define the row and column of the top left element of
# the 'local' 3x3 grid.
$localr = $i - ($i % 3);
$localc = $j - ($j % 3);
# Mark all numbers already used in the 'local' 3x3 grid.
for ($ii = $localr;$ii < ($localr + 3);$ii++) {
for ($jj = $localc;$jj < ($localc + 3);$jj++) {
$value = $board[$ii][$jj];
if ($value != 0) {
$ctr[$value] = 1;
}
}
}
# Mark all numbers already used in the current row.
for ($jj = 0;$jj < 9;$jj++) {
$value = $board[$i][$jj];
if ($value != 0) {
$ctr[$value] = 1;
}
}
# Mark all numbers already used in the current column.
for ($ii = 0;$ii < 9;$ii++) {
$value = $board[$ii][$j];
if ($value != 0) {
$ctr[$value] = 1;
}
}
# Now count up possible moves and record the score. A
# number is available to play in a current space if there
# is a zero in its place in the @ctr array.
#
# If this is the current Low Score, keep track of it!
$moves = 0;
for ($k = 1;$k < 10;$k++) {
if ($ctr[$k] == 0) {
$alts[$moves] = $k;
$moves++;
}
}
$scores[$i][$j] = $moves;
if ($moves < $lowscore) {
$lowscore = $moves;
$lowi = $i;
$lowj = $j;
$lowalts = substr($alts,0,$moves);
}
}
}
$results[0] = $lowscore;
$results[1] = $lowi;
$results[2] = $lowj;
$results[3] = substr($lowalts,0,$lowscore);
return $results;
}
function solve_sudoku($str,$level) {
global $solution_ctr;
global $solutions;
global $nctr;
$results = score_board($str);
$score = $results[0];
$nextrow = $results[1];
$nextcol = $results[2];
$alts = $results[3];
if ($score == 10) {
$solution_ctr++;
$solutions[] = $str;
return;
}
for ($i = 0;$i < $score;$i++) {
$c = substr($alts,$i,1);
$newstr = $str;
$newstr[($nextrow * 9) + $nextcol] = $c;
solve_sudoku($newstr,$level + 1);
}
}
function PushSolveButton($request_page,$request_ip) {
global $nctr;
global $solution_ctr;
global $solutions;
global $original;
$before = gather_board($_POST["grid"]);
$original = $before;
// Database stuff removed. The final application checks a
// database of 'known' puzzles to see if has already solved
// the board and if so just retrieves and displays the
// solution.
//
// Otherwise, assume $ctr resolves to 0 (board not found
// in the database, so solve). 'gather_board' is just a
// function that gets the data from a grid. I think the
// other functions are pretty clear about what that array
// looks like!
if ($ctr == 0) {
// The board was not found in the database. Run the solver.
$nctr = 0; // Counts steps in solution
$solution_ctr = 0; // Counts number of solutions found
$solutions = Array(); // Array of solutions found.
solve_sudoku($before,0);
if ($solution_ctr > 0) {
$after = $solutions[0];
// Check the database, and add this board/solution if it
// is new.
update_database($original,$after,$nctr);
showboardform($after,$before);
}
else {
showboardform($before,$before);
}
}
else {
showboardform($after,$before);
}
}
|
The full Perl script is provided here (in a ZIP file so it won't attempt to execute when you press the link ... extract the .pl script before submitting it to Perl on your machine!)
Copyright 2006 Mark Van Dine, All Rights Reserved

