Solve the Jumble!
|
In which we tackle the weighty problem of solving the Jumble puzzle in the daily paper. Usually, this is an easy puzzle to solve, although when your mind gets "jumble-blocked", it can tend to stay that way for a while. The "Jumble" puzzles are copyrighted by David L. Hoyt and appear in many daily newspapers, usually in the comics section near the crossword. The challenge is to unscramble a set of words. The individual words are usually 5 to 8 characters long. |
|
The human brain seems to be particularly gifted at solving such problems. Most people have had the experience of having the unscrambled answer for each word pop right into their head after a quick glance. This is followed by a short burst of intellectual smugness and fantasies about accepting the Nobel Prize for Extreme Cleverness.
And sometimes the unscrambling process stumbles and the letters just stare back at you, defiantly scrambled. At that point any number of tricks to jump-start the puzzle-solving process might work. Simple tactics like reversing the scrambled letters and giving your brain another go, or heuristics like looking for common groupings of letters ('th', 'ing', etc.) in order to simplify the number of combinations to consider.
This is the part of the puzzle we'll tackle. The actual Jumble takes selected letters from the correct answers, which the player must then unscramble to answer a bonus clue, sometimes straightforward but usually multi-word AND part of a horrific pun, and who want to encourage that sort of thing?
This is a fun script to write if you've figured out an elegant, simplifying hack (or, well, had it explained to you). It's a fun Comp Sci 101 exercise because of how inventive people can be at finding difficult solutions. It gave me the chance to reread a great essay by Jon Bentley of (at the time) Bell Labs, and learn a little Perl CGI and PHP scripting in the bargain.
How to Solve It
To unscramble the individual words, Jon Bentley proposed an elegant solution* from which we steal the central trick: sort the letters of the scrambled word into alphabetical order. Then compare this sorted arrangement of letters with a dictionary of words processed the same way (an alphabetical list of words each paired with a version of itself with its letters sorted alphabetically).
For example, say that one of our scrambled puzzle words is "lorac". We sort the letters into alphabetical order to obtain "aclor". We then search our dictionary for any five-letter words that also sorted to the pattern "aclor". (I used a list of about 100,000 words and found four solutions: "calor", "carlo", "carol", and "coral")
*'Aha! Algorithms', the second essay in his collection "Programming Pearls".
Lessons Learned
A quick summary of Bentley's suggested approach to the problem:
1. write a short program to 'sign' each word in my master list, meaning to pair each word with the letters from that word sorted in alphabetical order (separated by a space, say). The word 'jumble', for example, would be processed into the line:
bejlmu jumble
2. Run this 'signing' operation on your list of dictionary words, and sort the resulting output file using the system sort command.
3. Then 'squash' the sorted file created in step 2. Since some sets of letters can be arranged to form multiple words (the letters 'abder' for example, can be arranged to form the words 'bared', 'beard', 'bread', and 'debar'), we can create a new file where each unique signed combination appears just once, followed by all of the words it represents on the same line. For example:
abder bared beard bread debar
'Squashing' shrinks the file a little, but more important, each line in the resulting file starts with a unique combination of letters, which will be important when we start searching through it.
The final Perl script was short (less than 100 lines, and I've developed the feeling that a good Perl programmer might do it in 20). Even the 'slow' linear search version was pretty fast, so my longstanding bias against 'slow, interpreted programs' had to be revised.
The almost 'final' script was fairly simple:
- Collect a scrambled word from a Web page
- Lowercase the word, then create and save a 'signed' version of it.
- Load the contents of the 'squashed' file into an array
- Go through the array looking for a match for the 'signed' version of the scrambled word
- Display the answer (if any) and prep the page for another try.
As mentioned, that worked, but it wasn't very fast. The main problem was 'slurping' (technical Perl term, that) the file into an array. Perl never complains, but the dictionary is 100,000 lines long, and that step created a noticeable pause. The secondary problem was the subsequent linear search through the array looking for an answer.
I revised the solution by dividing the Squashed file into a number of smaller files, one for each length of word from 1 character to 30 characters, which makes for a smaller file to 'slurp' when the time comes for any given word. And I replaced the linear search through the array with a binary search, which gives the final performance a little boost.
And then I took the problem to a more contemporary conclusion and stored the dictionary of words, with their corresponding 'signed' versions, into a MySQL relational database for a very fast search indeed.
Final thoughts:
- Perl was impressive as a scripting language. It was efficient (a little code goes a long way), easy to learn, and very fast.
- A little thought and basic research revealed a compact means to prepare and store the dictionary information for rapid access.
- A little computer science goes a long way. The binary search (adapted from 'Algorithms in C' by Sedgewick) required just a few extra lines of code, but limits the amount of looking we have to do to 15 or 16 tests in the worst case (as compared to 20,000 or more in the linear search).
- PHP was also very impressive (see below), in particular allowing clean encapsulation of the working code within the active page design.
Scripts
Right-click (in Windows-land) and copy the link target. This Perl script assumes that it lives in the cgi-bin subdirectory. Edit as necessary:
The Jumble Perl Script (in a ZIP file)
The C program upon which the Perl script was based:
The Perl script assumes the existence of the following series of files (all contained in a single ZIP file). Each file represents words of a specfic length (denoted by the integer in the file name). In each file, each line is a "signed" pattern followed by the words that share that sign.
Copyright 2006 Mark Van Dine, All Rights Reserved

