The Anagram Dictionary
Several months ago I found a little book called ‘Hidden Anagrams’ at Rodger’s Book Barn out in the Berkshires. (Published 1912 by Sturgis & Walton Company, New York … no author is listed). It contains a set of 100 rhymes, each of which is constructed to use a set of words (usually 3 or 4) that are all anagrams of each other. For example:
A little girl sat on the floor one day
And near her two ____ were busy at play.
But behold! when she ____ her eyes that way,
Their rude ____ filled her with such dismay
That she cried out “____!” and drove them away
The four blanks can be filled with the words ‘cats’, ‘cast’, ‘acts’, and ’scat’, all anagrams of the letters a, c, s, and t.
Okay, Shakespeare it ain’t. But I know one or two specialists in light verse, and thought it might be fun to attempt an updated version. My sample text has 103 verses (3 of the rhymes have two verses). The majority (80+%) use anagrams of 4, 5, or 6 letters. Five use 3-letter anagrams, 13 use 7-letter anagrams, and there are one each for 8- and 9-letter anagrams.
What makes the ‘Hidden Anagrams’ book surprising is that the author found this many anagram sets: there are 102 unique anagram letter sets (one repeat combo is found), and analysis of a set of a ‘common words’ dictionary with 100,000+ entries finds that there are less than 500 such sets available. The author covered an impressive subset of these without benefit of a computer.
To support development of a new book, an Anagram Dictionary would be useful. This is basically a recycling of the work done for the Jumble Solver. Instead of handing one entry at a time, however, we’d like to create a single-file reference that will list each unique combination of letters and the words that those letters can form.
We use the same database table developed for the Jumble work. In this table, the rows are the words from our list of 114,300 common words, and the fields we need are the ‘word’ itself, and its ’sorted’ equivalent: the sorted equivalent of a word is simply that word with its letters sorted alphabetically, so for example, the sorted version of ‘cats’ is ‘acst’. Every anagram of the letters ‘a’, ‘c’, ’s’, and ‘t’ will have the same sorted version of letters.
Knowing that, the following SQL query will retrieve every unique sorted combination of letters for words 3 characters in length or longer, that have at least 4 anagrams:
SELECT sorted, COUNT(sorted) FROM dictionary WHERE (LENGTH(sorted)>=3)
GROUP BY sorted HAVING COUNT(sorted) >= 4
ORDER BY COUNT(sorted) DESC, LENGTH(sorted) DESC
Once that list of unique ’sorted’ combos are identified, it is then a simple matter to loop through the list and extract all anagrams into a list. The following perl script will do the job, as long as your database table is ready to go.
For the poetically inclined, the output from the following script can be found here.
Now all I have to do is write the new book!
#!/usr/bin/perl -w
use strict;
use DBI;
our ($db, $host, $userid, $pwd, $connectionInfo);
sub get_anagram_bases {
my ($mincount, $minlen) = @_;
my ($dbh, $sql, $query);
my $ctr;
my @row;
my @results;
my $str;
my ($i, $n);
my $combos = 0;
$db = "sitedata";
$host = "localhost";
$userid = "youruserid";
$pwd = "yourpwd";
$connectionInfo="dbi:mysql:$db;$host";
# Connect to the database. Gather all anagram bases with
# $minlen or more matches:
$dbh = DBI -> connect($connectionInfo,$userid,$pwd);
if (!$dbh) {
die("Unable to open database ($connectionInfo)\n\n");
}
$sql = "SELECT sorted, COUNT(sorted) FROM dictionary ";
$sql .= "WHERE (LENGTH(sorted)>=$minlen) GROUP BY sorted ";
$sql .= "HAVING COUNT(sorted) >= $mincount ";
$sql .= "ORDER BY COUNT(sorted) DESC, LENGTH(sorted) DESC";
$query = $dbh -> prepare ($sql);
if (!defined($query)) {
die("Unsuccessful preparing this SQL query:\n$sql\n\n");
}
$query -> execute;
$ctr = 0;
@results = ();
while (@row = $query -> fetchrow_array()) {
$results[$ctr][0] = $row[0];
$results[$ctr][1] = $row[1];
$combos += $results[$ctr][1];
$ctr++;
}
$query -> finish();
# Now loop through the list of bases, and print a listing:
print "Entry\tBase\tLength\tnAnswers\n";
for ($i = 0;$i < $ctr;$i++) {
$str = $results[$i][0];
$n = $results[$i][1];
print "" . ($i + 1) . "\t$str\t" . length($str) . "\t$n";
$sql = "SELECT word FROM dictionary WHERE sorted='$str' ";
$sql .= "ORDER BY word";
$query = $dbh -> prepare ($sql);
$query -> execute;
while (@row = $query -> fetchrow_array()) {
print “\t” . $row[0];
}
print “\n”;
$query -> finish();
}
print STDERR “Note that $combos unique words are “;
print STDERR “anagrams for other words.\n\n”;
}
# Gets just shy of 500 entries:
get_anagram_bases(4,3);


