October 21, 2008

The Anagram Dictionary

Filed under: Scripts, Books, Puzzles, Poetry — mark @ 7:57 am

Several months ago I found a little book called ‘Hidden Anagrams’ at Rodger’s Book Barn out in the Berkshires. (Published 1912 by Sturgis & Walton Company, New York … no author is listed). It contains a set of 100 rhymes, each of which is constructed to use a set of words (usually 3 or 4) that are all anagrams of each other. For example:

A little girl sat on the floor one day
And near her two ____ were busy at play.
But behold! when she ____ her eyes that way,
Their rude ____ filled her with such dismay
That she cried out “____!” and drove them away

The four blanks can be filled with the words ‘cats’, ‘cast’, ‘acts’, and ’scat’, all anagrams of the letters a, c, s, and t.

Okay, Shakespeare it ain’t. But I know one or two specialists in light verse, and thought it might be fun to attempt an updated version. My sample text has 103 verses (3 of the rhymes have two verses). The majority (80+%) use anagrams of 4, 5, or 6 letters. Five use 3-letter anagrams, 13 use 7-letter anagrams, and there are one each for 8- and 9-letter anagrams.

What makes the ‘Hidden Anagrams’ book surprising is that the author found this many anagram sets: there are 102 unique anagram letter sets (one repeat combo is found), and analysis of a set of a ‘common words’ dictionary with 100,000+ entries finds that there are less than 500 such sets available. The author covered an impressive subset of these without benefit of a computer.

To support development of a new book, an Anagram Dictionary would be useful. This is basically a recycling of the work done for the Jumble Solver. Instead of handing one entry at a time, however, we’d like to create a single-file reference that will list each unique combination of letters and the words that those letters can form.

We use the same database table developed for the Jumble work. In this table, the rows are the words from our list of 114,300 common words, and the fields we need are the ‘word’ itself, and its ’sorted’ equivalent: the sorted equivalent of a word is simply that word with its letters sorted alphabetically, so for example, the sorted version of ‘cats’ is ‘acst’. Every anagram of the letters ‘a’, ‘c’, ’s’, and ‘t’ will have the same sorted version of letters.

Knowing that, the following SQL query will retrieve every unique sorted combination of letters for words 3 characters in length or longer, that have at least 4 anagrams:

SELECT sorted, COUNT(sorted) FROM dictionary WHERE (LENGTH(sorted)>=3)
GROUP BY sorted HAVING COUNT(sorted) >= 4
ORDER BY COUNT(sorted) DESC, LENGTH(sorted) DESC

Once that list of unique ’sorted’ combos are identified, it is then a simple matter to loop through the list and extract all anagrams into a list. The following perl script will do the job, as long as your database table is ready to go.

For the poetically inclined, the output from the following script can be found here.

Now all I have to do is write the new book!

#!/usr/bin/perl -w
use strict;
use DBI;

our ($db, $host, $userid, $pwd, $connectionInfo);

sub get_anagram_bases {
    my ($mincount, $minlen) = @_;
    my ($dbh, $sql, $query);
    my $ctr;
    my @row;
    my @results;
    my $str;
    my ($i, $n);
    my $combos = 0;

    $db     = "sitedata";
    $host   = "localhost";
    $userid = "youruserid";
    $pwd    = "yourpwd";

    $connectionInfo="dbi:mysql:$db;$host";

    # Connect to the database.  Gather all anagram bases with
    # $minlen or more matches:
    $dbh = DBI -> connect($connectionInfo,$userid,$pwd);
    if (!$dbh) {
        die("Unable to open database ($connectionInfo)\n\n");
    }

    $sql  = "SELECT sorted, COUNT(sorted) FROM dictionary ";
    $sql .= "WHERE (LENGTH(sorted)>=$minlen) GROUP BY sorted ";
    $sql .= "HAVING COUNT(sorted) >= $mincount ";
    $sql .= "ORDER BY COUNT(sorted) DESC, LENGTH(sorted) DESC";

    $query = $dbh -> prepare ($sql);
    if (!defined($query)) {
        die("Unsuccessful preparing this SQL query:\n$sql\n\n");
    }

    $query -> execute;

    $ctr = 0;
    @results = ();
    while (@row = $query -> fetchrow_array()) {

        $results[$ctr][0] = $row[0];
        $results[$ctr][1] = $row[1];

        $combos += $results[$ctr][1];

        $ctr++;

    }

    $query -> finish();

    # Now loop through the list of bases, and print a listing:
    print "Entry\tBase\tLength\tnAnswers\n";
    for ($i = 0;$i < $ctr;$i++) {

        $str = $results[$i][0];
        $n   = $results[$i][1];

        print "" . ($i + 1) . "\t$str\t" . length($str) . "\t$n";
        $sql  = "SELECT word FROM dictionary WHERE sorted='$str' ";
        $sql .= "ORDER BY word";
        $query = $dbh -> prepare ($sql);
        $query -> execute;
        while (@row = $query -> fetchrow_array()) {
            print “\t” . $row[0];
        }
        print “\n”;
        $query -> finish();

    }

    print STDERR “Note that $combos unique words are “;
    print STDERR “anagrams for other words.\n\n”;
}

# Gets just shy of 500 entries:

get_anagram_bases(4,3);  

October 20, 2008

Web Development on a Mac

Filed under: Computing — mark @ 6:54 am

A little over a year ago Mom gave me an iMac as a belated graduation present, and it’s been a godsend. Besides providing me the escape hatch I needed to avoid ever having to work with Vista, just about anything worth doing on a computer has been easier on the Mac.

But the major exception has been LAMP Web development. Getting MySQL installed and conversing with both PHP and Perl (at the same time) has been a terrible experience. In searching for the magic combination of versions to allow creation of database-aware, scripted prototypes, I’ve managed to delete the default installation of Perl, break (and then lose) existing database schema, and have somehow created a permanent schism over where my localhost expects to find default HTML files.

I finally have the pre-installed Apache working with MySQL 5, pre-installed PHP (at least, I think it was pre-installed … at this point I honestly don’t remember), ActiveState Perl 5.8.8, and the DBD::mysql perl interface loaded via CPAN. Versions matter: (Perl 5.10, for example, is missing a lot of needed database modules, and MySQL 4 was having trouble talking to the on-board PHP.) I still don’t have the GD graphics library, which I need for one of my prototypes, but I just can’t face breaking everything again in an attempt to get it installed!

I have no doubt that I caused at least half of my own setup problems, but largely because I can find no coherent repository of support information for this particular application of Mac technology. I have an old no-name Intel box next to the Mac that is running Ubuntu Linux that is drop-dead simple-to-use by comparison.

The best find of the whole experience has been Komodo Edit, the free programming editor from ActiveState. This is available on Windows, Linux, and Mac platforms, has syntax support for a wide range of languages, and has an intuitive, tabbed interface that is well-executed and easy to use.

October 19, 2008

Zombie Six-Pack

Filed under: Observation — mark @ 7:58 pm

These things always emerge in groups (just like zombies) but perfect for the upcoming holiday:

First, I finally saw the Rodriguez/Tarantino zombie flick Grindhouse: Planet Terror, in my losing crusade to see all zombie movies. (Losing, because the flicks themselves seem to be generating zombie-like clones of themselves lately, which makes it tough enough … but the whole concept of the reanimated dead is such a disturbing concept for me that I can’t really handle more than one of these things a month.) Rose McGowan is very good. Rodriguez is one of my favorite directors and is very much on his game here. There is apparently some sort of industry rule now that Tom Savini has to make a cameo in every zombie flick ever.

Then, on Boing Boing courtesy Cory Doctorow, a link to the Thunder Panda site’s offering of the Zombiefie Six paper cut-outs. These guys are cutting edge on 21st century monsters.

Of course, browsing upwards in Boing Boing, one also sees this old item about the nutters at Homeland Security getting all jiggy about a kid who does a creative writing assignment about a zombie outbreak at a high school. The story dates from 2005, but man … I am not a bit surprised.

« Previous Page