February 22, 2006

Wrong Division

Filed under: Scripts, Puzzles — mark @ 4:21 pm

I am fascinated by web statistics … the information that you can pull out of the server logs that give you some idea of the usage patterns on your site. After several years of working on PC software and its endless marketing wars over features you could never actually convince yourself that anybody cared about, an opportunity to build applications on a platform that actually recorded what got used and when was an intoxicating invitation.

So I often find myself digging down into my web stats to find out how (sometimes if) people use my Puzzle applications. I have learned a few things:
 
 

-   I am helping a Very Large number of people turn the tables on the Jumble. Seriously. Hundreds a day sometimes. This surprises me, since the page was really just intended to implement Jon Bentley’s nifty algorithm for unscrambling words … the Puzzle itself is perfectly suited to the playful side of the human mind … devoting algorithm design time to the problem is only useful if it helps to teach how to think through trickier problems.
-   Scramble Squares puzzles are Very Popular although almost Nobody wants to deal with a puzzle solver that requires them to do any work. Users either get their answer on the first or second try, or else they frantically tweak their initial entries twenty different ways, and then email me to ask if I have the answer to their puzzle. In other words, teach a man to fish, and he’ll sadly shake his head and then ask you for a fish.
-   Cryptograms. My Solver works with what the American Cryptogram Association (ACA) calls ‘aristocrats’, which are simple letter-substitution encodings of messages that otherwise maintain the word divisions and punctuation of the original message. In their bimonthly magazine “The Cryptogram” the ACA will publish 25 Aristocrats, and the Solver (based in PHP) will usually manage to solve about half of the set … the ACA cons are submitted by members, many of whom are well onto what makes for a difficult puzzle, and the Solver will bail on anything that threatens to consume too much processing time. Its Perl-based big brother will handle them all, sometimes with a little hand-holding, so let me know if you’re stumped.

People using the Solver are often seized with the desire to feed it with a stream of hundreds of random characters from their keyboards (without spaces or other word divisions) looking for, I don’t know, messages from the Other Side expressed as a sort of systematically encoded Automatic Writing?

Others are fond of submitting single encoded words with the implicit question of (I guess) ‘What word does it encode?’ The explicit answer is ‘What word doesn’t it encode?’ …
after accounting for word length and repeated characters, the answer will usually be any one of thousands of alternatives!

Lately, someone has been sending in lots of ‘Patristocrats’. These puzzles are encoded like Aristos, but all punctuation and natural word divisions have been removed. For example, a quote from Douglas Adams, encoded as an Aristo:

gwdcd rp v gwdecq swriw pgvgdp gwvg rz dydc vxqjelq
lrpieydcp dhvigaq swvg gwd kxrydcpd rp zec vxl swq rg rp
wdcd, rg sraa rxpgvxgaq lrpvuudvc vxl jd cduavidl jq
petdgwrxb dydx tecd jrfvccd vxl rxdhuarivjad. gwdcd
rp vxegwdc gwdecq swriw pgvgdp gwvg gwrp wvp vacdvlq
wvuudxdl

The Solver will work through this almost instantly ("There is a theory which states that if ever anybody discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened.") But a Patristocrat version of the same puzzle lacks any punctuation, and groups the letters in even groups:

gwdcd rpvgw decqs wriwp gvgdp gwvgr zdydc vxqje
lqlrp ieydc pdhvi gaqsw vggwd kxryd cpdrp zecvx
lswqr grpwd cdrgs raarx pgvxg aqlrp vuudv cvxlj
dcdua vidlj qpetd gwrxb dydxt ecdjr fvccd vxlrx
dhuar ivjad gwdcd rpvxe gwdcg wdecq swriw pgvgd
pgwvg gwrpw vpvac dvlqw vuudx dlabc

The Solver will look at this a long time with the single computer-brain thought, "Man! That’s a lot of 5-letter words!"

Although there are crypto-wonks out there who will giggle at my naivete (weirdos), I have thought about this a lot and I just have not come up with a fast way to solve Patristocrats. If the message is long enough, basic letter-frequency analysis will get you most of the way there (figure out the first-, second-, and third-most-used letters and assign these to ‘e’, ‘t’, and ‘a’, and you’re off to the races), but otherwise the words do very little to reveal themselves.

February 20, 2006

Delenda est Carthago

Filed under: Observation — mark @ 10:14 am

I couldn’t help but think that a great opportunity was missed at the opening of the Winter games in Turin, Italy.

Although we were repeatedly reminded of the bizarre cultural atmosphere created by the current Empire of the USA (with teams from all countries entering to the classic musical compositions of “I Shall Survive” and “YMCA” among others), I thought the Italians, who still hold the record for Longest Running Empire (winning Gold for the Roman, and having coached the Byzantine) could have taken more credit.

In particular, at the end of the opening parade, I would have favored seeing an elderly gentleman dressed as Cato the Elder, holding a banner that says, “And Carthage Must Be Destroyed.” Of course, it has been pointed out to me that poorly considered attempts at humor do not go over well in some parts of the world, and Tunisia may be one of those places.

February 1, 2006

A Real Search for Fake Data

Filed under: Analysis, Observation — mark @ 9:37 am

Add to your collection of nifty mathematical insights, Benford’s Law, which states that in a suprising diversity of data collections, the integer digits represented in that data will NOT be uniformly distributed. Or more exactly, they won’t be uniformly distributed “positionally” … that a “1″ is far more likely going to be the initial digit of a number in a data set than will be, say, a “9″.

Reg sent a pointer to an NYT article from 1998 that mentions this. (A mathematician friend notes what that the article does not … that Benford’s law is generally held to be correct when the sample size is large enough.)

The most interesting practical application of such a law is in constructing tests for Fake Data (let all fillers-out of Expense Reports be forewarned) … that a data set could be quickly analyzed to see if the data used digits well outside a distribution that Benford’s Law might suggest.

… or the corollary might be, with an understanding of Benford’s Law, we now have a market opportunity for tools that create accurate fake data that conforms to Benford’s Law plus or minus epsilon.

We’ll be rich.