Making Cryptograms

Part I: Making Cryptograms



Using a computer to solve single-letter substitution codes quickly and reliably is a problem that fascinates me, and I have yet to find a wholly satisfying solution. Never mind that the basic mechanics of doing this well are hundreds of years old or that the there is little utility involved (present-day encryption techniques leverage number theory and effectively unbreakable).

This essay sums up my first assult on the problem and provides the first of a few Web tools I've built for the weekend codebreaker.


18
Unique letters used in this message.
Plaintext
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
Encoded
l
n
g
j
q
_
o
u
m
_
_
_
r
f
h
d
_
p
e
y
s
_
_
c
i
_
 
Encoded Message:
yidq l rqeeloq yh qfghjq mf yume nhc, yuqf dpqee yuq "ghjq my!" nsyyhf.

The Puzzle :

There are all sorts of cryptograms (coded messages), but the sort I'm referring to in this article are single letter Substitution Codes. Letters of the alphabet are paired up with another letter (or symbol), chosen at random, to be substituted for the actual letter when the message is encoded. In the attempt to build an algorithm that can reliably solve such problems, I always assume one-to-one encoding (meaning one, and only one, encoded character represents its decoded counterpart) and I just encode messages with alphabet letters. Sometimes the rule is observed that a letter is never encoded with itself, but I find this rule a little superfluous.

The American Cryptogram Association calls this sort of a puzzle an "aristocrat", in which the word boundaries are the same as in the decrypted message (in a "patristocrat", the characters are presented in even groups of five, and it is a big challenge to figure out where words start and end.)

In the form above, I provide the means to enter a plaintext message and then create a random puzzle key and encrypted version of that message.

If you know the "key", or mapping of the alphabet used to create it, decrypting the message is easy. The popular newspaper cryptogram puzzles want you to deduce the key ... figuring it out from patterns in the words or letter frequency.

One famous passage for decrypting such a code occurs in Edgar Allan Poe's short story, "The Gold Bug", in which the protagonist, William Legrand, must decode a secret message in order to find a hidden treasure. (Legrand's message used a different symbol to mark word divisions, and it employed an array of typesetter's punctuation which I doubt that the purported author, Captain Kidd, was likely to have used, but the overall idea of the secret message is the same as a single letter substitution code.)

Legrand solves this puzzle by analyzing the relative frequency of the symbols in the message and trying to match those up with a similar distribution of letters from common written English (in the story Legrand lists the letters according to frequency in common usage as e (most frequent) followed by a, o, i, d, h, n, r, s, t, u, y, c, f, g, l, m, w, b, k, p, q, x, and z ... apparently the letters 'j' and 'v' were not popular in the mid-19th century, at least when discussing late 17th century pirate communiqués.) With a little success at this, he makes a few more educated guesses, like puzzling out the identity of the codes for 't' and 'h' by looking for possible instances of the word 'the' in the message, based on his frequency-based guess for which symbol represented 'e'.

The critical element of this approach is, however, a human brain, not only able to conduct this very practical sort of rule-based analysis, but also armed with a well-practiced sense of word usage, common syntax, and idiom. My initial goal was to instead find a way to solve a cryptogram, programmatically, without a rule-base to compensate for the human intuition that aids solution of such puzzles.

In the next 'Cracking the Code' installment, we'll look at one approach for trying to solve a puzzle like this.