Mining Words

Our local Second Grade class sends back a variety of word puzzles to exercise the kids' verbal skills. A favorite is a (usually seasonal) word, from which a child is asked "How many different words can you make from the letters in this word?"

So, for example, the word "bread" will yield: red, are, ear, dare, dear, read, bead, bar, bard, etc. There are more (39 in all, actually, but we'll get to that). We have fairly verbal kids, who now and then come up with more words than the "official" target. No wonder ...


Word:

aadderdradrederredareear
eradaredearreaddbabbaddabbebed
debabedbadebeadbrbarbrabardbraddrab
rebbredbarebearbraebaredbeardbreaddebar

31 letter combos (25 - 1)


39 dictionary words found.

How to Solve It

How would you get a list of all of the right answers? First, you have to figure out how to ennumerate the complete list of letter subsets found in the "parent" word, and then you have to figure out what words (if any) can be formed by rearranging the letters in any one subset. To generate an initial list of all possible letter subsets from the "parent" word, we can employ a clever hack involving binary numbers.

"There are 10 kinds of people in this world: those who can deal in binary and those who can't."

- Frank Clarke

Any word you are going to find involves using some subset of the letters in the "parent" word. Said another way, in any given subset, a letter from the parent word is used (1) or it isn't (0). That means that any given subset of letters can be represented as a string of 1s and 0s according to whether the letter in the parent word was used or not. Consider the word "east":

# east
binary
subset
1 east
0001
t
2 east
0010
s
3 east
0011
st
4 east
0100
a
5 east
0101
at
6 east
0110
as
7 east
0111
ast
8 east
1000
e
9 east
1001
et
10 east
1010
es
11 east
1011
est
12 east
1100
ea
13 east
1101
eat
14 east
1110
eas
15 east
1111
east

Map the letters you use to a "1" and any that you don't use to a "0", and the letter combo you use suddenly has a unique numeric "signature". For a word N characters long, there must be 2N - 1 possible combinations of letters (you subtract one since one of the combos is all 0s ... no letters used ... so it can't ever help). For a four letter word like "east", that's 24 - 1 = 16 - 1 = 15 combos, and the binary form of each number from 1 to 15 can be used as a map for which letters are included in the subset.

Many "parent" words will have repeating letters, so some of these binary letter subsets will boil down to the same set of letters (just originating from different places in the "parent" word), so we steal a trick from the Jumble Solver, and sort the letters in each subset, and only keep the list of unique combos.

Finally, we recognize each subset may represent more than one word, depending on how the letters are arranged (for example, the subset of "e", "a", and "t" can generate words "eat", "ate", and "tea"). But this is just the Jumble problem we've solved elsewhere, so we use that code to get the set of matches from the dictionary. The dictionary I use has over 100,000 words, so we'll get a lot of suspect matches (abbreviations, names, etc.), but the resulting output is a pretty good master list.

The latest challenge to come home from the 2nd Grade was to find the 39 words in the word "ornaments". Using this script, we can find 399! Time to start talking about extra credit ...

 

 


 
Copyright 2006 Mark Van Dine, All Rights Reserved