Notebook

Peter Norvig
January 2020

Boggle¶

Boggle Board

The word game Boggle (rules here) is a fun game to play with friends, but not so fun to play against a computer. It is all too simple for a computer to find all the valid words in a Boggle board (such as the one shown above) in less than a second. We'll see how to do that as a straightforward programming exercise. Then we'll consider a more challenging problem:

Inverse Boggle¶

The goal of Inverse Boggle is to find an arrangement of letters on a board that scores the most points. It is called Inverse Boggle because, instead of "give me a board and I'll find the words" it is "give me the word list and I'll find a board with lots of words."

It is not feasible to try all possible boards ($26^{5 \times 5} \approx 10^{35}$ possible $5 \times 5$ boards), so we will use hill-climbing, a heuristic search technique that does not guarantee finding an optimal solution.

The Boggle Board¶

I'll represent a board as a dict where the keys are (x, y) coordinates of squares, and the values are individual letters (except that QU occupies a single square). A board will also hold:

board.squares: a list of squares in row-major order.
board.neighbors: a dict of {square: neighbors}.
board.format: a string format method to print the board nicely using __repr__.

In [1]:

import random
import itertools

class Board(dict):
    """A board is a dict of {(x, y): L}."""
    def __init__(self, letters=None):
        letters = [('QU' if L == 'Q' else L) for L in letters if 'A' <= L <= 'Z']
        n = int(len(letters) ** 0.5)
        self.squares   = [(x, y) for y in range(n) for x in range(n)]
        self.neighbors = {s: neighbors(s, n) for s in self.squares}
        self.format    = ('\n'.join(['{:2s}' * n] * n)).format
        self.update(zip(self.squares, letters))
        
    def __repr__(self): 
        return self.format(*(self[s].capitalize() for s in self.squares))
    
def neighbors(square, n) -> list:
    """All the squares that neighbor a square."""
    (x, y) = square
    return [(x+dx, y+dy) for dx in (-1, 0, 1) for dy in (-1, 0, 1)
            if 0 <= x+dx < n and 0 <= y+dy < n and not (dx == dy == 0)]

In [2]:

board = Board('PUZZL WORDE BOGGL SEARC FINDH')
board

Out[2]:

P U Z Z L 
W O R D E 
B O G G L 
S E A R C 
F I N D H

The word list and scoring¶

We'll read in a word list, convert to uppercase, and keep all the words that are three letters or more. The variable WORDS holds the set of valid words. The function total_score computes the total score of a set of words. It does not take a board as an argument, and so it does not check if the words can actually be made on the board. It does check that the words are in the WORDS list of valid words.

In [3]:

! [ -e enable1.txt ] || curl -O http://norvig.com/ngrams/enable1.txt
! wc -w enable1.txt

  172820 enable1.txt

In [4]:

def total_score(found, scores=[0, 0, 0, 1, 1, 2, 3, 5] + [11] * 99) -> int:
    """The total score for the words found, according to the rules."""
    return sum([scores[len(w)] for w in found & WORDS])

WORDS = {w for w in open('enable1.txt').read().upper().split() if len(w) >= 3}
len(WORDS)

Out[4]:

Finding all the words in a board¶

The strategy for finding words:

A word could start at any of the squares on the board, so consider each one.
At each square, form a path consisting of just that square (for example, the top left square forms the path [(0, 0)]) and a prefix string consisting of the letters visited in the path; for this one-square path on board (the one with PUZZLE in it), the prefix string would be 'P'. On a 5 by 5 board we would have 25 initial paths of length one.
Now continue each path:
- If the prefix string is a word, record it: add it to the set found.
- If the prefix string is a prefix of some word in the dictionary, continue the path by visiting each neighboring square that has not already been visited in the path. For example, in board, the path [(0, 0)] with prefix 'P' would be continued to three neighbors:
  - [(0, 0), (1, 0)], 'PU': here 'PU' is a prefix of a word so continue this path.
  - [(0, 0), (0, 1)], 'PW': here 'PW' is not a prefix of any word so do not continue this path.
  - [(0, 0), (1, 1)], 'PO': here 'PO' is a prefix of a word so continue this path.
    - One continuation of 'PO' is as follows:
    - [(0, 0), (1, 1), (0, 1)], 'POW': here 'POW' is both a word and a prefix, so record it and continue.
We can precompute the set of all PREFIXES of all words in the word list.

In [5]:

def word_prefixes(words) -> set:
    "The set of all non-empty, non-full-word prefixes of each word in a word list."
    return {word[:i] for word in words for i in range(1, len(word))}

PREFIXES = word_prefixes(WORDS) # Precompute this once.

def find_words(board, words=WORDS, prefixes=PREFIXES):
    """Find all words in a Boggle board by recursively continuing paths that are prefixes of words."""
    found = set() # Accumulate the words found in the set `found`
    def continue_path(path, prefix):
        if prefix in words:
            found.add(prefix)
        if prefix in prefixes:
            for s in board.neighbors[path[-1]]:
                if s not in path:
                    continue_path(path + [s], prefix + board[s])
    # Body of find_words: Start paths from each of the squares on the board
    for s in board.squares:
        continue_path([s], board[s])
    return found

How many words can we find? And how long does it take?

In [6]:

%time len(find_words(board))

CPU times: user 4.39 ms, sys: 43 µs, total: 4.43 ms
Wall time: 4.5 ms

Out[6]:

Can we see the words and summarize them?

In [7]:

import textwrap

def report(board, verbose=True):
    found = find_words(board, WORDS, PREFIXES)
    if verbose:
        print(f'{len(found)} words found:') 
        print(textwrap.fill(' '.join(sorted(found)), width=90))
    print(f'Score of {total_score(found)} for {len(found)} words on board:\n{board}')
    
report(board)

252 words found:
AGE AGED AGES AGGRO AGGROS AGO AIN AIS AND ANE ANES ANI ANIS ANISE ARC ARCH ARGLE ARGLED
BEAD BEAGLE BEAN BEAR BEARD BEG BEGAN BEGGAR BEGGED BEGROAN BEN BEND BOA BOAR BOARD BOG
BOGAN BOGGED BOGGLE BOGGLED BOO BOOR BOOS BOP BORDEL BOS BOURG BOW CRAG CRAGGED CRANE
CRANES DAG DAGGLE DAGGLED DAGO DAGOES DAGOS DAIS DARN DEGAGE DEL DRAG DRAGGED DRAGGLE
DRAGGLED DRAIN DROOP DROP EAGLE EAR EARL EARN EDGE EDGES EFS EGAD EGG EGGAR EGGED EGO EGOS
ELD END ENDARCH ENRAGE ENRAGED EOSIN FEAR FEN FENAGLE FENAGLED FEND FIAR FIE FIEND FIN
FINAGLE FINAGLED FIND FINE FINES GAD GAE GAEN GAES GAG GAGE GAGED GAGES GAIN GAN GANE
GANEF GANEFS GAR GARGLE GARGLED GARNI GEAR GED GEL GELD GEN GLED GOA GOAD GOB GOBO GOBOES
GOBOS GOBS GOES GOO GOOP GOOS GOOSE GOR GORGE GORGED GOURD GOURDE GRAD GRAIN GRAN GRAND
GROAN GROG GROUP GROW IFS INARCH LED LEDGE LEDGES LEG LEZ NAE NAG NAGGED NAIF NAIFS NAOS
NARC NARD NEAR NEB NEBS NEIF NEIFS OAR OBE OBES OBOE OBOES OES ORGAN ORGANISE ORZO OSE OUR
POOR POROSE POUR POW PUR PURGE PURGED PURGES PUZZLE PUZZLED RAD RAG RAGE RAGED RAGES
RAGGED RAGGLE RAIN RAISE RAN RAND RANI RANIS ROAD ROAN ROAR ROB ROBE ROBES ROBS ROE ROES
ROOSE ROSE ROSIN ROUP ROW SEA SEAR SEARCH SEG SEGGAR SEGO SEI SEIF SEN SEND SIN SINE SOAR
SOB SOGGED SORD SORGO SOW UPO URD URGE URGED URGES WOAD WOE WOES WOG WOO WOOS WOP WORD WOS
WURZEL ZED ZOO ZOOS
Score of 449 for 252 words on board:
P U Z Z L 
W O R D E 
B O G G L 
S E A R C 
F I N D H

Let's test that the Q works, and that a 4x4 board works:

In [8]:

report(Board('QEST ANSI XWEO YRSN'))

125 words found:
ANE ANES ANEW ANSWER ANSWERS AWE AWES AWN AWNS AWRY ENOSIS ENS EON EONS ERS ESES ESS ESSES
ION IONS ITS NAE NAW NEIST NENE NEON NEONS NESS NESSES NEST NESTS NEW NEWNESS NEWS NOES
NOESIS NOISE NOISES NOS NOSE NOSES NOSIER OES ONE ONENESS ONERY ONES ONS OSE OSES OSIER
OSIERS QUA QUEAN QUEANS QUEST QUESTION QUESTIONER QUESTIONERS QUESTIONS QUESTS REI REIS
RENEST RENESTS RES RESIST REST RESTS REWAN REWAX SEA SEI SEIS SEISE SEISES SEN SENE SENSE
SENSES SER SERS SESSION SEW SEWAN SEWANS SEWN SEWS SIS SISES SIT SITS SNAW SNAWS SON SONE
SONES SONS SOS STIES SWAN SWANS TIE TIER TIERS TIES TIS WAE WAENESS WAES WAN WANE WANES
WANS WAX WAXY WEN WENS WEST WESTS WREN WRENS WREST WRESTS WRY
Score of 245 for 125 words on board:
QuE S T 
A N S I 
X W E O 
Y R S N

Inverse Boggle¶

I'll tackle the Inverse Boggle problem with a hill-climbing approach:

Start with some board.
Make a random change to some letter on the board.
Evaluate the score of the new board; if it is the best score so far, keep it.
If not, revert the change.
Repeat for a set number of iterations

In [9]:

def boggle_hill_climbing(board, words=WORDS, prefixes=PREFIXES, repeat=500):
    """Solve Inverse Boggle by hill-climbing: find a high-scoring board by
    starting with one and changing it."""
    board = Board(board[s] for s in board.squares) # Copy board, so we don't mutate original
    best_score = total_score(find_words(board, words, prefixes))
    for _ in range(repeat):
        s, old_letter = mutate_boggle(board)
        new_score = total_score(find_words(board, words, prefixes))
        if new_score >= best_score:
            best_score = new_score
        else:
            board[s] = old_letter  # Change back
    return board

# The distribution of letters in the 16-cube version of the game
LETTERS = 3 * 'AABCDEEEGHIILMNOOPRSTUY' + 'AADEFFIJKKLLNNQRSSTTUVVWWXZ'

def mutate_boggle(board, letters=LETTERS):
    """Make a random change to one letter in the board"""
    s = random.choice(board.squares)
    old_letter = board[s]
    board[s] = random.choice(letters)
    return s, old_letter

def random_board(n=5, letters=LETTERS) -> Board:
    """Return a random Boggle board."""
    return Board(random.sample(letters, n * n))

Let's generate a random board and see if we can improve it:

In [10]:

board2 = random_board(5)
report(board2)

177 words found:
ABIOTIC ABO ALE ALOOF ALT ALTO ATT ATTIC AUK BABE BABOO BABOOL BABU BAKE BAL BALE BALK BAT
BATT BAUBEE BEE BEEP BIO BIOTIC BIPOD BOA BOAT BOD BOLA BOLE BOLT BOO BOOT BOP BOT BOTA
BOTT BOTTLE BOY BUB BUBAL BUBALE BUBO BUOY CITOLA CITOLE DIOL DIT DITA DITTO DOPY DOT
DOTTLE EAT EAU ELK FIB FOB FOOL FOOT FOOTBOY FOOTLE FOP HEP HIP HOB HOBO HOE HOOD HOOF
HOOP HOOT HOP HOPE HOY HYP HYPO IODIC IOTA KAB KAE KALE KAT KEA KELOID KLOOF KUE LAB LAKE
LAT LATI LEA LEAK LEK LEKU LOB LOBO LOO LOOF LOOP LOOPY LOOT LOT LOTA LOTI LOTIC LOTTO
LOUP LOUPE OAK OAT OBI OBOE OBOL OBOLE ODIC OFT OLE OLEA OOH OOT OOTID OPE OTIC OTTO OUPH
OUPHE PEE PEH PHI PIBAL POD POH POI POOD POOF POOH POOL POOP POT POTBOY POTTLE POTTO PUB
PUKE TAB TABOO TABU TAE TAEL TAKE TALE TALK TAO TAU TAUPE TIC TIT TITLE TOD TOIT TOLA TOLE
TOO TOOL TOOT TOOTLE TOP TOPI TOT TOTAL TOUPEE UKE UPO YIP YOB YOU
Score of 234 for 177 words on board:
C I T L E 
D T O A K 
F O B U B 
P I O P E 
I Y H E U

In [11]:

report(boggle_hill_climbing(board2), False)

Score of 5782 for 1577 words on board:
V I S E N 
D E T T S 
R A L A R 
S I M E C 
E D N S H

Impressive! We got roughly a ten-fold improvement in score after 500 repetitions. We can do the same with our original board:

In [12]:

report(board)

252 words found:
AGE AGED AGES AGGRO AGGROS AGO AIN AIS AND ANE ANES ANI ANIS ANISE ARC ARCH ARGLE ARGLED
BEAD BEAGLE BEAN BEAR BEARD BEG BEGAN BEGGAR BEGGED BEGROAN BEN BEND BOA BOAR BOARD BOG
BOGAN BOGGED BOGGLE BOGGLED BOO BOOR BOOS BOP BORDEL BOS BOURG BOW CRAG CRAGGED CRANE
CRANES DAG DAGGLE DAGGLED DAGO DAGOES DAGOS DAIS DARN DEGAGE DEL DRAG DRAGGED DRAGGLE
DRAGGLED DRAIN DROOP DROP EAGLE EAR EARL EARN EDGE EDGES EFS EGAD EGG EGGAR EGGED EGO EGOS
ELD END ENDARCH ENRAGE ENRAGED EOSIN FEAR FEN FENAGLE FENAGLED FEND FIAR FIE FIEND FIN
FINAGLE FINAGLED FIND FINE FINES GAD GAE GAEN GAES GAG GAGE GAGED GAGES GAIN GAN GANE
GANEF GANEFS GAR GARGLE GARGLED GARNI GEAR GED GEL GELD GEN GLED GOA GOAD GOB GOBO GOBOES
GOBOS GOBS GOES GOO GOOP GOOS GOOSE GOR GORGE GORGED GOURD GOURDE GRAD GRAIN GRAN GRAND
GROAN GROG GROUP GROW IFS INARCH LED LEDGE LEDGES LEG LEZ NAE NAG NAGGED NAIF NAIFS NAOS
NARC NARD NEAR NEB NEBS NEIF NEIFS OAR OBE OBES OBOE OBOES OES ORGAN ORGANISE ORZO OSE OUR
POOR POROSE POUR POW PUR PURGE PURGED PURGES PUZZLE PUZZLED RAD RAG RAGE RAGED RAGES
RAGGED RAGGLE RAIN RAISE RAN RAND RANI RANIS ROAD ROAN ROAR ROB ROBE ROBES ROBS ROE ROES
ROOSE ROSE ROSIN ROUP ROW SEA SEAR SEARCH SEG SEGGAR SEGO SEI SEIF SEN SEND SIN SINE SOAR
SOB SOGGED SORD SORGO SOW UPO URD URGE URGED URGES WOAD WOE WOES WOG WOO WOOS WOP WORD WOS
WURZEL ZED ZOO ZOOS
Score of 449 for 252 words on board:
P U Z Z L 
W O R D E 
B O G G L 
S E A R C 
F I N D H

In [13]:

report(boggle_hill_climbing(board), False)  

Score of 5690 for 1511 words on board:
L P C A P 
N A R O M 
D E T I L 
S E S E N 
R T N R G

Again, roughly a ten-fold improvement.

Now, let's start from a very high-scoring board, identified by Justin Boyan in his Ph.D. thesis, and see if we can improve it:

In [14]:

boyan = Board('RSTCS DEIAE GNLRP EATES MSSID')
report(boyan, False)

Score of 10112 for 2290 words on board:
R S T C S 
D E I A E 
G N L R P 
E A T E S 
M S S I D

In [15]:

report(boggle_hill_climbing(boyan), False)

Score of 10112 for 2290 words on board:
R S T C S 
D E I A E 
G N L R P 
E A T E S 
M S S I D

Sadly, we were not able to make an improvement in 500 repetitions. But that certainly is no guarantee that boyan is the best possible board. Here are some things to try to find a better board; maybe you can implement some of them, or try some ideas of your own:

Genetic algorithms: We used mutation of a single board, but we could also consider crossover where we keep a pool of boards and take the first half of one board and combine it with the second half of another.
Swaps: We changed one letter at a time. But maybe there is no change of one letter that will improve a board, but there is a change involving two squares, either swapping them or mutating both of them.
Incremental score calculation: We modified just one square and then tried to find all the words from scratch. Would it be faster to keep track of which squares contributed to which words, and only re-do the calculations for the one changed square? This would probably involve infixes of words rather than prefixes. Perhaps by keeping track of what each square contributes, we can make a better choice of which square to mutate.
Random restarts: When is it best to continue searching from the current board, versus starting over from a new board?