Ghost¶

According to [Wikipedia](https://en.wikipedia.org/wiki/Ghost_(game);):

Ghost is a written or spoken word game in which players take turns adding letters to a growing word fragment, trying not to be the one to complete a valid word. Each fragment must be the beginning of an actual word, and usually some minimum is set on the length of a word that counts, such as three or four letters. The player who completes a word loses.

I'd like to create a program to allow any two players (human or computer) to play the game, and I'd like to figure out who wins if both players play optimally. The concepts I will need to define, and my implementation choices, are as follows:

Words: I will read a standard online word list, enable1, and make a set of all the words of sufficient length.
Fragment: a fragment is a str of letters, such as 'gho'.
Beginning: each word has a set of valid beginnings: for ghost it is {'', g, gh, gho, ghos, ghost}. "Prefix" is a synonym of "beginning".
Vocabulary: A Vocabulary object holds a set of all the words in a dictionary, as well as all the valid fragments (beginnings) of the words.
Player: The first player will be called player 0; the second player 1.
Play: A play is a new fragment formed by adding one letter to the end of the existing fragment.
Legal Play: A play that is a valid prefix of some word. enable1.legal_plays('gho') = {'ghos, 'ghou'}.
Strategy: A strategy is a function with signature strategy(vocab, fragment) -> play.
Game: play_game(vocab, *strategies) plays a game between two (or more) player strategies.

Vocabulary: Words, Fragments, Legal Plays, and `enable1`¶

Vocabulary(words) takes a collection of words as input, stores the words as a set, and also stores all the legal fragments of those words. legal_plays(fragments) gives a set of all plays that can be formed by adding a letter to create a legal word fragment. I also define the function words to split any string into component words.

In [1]:

class Vocabulary:
    "Holds a set of legal words and a set of legal prefix fragments of those words."
    def __init__(self, words, minlength=3):
        self.words = {word for word in words if len(word) >= minlength}
        self.fragments = {word[:i] for word in self.words for i in range(len(word) + 1)}
        
    def legal_plays(self, fragment): 
        return {fragment + L for L in alphabet} & self.fragments     
    
alphabet = 'abcdefghijklmnopqrstuvwxyz'
    
words = str.split # Function to split a str into words

Here is a small example with a vocabulary of just three words:

In [2]:

v = Vocabulary(words('game ghost ghoul'))

v.words

Out[2]:

{'game', 'ghost', 'ghoul'}

In [3]:

v.fragments

Out[3]:

{'', 'g', 'ga', 'gam', 'game', 'gh', 'gho', 'ghos', 'ghost', 'ghou', 'ghoul'}

And here is a large vocabulary, from a standard online Scrabble™ word list known as enable1:

In [4]:

! [ -e enable1.txt ] || curl -O http://norvig.com/ngrams/enable1.txt

In [5]:

enable1 = Vocabulary(words(open('enable1.txt').read()))

Let's explore enable1:

In [6]:

len(enable1.words), len(enable1.fragments), max(enable1.words, key=len)

Out[6]:

(172724, 387878, 'ethylenediaminetetraacetates')

In [7]:

enable1.legal_plays('gho')

Out[7]:

{'ghos', 'ghou'}

In [8]:

enable1.legal_plays('ew')

Out[8]:

{'ewe'}

In [9]:

enable1.legal_plays('th')

Out[9]:

{'tha', 'the', 'thi', 'tho', 'thr', 'thu', 'thw', 'thy'}

Players and Winners¶

The first player is 0 and the second player is 1. These names are convenient because:

During the course of the game, the player whose turn it is to play next is always the length of the current fragment mod 2.
When the game ends, the winning player is the length of the current fragment mod 2.

In [10]:

to_play = winner = lambda fragment: len(fragment) % 2

Who Wins?¶

Who wins a game if both players play rationally? To answer that, consider the general situation at some point during the game where a player is presented with a fragment. That player can win if either:

The fragment is a complete word (meaning the other player just formed the word, and thereby lost).
The fragment is not a legal fragment (meaning the other player made a mistake, and thereby lost).
At least one of the legal plays in this position puts the opponent in a position from which they cannot win.

The function win(vocab, fragment) implements this. It returns a winning fragment if there is one, otherwise False. In particular, it returns fragment if the current player has already won (because fragment is a word or illegal fragment), and it returns one of the legal plays if that play leads to a position from which the opponent cannot win.

In [11]:

def win(vocab, fragment=''):
    """Does the current player have a forced win? 
    If so, return a play (or the current fragment) that leads to a win."""
    if fragment in vocab.words or fragment not in vocab.fragments:
        return fragment
    for play in vocab.legal_plays(fragment):
        if not win(vocab, play):
            return play 
    return False

Let's test win to gain some confidence that we got it right:

In [12]:

# No winning play because all words have odd number of letters.
win(Vocabulary(words('cat camel gecko'))) 

Out[12]:

False

In [13]:

# 'g' is a winning play (but 'c' would be a loser)
win(Vocabulary(words('cat camel goat gerbil'))) 

Out[13]:

'g'

In [14]:

# No winning play; doomed to 'camel' or 'gar'
win(Vocabulary(words('cat camel goat gecko gerbil gar'))) 

Out[14]:

False

In [15]:

# 'g' wins because 'ga' can be answered with 'gan' and 'ge' with 'ge'
win(Vocabulary(words('cat camel goat gecko gerbil gar gannet'))) 

Out[15]:

'g'

TL;DR: The Answer¶

Can the first player win with the enable1 vocabulary?

In [16]:

win(enable1)

Out[16]:

False

No. The game is a win for the second player, not the first. This agrees with xkcd's Randall Monroe, who says "I hear if you use the Scrabble wordlist, it’s always a win for the second player."

But ... Wikipedia says that the minimum word length can be "three or four letters." In enable1 the limit was three; let's try again with a limit of four:

In [17]:

enable1_4 = Vocabulary(enable1.words, 4)

win(enable1_4)

Out[17]:

'n'

Yes. The first player can win with this vocabulary, by playing 'n' first (and there might be other first plays that also force a win). It makes sense that it is easier for the first player to win, because we've eliminated a bunch of three-letter words from the vocabulary, all of which are losers for the first player. So here's a good meta-strategy: Say "Hey, let's play a game of Ghost. We can use the enable1 word list. Would you like the limit to be 3 or 4 letters?" Then if your opponent says three (or four) you can say "OK, since you decided that, I'll decide to go second (or first)."

Playing the Game: Strategies¶

We define a strategy as a function that is given a vocabulary and a fragment as arguments, and returns a legal play. Below we define rational, a strategy that wins whenever it is possible to do so and plays randomly otherwise, and ask (a strategy factory that returns a strategy that, when called, will ask the named person to input a fragment).

In [18]:

import random

def rational(vocab, fragment): 
    "Select a play that makes opponent not win (if possible), otherwise a random play."
    return win(vocab, fragment) or random.choice(list(vocab.legal_plays(fragment)))

def ask(name):
    "Return a strategy that asks for the next letter."
    return lambda _, fragment: input('Player ' + name + ' given "' + fragment + '" plays? ')

Here is a function to play a game. You give it a vocabulary, two (or possibly more) strategies, and optionally a verbose keyword to say if you want a line printed for each play or not.

In [19]:

from itertools import cycle

def play(vocab, *strategies, verbose=False):
    "Return (winner, final_fragment) for a game of Ghost between these strategies."
    fragment = ''
    for (p, strategy) in cycle(enumerate(strategies)): # p is the player number
        play = strategy(vocab, fragment)
        if verbose:
            print('Player {}, given "{}", plays "{}".'.format(p, fragment, play))
        if play not in vocab.legal_plays(fragment):
            return (winner(fragment + '?'), play) # Player loses for making an illegal play
        elif play in vocab.words:
            return (winner(play), play)           # Player loses for making a word
        else:
            fragment = play                       # Keep playing

In [20]:

play(enable1, rational, rational) 

Out[20]:

(1, 'odyssey')

In [21]:

# Does player 1 win every time?

[play(enable1, rational, rational, verbose=False) for _ in range(20)]

Out[21]:

[(1, 'truancies'),
 (1, 'xebec'),
 (1, 'ods'),
 (1, 'vying'),
 (1, 'ewe'),
 (1, 'bwana'),
 (1, 'vying'),
 (1, 'knell'),
 (1, 'zloty'),
 (1, 'flirt'),
 (1, 'zlote'),
 (1, 'bwana'),
 (1, 'gjetost'),
 (1, 'pliancies'),
 (1, 'flounce'),
 (1, 'ufologist'),
 (1, 'flour'),
 (1, 'dweeb'),
 (1, 'flirt'),
 (1, 'ignatia')]

In [22]:

play(enable1_4, rational, rational)

Out[22]:

(0, 'nyctalopia')

In [23]:

play(enable1, ask('P'), rational)

Player P given "" plays? d
Player P given "dw" plays? dwa
Player P given "dwar" plays? dwarv
Player P given "dwarve" plays? dwarves

Out[23]:

(1, 'dwarves')

Minimizing Possible Outcomes¶

Now we know how to play perfectly, if we have a computer handy to execute the strategy. But can we summarize the strategy into a form that is small enough that a human can memorize it? I will define the function outcomes(vocab, fragment, player) to return a set of all the words that are possible outcomes of a game, where the opponent can use any strategy whatsoever, but player uses a strategy that is:

Rational: plays towards a forced win whenever there is one.
Exploitive: otherwise tries to give the opponent an opportunity to make a mistake that can be exploited.
Minimizing: within the above constraints, returns the smallest possible set of words.

In [24]:

import textwrap 

def outcomes(vocab, fragment, player):
    "The smallest set of outcome words, if player tries to win, and make the set small."
    if fragment in vocab.words:
        return {fragment}
    else:
        cases = [outcomes(vocab, play, player) for play in vocab.legal_plays(fragment)]
        if to_play(fragment) == player: # Player picks the top priority case
            return min(cases, key=lambda words: priority(words, player))
        else:                           # Oher player could pick anything
            return set.union(*cases)
                       
def priority(words, player):
    """Return (lossiness, number_of_words, total_number_of_letters),
    where lossiness is 0 if no losses, 1 if mixed losses/wins, 2 if all losses.
    The idea is to find the list of outcome words that minimizes this triple."""
    lossiness = (0 if all(winner(word) == player for word in words) else
                 1 if any(winner(word) == player for word in words) else
                 2)    
    return (lossiness, len(words), sum(map(len, words)))

def report_outcomes(vocab, player):
    "Find minimal outcomes for player; print info."
    oc = outcomes(vocab, '', player)
    winners = {w for w in oc if winner(w) == player}
    def fill(label, words): 
        if words:
            text = '{} ({}): {}'.format(label, len(words), ' '.join(sorted(words)))
            print(textwrap.fill(text))
    print('Outcomes ({}) for player {}:'.format(len(oc), player))
    fill('Losers', oc - winners)
    fill('Winners', winners)

Minimizing Outcomes for Player 0¶

Let's see what minimal set of words player 0 can force the game into (with both vocabularies):

In [25]:

report_outcomes(enable1, 0)

Outcomes (6) for player 0:
Losers (1): qursh
Winners (5): qaid qiviut qoph qurush qwerty

Interesting! There are only 6 words; it wouldn't be hard for a human to memorize these. Then, when you are playing as player 0, pick 'q' first, and then try to steer the game to one of the 5 words with an even number of letters. Unfortunately, one word, 'qursh' (a monetary unit of Saudi Arabia), has an odd number of letters, which means that if the opponent replies to 'q' with 'qu' and to 'qur' with 'qurs', then player 0 will lose. But if the opponent makes any other responses, player 0 will win.

In [26]:

report_outcomes(enable1_4, 0)

Outcomes (7) for player 0:
Winners (7): naan nene ngultrum nirvanic nolo null nyctalopia

Neat! Only 7 words, and the first player can always win by forcing the opponent to one of these words.

Minimizing Outcomes for Player 1¶

Since player 0 can pick any letter, the minimal outcomes set for player 1 must be at least 26 words. Let's see how much bigger it turns out to be.

With enable1 we already know that player 1 can force a win, so all the words in the outcomes set will have odd length:

In [27]:

report_outcomes(enable1, 1)

Outcomes (55) for player 1:
Winners (55): aah aal aargh aas bwana cwm drave dreck drink droit
druid dry ewe fjeld fjord gjetost hmm ihram jnana kwashiorkor llano
mho nth oquassa prawn prequel prill pro prurigo pry qua quell quiff
quomodo qursh rhamnus rheum rhizoid rho rhumb rhyolitic squoosh
tchotchke uhlan vying wrath wrest wrist wrong wrung wry xanthic
xanthin yttrium zucchetto

Memorize this list and you will never lose as player 1.

How about with the other vocabulary?

In [28]:

report_outcomes(enable1_4, 1)

Outcomes (85) for player 1:
Losers (4): hybris hyphen hyte ngultrum
Winners (81): aquiculture aquifer aquilegia aquiver bwana cnidarian
drave dreck drink droit druid dryness eschatologies eschatology
escheat eserine esker esophagus esplanade esquire esquiring essay
estuarine estuary esurience fjeld fjord gjetost hyaenic hydatid hyena
hyenine hyenoid hygeist hying hylozoism hylozoist hymen hyoid hypha
hyraces hyrax hyson ihram jnana kwashiorkor llano mbira ngwee oquassa
plaza plethoric plica plonk pluck plyer quaff quell quiff quomodo
qursh rhamnus rheum rhizoid rhomb rhumb rhyolitic squoosh tchotchke
uhlan vying wrath wrest wrist wrong wrung wryly xanthic xanthin
yttrium zucchetto

In this case there are 85 words, four of which are losses for player 1. But the other 81 words are wins, so with this strategy you'd have a good chance against an imperfect opponent.

SuperGhost¶

In the variant SuperGhost, players can add a letter to either the beginning or the end of a fragment, as long as this forms a fragment that is part of some word. As Wikipedia says, given the fragment era, a player might play bera or erad. I was thinking of SuperGhost when I made the design decision to encapsulate legal_plays as a method of Vocabulary, rather than as a separate function. Because I did that, I should be able to use all the existing code if I just make a new class, SuperVocabulary, that finds all fragments (i.e. infixes) rather than just the beginning fragments (i.e. prefixes), and if I change legal_plays to add letters to both ends.

In [29]:

class SuperVocabulary(Vocabulary):
    "Holds a set of legal words and a set of legal infix fragments of those words."
    def __init__(self, words, minlength=3):
        self.words = {word for word in words if len(word) >= minlength}
        self.fragments = {word[i:j] for word in self.words 
                                    for i in range(len(word)) 
                                    for j in range(i, len(word) + 1)}
        
    def legal_plays(self, fragment):
        "All plays (adding a letter to fragment) that form a valid infix."
        return {p for L in alphabet for p in (fragment + L, L + fragment)} & self.fragments

Now I will create SuperVocabulary objects for 3- and 4-letter versions of enable1, and check out how many fragments there are in each variant:

In [30]:

enable1_3super = SuperVocabulary(enable1.words, 3)
enable1_4super = SuperVocabulary(enable1.words, 4)

In [31]:

# Example legal plays
enable1_3super.legal_plays('crop')

Out[31]:

{'acrop',
 'cropa',
 'crope',
 'croph',
 'cropi',
 'cropl',
 'cropo',
 'cropp',
 'cropr',
 'crops',
 'cropt',
 'cropu',
 'cropy',
 'ecrop',
 'icrop',
 'ncrop',
 'rcrop',
 'tcrop'}

In [32]:

# Can the first player win in SuperGhost with 3-letter words?

win(enable1_3super)

Out[32]:

'u'

In [33]:

# How about with a 4-letter limit?

win(enable1_4super)

Out[33]:

'u'

The first player can win with or without three-letter words. And unless the first player is perfect, the rational strategy can do pretty well as seond player as well. Here is a sample game:

In [34]:

play(enable1_3super, ask('P'), rational)

Player P given "" plays? q
Player P given "cq" plays? cqu
Player P given "cque" plays? acque
Player P given "acques" plays? acquest

Out[34]:

(1, 'acquest')

I would like to give a concise summary of the strategy for SuperGhost, but my existing outcomes function won't do it. That's because it is not enough to know that a particular word results in a win; we have to know in what order the letters of the word are added. I'll leave it as an exercise to find a good way to summarize SuperGhost strategies.

SuperDuperGhost¶

In the variant SuperDuperGhost, players have an option to reverse the fragment before adding a letter to the beginning or end. As Wikipedia says, given the fragment era, a player might play bera, erad, nare, or aren. Wikipedia is not clear, but I interpret this as meaning that the fragment played must still be a fragment of a word (not a reversed fragment of a word). Again, all we need is a new subclass:

In [35]:

class SuperDuperVocabulary(SuperVocabulary):
    "Holds a set of legal words and a set of legal infix fragments of those words."
        
    def legal_plays(self, fragment):
        "All plays that form a valid infix; optionally reverse fragment first."
        def all4(f, L): return (f + L, f[::-1] + L, L + f, L + f[::-1])
        return {p for L in alphabet for p in all4(fragment, L)} & self.fragments

In [36]:

enable1_3duper = SuperDuperVocabulary(enable1.words)
enable1_4duper = SuperDuperVocabulary(enable1.words, 4)

In [37]:

# Example legal plays
enable1_3duper.legal_plays('crop')

Out[37]:

{'acrop',
 'cropa',
 'crope',
 'croph',
 'cropi',
 'cropl',
 'cropo',
 'cropp',
 'cropr',
 'crops',
 'cropt',
 'cropu',
 'cropy',
 'ecrop',
 'icrop',
 'iporc',
 'ncrop',
 'nporc',
 'porce',
 'porch',
 'porci',
 'porcu',
 'rcrop',
 'tcrop'}

Now we should check who wins. But I'm impateint: I tried win(enable1_3duper), waited a minute, and it still hadn't returned, so I interrupted that computation and threw in a lru_cache decorator; which stops win from wasting time repeating the same computation over and over. This brings the time down to about 2 seconds.

In [38]:

from functools import lru_cache

@lru_cache(None)
def win(vocab, fragment=''):
    """Does the current player have a forced win? 
    Return fragment if the player has already won, or return a play that forces a win."""
    if fragment in vocab.words or fragment not in vocab.fragments:
        return fragment
    for play in vocab.legal_plays(fragment):
        if not win(vocab, play):
            return play 
    return False

In [39]:

win(enable1_3duper)

Out[39]:

'p'

In [40]:

win(enable1_4duper)

Out[40]:

'p'

The first player can win with either vocabulary. Here's a sample game.

In [41]:

play(enable1_3duper, rational, rational, verbose=True)

Player 0, given "", plays "p".
Player 1, given "p", plays "pr".
Player 0, given "pr", plays "lpr".
Player 1, given "lpr", plays "rple".
Player 0, given "rple", plays "rplex".
Player 1, given "rplex", plays "rplexe".
Player 0, given "rplexe", plays "urplexe".
Player 1, given "urplexe", plays "urplexes".
Player 0, given "urplexes", plays "ourplexes".
Player 1, given "ourplexes", plays "fourplexes".

Out[41]:

(0, 'fourplexes')

Let's see how many fragments each vocabulary takes:

In [42]:

[len(v.fragments) 
 for v in [enable1, enable1_4, enable1_3super, enable1_4super, enable1_3duper, enable1_4duper]]

Out[42]:

[387878, 387844, 1076434, 1076431, 1076434, 1076431]

Summary¶

Here's a summary of what we have learned. (Note: the bold qursh means it is a losing word):

| Variant | Shortest | Winner | First Player Forced Outcomes | 2nd Outcomes | Fragments |:----: |:---: |:---: |:---: |:---: |---: | Ghost | 3 | Second | qaid qiviut qoph qursh qurush qwerty | 55 words | 387,878 | Ghost | 4 | First | naan nene ngultrum nirvanic nolo null nyctalopia | 85 words | 387,844 | Super | 3 | First | ? | ? | 1,076,434 | Super | 4 | First | ? | ? | 1,076,431 | SuperDuper | 3 | First| ? | ? | 1,076,434 | SuperDuper | 4 | First| ? | ? | 1,076,431

Further Work¶

Here are some additional ideas to play with:

Exploitation: What are some good strategies when you are not guaranteed to win, to exploit an imperfect human opponent? Can you steer the game so that you win if the opponent is unfamiliar with some obscure word(s)? You might need a file of word frequencies.
Security: A strategy function could cheat, and modify vocab.words, inserting or deleting some crucial words to ensure victory. Can you harden play (and/or change Vocabulary) to protect against that?
Pruning: Currently Vocabulary saves words and fragments that could never be reached in a game. For example, because 'the' is a word that ends the game, we could never reach 'them' or 'theme' or 'thermoluminescences'. Can you prune away these redundant words/fragments?
Multi-player: play(enable1, ask('A'), ask('B'), ask('C')) will play a three-player game. But rational (along with win and winner) would no longer work, since they assume there are exactly two players. Can you alter them to allow n players?
SuperGhost Summary: Can you summarize a SuperGhost or SuperDuperGhost strategy in a way that a human can memorize?
Xghost: In Xghost, a letter can be added anywhere, so from the fragment 'era' you could play 'erba'.
Spook: In Spook, letters can be rearranged before adding one, so from the fragment 'era' you could play 'bear'.