The 538 Riddler for March 22, 2019 asks us to simulate baseball using probabilities from a 19th century dice game called Our National Ball Game:
1,1: double 2,2: strike 3,3: out at 1st 4,4: fly out
1,2: single 2,3: strike 3,4: out at 1st 4,5: fly out
1,3: single 2,4: strike 3,5: out at 1st 4,6: fly out
1,4: single 2,5: strike 3,6: out at 1st 5,5: double play
1,5: base on error 2,6: foul out 5,6: triple
1,6: base on balls 6,6: home run
The rules left some things unspecified; the following are my current choices (in an early version I made different choices that resulted in slightly more runs):
I also made some choices about the implementation:
K
, O
, o
, f
, D
: strikeout, foul out, out at first, fly out, double play1
, 2
, 3
, 4
: single, double, triple, home runE
, B
: error, base on balls1,1
is a 1/36 event, whereas 1,2
is a 2/36 event, because it also represents (2, 1).runners = [1, 2]
means runners on first and second.inning
simulates a half inning and returns the number of runs scored.inning
by feeding it specific events, and I also want to generate random innings. So I'll make the interface be that I pass in an iterable of events. The function event_stream
generates an endless stream of randomly sampled events.r = 2
) when the event is a single (e = '1'
), the expression r + int(e) + (r == 2)
evaluates to 2 + 1 + 1
or 4
, meaning the runner on second scores.innings
.innings
and sum them.%matplotlib inline
import matplotlib.pyplot as plt
import random
def event_stream(events='2111111EEBBOOooooooofffffD334', strike=7/36):
"An iterator of random events. Defaults from `Our National Ball Game`."
while True:
yield 'K' if (random.random() < strike ** 3) else random.choice(events)
def inning(events=event_stream(), verbose=False) -> int:
"Simulate a half inning based on events, and return number of runs scored."
outs = runs = 0 # Inning starts with no outs and no runs,
runners = [] # ... and with nobody on base
for e in events:
if verbose: print(f'{outs} outs, {runs} runs, event: {e}, runners: {runners}')
# What happens to the batter?
if e in 'KOofD': outs += 1 # Batter is out
elif e in '1234EB': runners.append(0) # Batter becomes a runner
# What happens to the runners?
if e == 'D' and 1 in runners: # double play: runner on 1st out, others advance
outs += 1
runners = [r + 1 for r in runners if r != 1]
elif e in 'oE': # out at first or error: runners advance
runners = [r + 1 for r in runners]
elif e == 'f' and 3 in runners and outs < 3: # fly out: runner on 3rd scores
runners.remove(3)
runs += 1
elif e in '1234': # single, double, triple, homer
runners = [r + int(e) + (r == 2) for r in runners]
elif e == 'B': # base on balls: forced runners advance
runners = [r + forced(runners, r) for r in runners]
# See if inning is over, and if not, whether anyone scored
if outs >= 3:
return runs
runs += sum(r >= 4 for r in runners)
runners = [r for r in runners if r < 4]
def forced(runners, r) -> bool: return all(b in runners for b in range(r))
Let's peek at some random innings:
inning(verbose=True)
0 outs, 0 runs, event: E, runners: [] 0 outs, 0 runs, event: 4, runners: [1] 0 outs, 2 runs, event: E, runners: [] 0 outs, 2 runs, event: 1, runners: [1] 0 outs, 2 runs, event: f, runners: [2, 1] 1 outs, 2 runs, event: B, runners: [2, 1] 1 outs, 2 runs, event: 1, runners: [3, 2, 1] 1 outs, 4 runs, event: E, runners: [2, 1] 1 outs, 4 runs, event: o, runners: [3, 2, 1] 2 outs, 5 runs, event: o, runners: [3, 2]
5
inning(verbose=True)
0 outs, 0 runs, event: 1, runners: [] 0 outs, 0 runs, event: B, runners: [1] 0 outs, 0 runs, event: O, runners: [2, 1] 1 outs, 0 runs, event: 1, runners: [2, 1] 1 outs, 1 runs, event: 3, runners: [2, 1] 1 outs, 3 runs, event: 1, runners: [3] 1 outs, 4 runs, event: f, runners: [1] 2 outs, 4 runs, event: o, runners: [1]
4
And we can feed in any events we want to test the code:
inning('2EBB1DB12f', verbose=True)
0 outs, 0 runs, event: 2, runners: [] 0 outs, 0 runs, event: E, runners: [2] 0 outs, 0 runs, event: B, runners: [3, 1] 0 outs, 0 runs, event: B, runners: [3, 2, 1] 0 outs, 1 runs, event: 1, runners: [3, 2, 1] 0 outs, 3 runs, event: D, runners: [2, 1] 2 outs, 3 runs, event: B, runners: [3] 2 outs, 3 runs, event: 1, runners: [3, 1] 2 outs, 4 runs, event: 2, runners: [2, 1] 2 outs, 5 runs, event: f, runners: [3, 2]
5
That looks good.
Now, simulate a million innings, and then sample from them to simulate a million nine-inning games (for one team):
N = 1000000
innings = [inning() for _ in range(N)]
games = [sum(random.sample(innings, 9)) for _ in range(N)]
Let's see histograms:
def hist(nums, title):
"Plot a histogram."
plt.hist(nums, ec='black', bins=max(nums)-min(nums)+1, align='left')
plt.title(f'{title} Mean: {sum(nums)/len(nums):.3f}, Min: {min(nums)}, Max: {max(nums)}')
hist(innings, 'Runs per inning:')
hist(games, 'Runs per game:')