In 1637, Pierre de Fermat wrote in the margin of a book that he had a proof of his famous "Last Theorem":
If $A^n + B^n = C^n$,
where $A, B, C, n$ are positive integers
then $n \le 2$.
Centuries passed before Andrew Beal, a businessman and amateur mathematician, made his conjecture in 1993:
If $A^x + B^y = C^z$,
where $A, B, C, x, y, z$ are positive integers
and $x, y, z$ are all greater than $2$,
then $A, B$ and $C$ must have a common prime factor.
Andrew Wiles proved Fermat's theorem in 1995, but Beal's conjecture remains unproved, and Beal has offered one million dollars for a proof or disproof. I don't have the mathematical skills of Wiles, so I could never find a proof, but I can write a program to search for counterexamples. I first wrote that program in 2000, and [my name got associated](https://www.google.com/webhp?#q=beal conjecture) with Beal's Conjecture, which means I get a lot of emails with purported proofs or counterexamples (many asking how they can collect their prize money). So far, all the emails have been wrong. This notebook catalogs some of the more common errors, updates my 2000 program, and introduces this tool for verifying counterexamples:
Online Beal Counterexample Checker |
---|
A proof must show that there are no examples that satisfy the conditions. A common error is to show how a certain pattern generates an infinite number of $(A, x, B, y, C, z)$ examples, and that the conjecture holds for this entire infinite collection. But that's not good enough, unless you can also prove that the conjecture holds for every other possible pattern.
It is valid to use proof by contradiction: assume the conjecture is true, and show that that leads to a contradiction. It is not valid to use proof by circular reasoning: assume the conjecture is true, put in some irrelevant steps, and show that it follows that the conjecture is true.
A valid counterexample needs to satisfy all four conditions—don't leave one out.
One correspondent claimed that $27^4 + 162 ^ 3 = 9 ^ 7$ was a solution, because the first three conditions hold, and the common factor is 9, which isn't a prime. But of course, if $A, B, C$ have 9 as a common factor, then they also have 3, and 3 is prime. "No common prime factor" means the same thing as "no common factor greater than 1."
Another claimed that $2^3+2^3=2^4$ was a counterexample, because all the bases are 2, which is prime, and prime numbers have no prime factors. But that's not true; a prime number has itself as a factor.
A creative person offered $ 1359072^4 - 940896^4 = 137998080^3$, which fails both because $ 3^3 2^5 11^2 $ is a common factor, and because it has a subtraction rather than an addition (although, as Julius Jacobsen pointed out, it could be rewritten as $ 137998080^3 + 940896^4 = 1359072^4 $).
Mustafa Pehlivan came up with an example involving 76-million-digit numbers, which took some work to prove wrong (using modulo arithmetic).
Another Beal fan started by saying "Let $C = 43$ and $z = 3$. Since $43 = 21 + 22$, we have $43^3 = (21^3 + 22^3)$." But of course $(a + b)^3 \ne (a^3 + b^3)$. This fallacy is called the freshman's dream (although I remember having different dreams as a freshman).
Multiple people proposed counterexamples similar to this one:
from math import gcd #### In Python versions < 3.5, use "from fractions import gcd"
A, B, C = 60000000000000000001, 70000000000000000003, 82376613842809255677
x = y = z = 3.
A ** x + B ** y == C ** z and gcd(A, B) == gcd(B, C) == 1
True
WOW! The result is True
! The two sides of the equation are equal, and the greatest common divisor is 1. Is this a real counterexample to Beal? And also a disproof of Fermat's Last Theorem?
Alas, it is not. The decimal point in "x = y = z = 3
.
" indicates a floating point number, with inexact, limited precision. Change the inexact "3.
" to an exact "3
" and the two sides of the equation are no longer equal. Below we see they are the same for the first 19 digits, but differ starting with the 20th:
(A ** 3 + B ** 3,
C ** 3)
(559000000000000000054900000000000000002070000000000000000028, 559000000000000000063037470301555182935702892172500189973733)
They say "close" only counts in horseshoes and hand grenades, and if you stood in your yard and threw a horseshoe at a stake on Kapteyn-b (an exoplanet 12.8 light years from Earth that is deemed habitable and thus possibly horseshoe-playing) and the flight path differed from the perfect path in the 20th digit, then it would end up about a millimeter from the target. That's really, really close, but close doesn't count in number theory.
Left: Kapteyn-b. Right: Homer Simpson.
In two different episodes of The Simpsons, close counterexamples to Fermat's Last Theorem are shown:
$3987^{12} + 4365^{12} = 4472^{12}$ and $1782^{12} + 1841^{12} = 1922^{12}$. These were designed by Simpsons writer David X. Cohen to be correct up to the precision of a typical handheld calculator; here we see the two sides of the second equation agree on the first ten digits, 6397665634
, and then differ:
3987 ** 12 + 4365 ** 12, 4472 ** 12
(63976656349698612616236230953154487896987106, 63976656348486725806862358322168575784124416)
Cohen must have found the equations with a program something like this (here bases
is a sequence of integers to consider for the values of A
and B
; the variables An
and Bn
hold the A**n
and B**n
values; lhs
is their sum (the left-hand-side of the equation); and the function Cn
computes the C**n
that is closest to that sum):
from itertools import combinations
def simpsons(bases, n):
"""Print the (A**n + B**n = C**n) equation that minimizes the relative error,
for a given n and A, B values from the sequence of integers `bases`."""
def Cn(lhs): return iroot(sum(lhs), n) ** n
def err(lhs): return abs(sum(lhs) - Cn(lhs)) / sum(lhs)
def show(Xn): return '{} ** {}'.format(iroot(Xn, n), n)
powers = [b ** n for b in bases]
(An, Bn) = lhs = min(combinations(powers, 2), key=err)
print('{} + {} == {} (with error {:.0g})'
.format(show(An), show(Bn), show(Cn(lhs)), err(lhs)))
def iroot(x, n): "integer nth root"; return int(round(x ** (1 / n)))
simpsons(range(1000, 2000), 12)
simpsons(range(2000, 5000), 12)
1782 ** 12 + 1841 ** 12 == 1922 ** 12 (with error 3e-10) 3987 ** 12 + 4365 ** 12 == 4472 ** 12 (with error 2e-11)
These are the same two equations that David X. Cohen found.
Can we find other near-misses? I'll try each single-digit exponent. I want A, B, C to be 4 digits each, so I'll limit A and B to 9500 (not 9999), to try to keep C from overflowing to 5 digits. (This takes around 10 minutes to run.)
for n in range(3, 10):
simpsons(range(1000, 9500), n)
5856 ** 3 + 9036 ** 3 == 9791 ** 3 (with error 1e-12) 2396 ** 4 + 4551 ** 4 == 4636 ** 4 (with error 4e-11) 3993 ** 5 + 7767 ** 5 == 7822 ** 5 (with error 2e-11) 6107 ** 6 + 8919 ** 6 == 9066 ** 6 (with error 8e-13) 5592 ** 7 + 9079 ** 7 == 9122 ** 7 (with error 2e-11) 4749 ** 8 + 8952 ** 8 == 8959 ** 8 (with error 3e-11) 5433 ** 9 + 6725 ** 9 == 6828 ** 9 (with error 4e-11)
The equation for n=6 has the smallest error yet (in the 12th decimal place).
beal
¶In October 2015 I looked back at my original program from 2000.
I ported it from Python 1.5 to 3.5 (print
is now a function, long
is int
). It runs 250 times faster today, a tribute to both computer hardware engineers and the developers of the Python interpreter.
I found that I had misstated the problem in 2000. I thought that, by definition, $A$ and $B$ could not have a common factor, but actually, the conjecture only rules out examples where all three of $A, B, C$ share a common factor. But, as [Mark Tiefenbruck](mailto:mark @tiefenbruck.org) (as well as Edward P. Berlin and Shen Lixing) pointed out, my statement is correct, not by definition, but by derivation: if $A$ and $B$ have a common prime factor $p$, then the sum of $A^x + B^y$ must also have that factor $p$, and hence $C^z$, and $C$, must have the factor $p$.
Mark Tiefenbruck also suggested another optimization: only consider exponents that are odd primes, or 4. The idea is that a number like 512 can be expressed as either $2^9$ or $8^3$, and my program doesn't need to consider both. In general, any time we have a composite exponent, such as $b^{qp}$, where $p$ is prime, we should ignore $A=b, x=qp$, and instead consider only $A=b^q, x=p$. There's one complication to this scheme: 2 is a prime, but 2 is not a valid exponent for a Beal counterexample. So we will allow 4 as an exponent, as well as all odd primes up to max_x
.
Here is the complete, updated program:
from math import gcd, log
from itertools import combinations, product
def beal(max_A, max_x):
"""See if any A ** x + B ** y equals some C ** z, with gcd(A, B) == 1.
Consider any 1 <= A,B <= max_A and x,y <= max_x, with x,y prime or 4."""
Apowers = make_Apowers(max_A, max_x)
Czroots = make_Czroots(Apowers)
for (A, B) in combinations(Apowers, 2):
if gcd(A, B) == 1:
for (Ax, By) in product(Apowers[A], Apowers[B]):
Cz = Ax + By
if Cz in Czroots:
C = Czroots[Cz]
x, y, z = exponent(Ax, A), exponent(By, B), exponent(Cz, C)
print('{} ** {} + {} ** {} == {} ** {} == {}'
.format(A, x, B, y, C, z, C ** z))
def make_Apowers(max_A, max_x):
"A dict of {A: [A**3, A**4, ...], ...}."
exponents = exponents_upto(max_x)
return {A: [A ** x for x in (exponents if (A != 1) else [3])]
for A in range(1, max_A+1)}
def make_Czroots(Apowers): return {Cz: C for C in Apowers for Cz in Apowers[C]}
def exponents_upto(max_x):
"Return all odd primes up to max_x, as well as 4."
exponents = [3, 4] if max_x >= 4 else [3] if max_x == 3 else []
for x in range(5, max_x, 2):
if not any(x % p == 0 for p in exponents):
exponents.append(x)
return exponents
def exponent(Cz, C):
"""Recover z such that C ** z == Cz (or equivalently z = log Cz base C).
For exponent(1, 1), arbitrarily choose to return 3."""
return 3 if (Cz == C == 1) else int(round(log(Cz, C)))
It takes less than a second to verify that there are no counterexamples for combinations up to $100^{100}$, a computation that took Andrew Beal thousands of hours on his 1990s-era computers:
%time beal(100, 100)
CPU times: user 353 ms, sys: 4.84 ms, total: 358 ms Wall time: 376 ms
The execution time goes up roughly with the square of max_A
, so the following should take about 25 times longer:
%time beal(500, 100)
CPU times: user 8.97 s, sys: 56.6 ms, total: 9.03 s Wall time: 9.12 s
beal
Works¶The function beal
first does some precomputation, creating two data structures:
Apowers
: a dict of the form {A: [A**3, A**4, ...]}
giving thenonredundant powers (prime and 4th powers) of each base, A
, from 3 to max_x
.
Czroots
: a dict of {C**z : C}
pairs, giving the zth root of each power in Apowers
.Here is a very small example Apowers table:
Apowers = make_Apowers(6, 10)
Apowers
{1: [1], 2: [8, 16, 32, 128], 3: [27, 81, 243, 2187], 4: [64, 256, 1024, 16384], 5: [125, 625, 3125, 78125], 6: [216, 1296, 7776, 279936]}
Then we enumerate all combinations of two bases, A
and B
, from Apowers
. Consider the combination where A
is 3
and B
is 6
. Of course gcd(3, 6) == 3
, so the program would not consider them further, but imagine if they did not share a common factor. Then we would look at all possible Ax + By
sums, for Ax
in [27, 81, 243, 2187]
and By
in [216, 1296, 7776, 279936].
One of these would be 27 + 216
, which sums to 243
. We look up 243
in Czroots
:
Czroots = make_Czroots(Apowers)
Czroots
{1: 1, 8: 2, 16: 2, 27: 3, 32: 2, 64: 4, 81: 3, 125: 5, 128: 2, 216: 6, 243: 3, 256: 4, 625: 5, 1024: 4, 1296: 6, 2187: 3, 3125: 5, 7776: 6, 16384: 4, 78125: 5, 279936: 6}
Czroots[243]
3
We see that 243
is in Czroots
, with value 3
, so this would be a counterexample (except for the common factor). The program uses the exponent
function to recover the values of x, y, z
, and prints the results.
Can we gain confidence in the program? It is difficult to test beal
, because the expected output is nothing, for all known inputs.
One thing we can do is verify that beal
finds cases like 3 ** 3 + 6 ** 3 == 3 ** 5 == 243
that would be a counterexample except for the common factor 3
. We can test this by temporarily replacing the gcd
function with a mock function that always reports no common factors:
def gcd(a, b): return 1
beal(100, 100)
3 ** 3 + 6 ** 3 == 3 ** 5 == 243 7 ** 7 + 49 ** 3 == 98 ** 3 == 941192 8 ** 4 + 16 ** 3 == 2 ** 13 == 8192 8 ** 5 + 32 ** 3 == 16 ** 4 == 65536 9 ** 3 + 18 ** 3 == 9 ** 4 == 6561 16 ** 5 + 32 ** 4 == 8 ** 7 == 2097152 17 ** 4 + 34 ** 4 == 17 ** 5 == 1419857 19 ** 4 + 38 ** 3 == 57 ** 3 == 185193 27 ** 3 + 54 ** 3 == 3 ** 11 == 177147 28 ** 3 + 84 ** 3 == 28 ** 4 == 614656 34 ** 5 + 51 ** 4 == 85 ** 4 == 52200625
Let's make sure all those expressions are true:
{3 ** 3 + 6 ** 3 == 3 ** 5 == 243,
7 ** 7 + 49 ** 3 == 98 ** 3 == 941192,
8 ** 4 + 16 ** 3 == 2 ** 13 == 8192,
8 ** 5 + 32 ** 3 == 16 ** 4 == 65536,
9 ** 3 + 18 ** 3 == 9 ** 4 == 6561,
16 ** 5 + 32 ** 4 == 8 ** 7 == 2097152,
17 ** 4 + 34 ** 4 == 17 ** 5 == 1419857,
19 ** 4 + 38 ** 3 == 57 ** 3 == 185193,
27 ** 3 + 54 ** 3 == 3 ** 11 == 177147,
28 ** 3 + 84 ** 3 == 28 ** 4 == 614656,
34 ** 5 + 51 ** 4 == 85 ** 4 == 52200625}
{True}
I get nervous having an incorrect version of gcd
around: change it back, quick!
from math import gcd
beal(100, 100)
We can also provide some test cases for the subfunctions of beal
:
def tests():
assert make_Apowers(6, 10) == {
1: [1],
2: [8, 16, 32, 128],
3: [27, 81, 243, 2187],
4: [64, 256, 1024, 16384],
5: [125, 625, 3125, 78125],
6: [216, 1296, 7776, 279936]}
assert make_Czroots(make_Apowers(5, 8)) == {
1: 1, 8: 2, 16: 2, 27: 3, 32: 2, 64: 4, 81: 3,
125: 5, 128: 2, 243: 3, 256: 4, 625: 5, 1024: 4,
2187: 3, 3125: 5, 16384: 4, 78125: 5}
Czroots = make_Czroots(make_Apowers(100, 100))
assert 3 ** 3 + 6 ** 3 in Czroots
assert 99 ** 97 in Czroots
assert 101 ** 100 not in Czroots
assert Czroots[99 ** 97] == 99
assert exponent(10 ** 5, 10) == 5
assert exponent(7 ** 3, 7) == 3
assert exponent(1234 ** 999, 1234) == 999
assert exponent(12345 ** 6789, 12345) == 6789
assert exponent(3 ** 10000, 3) == 10000
assert exponent(1, 1) == 3
assert exponents_upto(2) == []
assert exponents_upto(3) == [3]
assert exponents_upto(4) == [3, 4]
assert exponents_upto(40) == [3, 4, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
assert exponents_upto(100) == [
3, 4, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61,
67, 71, 73, 79, 83, 89, 97]
assert gcd(3, 6) == 3
assert gcd(3, 7) == 1
assert gcd(861591083269373931, 94815872265407) == 97
assert gcd(2*3*5*(7**10)*(11**12), 3*(7**5)*(11**13)*17) == 3*(7**5)*(11**12)
return 'tests pass'
tests()
'tests pass'
The program is mostly straightforward, but relies on the correctness of these arguments:
combinations
without replacements from the Apowers
table? In other words, are we sure there are no solutions of the form $A^x + A^x = C^z$? Yes, we can be sure, because then $2\;A^x = C^z$, and all the factors of $A$ would also be factors of $C$.* Are we justified in having a single value for each key in the `Czroots` table? Consider that $81 = 3^4 = 9^2$. We put `{81: 3}` in the table and discard `{81: 9}`, because any number that has 9 as a factor will always have 3 as a factor as well, so 3 is all we need to know. But what if a number could be formed with two bases where neither was a multiple of the other? For example, what if $2^7 = 5^3 = s$; then wouldn't we have to have both 2 and 5 as values for $s$ in the table? Fortunately, that can never happen, because of the [fundamental theorem of arithmetic](https://en.wikipedia.org/wiki/Fundamental_theorem_of_arithmetic).
* Could there be a rounding error involving the `exponent` function that was not caught by the tests? Possibly; but `exponent` is not used to find counterexamples, only to print them, so any such error wouldn't cause us to miss a counterexample.
* Are we justified in only considering exponents that are odd primes, or the number 4? In one sense, yes, because when we consider the two terms $A^{(qp)}$ and $(A^q)^p$, we find they are always equal, and always have the same prime factors (the factors of $A$), so for the purposes of the Beal problem, they are equivalent, and we only need consider one of them. In another sense, there is a difference. With this optimization, when we run `beal(6, 10)`, we are no longer testing $512$ as a value of $A$ or $B$, even though $512 = 2^9$ and both $2$ and $9$ are within range, because the program chooses to express $512$ as $8^3$, and $8$ is not in the specified range. So the program is still correctly searching for counterexamples, but the space that it searches for given `max_A` and `max_x` is different with this optimization.
* Are we really sure that when $A$ and $B$ have a common factor greater than 1, then $C$ also shares that common factor? Yes, because if $p$ is a factor of both $A$ and $B$, then it is a factor of $A^x + B^y$, and since we know this is equal to $C^z$, then $p$ must also be a factor of $C^z$, and thus a factor of $C$.
Arithmetic is slow with integers that have thousands of digits. If we want to explore much further, we'll have to make the program more efficient. An obvious improvement would be to do all the arithmetic modulo some number $m$. Then we know:
$$\mbox{if} ~~ A^x + B^y = C^z ~~ \mbox{then} ~~ (A^x (\mbox{mod} ~ m) + B^y (\mbox{mod} ~ m)) (\mbox{mod} ~ m) = C^z \;(\mbox{mod} ~ m)$$So we can do efficient tests modulo $m$, and then do the full arithmetic only for combinations that work modulo $m$. Unfortunately there will be collisions (two numbers that are distinct, but are equal mod $m$), so the tables will have to have lists of values. Here is a simple, unoptimized implementation:
from math import gcd
from itertools import combinations, product
from collections import defaultdict
def beal_modm(max_A, max_x, m=2**31-1):
"""See if any A ** x + B ** y equals some C ** z (mod p), with gcd(A, B) == 1.
If so, verify that the equation works without the (mod m).
Consider any 1 <= A,B <= max_A and x,y <= max_x, with x,y prime or 4."""
assert m >= max_A
Apowers = make_Apowers_modm(max_A, max_x, m)
Czroots = make_Czroots_modm(Apowers)
for (A, B) in combinations(Apowers, 2):
if gcd(A, B) == 1:
for (Axm, x), (Bym, y) in product(Apowers[A], Apowers[B]):
Czm = (Axm + Bym) % m
if Czm in Czroots:
lhs = A ** x + B ** y
for (C, z) in Czroots[Czm]:
if lhs == C ** z:
print('{} ** {} + {} ** {} == {} ** {} == {}'
.format(A, x, B, y, C, z, C ** z))
def make_Apowers_modm(max_A, max_x, m):
"A dict of {A: [(A**3 (mod m), 3), (A**4 (mod m), 4), ...]}."
exponents = exponents_upto(max_x)
return {A: [(pow(A, x, m), x) for x in (exponents if (A != 1) else [3])]
for A in range(1, max_A+1)}
def make_Czroots_modm(Apowers):
"A dict of {C**z (mod m): [(C, z),...]}"
Czroots = defaultdict(list)
for A in Apowers:
for (Axm, x) in Apowers[A]:
Czroots[Axm].append((A, x))
return Czroots
Here we see that each entry in the Apowers
table is a list of (A**x (mod p), x)
pairs.
For example, $6^7 = 279,936$, so in our (mod 1000) table we have the pair (936, 7)
under 6
.
Apowers = make_Apowers_modm(6, 10, 1000)
Apowers
{1: [(1, 3)], 2: [(8, 3), (16, 4), (32, 5), (128, 7)], 3: [(27, 3), (81, 4), (243, 5), (187, 7)], 4: [(64, 3), (256, 4), (24, 5), (384, 7)], 5: [(125, 3), (625, 4), (125, 5), (125, 7)], 6: [(216, 3), (296, 4), (776, 5), (936, 7)]}
And each item in the Czroots
table is of the form {C**z (mod m): [(C, z), ...]}
.
For example, 936: [(6, 7)]
.
make_Czroots_modm(Apowers)
defaultdict(list, {1: [(1, 3)], 8: [(2, 3)], 16: [(2, 4)], 24: [(4, 5)], 27: [(3, 3)], 32: [(2, 5)], 64: [(4, 3)], 81: [(3, 4)], 125: [(5, 3), (5, 5), (5, 7)], 128: [(2, 7)], 187: [(3, 7)], 216: [(6, 3)], 243: [(3, 5)], 256: [(4, 4)], 296: [(6, 4)], 384: [(4, 7)], 625: [(5, 4)], 776: [(6, 5)], 936: [(6, 7)]})
Let's run the program:
%time beal_modm(1000, 100)
CPU times: user 56 s, sys: 436 ms, total: 56.4 s Wall time: 59.2 s
We don't see a speedup here, but the idea is that as we start dealing with much larger integers, this version should be faster. I could improve this version by caching certain computations, managing the memory layout better, moving some computations out of loops, considering using multiple different numbers as the modulus (as in a Bloom filter), finding a way to parallelize the program, and re-coding in a faster compiled language (such as C++ or Go or Julia). Then I could invest thousands (or millions) of CPU hours searching for counterexamples.
But Witold Jarnicki and David Konerding already did that: they wrote a C++ program that, in parallel across thousands of machines, searched for $A, B$ up to 200,000 and $x, y$ up to 5,000, but found no counterexamples. So I don't think it is worthwhile to continue on that path.
This was fun, but I can't recommend anyone spend a serious amount of computer time looking for counterexamples to the Beal Conjecture—the money you would have to spend in computer time would be more than the expected value of your prize winnings. I suggest you work on a proof rather than a counterexample, or work on some other interesting problem instead!