I understand the certainty argument to mean:
certainty – a measure of the uncertainty that the caller is willing to tolerate: if the call returns true the probability that this BigInteger is prime exceeds (1 – 1/2certainty)
From my experiments, it seems to exceed it by quite a lot! The code below finds “probable primes” between 2 and 1 million and checks against a set of definite primes to see if it was a false positive.
I’m using a certainty argument of 2. I therefore expect that only 75% of “probable primes” will be actual primes. (1 – 1/22 = 0.75 = 75%.)
Actually, it gets it right 99.9% of the time.
Is my understanding of the meaning of “certainty” correct? I suspect it might not be if the certainty I’ve seen experimentally exceeds my expectation by so much.
import java.math.BigInteger; import java.util.BitSet; import static java.lang.Math.sqrt; public class PrimesCalculator { public final int max; private final BitSet sieve; // Set of all non-primes from 2 to max. public PrimesCalculator(int max) { this.max = max; sieve = new BitSet(max+1); for (int n = 2, sqrtMax = (int) sqrt(max); n < sqrtMax; n++) for (int i = n * 2; i < max; i += n) sieve.set(i); } public boolean isPrime(int n) { return !sieve.get(n); } public static void main(String[] args) { PrimesCalculator calc = new PrimesCalculator(1_000_000); int numPrimes = 0; int numProbablePrimes = 0; for (int i = 2; i < calc.max; i++) if (BigInteger.valueOf(i).isProbablePrime(2)) { numProbablePrimes++; if (calc.isPrime(i)) numPrimes++; } System.out.printf("%s/%s (%s%%)%n", numPrimes, numProbablePrimes, numPrimes / (numProbablePrimes / 100.0)); } }
Advertisement
Answer
These statements frequently cause confusion. I will attempt to explain it a little better here.
Java’s BigInteger.isProbablePrime()
contains no optimizations for small integers at all, except 0, 1, and 2 are treated as special cases and all even integers except 2 are immediately declared composite.
All other odd integers are checked for primality using the Miller-Rabin (MR) primality test. In addition, if the integer is 100 bits or larger it is also checked with something called the Lucas-Lehmer test.
MR is a complicated primality test whose explanation and analysis are beyond the scope of an SO answer. The key point is that, as far as MR is concerned, not all composites are created equal. A very, very tiny fraction are much, much harder to discover their compositeness. A few examples: amongst the small odd composites, 91, 703, and 1891 are difficult. MR overcomes this by trying multiple randomized attempts to discover the compositeness of an integer. Rabin’s analysis shows that, for the worst behaving composites, a single randomized attempt still has at least a 75% (3/4) probability of revealing its compositeness.
The certainty
argument is almost equivalent to specifying the number of randomized attempts that the MR algorithm needs to perform. In reality, the relationship between the certainty
argument and the number of randomized attempts is more complicated and I don’t completely understand it myself.
As an experiment to see how different composites perform, change your program to instead just repeatedly try confirming the compositeness of, say, 1891. You will see something closer to only 75% success.
A list of relatively small MR-stubborn composites is here.