Background
I’ve been reading and trying to wrap my head around various questions/answers that relate to finding the seed from Java.util.Random given its output from nextInt()
.
The implementation of nextInt(int bound)
is:
public int nextInt(int bound) { if (bound <= 0) throw new IllegalArgumentException("bound must be positive"); if ((bound & -bound) == bound) // i.e., bound is a power of 2 return (int)((bound * (long)next(31)) >> 31); int bits, val; do { bits = next(31); val = bits % bound; } while (bits - val + (bound-1) < 0); return val; }
The implementation of next(int bits)
is:
protected int next(int bits) { long oldseed, nextseed; AtomicLong seed = this.seed; do { oldseed = seed.get(); nextseed = (oldseed * multiplier + addend) & mask; } while (!seed.compareAndSet(oldseed, nextseed)); return (int)(nextseed >>> (48 - bits)); }
where the multiplier is 0x5DEECE66DL
, the addend is 0xBL
, and the mask is (1L << 48) - 1
. These are hexadecimal values where the L is Java convention for long
conversion.
By calling nextInt()
without a bound, the full 32-bits are returned from next(32)
instead of dropping bits with bits % bound
.
Questions
Without completely bruteforcing the full 248 possibilities, how would I go about finding the current seed after x amount of calls to
nextInt(n)
(assuming the bound is never a power of 2)? For example, let’s assume I want to find the seed given 10 calls tonextInt(344)
[251, 331, 306, 322, 333, 283, 187, 54, 170, 331].How can I determine the amount of data I’d need to find the correct seed, not just another one that produces the same starting data?
Does this change given bounds that are either odd/even?
Advertisement
Answer
Without completely bruteforcing the full 248 possibilities, how would I go about finding the current seed after x amount of calls to
nextInt(n)
(assuming the bound is never a power of 2)?
Let’s first remove code that is here for multi-threading, error testing, and bound
a power of two. Things boil down to
public int nextInt_equivalent(int bound) { int bits, val; do { seed = (seed * multiplier + addend) & mask; // 48-bit LCG bits = seed >> 17; // keep the top 31 bits val = bits % bound; // build val in [0, bound) } while (bits - val + (bound-1) < 0); return val; }
Next we must understand what the while (bits - val + (bound-1) < 0)
is about. It’s here to use bits
only when it is in an interval of width multiple of bound
, thus insuring a uniform distribution of val
. That interval is [0, (1L<<31)/bound*bound
).
The while condition is equivalent to while (bits >= (1L<<31)/bound*bound)
, but executes faster. This condition occurs for the (1L<<31)%bound
highest values of bits
out of 1L<<31
. When bound
is 344, this occurs for 8 values of bits
out of 231, or about 3.7 per US billion.
This is so rare that one reasonable approach is to assume it does not occur. Another is to hope that it occurs, and test if (and when) it does by seeing if the seeds that cause that rare event lead to a sequence of val
found in the givens. We have only ((1L<<31)%bound)<<17
(here slightly above a million) values of seed
to test, which is quite feasible. Whatever, in the rest I assume this is ruled out, and consider the generator without the while.
When bound
is even, or more generally a multiple of 2S for some S>0, observe that the low-order S bits of the output val
(which we can find) are also the low-order S bits of bits
, and thus the bits of rank [17, 17+S) of seed
. And that the low 17+S bits of seed are fully independent of the 31-S other ones. When bound
is 344=8×43, we have S=3, and thus we can attack the low-order 17+S=20 low-order bits of seed
independently. We directly get S=3 bits of seed
from the first val
.
We get the 17 low-order bits of seed
by elimination: for each of 217 candidates, and given the S=3 bits we know, does the 17+S=20 bits of seed
lead to a sequence of val
which low-order S bits match the given sequence? With enough values we can make a full determination of the 17+S bits. We need ⌈17/S+1⌉ = 7 val
to narrow down to a single value for the 17+S low-order bits of seed
in this way. If we got less, we need to keep several candidates in the next step. In the question we have ample val
to narrow to a single value, and be convinced we get it right.
And then when we have these 17+S=20 bits of seed
, we can find the remaining 31-S=28 with a moderate amount of brute force. We could test 228 values of the yet unknown bits for seed
and check which gives a full match with the known val
. But better: we know seed % (bound<<17)
exactly, and thus need only test 231/bound
values of seed
(here roughly 6 million).
How can I determine the amount of data I’d need to find the correct seed?
A working heuristic for all but pathological LCGs, and many other PRNGs, is that you need as much information as there are bits in the state, thus 48 bits. Each output gives you log2(bound
) bits, thus you need ⌈48/log2(bound
)⌉ values, here 6 (which would require keeping track of a few candidates for the low 20 bits of seed
and thus require correspondingly more work in the second phase) . Extra values give confidence that the actual state was recovered, but AFAIK wrong guesses just will not happen unless the while
comes into play.
Does this change given bounds that are either odd/even?
The above attack strategy does not work well for odd bound
(we can’t separately guess the low-order bits and need to search 248/bound
values of seed
). However there are better attacks with less guesswork, applicable even if we raise the number of state bits considerably, and including for odd bound
. They are more difficult to explain (read: I hardly can get them working with a math package, and can’t explain how, yet; see this question).