# Finding Java.util.Random seed from bounded nextInt(int bound) results

### Background

I’ve been reading and trying to wrap my head around various questions/answers that relate to finding the seed from Java.util.Random given its output from `nextInt()`.

The implementation of `nextInt(int bound)` is:

```public int nextInt(int bound) {
if (bound <= 0)
throw new IllegalArgumentException("bound must be positive");

if ((bound & -bound) == bound)  // i.e., bound is a power of 2
return (int)((bound * (long)next(31)) >> 31);

int bits, val;
do {
bits = next(31);
val = bits % bound;
} while (bits - val + (bound-1) < 0);
return val;
}
```

The implementation of `next(int bits)` is:

```protected int next(int bits) {
long oldseed, nextseed;
AtomicLong seed = this.seed;
do {
oldseed = seed.get();
} while (!seed.compareAndSet(oldseed, nextseed));
return (int)(nextseed >>> (48 - bits));
}
```

where the multiplier is `0x5DEECE66DL`, the addend is `0xBL`, and the mask is `(1L << 48) - 1`. These are hexadecimal values where the L is Java convention for `long` conversion.

By calling `nextInt()` without a bound, the full 32-bits are returned from `next(32)` instead of dropping bits with `bits % bound`.

### Questions

1. Without completely bruteforcing the full 248 possibilities, how would I go about finding the current seed after x amount of calls to `nextInt(n)` (assuming the bound is never a power of 2)? For example, let’s assume I want to find the seed given 10 calls to `nextInt(344)` [251, 331, 306, 322, 333, 283, 187, 54, 170, 331].

2. How can I determine the amount of data I’d need to find the correct seed, not just another one that produces the same starting data?

3. Does this change given bounds that are either odd/even?

Without completely bruteforcing the full 248 possibilities, how would I go about finding the current seed after x amount of calls to `nextInt(n)` (assuming the bound is never a power of 2)?

Let’s first remove code that is here for multi-threading, error testing, and `bound` a power of two. Things boil down to

```public int nextInt_equivalent(int bound) {
int bits, val;
do {
seed = (seed * multiplier + addend) & mask; // 48-bit LCG
bits = seed >> 17;                          // keep the top 31 bits
val = bits % bound;                         // build val in [0, bound)
} while (bits - val + (bound-1) < 0);
return val;
}
```

Next we must understand what the `while (bits - val + (bound-1) < 0)` is about. It’s here to use `bits` only when it is in an interval of width multiple of `bound`, thus insuring a uniform distribution of `val`. That interval is [0, `(1L<<31)/bound*bound`).
The while condition is equivalent to `while (bits >= (1L<<31)/bound*bound)`, but executes faster. This condition occurs for the `(1L<<31)%bound` highest values of `bits` out of `1L<<31`. When `bound` is 344, this occurs for 8 values of `bits` out of 231, or about 3.7 per US billion.

This is so rare that one reasonable approach is to assume it does not occur. Another is to hope that it occurs, and test if (and when) it does by seeing if the seeds that cause that rare event lead to a sequence of `val` found in the givens. We have only `((1L<<31)%bound)<<17` (here slightly above a million) values of `seed` to test, which is quite feasible. Whatever, in the rest I assume this is ruled out, and consider the generator without the while.

When `bound` is even, or more generally a multiple of 2S for some S>0, observe that the low-order S bits of the output `val` (which we can find) are also the low-order S bits of `bits`, and thus the bits of rank [17, 17+S) of `seed`. And that the low 17+S bits of seed are fully independent of the 31-S other ones. When `bound` is 344=8×43, we have S=3, and thus we can attack the low-order 17+S=20 low-order bits of `seed` independently. We directly get S=3 bits of `seed` from the first `val`.

We get the 17 low-order bits of `seed` by elimination: for each of 217 candidates, and given the S=3 bits we know, does the 17+S=20 bits of `seed` lead to a sequence of `val` which low-order S bits match the given sequence? With enough values we can make a full determination of the 17+S bits. We need ⌈17/S+1⌉ = 7 `val` to narrow down to a single value for the 17+S low-order bits of `seed`in this way. If we got less, we need to keep several candidates in the next step. In the question we have ample `val` to narrow to a single value, and be convinced we get it right.

And then when we have these 17+S=20 bits of `seed`, we can find the remaining 31-S=28 with a moderate amount of brute force. We could test 228 values of the yet unknown bits for `seed` and check which gives a full match with the known `val`. But better: we know `seed % (bound<<17)` exactly, and thus need only test 231/`bound` values of `seed` (here roughly 6 million).

How can I determine the amount of data I’d need to find the correct seed?

A working heuristic for all but pathological LCGs, and many other PRNGs, is that you need as much information as there are bits in the state, thus 48 bits. Each output gives you log2(`bound`) bits, thus you need ⌈48/log2(`bound`)⌉ values, here 6 (which would require keeping track of a few candidates for the low 20 bits of `seed` and thus require correspondingly more work in the second phase) . Extra values give confidence that the actual state was recovered, but AFAIK wrong guesses just will not happen unless the `while` comes into play.

Does this change given bounds that are either odd/even?

The above attack strategy does not work well for odd `bound` (we can’t separately guess the low-order bits and need to search 248/`bound` values of `seed`). However there are better attacks with less guesswork, applicable even if we raise the number of state bits considerably, and including for odd `bound`. They are more difficult to explain (read: I hardly can get them working with a math package, and can’t explain how, yet; see this question).