Skip to content
Advertisement

How to generate the same pseudorandom numbers in Java as in Numpy (for the same seed)?

Is there any option to generate identical random numbers in Java like in Numpy random, when using same seed (e.g. 12345).

In Numpy I get for code below output: 0.9296160928171479

from numpy.random import RandomState
rs = RandomState(12345)
rs.random()

In Java I get for code below output: 0.3618031071604718

import java.util.Random;
Random random = new Random(12345);
System.out.println(random.nextDouble());

I am comparing outputs of some methods in SciKit learn and my own library in Java. In order to generate same outputs I need to generate same randoms like Numpy does (SciKit learn uses Numpy random).

Advertisement

Answer

The legacy NumPy random generator uses the Mersenne Twister (MT) algorithm to generate random bits, and transforms or interpret them as the desired types, while Java uses a much simpler algorithm called LCG. To get the same set of raw bits with Java, you need to use Mersenne Twister for Java from Apache. MT generates unsigned 32-bit integers which then can be transformed into other ranges or distributions. If a number with 64-bit precision is needed, the implementation implicitly invokes the random number generator twice and combines the result.

However, It seems that the number representation in NumPy and Java are different, therefore it is not enough to get the same random bits.

Let’s use MT both in Java and python, and compare the results. First, take a look at the Java example code:

import org.apache.commons.math3.random.RandomGenerator;
import org.apache.commons.math3.random.MersenneTwister;

class QuickStart {
    public static void main(String[] args) {
        RandomGenerator prng = new MersenneTwister(0);
        for (int i = 0; i < 3; ++i) {
            long num = prng.nextLong();
            System.out.println(Long.toString(num) + "t" + Long.toBinaryString(num));
        }
        System.out.println();
        for (int i = 0; i < 3; ++i) {
            int num = prng.nextInt();
            System.out.println(Integer.toString(num) + "t" + Integer.toBinaryString(num));
        }
        System.out.println();
        for (int i = 0; i < 3; ++i) {
            double num = prng.nextDouble();
            System.out.println(Double.toString(num) + "t" + Long.toBinaryString(Double.doubleToRawLongBits(num)));
        }
    }
}

This produces the output

-8322921849960486353    1000110001111111000010101010110010010111110001001010101000101111
-5253828890213626688    1011011100010110101001100111010111011000001000011100110011000000
-7327722439656189189    1001101001001110101100110100001111011011101000100101001011111011

-1954711869     10001011011111010111011011000011
-656048793      11011000111001010111110101100111
1819583497      1101100011101001010010000001001

0.6235637015982585      11111111100011111101000011101111011101001010101100101010001000
0.38438170310239794     11111111011000100110011011010110110111000000000101101101110000
0.2975346131886989      11111111010011000010101100111010011110010001001011001111000100

Whereas the python example is:

import numpy
import bitstring


numpy.random.seed(0)
for i in range(3):
    state = numpy.random.get_state()
    num = numpy.random.randint(0, 2**64, dtype=numpy.uint64)
    print(num, bin(num), sep="t")
print()

for i in range(3):
    state = numpy.random.get_state()
    num = numpy.random.randint(0, 2**32, dtype=numpy.uint32)
    print(num, bin(num), sep="t")
print()

for i in range(3):
    state = numpy.random.get_state()
    num = numpy.random.random()
    f1 = bitstring.BitArray(float=num, length=64)
    print(num, f1.bin, sep="t")

which produces the output:

10123822223749065263    0b1000110001111111000010101010110010010111110001001010101000101111
13192915183495924928    0b1011011100010110101001100111010111011000001000011100110011000000
11119021634053362427    0b1001101001001110101100110100001111011011101000100101001011111011

2340255427      0b10001011011111010111011011000011
3638918503      0b11011000111001010111110101100111
1819583497      0b1101100011101001010010000001001

0.6235636967859723      0011111111100011111101000011101111011010100101010110010101000100
0.3843817072926998      0011111111011000100110011011010110111011100000000010110110111000
0.2975346065444723      0011111111010011000010101100111010010111001000100101100111100010

Result comparison

You can see that the raw numbers are the same. The format how the binary values are printed is slightly different: python uses 0b if the builtin bin() is used, and bitstring puts some binary values upfront, the leading zeros. Due to the signed/unsigned types difference for integers, the represented values are completely different, but for doubles, the representation difference is small. By seeing the pattern, one can write in Java or in python the interpreter of binary numbers to mimic the behaviour of other language.

Initial state

If you use another implementation of Mersenne Twister, pay attention to how the initial state is initialized. I don’t know if NumPy’s legacy random initialization were documented, but it seems it uses the same what Wikipedia writes (if a 32-bit integer is provided), which is a special case of the method suggested by Makoto Matsumoto et al., 2007 (check around eq. 30). There are other initialization possibilities used in other languages, and even in a given language, different groups may come up with different implementations (the C++ std Mersenne Twister implementation by MS VC++ and GCC or boost are different). Even NumPy’s new interface uses a different technique involving hash functions.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement