How to generate the same pseudorandom numbers in Java as in Numpy (for the same seed)?

Question

Is there any option to generate identical random numbers in Java like in Numpy random, when using same seed (e.g. 12345). In Numpy I get for code below output: 0.9296160928171479 In Java I get for code below output: 0.3618031071604718 I am comparing outputs of some methods in SciKit learn and my own library i…

Accepted Answer

The legacy NumPy random generator uses the Mersenne Twister (MT) algorithm to generate random bits, and transforms or interpret them as the desired types, while Java uses a much simpler algorithm called LCG. To get the same set of raw bits with Java, you need to use Mersenne Twister for Java from Apache. MT generates unsigned 32-bit integers which then can be transformed into other ranges or distributions. If a number with 64-bit precision is needed, the implementation implicitly invokes the random number generator twice and combines the result.However, It seems that the number representation in NumPy and Java are different, therefore it is not enough to get the same random bits.Let&#8217;s use MT both in Java and python, and compare the results. First, take a look at the Java example code:import org.apache.commons.math3.random.RandomGenerator;import org.apache.commons.math3.random.MersenneTwister;class QuickStart {    public static void main(String[] args) {        RandomGenerator prng = new MersenneTwister(0);        for (int i = 0; i < 3; ++i) {            long num = prng.nextLong();            System.out.println(Long.toString(num) + "t" + Long.toBinaryString(num));        }        System.out.println();        for (int i = 0; i < 3; ++i) {            int num = prng.nextInt();            System.out.println(Integer.toString(num) + "t" + Integer.toBinaryString(num));        }        System.out.println();        for (int i = 0; i < 3; ++i) {            double num = prng.nextDouble();            System.out.println(Double.toString(num) + "t" + Long.toBinaryString(Double.doubleToRawLongBits(num)));        }    }}This produces the output-8322921849960486353    1000110001111111000010101010110010010111110001001010101000101111-5253828890213626688    1011011100010110101001100111010111011000001000011100110011000000-7327722439656189189    1001101001001110101100110100001111011011101000100101001011111011-1954711869     10001011011111010111011011000011-656048793      110110001110010101111101011001111819583497      11011000111010010100100000010010.6235637015982585      111111111000111111010000111011110111010010101011001010100010000.38438170310239794     111111110110001001100110110101101101110000000001011011011100000.2975346131886989      11111111010011000010101100111010011110010001001011001111000100Whereas the python example is:import numpyimport bitstringnumpy.random.seed(0)for i in range(3):    state = numpy.random.get_state()    num = numpy.random.randint(0, 2**64, dtype=numpy.uint64)    print(num, bin(num), sep="t")print()for i in range(3):    state = numpy.random.get_state()    num = numpy.random.randint(0, 2**32, dtype=numpy.uint32)    print(num, bin(num), sep="t")print()for i in range(3):    state = numpy.random.get_state()    num = numpy.random.random()    f1 = bitstring.BitArray(float=num, length=64)    print(num, f1.bin, sep="t")which produces the output:10123822223749065263    0b100011000111111100001010101011001001011111000100101010100010111113192915183495924928    0b101101110001011010100110011101011101100000100001110011001100000011119021634053362427    0b10011010010011101011001101000011110110111010001001010010111110112340255427      0b100010110111110101110110110000113638918503      0b110110001110010101111101011001111819583497      0b11011000111010010100100000010010.6235636967859723      00111111111000111111010000111011110110101001010101100101010001000.3843817072926998      00111111110110001001100110110101101110111000000000101101101110000.2975346065444723      0011111111010011000010101100111010010111001000100101100111100010Result comparisonYou can see that the raw numbers are the same. The format how the binary values are printed is slightly different: python uses 0b if the builtin bin() is used, and bitstring puts some binary values upfront, the leading zeros. Due to the signed/unsigned types difference for integers, the represented values are completely different, but for doubles, the representation difference is small. By seeing the pattern, one can write in Java or in python the interpreter of binary numbers to mimic the behaviour of other language.Initial stateIf you use another implementation of Mersenne Twister, pay attention to how the initial state is initialized. I don&#8217;t know if NumPy&#8217;s legacy random initialization were documented, but it seems it uses the same what Wikipedia writes (if a 32-bit integer is provided), which is a special case of the method suggested by Makoto Matsumoto et al., 2007 (check around eq. 30). There are other initialization possibilities used in other languages, and even in a given language, different groups may come up with different implementations (the C++ std Mersenne Twister implementation by MS VC++ and GCC or boost are different). Even NumPy&#8217;s new interface uses a different technique involving hash functions.

Advertisement

Answer

Result comparison

Initial state