Is there any option to generate identical random numbers in Java like in Numpy random, when using same seed (e.g. 12345).
In Numpy I get for code below output: 0.9296160928171479
from numpy.random import RandomState rs = RandomState(12345) rs.random()
In Java I get for code below output: 0.3618031071604718
import java.util.Random; Random random = new Random(12345); System.out.println(random.nextDouble());
I am comparing outputs of some methods in SciKit learn and my own library in Java. In order to generate same outputs I need to generate same randoms like Numpy does (SciKit learn uses Numpy random).
Advertisement
Answer
The legacy NumPy random generator uses the Mersenne Twister (MT) algorithm to generate random bits, and transforms or interpret them as the desired types, while Java uses a much simpler algorithm called LCG. To get the same set of raw bits with Java, you need to use Mersenne Twister for Java from Apache. MT generates unsigned 32-bit integers which then can be transformed into other ranges or distributions. If a number with 64-bit precision is needed, the implementation implicitly invokes the random number generator twice and combines the result.
However, It seems that the number representation in NumPy and Java are different, therefore it is not enough to get the same random bits.
Let’s use MT both in Java and python, and compare the results. First, take a look at the Java example code:
import org.apache.commons.math3.random.RandomGenerator; import org.apache.commons.math3.random.MersenneTwister; class QuickStart { public static void main(String[] args) { RandomGenerator prng = new MersenneTwister(0); for (int i = 0; i < 3; ++i) { long num = prng.nextLong(); System.out.println(Long.toString(num) + "t" + Long.toBinaryString(num)); } System.out.println(); for (int i = 0; i < 3; ++i) { int num = prng.nextInt(); System.out.println(Integer.toString(num) + "t" + Integer.toBinaryString(num)); } System.out.println(); for (int i = 0; i < 3; ++i) { double num = prng.nextDouble(); System.out.println(Double.toString(num) + "t" + Long.toBinaryString(Double.doubleToRawLongBits(num))); } } }
This produces the output
-8322921849960486353 1000110001111111000010101010110010010111110001001010101000101111 -5253828890213626688 1011011100010110101001100111010111011000001000011100110011000000 -7327722439656189189 1001101001001110101100110100001111011011101000100101001011111011 -1954711869 10001011011111010111011011000011 -656048793 11011000111001010111110101100111 1819583497 1101100011101001010010000001001 0.6235637015982585 11111111100011111101000011101111011101001010101100101010001000 0.38438170310239794 11111111011000100110011011010110110111000000000101101101110000 0.2975346131886989 11111111010011000010101100111010011110010001001011001111000100
Whereas the python example is:
import numpy import bitstring numpy.random.seed(0) for i in range(3): state = numpy.random.get_state() num = numpy.random.randint(0, 2**64, dtype=numpy.uint64) print(num, bin(num), sep="t") print() for i in range(3): state = numpy.random.get_state() num = numpy.random.randint(0, 2**32, dtype=numpy.uint32) print(num, bin(num), sep="t") print() for i in range(3): state = numpy.random.get_state() num = numpy.random.random() f1 = bitstring.BitArray(float=num, length=64) print(num, f1.bin, sep="t")
which produces the output:
10123822223749065263 0b1000110001111111000010101010110010010111110001001010101000101111 13192915183495924928 0b1011011100010110101001100111010111011000001000011100110011000000 11119021634053362427 0b1001101001001110101100110100001111011011101000100101001011111011 2340255427 0b10001011011111010111011011000011 3638918503 0b11011000111001010111110101100111 1819583497 0b1101100011101001010010000001001 0.6235636967859723 0011111111100011111101000011101111011010100101010110010101000100 0.3843817072926998 0011111111011000100110011011010110111011100000000010110110111000 0.2975346065444723 0011111111010011000010101100111010010111001000100101100111100010
Result comparison
You can see that the raw numbers are the same. The format how the binary values are printed is slightly different: python uses 0b if the builtin bin()
is used, and bitstring puts some binary values upfront, the leading zeros. Due to the signed/unsigned types difference for integers, the represented values are completely different, but for doubles, the representation difference is small. By seeing the pattern, one can write in Java or in python the interpreter of binary numbers to mimic the behaviour of other language.
Initial state
If you use another implementation of Mersenne Twister, pay attention to how the initial state is initialized. I don’t know if NumPy’s legacy random initialization were documented, but it seems it uses the same what Wikipedia writes (if a 32-bit integer is provided), which is a special case of the method suggested by Makoto Matsumoto et al., 2007 (check around eq. 30). There are other initialization possibilities used in other languages, and even in a given language, different groups may come up with different implementations (the C++ std Mersenne Twister implementation by MS VC++ and GCC or boost are different). Even NumPy’s new interface uses a different technique involving hash functions.