Skip to content
Advertisement

Does the number generated by this function obey uniform distribution ? [closed]

I have implemented a function Generating numbers in [0,x] by given string :

import org.apache.commons.codec.digest.DigestUtils;

public class BaseUtil {

    public static int getIntValue(String stringText, String x) {
        var modNum = new BigInteger(x);
        String sha256hex = DigestUtils.sha256Hex(stringText);
        var result = new BigInteger(sha256hex, 16);
        int intValue = result.mod(modNum).intValue();
        return intValue;
    }
}

Does the return intValue obey uniform distribution in [0,x] for much random string?

Advertisement

Answer

Does the method Generating int number in [0,x] by given string follow uniform distribution?

Almost. It is difficult to entirely eliminate non-uniformity … unless 2256 happens to be evenly divisible by x.

Note that you are actually generating a number in [0,x) … since the result cannot be x.


You also asked about more efficient implementations than the one in your question.

public class BaseUtil {

    public static int getIntValueV1(String stringText, int x) {
        if (x <= 0) {
            throw new InvalidArgumentException("x must be strictly positive");
        }
        MessageDigest digest = MessageDigest.getInstance("SHA-256");
        byte[] hash = digest.digest(
            stringText.getBytes(StandardCharsets.UTF_8));
        return new BigInteger(hash).mod(new BigInteger(x)).intValue()
    }

    public static int getIntValueV2(String stringText, int x) {
        if (x <= 0) {
            throw new InvalidArgumentException("x must be strictly positive");
        }
        MessageDigest digest = MessageDigest.getInstance("SHA-256");
        byte[] hash = digest.digest(
            stringText.getBytes(StandardCharsets.UTF_8));
        ByteBuffer buff = new ByteBuffer(hash);
        long number = 0;
        for (int i = 0; i < 8; i++) {
            number = number ^ buff.getLong();
        }
        return (int)(number % x);
    }
}

Preconditions:

  • Since the result of your method is an int, that implies that x must also be an int.

  • Since you want numbers in the range [0,x), that implies that x must be greater than zero.

Implementations:

I am using the standard MessageDigest class since it has been in Java SE since Java 5 (if not earlier).

The first version uses BigInteger to minimize the non-uniformity when we reduce the bytes into a number in the range [0, x)

The second version uses long arithmetic to compute the remainder. I think that means that the distribution might be a bit more non-uniform than the first version. (My math skills are too rusty …) It would also be possible to eliminate the use of ByteBuffer to convert the bytes to a sequence of longs.

I have not benchmarked these versions to see which is faster. But both should be faster than producing an intermediate hexadecimal string and parsing it.

Note that you could probably get away with using a less expensive hashing algorithm, depending on the actual use-case for this generator.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement