Skip to content
Advertisement

Why the java.lang.CharacterDataLatin1#digit function in JDK generates the whole digit byte array?

In the source code, it says that:

Analysis has shown that generating the whole array allows the JIT to generate better code compared to a slimmed down array, such as one cutting off after ‘z’

So my questions are:

  1. What’s the root reason of this comment?
  2. Compared to the slimmed down array, how many performance can be improved?

Answer

This trick optimizes performance of the standard Character.digit method.

Most often, this method is called for Latin-1 characters, i.e. characters with the Unicode numerical values from 0 to 255. The implementation has the fast path that checks whether the character fits into a single byte and then delegates to CharacterDataLatin1.digit:

static final CharacterData of(int ch) {
    if (ch >>> 8 == 0) {     // fast-path
        return CharacterDataLatin1.instance;
    } else {
        return switch (ch >>> 16) {  //plane 00-16
            case 0 -> CharacterData00.instance;
            case 1 -> CharacterData01.instance;
            case 2 -> CharacterData02.instance;
            case 3 -> CharacterData03.instance;
            case 14 -> CharacterData0E.instance;
            case 15, 16 -> CharacterDataPrivateUse.instance; // Both cases Private Use
            default -> CharacterDataUndefined.instance;
        };
    }
}

In turn, CharacterDataLatin1.digit reads from the mentioned array:

int digit(int ch, int radix) {
    int value = DIGITS[ch];
    return (value >= 0 && value < radix && radix >= Character.MIN_RADIX
            && radix <= Character.MAX_RADIX) ? value : -1;
}

From the previous step the JVM knows that the value of ch is in 0-255 range. So, when the array size is 256, the JVM does not need to perform an array bounds check. Therefore, extending the array to 256 elements helps JIT compiler to eliminate a redundant conditional branch.

This optimization was done in the context of JDK-8196331. As the issue claims, the performance improvement on a certain microbenchmark was ~1.5x. See this discussion for details.

Advertisement