In the source code, it says that:
Analysis has shown that generating the whole array allows the JIT to generate better code compared to a slimmed down array, such as one cutting off after ‘z’
So my questions are:
- What’s the root reason of this comment?
- Compared to the slimmed down array, how many performance can be improved?
Advertisement
Answer
This trick optimizes performance of the standard Character.digit
method.
Most often, this method is called for
Latin-1 characters, i.e. characters with the Unicode numerical values from 0 to 255. The implementation has the fast path that checks whether the character fits into a single byte and then delegates to CharacterDataLatin1.digit
:
static final CharacterData of(int ch) { if (ch >>> 8 == 0) { // fast-path return CharacterDataLatin1.instance; } else { return switch (ch >>> 16) { //plane 00-16 case 0 -> CharacterData00.instance; case 1 -> CharacterData01.instance; case 2 -> CharacterData02.instance; case 3 -> CharacterData03.instance; case 14 -> CharacterData0E.instance; case 15, 16 -> CharacterDataPrivateUse.instance; // Both cases Private Use default -> CharacterDataUndefined.instance; }; } }
In turn, CharacterDataLatin1.digit
reads from the mentioned array:
int digit(int ch, int radix) { int value = DIGITS[ch]; return (value >= 0 && value < radix && radix >= Character.MIN_RADIX && radix <= Character.MAX_RADIX) ? value : -1; }
From the previous step the JVM knows that the value of ch
is in 0-255 range. So, when the array size is 256, the JVM does not need to perform an array bounds check. Therefore, extending the array to 256 elements helps JIT compiler to eliminate a redundant conditional branch.
This optimization was done in the context of JDK-8196331. As the issue claims, the performance improvement on a certain microbenchmark was ~1.5x. See this discussion for details.