Skip to content
Advertisement

Weird Lazy Casting Behaviour for character in Java 11

Can someone kindly explain the casting behaviour I illustrated below?

Code

char charo = (char) -1;
System.out.println(charo);
System.out.println((short) charo);
System.out.println((int) charo);

Expected Output

?
-1
-1

Actual Output

?
-1
65535

As observed, when -1 is cast to char & back to short, it remembers what it was (which is -1) whereas when cast to an int it is 65535. I’d expected charo to be 65535 due to underflow upon casting to char since char only holds positive values.

Is there some sort of lazy casting behaviour I’m missing out? What is happening under the hood?

EDIT 1: Added expected output to illustrate my misconception

Advertisement

Answer

The int value -1 is equivalent to 0xFFFF_FFFF in the Two’s Complement representation. When casting it to a char, you’re cutting off the upper bits, ending up at 0xFFFF or rather 'uFFFF'.

It’s important to keep in mind that when you do System.out.println(charo); you’re ending up at a different method than the other print statements, as a char does not only have a different value range than short or int, but also different semantics.

When you cast 0xFFFF to short, the value doesn’t change, but 0xFFFF is exactly -1 in the 16 bit Two’s Complement representation. On the other hand, when you cast it to int, the value gets zero extended to 0x0000_FFFF which equals to 65535.

That’s the way to explain it in terms of the short, char, and int datatypes, but since you also asked “What is happening under the hood?”, it’s worth pointing out that this not how Java actually works.

In Java, all arithmetic involving byte, short, char, or int is done using int. Even local variables of any of these types are actually int variables on the bytecode level. In fact, the same applies to boolean variables but the Java language does not allow us to exploit this for arithmetic.

So the code

char charo = (char)-1;
System.out.println(charo);
System.out.println((short)charo);
System.out.println((int)charo);

actually compiles to the same as

int charo = (char)-1;
System.out.println(charo); // but invoking println(char)
System.out.println((short)charo);
System.out.println(charo);

or

int charo = 0x0000_FFFF;
System.out.println(charo); // but invoking println(char)
System.out.println((short)charo);
System.out.println(charo);

A said at the beginning, the first println ends up at a different method, responsible for the different semantics. The compile-time type of the variable only matters insofar as it made the compiler select the different method.

When always maintaining all 32 bits of a value, a cast to char has the effect of setting the upper 16 bits to zero. So the result of (char)-1 is 0x0000_FFFF and this operation is even done at compile-time already. So the first statement assigns the constant 0xFFFF to a variable.

The next statement invokes the println(char) method. No conversion is involved at the caller’s side.

The other two invocations end up at println(int) and here, the cast to short is actually modifying the value. It has the effect of sign-extending a short value to an int value, which means, the 15th bit is copied over to the upper 16 bits. So for 0x...._FFFF, the 15th bit is a one, so all upper bits are set to one, ending up at 0xFFFF_FFFF, which is the int value -1 when using Two’s Complement.

The final result is in line with the first explanation given above, reasoning about the value ranges of char, short, and int. For a lot of scenarios, explanations on that level are sufficient. But you might notice that there is no println(short) method, so to understand why println(int) is sufficient for printing short (or byte) values, it’s necessary to know what’s really going on.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement