Can someone kindly explain the casting behaviour I illustrated below?
Code
char charo = (char) -1; System.out.println(charo); System.out.println((short) charo); System.out.println((int) charo);
Expected Output
? -1 -1
Actual Output
? -1 65535
As observed, when -1 is cast to char
& back to short
, it remembers what it was (which is -1) whereas when cast to an int
it is 65535. I’d expected charo to be 65535 due to underflow upon casting to char
since char
only holds positive values.
Is there some sort of lazy casting behaviour I’m missing out? What is happening under the hood?
EDIT 1: Added expected output to illustrate my misconception
Advertisement
Answer
The int
value -1
is equivalent to 0xFFFF_FFFF
in the Two’s Complement representation. When casting it to a char
, you’re cutting off the upper bits, ending up at 0xFFFF
or rather 'uFFFF'
.
It’s important to keep in mind that when you do System.out.println(charo);
you’re ending up at a different method than the other print statements, as a char
does not only have a different value range than short
or int
, but also different semantics.
When you cast 0xFFFF
to short
, the value doesn’t change, but 0xFFFF
is exactly -1
in the 16 bit Two’s Complement representation. On the other hand, when you cast it to int
, the value gets zero extended to 0x0000_FFFF
which equals to 65535
.
That’s the way to explain it in terms of the short
, char
, and int
datatypes, but since you also asked “What is happening under the hood?”, it’s worth pointing out that this not how Java actually works.
In Java, all arithmetic involving byte
, short
, char
, or int
is done using int
. Even local variables of any of these types are actually int
variables on the bytecode level. In fact, the same applies to boolean
variables but the Java language does not allow us to exploit this for arithmetic.
So the code
char charo = (char)-1; System.out.println(charo); System.out.println((short)charo); System.out.println((int)charo);
actually compiles to the same as
int charo = (char)-1; System.out.println(charo); // but invoking println(char) System.out.println((short)charo); System.out.println(charo);
or
int charo = 0x0000_FFFF; System.out.println(charo); // but invoking println(char) System.out.println((short)charo); System.out.println(charo);
A said at the beginning, the first println
ends up at a different method, responsible for the different semantics. The compile-time type of the variable only matters insofar as it made the compiler select the different method.
When always maintaining all 32 bits of a value, a cast to char
has the effect of setting the upper 16 bits to zero. So the result of (char)-1
is 0x0000_FFFF
and this operation is even done at compile-time already. So the first statement assigns the constant 0xFFFF
to a variable.
The next statement invokes the println(char)
method. No conversion is involved at the caller’s side.
The other two invocations end up at println(int)
and here, the cast to short
is actually modifying the value. It has the effect of sign-extending a short
value to an int
value, which means, the 15th bit is copied over to the upper 16 bits. So for 0x...._FFFF
, the 15th bit is a one, so all upper bits are set to one, ending up at 0xFFFF_FFFF
, which is the int
value -1
when using Two’s Complement.
The final result is in line with the first explanation given above, reasoning about the value ranges of char
, short
, and int
. For a lot of scenarios, explanations on that level are sufficient. But you might notice that there is no println(short)
method, so to understand why println(int)
is sufficient for printing short
(or byte
) values, it’s necessary to know what’s really going on.