If, for example, I define method’s return type/parameter as char
, but both the caller and implementation actually immediately use it as an int
, is there any overhead? If I understand correctly, the values on the stack are 32-bits aligned anyway, as are the ‘registers’ (I’m sorry, I’m not well versed in the byte code).
A word of explanation: I am writing low-level code for parsing and formatting binary streams. I need a representation of a single bit, used when indexing the stream to read and update individual bits. This is Scala and I am using a value class, that is a construct erased at compile time to a chosen java primitive type. This means that methods defined as:
class Bit(val toInt :Int) extends AnyVal @inline def +=(bit :Bit) = ... @inline def +=(int :Int) = ...
clash with each other at compilation, because they are both $plus$eq$(int)
in the byte code.
There are obviously ways around it, chief of them naming the methods differently, but I’d prefer to avoid it in case it doesn’t matter. An int
was a natural choice for a bit representation as it is the result of any bitwise operation, so the ‘conversion’ from word >> offset & 1
to a Bit
is a no-op and likewise, they can be dropped inside bitwise expressions without needing any, either. As you see, pretty fine grained stuff.
I won’t use a boolean
as there doesn’t seem to be any way around a conditional expression when converting to and from an int
, but I thought about a char
, which would be otherwise unused (i.e., there is no need for reading and writing a character, as they are much higher abstractions than I’m dealing with at this level).
So, does throwing in char
s into bitwise operations all the time affect things at all, or is it for example two orders of magnitude faster than a method call (as in the overhead of creating and popping an activation record)?
Advertisement
Answer
The problem is that your question is essentially unanswerable.
From the point of view of bytecode, yeah, there is overhead: You can use javap -c
to ‘disassemble’ class files (show the bytecode), and you’ll observe that type promotions are taken care of with an actual bytecode. For example, this:
class Test { void example() { int a = 0; long b = 0L; foo(a); foo(b); } void foo(long c) {} }
then javap it…
and it shows you that an I2L
opcode is involved when the int
is promoted to a long
, whereas if you use a long directly, this bytecode isn’t – it’s one bytecode shorter.
However – you can’t just extrapolate bytecode into machine code in this fashion. class files (bytecode) are extremely simple, entirely unoptimized constructs, and a JVM merely has to follow the JVM Specification’s rules, the JVMS does not as a rule specify timings and other behaviours.
For example, in practice, JVMs execute all code quite slowly, just ‘stupidly’ interpreting the bytecodes, and wasting extra time and memory doing some basic bookkeeping, like keeping track of which way a branch (an if
) tends to go.
Then if hotspot notices some method is invoked rather a lot, it will take some time, and use that bookkeeping, to produce finely tuned machine code. On CPUs where the fallthrough case is faster than the jump case*, it will use that bookkeeping on which way an if
tends to go, to optimize so that the more common case gets the fallthrough. It will even unroll loops and do all sorts of amazing and far-reaching optimizations. After all, this is the 1% of the code that takes 99% of the time, so it is worth taking a relatively long time to produce optimized machine code.
I don’t even know if the I2L by itself, even without hotspot getting involved, is taking significant time. It’s an instruction that can be done entirely in-register, it’s a single byte opcode, and what with pipelining CPUs working as they are, I bet in the vast majority of cases this costs literally 0 extra time, it’s snuck in between other operations. With hotspot involved, it may well end up optimized entirely out of the equation.
So, the question then becomes, on the hardware you target, with the specific version of java you have (from oracle’s java8 to OpenJ9 14, there are many options here, it’s a combinatory explosion of CPUs, OSes, and JVM editions), how ‘bad’ is it.
Perhaps this is a generalized library and you’re targeting all of that (many versions, many OSes and CPUs), there are no easy answers: use tools like JMH to thoroughly test performance on many platforms – or assume that the overhead might matter on some exotic combination.
But if you can limit the JVM and arch/OS down a lot, then this becomes much easier – just JMH your target deployment and now you know.
For what its worth, I bet the promotion won’t end up costing enough to matter here (let alone, to show up in JMH at all).
*) on the vast majority of CPUs, the only branch instruction available is ‘GOTO this place in the code IF some flag is set’ – so to write an if, you first write GOTO a bunch ahead if condition
, then the else
code, which ends with GOTO the line after the if block
, then the if code.
NB: You can use some of the -XX
parameters when starting the java
executable to let it print out when it hotspots a certain method, and even ask it to print the machine code it produced, which you can then toss through a disassembler to see the code that really matters: What actually ends up running on your CPU. Even there an extra instruction may not cost anything significant due to CPU pipelining.
NB2: On 32-bit architecture, longs in general are just more costly than ints by quite a big margin, but 32-bit architecture is few and far between these days, so I doubt that matters here.