short - int - long type promotion: is there any overhead?

Question

If, for example, I define method&#8217;s return type/parameter as char, but both the caller and implementation actually immediately use it as an int, is there any overhead? If I understand correctly, the values on the stack are 32-bits aligned anyway, as are the &#8216;registers&#8217; (I&#8217;m sorry, I&#82…

Accepted Answer

The problem is that your question is essentially unanswerable.From the point of view of bytecode, yeah, there is overhead: You can use javap -c to &#8216;disassemble&#8217; class files (show the bytecode), and you&#8217;ll observe that type promotions are taken care of with an actual bytecode. For example, this:class Test {   void example() {        int a = 0;        long b = 0L;        foo(a);        foo(b);    }   void foo(long c) {}}then javap it&#8230;and it shows you that an I2L opcode is involved when the int is promoted to a long, whereas if you use a long directly, this bytecode isn&#8217;t &#8211; it&#8217;s one bytecode shorter.However &#8211; you can&#8217;t just extrapolate bytecode into machine code in this fashion. class files (bytecode) are extremely simple, entirely unoptimized constructs, and a JVM merely has to follow the JVM Specification&#8217;s rules, the JVMS does not as a rule specify timings and other behaviours.For example, in practice, JVMs execute all code quite slowly, just &#8216;stupidly&#8217; interpreting the bytecodes, and wasting extra time and memory doing some basic bookkeeping, like keeping track of which way a branch (an if) tends to go.Then if hotspot notices some method is invoked rather a lot, it will take some time, and use that bookkeeping, to produce finely tuned machine code. On CPUs where the fallthrough case is faster than the jump case*, it will use that bookkeeping on which way an if tends to go, to optimize so that the more common case gets the fallthrough. It will even unroll loops and do all sorts of amazing and far-reaching optimizations. After all, this is the 1% of the code that takes 99% of the time, so it is worth taking a relatively long time to produce optimized machine code.I don&#8217;t even know if the I2L by itself, even without hotspot getting involved, is taking significant time. It&#8217;s an instruction that can be done entirely in-register, it&#8217;s a single byte opcode, and what with pipelining CPUs working as they are, I bet in the vast majority of cases this costs literally 0 extra time, it&#8217;s snuck in between other operations. With hotspot involved, it may well end up optimized entirely out of the equation.So, the question then becomes, on the hardware you target, with the specific version of java you have (from oracle&#8217;s java8 to OpenJ9 14, there are many options here, it&#8217;s a combinatory explosion of CPUs, OSes, and JVM editions), how &#8216;bad&#8217; is it.Perhaps this is a generalized library and you&#8217;re targeting all of that (many versions, many OSes and CPUs), there are no easy answers: use tools like JMH to thoroughly test performance on many platforms &#8211; or assume that the overhead might matter on some exotic combination.But if you can limit the JVM and arch/OS down a lot, then this becomes much easier &#8211; just JMH your target deployment and now you know.For what its worth, I bet the promotion won&#8217;t end up costing enough to matter here (let alone, to show up in JMH at all).*) on the vast majority of CPUs, the only branch instruction available is &#8216;GOTO this place in the code IF some flag is set&#8217; &#8211; so to write an if, you first write GOTO a bunch ahead if condition, then the else code, which ends with GOTO the line after the if block, then the if code.NB: You can use some of the -XX parameters when starting the java executable to let it print out when it hotspots a certain method, and even ask it to print the machine code it produced, which you can then toss through a disassembler to see the code that really matters: What actually ends up running on your CPU. Even there an extra instruction may not cost anything significant due to CPU pipelining.NB2: On 32-bit architecture, longs in general are just more costly than ints by quite a big margin, but 32-bit architecture is few and far between these days, so I doubt that matters here.

short -> int -> long type promotion: is there any overhead?

Advertisement

Answer