Skip to content
Advertisement

Why doesn’t traditional Java BIO api need direct buffer?

Since JDK 1.4, Direct Buffer was introduced along with Java NIO. One reason of it is Java GC may move the memory. Therefore the buffer data must be put off heap.

I’m wondering why traditional Java blocking IO api (BIO) doesn’t need a direct buffer? Does BIO use something like direct buffer internally, or are there some other mechanisms to avoid the “memory movement” problem?

Advertisement

Answer

The simple answer is: It doesn’t matter. Java has a clear, and public, spec. The JLS, the JVMS, and the javadoc of the core library. Java implementations do exactly what those 3 documents state, and you may trust that somehow it ‘works’. This isn’t as trite as it sounds, for example, the JMM (Java Memory Model, part of the JVMS if memory serves) lays out all sorts of things a JVM ‘may’ do in regards to re-ordering instructions and caching local writes, which is tricky, because it means if you mess that up, given that it is a ‘may’, a JVM may not actually bug out, even though your code is buggy, in that a JVM may do X, and if it does that, your code breaks; just that on your machine, at this time, with this song playing on your music player, the JVM chose never to do X, so you can’t observe the problem.

Fortunately, the BIO stuff mostly has no may in it.

Here is the basic outlay of BIO in java:

  • You call .read() .read(byte[]), or .read(byte[], off, len).

  • (This is no guarantee; an implementation detail; a JVM is not required to do it this way): The JVM will read ‘as much as is currently available’ (hence, .read(some100SizedByteArr) may read only 50 bytes, even though if you call read again it’ll read more bytes: 50 so happened to be ‘ready’ in the network buffer. Lots of folks get that wrong and think .read(byte[]) will fill the byte array if it can. Nope. That would make it impossible to write code that processes data as it comes in!

  • (Again, no guarantee): Given that byte arrays can be shoved around in memory, you’d think that’s a problem, but it really isn’t: That byte[] is guaranteed not to magically grow new bytes in it, there is no way with the BIO API to say: Just fill this array as the bytes fly in over the wire. The only way to fill that array is to call .read() on your inputstream. That is a blocking operation, and the JVM can therefore ‘deal with it’ as it pleases. Perhaps the native layer simply locks out the garbage collector until data is returned (this isn’t as pricey as it sounds; the .read() method, once at least 1 byte can be returned, returns quickly, it doesn’t wait for more data beyond the first byte, at least, that’s how most JVMs do it). Perhaps it will read the data into a cloned buffer that lives out of heap and blits it over into your array later (sounds inefficient, perhaps, but a JVM is free to do it this way). Possibly the JVM marks that byte array specifically as off-limits for movement of any sort but the GC just collects ‘around’ it. It doesn’t matter – a JVM can do whatever it wants. As long as it guarantees that `.read(byte[]):

  • Blocks until EOF is reached (in which case it returns -1), or at least 1 byte is available.

  • Fills the byte array with the bytes so returned.

  • Marks the inputstream as having ‘consumed’ all that you just got.

  • Returns a value representing how many bytes have been filled.

That’s sort of the point of java: The how is irrelevant. Had the how not been irrelevant, writing a JVM for a new platform could be either impossible or require full virtualization, making it incredibly slow. The docs give themselves some ‘may’ clauses exactly so that this can be avoided.

One place where may does show up in BIO: When you .interrupt() a thread that is currently locked in a BIO .write() call (and the bytes haven’t all been sent yet, let’s say the network is slow and you sent a big array), o a BIO .read() call (it blocks until at least 1 byte is available; let’s say the other side isn’t sending anything) – then what happens? The docs leave it out. It ‘may’ result in an IOException being thrown, thus ending the read/write call, with a message indicating you interrupted it. Or, .interrupt() does nothing, and it is in fact impossible to interrupt a thread frozen on a BIO call. Most JVMs do the exception thing (fortunately), but the docs leave room – if for whatever reason the underlying OS/arch don’t make that feasible, then a JVM is free not to do anything if you attempt to interrupt(). Conclusion: If you want to write proper ‘write once run anywhere’ code you can’t rely on the idea that you can .interrupt() BIO freezes.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement