I’ve just discovered the HDF5 format and I’m considering using it to store 3D data spread over a cluster of Java application servers. I have found out that there are several implementations available for Java, and would like to know the differences between them:
Java HD5 Interface (JHI5) The Java wrapper from the HDF group itself.
Nujan: Pure Java NetCDF4 and HDF5 writer (cannot read HDF5)
Most importantly, I would like to know:
How much of the native API is covered, any limitations that do not exist in the native API?
If there is support for “Parallel HDF5”?
Once my 3D data is loaded, do I get a “native call overhead” each time I access one element in a 3D array? That is, do the data actually gets turned into Java objects, or stay in “native/JNI memory”?
Is there any know stability problems with a particular implementation, since a crash in native code normally takes the whole JVM down?
Advertisement
Answer
HDF Java follows a layered approach:
JHI5 – the low level JNI wrappers: very flexible, but also quite tedious to use.
Java HDF object package – a high-level interface based on JHI5.
HDFView – a Java-based viewer application based on the Java HDF object package.
JHDF5 provides a high-level interface building on the JHI5 layer which provides most of the functionality of HDF5 to Java. The API has a shallow learning curve and hides most of the house-keeping work from the developer. You can run the Java HDF object package (and HDFView) on the JHI5 interface that is part of JHDF5, so the two APIs can co-exist within one Java program.
Permafrost and Nujan seem far from being complete at this point and Permafrost hasn’t seen a lot of activity recently, so they appear to be not the first choice at this point in time.
I think a good path for you is to have a look at both the Java HDF5 object package and JHDF5, decide which one of the two APIs fit your needs better and go with that one.
Disclaimer: I have worked on the JHDF5 interface, so I may be biased.