Skip to content

HDF5 in Java: What are the difference between the availabe APIs?

I’ve just discovered the HDF5 format and I’m considering using it to store 3D data spread over a cluster of Java application servers. I have found out that there are several implementations available for Java, and would like to know the differences between them:

Most importantly, I would like to know:

  • How much of the native API is covered, any limitations that do not exist in the native API?

  • If there is support for “Parallel HDF5”?

  • Once my 3D data is loaded, do I get a “native call overhead” each time I access one element in a 3D array? That is, do the data actually gets turned into Java objects, or stay in “native/JNI memory”?

  • Is there any know stability problems with a particular implementation, since a crash in native code normally takes the whole JVM down?



HDF Java follows a layered approach:

  • JHI5 – the low level JNI wrappers: very flexible, but also quite tedious to use.

  • Java HDF object package – a high-level interface based on JHI5.

  • HDFView – a Java-based viewer application based on the Java HDF object package.

JHDF5 provides a high-level interface building on the JHI5 layer which provides most of the functionality of HDF5 to Java. The API has a shallow learning curve and hides most of the house-keeping work from the developer. You can run the Java HDF object package (and HDFView) on the JHI5 interface that is part of JHDF5, so the two APIs can co-exist within one Java program.

Permafrost and Nujan seem far from being complete at this point and Permafrost hasn’t seen a lot of activity recently, so they appear to be not the first choice at this point in time.

I think a good path for you is to have a look at both the Java HDF5 object package and JHDF5, decide which one of the two APIs fit your needs better and go with that one.

Disclaimer: I have worked on the JHDF5 interface, so I may be biased.