Why did Java 9 introduce the JMOD file format?

Tags: , , ,



Java 9 has three ways to package compiled code in files:

  • JAR
  • JMOD
  • JIMAGE

JIMAGE is optimized for speed and space and used by the JVM at runtime so it makes sense why JIMAGE was introduced. JIMAGE files are not supposed to be published to maven repos or used at compile or link time.

The docs claim that JMOD can store native code and other things that can’t be stored by JAR files and that developers can make and distribute their own JMOD files. The JDK ships with jmods/ directory containing all the modules of the JDK for users to depend on.

Questions:

  • Why did Java 9 introduce the JMOD file format?
  • Should a library author distribute a JMOD file or a JAR file or both?
  • Should jmod files be published to maven repos?

Answer

The purpose of JMODs are not well documented and existing documentation is rather sparse. Here is an in-depth explanation of system, from my understanding.

A warning: Parts of this answer are rather long, verbose, partially redundant, and a tough read. Constructive, structural, or grammatical edits are more than welcome to improve readability for future readers.


Short(er) Answer

Java 9’s new module system, Project Jigsaw, introduces the notion of a new optional link time phase, which occurs when the using the CLI tool jlink to build a custom space-optimized JRE. jlink bundles all explicit/transitive JAR modules/JMOD dependencies into a minified JRE; all other unreachable dependencies in the dependency graph (starting from specified root modules) are not bundled into the built JRE. As of JDK 9+, all of Java’s standard library has been broken up into JMODs, located at <jdk>/jmods.

Whereas JARs can only contain .class and resource files, JMODs (i.e. .jmod files) contain additional files that are consumed specifically in the new optional link time phase to customize the JRE (e.g. executables, native libraries, configurations, legal licenses, etc). These additional files are not available as resources at run time in the classpath, but are instead installed under various locations in the built JRE (e.g. executables and native libraries are placed under <jre>/bin). From the relevant bundled JARs and JMODs dependencies, classes and file resources will be written into a single optimized JIMAGE file, located at <jre>/lib/modules (replacing <jre>/lib/rt.jar in Java 8 and prior versions). The role of JMODs is at compile time and link time, and are not designed to be used at run time.

For the average library/application, only JARs should be built and pushed, instead of JMODs; only under certain conditions will JMODs offer critical functionality that is needed during the link time phase. At the time of writing, Maven does not appear to offer strong support for JMODs beyond the alpha release plugin org.apache.maven.plugins:maven-jmod-plugin.


Long Answer

This long-winded answer is more complexly motivated and sheds some light into how the new module system fundamentally operates. There is a strong emphasis throughout this post on the CLI tool jlink, since JMODs are designed specifically for this new optional link time phase that the tool introduces.

The Introduction of Project Jigsaw

Java 9 introduced Project Jigsaw in ‘JEP 261: Module System‘, a novel module system that can be used to minimize startup times and the size of JREs. As part of this release, the CLI utilities jmod, jimage, and jlink were introduced along with new file formats for JMODs/.jmods (ZIP-based) and JIMAGEs/.jimages.

A significant takeaway of this new module system is that the CLI tool jlink enables developers to build a custom JRE that contains only relevant standard library and external dependencies for their applications. This introduces a new notion of an optional link time phase between the traditional phases in the compile time -> run time pipeline.

For an example of the advantages of using jlink, a minimalist JRE built from JDK 15 with only the java.base module comes out to roughly ~40MB in size, in stark juxtaposition to JDK 15’s ~310MB size. This is especially useful for shipping a minimal custom JRE, such as for lean Docker images. The new module system brings significant benefits to the Java ecosystem that have been discussed at length elsewhere, and are thus not further elaborated in detail here.

The 3 J’s: JARs, JMODs, and JIMAGEs

The high level description of a JARs, JMODs, and JIMAGEs do not quickly lend themselves to an explanation that strongly differentiates between the roles of the three file formats. Here is a non-exhaustive overview of the purposes of each:

  • JARs: The classical format based on the ZIP file format for bundling classes and resources into the classpath at run time. This is the de-facto mainstream standard set forth since JDK 1.1 in 1997. JARs can be added to the classpath with the java -cp/-classpath flags. Almost every library or dependency has, is, and will be using this format, so it is glossed over in this section.

  • JMODs: A new format based on the ZIP file format for bundling the same contents that a JAR can contain, but with support for additional files (e.g. executables, native libraries, configurations, legal licenses, etc) that are consumed at the optional link time phase when building a custom JRE. JMODs are designed to be used at both compile time and link time, but not at run time. This new format was likely introduced (instead of extending JARs) because there is a special meaning for directories within this new archive-based format that is not backwards compatible with JARs that already use the same directory names.

    • A JMOD can be constructed from a JAR module (i.e. contains a valid module-info.class) with the CLI tool jmod.
    • As of JDK 9 and onward, all Java standard modules are stored under <jdk>/jmods in a JDK installation.
    • JMODs can be published for use by other developers and upstream applications; at the time of writing, I am unsure if JMODs may be pushed to Maven repositories, but various sources seem to indicate likely not for the time being.
    • JMOD classes and resources cannot be used at run time in the classpath with the java -cp/-classpath flags, as the classes and resources inside of the JMOD archive are stored under classes and not in the archive root.

Note: There may be a way to add easily JMODs to the classpath at run time; however, research did not explicitly state any functionality relating to this. Merely adding a JMOD to the classpath will not be sufficient for using the classes and resources. A custom ClassLoader could be used to resolve class and resource files correctly in the JMOD archive at run time, however; this is generally not recommended and is not the purpose of JMODs.

  • JIMAGEs: A special file format introduced in ‘JEP 220: Modular Run-Time Images‘ that is a runtime image containing all necessary classes and resources for a JRE (i.e. the standard library). Prior to JRE/JDK 9, a single large non-modular uber JAR was used, located at <jre>/lib/rt.jar; it has since been removed in favor of a single optimized JIMAGE stored located at <jre>/lib/modules. This format is not based on the ZIP format and uses a custom format that is significantly more time and space efficient than the original legacy JAR format, reducing startup times.
    • When building a custom JRE image with the CLI tool jlink, all relevant (explicit or transistive) module dependencies’ classes and resources (from JAR modules or JMODs) are compiled into a single optimized JIMAGE file (again, stored under <jre>/lib/modules).
    • The JIMAGE file format is modular and can be created, modified, disassembled, or inspected with the CLI tool jimage. E.g. jimage list $JAVA_HOME/lib/modules
    • JIMAGEs should generally not be be published, but instead shipped with a specific custom JRE version; the file format may be subject to changes in the future.

The Substance: Detailed Purpose of JMOD

A New, Optional Link Time Phase

As stated a few times previously, the CLI tool jlink introduces a new optional stage in the normal Java pipeline – the link time phase. This link time phase is used to generate a custom built JRE from a set of Java 9 modules (either a JAR with a module-info.java descriptor or a JMOD).

The high level stages are briefly described as follows:

  • compile time (javac): As described on the javac documentation, the compile time phase…

    …reads class and interface definitions, written in the Java programming language, and compiles them into bytecode class files. It can also process annotations in Java source files and classes.

  • link time (jlink): As described on ‘JEP 282: jlink: The Java Linker‘, the link time phase is…

    …an optional phase between the phases of compile time (the javac command) and run-time (the java run-time launcher). Link time requires a linking tool that will assemble and optimize a set of modules and their transitive dependencies to create a run-time image or executable.

    Link time is an opportunity to do whole-world optimizations that are otherwise difficult at compile time or costly at run-time. An example would be to optimize a computation when all its inputs become constant (i.e., not unknown). A follow-up optimization would be to remove code that is no longer reachable.

  • run time (java): As described on the javac documentation, the run time phase …

    …starts a Java application. It does this by starting the Java Runtime Environment (JRE), loading the specified class, and calling that class’s main() method.

Introduction of JMODs

During the link time phase, all classes and resources from modules (valid JAR modules or form JMODs’ classes) are compiled into a single optimized JIMAGE runtime image located at <jre>/lib/modules. Modules not explicitly or transitively included will not be included into this final JIMAGE, saving a significant amount of space. However, when building a custom JRE, some additional files might be necessary inside of the JRE; e.g. executable commands or native libraries. For JAR modules, the story ends here – this is no way for a JAR to add files (beyond the classes included in the JIMAGE) into the built JRE without ambiguities.

Introducing JMODs: JMODs have the ability to add additional files into the custom built JRE; some examples (but not necessarily exhaustive): executable commands, configuration files, header files, legal notices and licenses, native libraries, and manual pages. This allows a module dependency to shape the built JRE in its own way. The behavior of how these additional files are inserted into the built JRE by the CLI tool jlink are documented within the next section.

JMODs are destined for solely for the compile time and link time phases, as described in ‘JEP 261: Module System‘:

JMOD files can be used at compile time and link time, but not at run time. To support them at run time would require, in general, that we be prepared to extract and link native-code libraries on-the-fly. This is feasible on most platforms, though it can be very tricky, and we have not seen many use cases that require this capability, so for simplicity we have chosen to limit the utility of JMOD files in this release.

The New Format – No Backwards Compatibility with JARs

A good question might be “why not enable JARs to add link-time behavior?”. A sneaking suspicion here is that this does not enable sufficient backwards-compatibility support with existing JARs and tooling. There is no specification for reserved filenames in the JAR archive file format. If an existing library stores any resources under the directories intended for link time, jlink could not accurately guess whether it is meant to be consumed during link time or needed at run time. A new file format specification with reserved directory names would resolve this clashing issue – such as the new JMOD format. With JMODs, there is no ambiguity about what resources are designated for link time and run time. Furthermore, the JMOD format can be also extended to add new functionalities in later JDK versions, without backwards-compatibility issues.

The JMOD file format is similar to a JAR in that it is based on the ZIP file format. A JMOD file has the following reserved directory names with the following behavior (this is not necessarily an exhaustive list!):

  • bin (--cmds): Executable commands that are copied to <jre>/bin
  • classes (--class-path): Intended for including into the final built JIMAGE, stored at `/lib/modules
  • conf (--config): Additional configurations copied to <jre>/conf; likely used to control configuration for any bundled modules, if required
  • include (--header-files): Additional C header files that are copied to <jre>/include/ for building C libraries for the JVM using JNI; e.g. in java.base, the JNI interfaces are exported
  • legal (--legal-notices): Legal notices and licenses for the module that are copied to <jre>/legal/<module name>/
  • lib (--libs): Native libraries that are copied to <jre>/bin

For the curiously inclined, standard library JMODs (located under $JAVA_HOME/jmods in a JDK 9+) can be inspected with any application that reads ZIP archives.

Mainstream Support…?

A significant part of the reason that JMODs have not been rapidly adopted and have poor documentation availability is that, quite simply put, they are not necessary for the vast majority of libraries and module dependencies. While they still are useful for specific use cases, modules should use the JAR format that already has mainstream support since it was defined with JDK 1.1 in 1997 (with module-info.java module support added with JDK 9 in 2017).

From the documentation of the CLI tool jmod:

For most development tasks, including deploying modules on the module path or publishing them to a Maven repository, continue to package modules in modular JAR files. The jmod tool is intended for modules that have native libraries or other configuration files or for modules that you intend to link, with the jlink tool, to a runtime image.

An opinion: JMODs will likely not gain any significant adoption by developers for at least a very long time. Most developers will never hear or know the purpose of a JMOD – nor will they need to. JMODs serve a critical purpose behind the scenes for building JREs (all of the Java standard library modules are JMODs), but do not affect the vast majority of applications and projects due to their niche use case at link time. Java 9 was released in 2017 and dependencies in the Java ecosystem still struggle to reliably have a module-info.class descriptor to make a JAR a valid fully-fledged module…

Takeaways

  • JMODs are a fundamental new feature for creating JREs with the CLI tool jlink that enables customizing the custom built JRE with additional files.
  • Deploy JARs instead of JMODs, unless some features from JMODs are specifically needed. JAR modules are also compatible with jlink, so it is not necessary to ship a JMOD that only includes classes and resources. Ecosystem support and tooling is not necessarily going to adopt JMODs anytime soon and will certainly have compatibility issues for years to come.
  • Java documentation for this area of the ecosystem could really use some improvement.

Disclaimer

At the time of writing this answer, there was sparse documentation on the purpose of JMODs for Java 9 and onward. In fact, the Google search phrases “java jmods” and “jmod format” bring this very same StackOverflow question as the second search hit result. Therefore, some aspects may not be accurately explained, but are generally “directionally correct”; furthermore, it may not paint the full picture. If you find any issues or caveats, leave a comment and I will try to reconcile it with this answer.



Source: stackoverflow