What Modules Are About

When the subject of Java modules comes up in online discussions every so often, I get the sense that many people misunderstand what it is that modules are supposed to do. That’s understandable, as the name “modules” does not evoke the precise purpose of the feature. There are many kinds of modules, even in the Java world, each concerned with a different kind of software modularity and providing a different set of features designed for different goals. Java’s module system focuses on specific aspects of modularity, and its features are unique.

In the past I tried explaining the different roles of the module system and build tools using an analogy from construction. Like blueprints, Java modules say, “a window of such-and-such size goes here,” while the build system, like a bill of materials, says “our project uses windows of such-and-such make and model.” The build tool is in charge of obtaining the right parts of the right version, while the module system says how they’re put together. But, while helpful in understanding the philosophy of Java modules and perhaps why there’s a small overlap between the module declaration and the build file (they both mention “window”), this explanation is abstract and doesn’t clarify what it is, exactly, that modules do.

In this post I’ll try to be as concrete as I can.

The Core Principles

A Java module is a set of packages that declares which of them form an API accessible to other modules and which are internal and encapsulated — similar to how a class defines the visibility of its members. A module also declares what other modules it requires for its operation. A library author can choose to place it in a module, an application author can choose to place its components in modules, and, of course, the JDK itself is made of modules.

What do Java modules do? They enforce two sets of guarantees at runtime:

  1. Reliable Configuration: That all the application’s components are in hand, that no more than one instance of each class is present, and, optionally, that they are the same ones as those used at build-time,
  2. Strong Encapsulation: That no code is accessed by other modules, be it directly or through reflection, unless explicitly authorised by the module declaration, its code, or by the application (through command-line flags).

The module system is not concerned with picking and obtaining the right components for your application from some universal catalogue — that’s a job for build tools — nor is architectural cleanliness for its own sake its primary purpose. It exists to make those important guarantees I mentioned, and are unique to the module system, at runtime.

Virtually every element of the module system serves one or both. The module declaration’s requires and uses clauses? Reliable configuration. The exports and opens clauses? Strong encapsulation. The provides clause? Both. Layers service reliable configuration, while lookups have been retrofitted to service strong encapsulation, and jlink serves reliable configuration among other things, and even though it’s used to create runtimes for any Java application, modularised or otherwise, modules let you use jlink more easily and reliably to produce a compact, faster-starting runtime.

To what extent people care about these guarantees varies considerably. If you’re not in charge of maintenance, or security, or deployment, or maybe even if you are but your software is small enough that these jobs are easy without modules, you might legitimately not care. But, as we’ll see, for some software projects, including the JDK itself but certainly not limited to it, these are very important guarantees, and the bigger and more popular the software, the more vital they become.

I hope that even if you choose not to author and deploy your own modules, this post will help you understand their importance. And while only code that is already structured in a modular way can be put into modules, and so big old codebases are hard to refactor appropriately, new unencumbered code is easily modularised. If you’re a team lead or an architect of a big new project, or just hope that your software will grow big and/or popular someday, you might well reap great dividends from modules for little effort. But whether you author your own modules or not, use the module path or not — if you use Java, you already heavily use the module system all the time, as it is one of the foundations of the platform, and so it might be helpful to understand what it does.

Now, why are those guarantees important?

Reliable Configuration

What’s wrong with the classpath?

The first set of guarantees, reliable configuration, prevent some sneaky bugs that could be introduced through configuration, like those that have earned the name “classpath hell” and result from the brittle classpath.

Every time a Java program encounters a new class, the VM scans the classpath, in order, until it finds the first matching classfile. It does not care about the way classes are packaged into JAR files; that organisation is ignored. For example, suppose you put leftpad2.jar , with version 2 of the leftpad library, on the classpath and, perhaps by accident, leave the old leftpad1.jar, on the classpath as well.

If you’ve put leftpad2.jar after leftpad1.jar, when you first touch com.acme.leftpad.Bar — the new exciting addition in version 2 — the VM will scan all JARs on the classpath, until it finds Bar.class in leftpad2.jar. But when you first use the old com.acme.leftpad.Foo the VM will, again, scan the classpath and find Foo.class in leftpad1.jar, as that is its first occurrence. Some surprising and behaviour would ensue from the mismatch, and not one that’s easy to debug.

We manage to live with the classpath only thanks to the help of tools like Maven that assemble it for us, but the classpath is so fragile that it cannot serve as a respectable foundation. If the leftpad library were modularised and the two JARs accidentally placed on the module path, the VM would check — immediately at startup — that the com.acme.leftpad exists in no more than a single module. Furthermore, when creating the Java runtime for our application with jlink, it would embed the modules in the image file, ensuring that the version we used at build time is the same one we have at runtime.

Strong Encapsulation

How do modules help maintainability and security?

Reading the scenario above, I’m certain some readers thought, “this never happened to me,” while others were triggered with flashbacks to the horror they’d experienced when one of their dependencies had moved a package from one artefact to another and they only upgraded one of them. But, with care and discipline and the use of tools like Maven and perhaps even containers like Docker, classpath problems can be largely avoided even without runtime guarantees. So modules’ second guarantee, strong encapsulation, is the more important one. Strictly enforcing, at runtime, a well-defined, explicit API has significant consequences to both maintainability and security.

The Java compiler will not let you compile code that accesses another class’s private field; but even if you compiled your code against an older version where the field was public, or had the bytecode generated by some Java Agent, the JVM would perform access checks at runtime and block the attempt. The integrity of the platform depends on it. Modules work in a similar way. The compiler will check access according to module configuration (though not for reflective access!) but at runtime the VM will enforce that all access, even reflective access, goes through the module’s declared API.

Without an explicit API enforced at runtime, every class, method and field in a library could become part of a de facto API, as clients choose to reach for it and use it as such. Even though library authors would be allowed — per the unwritten contract between them and their users — to change internal classes at will, doing so may end up breaking code, and that, in turn, will mean less uptake of new versions and an overall slowdown of the library’s evolution. That is precisely what happened with Java 9. Even though Java’s specification — its documented API — remained backward compatible with JDK 8 (with the exception of a handful of methods virtually no one had used), many libraries had reached for internal JDK classes. When those internals changed in 9, those libraries broke, which slowed down the adoption of new JDK versions. Strong encapsulation makes libraries more maintainable, as their authors are free to change internals without fear they’ve been used as an ad hoc API by some user.

Of course, backward compatibility goes well beyond interfaces, and changes to logic also frequently cause problems, but strong encapsulation eliminates some, and significantly reduces the challenge. If Java had had the module system from day one, upgrading JDK versions would have been significantly easier. That’s not due to some special nature of the JDK, but just because it’s popular. While not many libraries are as big and as popular as the JDK, some — like, say, Spring or JUnit — are big enough, and are as popular as some fairly popular programming languages. Such libraries would enjoy a significantly smoother evolution when they adopt strong encapsulation.

Over their lifetime, big applications maintained by multiple teams experience a similar symptom to that of the ecosystem at large. Instead of waiting for another team to add an API for some operation they need, one team will carve that API for itself, just because it’s quicker, by reaching into internals and bypassing the documented API for the other team’s component — perhaps directly, which could be detected at build time, or perhaps reflectively, which could only be detected at runtime. Over time, the application’s components become entangled, making evolution painful as any change anywhere can affect pretty much anything else. Looser coupling of components through an uncompromisable API is one of the main motivations of the microservice architecture; Java modules give you that inside a single Java process.

Finally, strong encapsulation improves security. Suppose your application contains code like so:

public UserData safelyRetrieveUserData() {
    if (currentUserIsAuthorized())
        return getUserData();
}

This code — that’s similar to all authorisation code — assumes that the sensitive operation, getUserData, is only ever called by safelyRetrieveUserData after an appropritate credentials check. But without strong encapsulation, even if getUserData were private, it could be called directly, bypassing safelyRetrieveUserData through so-called “deep reflection” — the use of setAccessible to disable access checks. This does not require any malicious code in your application. Rather, such an exploit would require a well-meaning software component that already performs deep reflection for benign purposes and a vulnerability that would allow the remote attacker to trick that benevolent component to apply its deep reflection on getUserData by cleverly manipulating inputs (for example, a JSON serialisation library might employ deep reflection to access private fields or methods based on the strings that appear in its input). Such vulnerable code might lie in some transitive dependency without you even knowing it’s there.

Strong encapsulation has closed vulnerabilities in the JDK, and it will probably do the same for your application. Good security requires (among other things) a well-defined, preferably small, boundary — its “attack surface area” — that is then defended. Without one, every method and every field in the application becomes part of its border, making an effective defence difficult. The module system’s strong encapsulation is a solid foundation for security, and is gradually becoming the core of the platform’s security strategy.

In conclusion,

while we wouldn’t know it from their name alone, the chief purpose of Java modules, that is unique to them, is to make two kinds of strong guarantees at runtime: reliable configuration and strong encapsulation. Reliable configuration prevents certain slippery configuration bugs. The more important strong encapsulation ensures that interaction with code units takes place only via their explicit API, and has significant implications for long term maintainability as well as the security of Java code.

To learn more about modules, you can watch these Java Channel videos by Alex Buckley: Modules in JDK 9, and Modules and Services. There are also two good books on the subject: Java 9 Modularity by Paul Bakker and Sander Mak, and The Java Module System, by Nicolai Parlog.


Addendum: The Multi-Version Coexistence Problem

Do modules let me use two versions of a library at the same time?

Modules exist to do an important job and they do it well, but we might wish they also did other jobs, such as solve a problem I’ll call “multi-version coexistence.”

Suppose your application employs two libraries, libA and libB, both using a logging library called superlogger, but whereas libA uses version 1 of superlogger, libB depends on the incompatible version 2. Some had hoped that a module system could allow the two incompatible versions of the logging library to coexist in the same process without harmful interference. Specifically, the mechanism that can support multiple versions of classes in the same process is class loader isolation. The module system could have given each module its own class loader, arranged in a hierarchy derived from the modules’ dependency graph, but it doesn’t, at least not by default.

For one, doing so would be more disruptive to the Java ecosystem than all the changes in JDK 8 and 9 combined. Even when the Java SE specification remains backward-compatible, changes to JDK behaviour may break things that make assumptions about the workings of the Java runtime that aren’t guaranteed by the spec. The assumption that the runtime’s class loader hierarchy is shallow and known in advance runs so deep and so wide in the ecosystem, that too many popular frameworks would just cease functioning correctly if modules enforced class loader isolation by default.

For another, class loader isolation is insufficient to allow multiple instances of libraries to coexist. Suppose that the logger is configured with a system property like so -Dcom.acme.superlogger.logfile=myApp.log. Both versions would then use the same output file, and as they each probably synchronize writes to the file with a lock, two instances would mean two locks and a corrupted log (or version 2 could have even changed the file format). Class loader isolation is only sufficient for multi-version coexistence for some libraries and not others, depending on how they operate.

Given that class loader isolation is both disruptive and doesn’t even solve the problem in the general case but only as a best-effort attempt, the module system does not do it by default. Nevertheless, modules do support class loader isolation, for whatever it’s worth, with module layers. Layers might be appropriate at times (for example, they’re useful for a plugin architecture), and third-party libraries like Layrry can construct a layer hierarchy based on a configuration file.