Model Data Immutably and Transparently - Data-Oriented Programming v1.1

Nicolai Parlog on May 27, 2024

To model data immutably and transparently is one of the four principles of data-oriented programming. In this article, the second in a series that refines these principles in a version 1.1, we explore why immutability and transparency are important when modeling data and how to use Java’s features, particularly records, to achieve that.

Immutability and Transparency

A common source of software errors is the proliferation of objects that are modified by different subsystems. It happens again and again that code at one end of the code base changes an instance without code at the other end noticing this even though it needs to react to it.

A particularly simple and drastic example is storing an object in a HashSet and later changing a value that is used in the hash code calculation. The HashSet does not notice this change, can’t re-enter the object under its new hash code, and as a result, it is suddenly undiscoverable.

In this example, two subsystems (the HashSet and the code that modifies the object) have access to the same object, but have different requirements for modifying it and no way to communicate them - developers have to know them. Here, this is often the case and most Java developers know that computing hash codes from mutable fields is problematic, but that is only the case because one of the two affected systems is a well-known one with a simple contract (“don’t do it”) - in more complex and self-built systems this is much more difficult to keep track of.

The simplest approach that guarantees correctness is immutability: If nothing can change, such errors cannot occur. And if subsystems only communicate with immutable data, then this common source of errors entirely disappears.

But if the data cannot change, the necessary state changes must take place in the systems that process it. And just as a mutable object can take its entire state into account before changing it, these systems now have to take the entire state of the processed objects into account (more on this in the article on operations) and for that the objects must be transparent. An object is transparent if its internal state is accessible and constructable via the API, i.e.:

There must be an access method for each field that returns the same (==) or at least an equal (equals) value.
There must be a constructor that accepts a value for all fields and, if they are in the valid range, saves them directly or at least as a copy.

Taken together, this means that given an existing instance you can create a new one that is indistinguishable from the first apart from its identity (==) by querying all fields and calling the appropriate constructor.

Records

So we want to work with transparent carriers of immutable data. And as luck would have it, records were designed just so! Finalized in Java 16, records describe data as part of their type definition by declaring so-called components, each of which specifies a type and a name. If, for example, we want to model the data of a book with title, ISBN, and authors, the natural way to do that is as follows:

record Book(String title, ISBN isbn, List<Author> authors) { }

To function as transparent data carriers, a number of requirements must be met:

There must be a field for each component that stores its value.
These fields must be final (“immutable data”).
There must be a canonical constructor that accepts and assigns exactly these values, as well as accessor methods that return them (transparency in construction and access).
The type must be final (otherwise the record’s components would not fully describe the data).
The equals and hashCode methods are based on this data and not on the identity of the record instance (“carrier of data”).

Instead of leaving it to us to fulfill these requirements, Java takes care of that and generates all these things. (This then is the reduction in boilerplate that we enjoy when using records, but it is important to understand that this is not their purpose but a welcome side effect of their actual purpose: to be transparent carriers of immutable data.)

That’s why you can define simple records in a single line, although we’ll soon see that, in practice, adjustments are very common. And those are entirely possible:

The canonical constructor, accessor methods, equals and hashCode can be overridden and thus customized.
It is possible to add more constructors and arbitrary methods (but not fields or “private components” as this contradicts transparency).
Records can implement interfaces.

Before we continue, I want to point out that records simplify data-oriented programming but are neither required nor enforced by it. For example, if one of their limitations prevents their use for a particular type, you can design it as a normal class as long as you still adhere to the DOP principles. In the context of this principle, this means designing the class so that it is immutable and transparent.

Immutability in Depth

Record fields are final, but that doesn’t magically apply to what they reference:

record Book(String title, ISBN isbn, List<Author> authors) { }

// elsewhere
var threeBP = new Book(
	"The Three-Body Problem",
	new ISBN("978-0765382030"),
	new ArrayList<>());
threeBP
	.authors()
	.add(new Author("Liu Cixin"));

In this example, the list of authors could be changed after construction! To prevent that, records should, if possible, create immutable copies of mutable data structures in their constructors. The copyOf methods of List, Set and Map are suitable for Java collections:

record Book(String title, ISBN isbn, List<Author> authors) {

	Book {
		authors = List.copyOf(authors);
	}

}

Here I used a compact constructor, which does not require an explicit parameter list or assignments to fields. The parameters of a compact constructor are precisely the components of the record and after the code block is executed, the values are automatically assigned to the fields. So the constructor has to contain only what is absolutely necessary - here the copy of the author list by calling List.copyOf. And since the resulting list is immutable, calling authors().add(...) as above would result in an exception.

This can be more complicated for other data structures, especially your own. If there is no way to create immutable copies, you can make sure that nobody has a reference to the record’s inner state by creating a copy in the constructor and then another copy in the overwritten access method:

// assume `ISBN` is a mutable class that has a copy constructor
record Book(String title, ISBN isbn, List<Author> authors) {

	Book {
		authors = List.copyOf(authors);
		// create a copy, so references to
		// the `isbn` argument can't change
		// the record's internal state
		isbn = new ISBN(isbn);
	}

	@Override
	public ISBN isbn() {
		// don't expose mutable inner state
		return new ISBN(isbn);
	}

}

Although this can be unexpected and also lead to bugs, it is typically less problematic than changing the record state itself.

If there is no technical solution, perhaps a communicative one will help: A team can agree to treat everything they get from a record as immutable and not to call methods that change the data structure.

Summary

A reliable way to reduce bugs in a code base is to limit the reach of potentially troublesome actions and at the top of that list is the mutation of state that is shared across multiple subsystems. Data-oriented programming proposes that subsystems communicate via data that is modelled immutably and transparently. Java makes this particularly easy with records, although a little care needs to be taken when records reference mutable data structures.

Learn more about version 1.1 of data-oriented programming in this article series:

Data-Oriented Programming in Java - Version 1.1
Model data immutably and transparently - DOP v1.1 (this article)
Model the data, the whole data, and nothing but the data - DOP v1.1
Make illegal states unrepresentable - DOP v1.1
Separate operations from data - DOP v1.1
Wrapping up DOP v1.1
Bonus: Why Update DOP to Version 1.1?