Separate Operations From Data - Data-Oriented Programming v1.1

Nicolai Parlog on June 5, 2024

Not surprisingly, data-oriented programming (DOP) has a strong focus on data. In fact, three of the four guiding principles of DOP v1.1, which we’re exploring in this series, advise how to best model that. In this article, we’ll examine the fourth principle, which concerns the methods that implement most of the domain logic. It advises to separate operations from data.

We’ll continue to use the example of a simple sales platform that sells books, furniture, and electronic devices, each of which is modeled by a simple record. They all implement the sealed interface Item, which declares no methods because there are none that the three subclasses share.

Operations

When exploring how to model data, I laid out which methods fit well on records and which less so. I basically excluded all methods that contain non-trivial domain logic or interact with types that don’t represent data - let’s call them operations. Operations turn an extensive but ultimately lifeless data representation into a living system with moving parts.

In data-oriented programming, operations should not be defined on records but on other classes. Adding an item to the shopping cart would neither be Item.addToCart(Cart) nor Cart.add(Item) because Item and Cart are data and therefore immutable. Instead, the ordering system Orders should take over this task, for example with Orders.add(Cart, Item), which returns a new Cart instance that reflects the operation’s outcome.

If other subsystems need the current shopping cart, they should have a reference to Orders instead of a reference to the mutating shopping cart and, if necessary, query the current shopping cart of a user via Orders.getCartFor(User). Communication between subsystems isn’t implemented implicitlyby sharing mutable state, but rather explicitly through requests for the current state. State changes are still possible, but there are restrictions on where they should take place - ideally only in the subsystems that are responsible for the respective subdomain.

But how are these operations implemented? At first glance it seems quite difficult to do anything useful with an Item if the interface does not define any methods.

Pattern Matching

This is where pattern matching with switch comes into play. The switch statement has recently been improved in quite a few areas:

It can be used as an expression, for example to assign a value to a variable with var foo = switch ....
If a case label is followed by an arrow -> (instead of a colon :), there is no fall-through.
The selector expression (the variable or expression in parenthesis that follows the keyword switch; colloquially, what is being “switched over”) can have any type.

It is the last point that is crucial here: If the selector expression does not have any of the originally permitted types (numbers, strings, enums), it is not matched against concrete values but against patterns - hence pattern matching. The value of the selector expression is compared to one pattern after another, top to bottom, until one matches. Then, the branch on the right side of the label is executed. (The actual implementation is optimized and works non-linearly.)

In their simplest form, patterns are type patterns like the one we’ve used when implementing equals. Processing an item, for example, looks as follows:

public ShipmentInfo ship(Item item) {
	return switch (item) {
		case Book book -> // use `book`
		case Furniture furniture -> // use `furniture`
		case ElectronicItem eItem -> // use `eItem`
	}
}

Here, the variable item is compared to the types on the left and if it is, for example, a piece of furniture, the type pattern case Furniture furniture matches. This declares a variable furniture of type Furniture and casts item into it before executing the associated branch, where furniture can then be used. On the right side of the arrow, the logic that matches the operation (here: shipping an item) and the specific data (here: an instance of Book, Furniture or ElectronicItem) can then be executed. And because data is modeled transparently, all information is available to the operation.

This ultimately implements dynamic dispatch: selecting which piece of code should be executed for a given type. If we would have defined the method ship on the interface Item and then called item.ship(...), the runtime would decide which of the implementations Book.ship(...), Furniture.ship(. ..) and ElectronicItem.ship(...) ends up being executed. With switch we do this manually, which allows us not to define the methods on the interface. We have already highlighted some of the reasons why this makes sense:

Records should not implement non-trivial domain logic but remain simple data.
Records should not execute operations but be processed by them.
Many operations are difficult to implement on immutable records.

Another important reason has emerged during the short discussion about object-oriented programming (OOP) in this series’ first article: Types that model central domain concepts tend to attract too much functionality and therefore become difficult to maintain. DOP avoids this by placing the operations in the respective subsystems, i.e. Shipments.ship(Item) instead of Item.ship(Shipments) (where Shipments is the system responsible for deliveries).

The requirement to separate operations from the types they operate on is well-known in OOP, too. The Gang of Four has even documented a design pattern (no relation to pattern matching) called the visitor pattern that meets exactly this requirement. In this respect, DOP is in good company, but thanks to modern language features, it can use pattern matching, which is much simpler and more direct than the visitor pattern.

More detailed patterns

Type patterns in switch are essential for data-oriented programming. This may not apply to the five other types of patterns Java supports (or is about to), but they are certainly helpful, which is why we will briefly discuss them here. Each section includes a reference to the JDK Enhancement Proposal (JEP) that introduced the feature in detail.

Record Patterns

Record patterns were finalized in Java 21 by JEP 440 and allow a record to be deconstructed directly during matching:

switch(item) {
	case Book(String title, ISBN isbn, List<Author> authors) -> // use `title`, `isbn`, and `authors`
	// more cases...
}

You can alternatively use var, in which case the code in brackets would be var title, var isbn, var authors, or any mix of var and explicit types if you want to make your colleagues really angry.

Unnamed Patterns

Breaking down records is very convenient, but having to list all components every time is annoying when you only need some of them. This is where unnamed patterns come in, which were standardized by JEP 456 in Java 22. They allow replacing unnecessary patterns with the single underscore _:

switch(item) {
	case Book(_, ISBN isbn, _) -> // use `isbn`
	// more cases...
}

Unnamed patterns can also be used at the top level:

switch(item) {
	case Book book -> // use `book`
	case Furniture _ -> // no additional variable in scope
	// more cases...
}

We will see later, when we get to maintainability, why this is a crucial feature.

Nested Patterns

Since the finalization of patterns in switch in Java 21 by JEP 441, you can nest patterns inside each other with nested patterns. This allows us to dig deeper into a record, for example with two nested record patterns. Assuming that ISBN is also a record, it can look like this:

switch(item) {
	case Book(_, ISBN(String isbn), _) -> // use `isbn`
	// more cases...
}

Guarded Patterns

If the domain logic needs to distinguish not only by type but also by value, it might seem natural to simply use an if on the right side:

switch(item) {
	case Book(String title, _, _) -> {
		if (title.length() > 30)
			// handle long title
		else
			// handle regular title
	}
	// more cases...
}

Guarded patterns were also part of JEP 441 and they allow such conditions to be pushed to the left:

switch(item) {
	case Book(String title, _, _) when title.length() > 30 -> // handle long title
	case Book(String title, _, _) -> // handle regular title
	// more cases...
}

This has a few advantages:

All conditions, i.e. which type and which value is selected, are found on the left, improving the code’s structure and readability.
If different components are required for different branches, the ones that are not required can be conveniently ignored.
Guarded patterns are integrated into the completeness check that we will discuss in the next section.

Primitive Patterns

Lastly, a quick word about primitive patterns, which were introduced by JEP 455 as a preview feature in Java 23. They allow switch statements over primitive types (i.e. “classic” switches) to be extended with patterns, which makes it easier to capture the value of a selector expression and allows it to be used in a guarded pattern:

switch (Rankings.of(book).currentRank()) {
	case 1 -> firstPlace(book);
	case 2 -> secondPlace(book);
	case 3 -> thirdPlace(book);
	case int n when n <= 10 -> topTenPlace(book, n);
	case int n when n <= 100 -> nthPlace(book, n);
	case int n -> unranked(book, n);
}

Maintainability

A switch that compares by type will certainly give more than a few OOP veterans goosebumps. Should a glorified instanceof check really be the basis for a whole programming paradigm?

This idea is worth pursuing. Why is instanceof frowned upon? (Given the medium, this question is obviously rhetoric, but I still recommend to take a minute to come up with an answer before reading on.) The answer consists of two parts:

Code that works with an interface should work for all its implementations.
When adding a new implementation, a series of instanceof checks is chronically difficult to update because it’s hard to find.

In other words: Dynamic dispatch via instanceof checks is unreliable.

This is exactly why the visitor pattern has become widespread in object orientation: It also implements dynamic dispatch. (In case you lost count: After interface/implementation, switch with type patterns, and instanceof, this is now the fourth way to implement dynamic dispatch.) The visitor pattern does this in a way that is reliable, although somewhat cumbersome and sometimes difficult to understand because of its indirection. That’s so because each new implementation of the visited interface generates a series of compile errors that can only be fixed by making every existing visitor (i.e. every operation) take the new type into account.

And here comes the crucial point: The same can apply to a switch with patterns!

Exhaustiveness

Such a switch must be exhaustive, meaning that for every possible instance that has the type of the selector expression, there must be a pattern that matches it or the compiler reports an error. There are three different ways to achieve this:

A default branch that catches all remaining instances at the end:

 switch (item) {
     case Book book -> // ...
     case Furniture furniture -> // ...
     default -> // ...
 }

A pattern that has the same type as the selector expression and thus has the same effect as default:

 switch (item) {
     case Book book -> // ...
     case Furniture furniture -> // ...
     case Item i -> // ...
 }

Listing all implementations of a sealed type:

 switch (item) {
     case Book book -> // ...
     case Furniture furniture -> // ...
     case ElectronicItem eItem -> // ...
 }

Unfortunately, the first two variants do not help us achieve our goal. Such a switch would still be exhaustive when adding a new implementation and would therefore not produce a compile error. So if posters were added to the web shop, they would silently end up in default (1.) or in case item (2.). In the third variant, however, there would be no branch for posters and so we’d get a compile error, which forces us to update the operation. Excellent.

In order for operations to be maintainable (meaning they cause compile errors if they do not explicitly cover all cases), there must be no default or catch-all branch, which is only possible when:

switching over a sealed interface (or sealed abstract class but we’re ignoring those) and
listing all implementations

The last point also explains why sealed interfaces work better than sealed classes (remember that nugget from two articles ago?). If Item were a non-abstract class, a switch with Book, Furniture, and ElectronicItem branches would not be exhaustive because there could be instances of Item itself and there is no branch for them. If you process it with case Item, though, this branch would also handle every new item, such as a poster, and there would be no compile errors.

The last section’s comment on completeness checks of guarded patterns should also make sense now.

switch(item) {
	case Book(String title, _, _) -> {
		if (title.length() > 30)
			// handle long title
	}
	// more cases for other types...
}

In this example, books with short titles would be ignored, which may be an oversight and probably not obvious in longer code. This wouldn’t have happened with guarded patterns:

switch(item) {
	case Book(String title, _, _) when title.length() > 30 -> // handle long title
	case Book _ -> { /* ignore short titles */ }
	// more cases...
}

Here, after case Book ... when ..., there must be a branch for all books, which then either fixes the bug that books with short titles were forgotten, or (as shown) makes it explicit that they are intentionally ignored.

Avoiding Default Branches

Finally, a note on default branches and how to avoid them. It happens every now and then that a switch really only wants to handle some cases and ignore the others or treat them collectively in some other way - a default branch seems to be the obvious solution:

switch(item) {
	case Book book -> createTableOfContents(book);
	default -> { }
}

As discussed, however, this should be avoided at all costs and the addition of Magazine implements Item (which are not books but still require a table of contents) again highlights the problem. Instead, several case labels with unnamed patterns can be combined into one:

switch(item) {
	case Book book -> createTableOfContents(book);
	case Furniture _, ElectronicItem _ -> { }
}

This is a little more code than default ->, but produces the desired compile error when adding magazines and should therefore be preferred.

If you stick with Java 21 for the time being, you can only use unnamed patterns as a preview feature. Since it was finalized without changes in Java 22, this would be conceivable. But be aware that, when activating preview features with --enable-preview, all of them become available and you have to be careful not to use other, more volatile preview features (like string templates, for example 😬).

Summary

To keep data-modeling records free of non-trivial domain logic and prevent bloated APIs, operations should not be implemented on them but rather in dedicated subsystems. Operations will then often process sealed interfaces that usually offer very few methods to interact with. Instead, they will switch over those interfaces and enumerate all implementations, thus implementing their own dynamic dispatch. As long as default and catch-all branches are avoided, this is future-proof because new interface implementations will make these switches non-exhaustive. This causes compile errors that lead developers directly to the operations that need to be updated for the new type.

Learn more about version 1.1 of data-oriented programming in this article series:

Data-Oriented Programming in Java - Version 1.1
Model data immutably and transparently - DOP v1.1
Model the data, the whole data, and nothing but the data - DOP v1.1
Make illegal states unrepresentable - DOP v1.1
Separate operations from data - DOP v1.1 (this article)
Wrapping up DOP v1.1
Bonus: Why Update DOP to Version 1.1?