Data-Oriented Programming in Java - Version 1.1

Nicolai Parlog on May 23, 2024

In recent years, Java received a number of new language features that can be used independently of one another and that are each useful on their own: type patterns, switch improvements, records and record patterns, sealed types and a few other patterns. But as is occasionally the case, the whole is significantly more than the sum of its parts here and when correctly combined, these features can significantly impact our day-to-day coding. They invite us to fundamentally expand our repertoire of design patterns - in a well-known direction, but with a new twist. So in this series of six articles (see a detailed list at the end of this one), we will explore this style of programming and provide a minor update to the guidelines that Brian Goetz proposed for it in June 2022.

Object-Oriented Programming

Everything is an object.

Object-oriented programming (OOP for short) can be boiled down to this one sentence. It expresses that everything can or (in OOP) should be modeled as a combination of state and behavior. The most direct way to implement it is to create classes that combine mutable state with the methods that operate on them. These classes usually encapsulate their state and often inherit the contract for their methods from interfaces that represent the common features of different classes in one type.

In Java, this approach is ubiquitous and perhaps nowhere more obvious than in the collection API. From Iterable to Collection and List, from Queue to Set and more recently SequencedCollection and SequencedSet, interfaces define contracts, while concrete classes such as ArrayList or LinkedList, HashSet or TreeSet, PriorityQueue or ArrayDeque implement them in a variety of ways, always ensuring that their mutable state remains hidden so that outsiders cannot corrupt it.

It’s no surprise, then, that we often design our own systems in a similar way. In a web shop, an item might be modeled by the Item interface, which is implemented by concrete classes such as Book (with an ISBN), Furniture (with dimensions) and ElectronicItem (with additional information about connections and battery power). The interface has methods like addToCart, purchase, ship, or reorder and new item types can be easily added to the system by implementing new classes.

But… it’s often not that simple. While gathering all of these methods on Item seemed reasonable because they all interact with the purchasing process, adding predictLowStock (interacts with the machine learning-based pre-order system), registerForRecommendations (another ML system, this time for item suggestions) and reportPurchase (registration of the purchase of potentially dangerous goods) makes us doubt whether all of these operations really belong to the same interface. It’s also problematic that tables of contents can only be displayed for books while the 3D apartment planner can only deal with furniture - should Item now get the methods tableOfContent and addToVirtualApartment, each of which contains meaningful behavior in only one out of three Item implementations with the other two throwing exceptions or doing nothing at all? Alternatively we could introduce flags or do instanceof checks, but that doesn’t solve another problem that arises after some time: All of these subsystems share the item instances and repeatedly step on each other’s toes when changing their state, which causes some unpleasant bugs.

Somehow it feels like our beautiful design is shattered by ugly reality. A key contributing factor is that OOP is best at modelling evolving processes like shipping time, inventory management, or recommendation systems but not that suitable for modeling the things these processes are operating on - like the items above. So, what can we do?

Data-Oriented Programming

Where object-orientation sees the world as a network of interacting objects, each with an internal, usually mutable state (perhaps similar to a natural ecosystem), data-oriented programming (DOP for short) sees it as a chain of systems, each with a potentially changing state, that operate on immutable data (comparable to a production line). Operations on immutable data? That sounds like functional programming (FP for short) and in fact DOP has a lot in common with it. But DOP also contains potentially-mutable systems that can be modeled in an object-oriented fashion. We’ll talk a little about the relation between DOP, FP, and OOP in the last article in this series.

Data-oriented programming is based on a number of principles whose exact formulation isn’t quite finalized. In his seminal article “Data-Oriented Programming in Java”, Brian Goetz, Java Language Architect at Oracle, wrote in June 2022 (slightly reordered):

Data is immutable.
Model the data, the whole data, and nothing but the data.
Make illegal states unrepresentable.
Validate at the boundary.

That was version 1.0, so to speak. After using DOP for about 18 months in various projects (mostly demos and hobby projects, but one is also in production), I propose a first revised version 1.1 here:

Model data immutably and transparently.
Model the data, the whole data, and nothing but the data.
Make illegal states unrepresentable.
Separate operations from data.

Article Series

Over the coming weeks, we will publish an article on each of these four principles and close the series out with a sixth one that puts data-oriented programming in context of object-oriented as well as functional programming and gives some guidelines as to when and where to use it.

Data-Oriented Programming in Java - Version 1.1 (this article)
Model data immutably and transparently - DOP v1.1
Model the data, the whole data, and nothing but the data - DOP v1.1
Make illegal states unrepresentable - DOP v1.1
Separate operations from data - DOP v1.1
Wrapping up DOP v1.1
Bonus: Why Update DOP to Version 1.1?