Data-Oriented Programming in Java - Version 1.1
Nicolai Parlog on May 23, 2024In recent years, Java received a number of new language features that can be used independently of one another and that are each useful on their own: type patterns, switch improvements, records and record patterns, sealed types and a few other patterns. But as is occasionally the case, the whole is significantly more than the sum of its parts here and when correctly combined, these features can significantly impact our day-to-day coding. They invite us to fundamentally expand our repertoire of design patterns - in a well-known direction, but with a new twist. So in this series of six articles (see a detailed list at the end of this one), we will explore this style of programming and provide a minor update to the guidelines that Brian Goetz proposed for it in June 2022.
Object-Oriented Programming
Everything is an object.
Object-oriented programming (OOP for short) can be boiled down to this one sentence. It expresses that everything can or (in OOP) should be modeled as a combination of state and behavior. The most direct way to implement it is to create classes that combine mutable state with the methods that operate on them. These classes usually encapsulate their state and often inherit the contract for their methods from interfaces that represent the common features of different classes in one type.
In Java, this approach is ubiquitous and perhaps nowhere more obvious than in the collection API.
From Iterable
to Collection
and List
, from Queue
to Set
and more recently SequencedCollection
and SequencedSet
, interfaces define contracts, while concrete classes such as ArrayList
or LinkedList
, HashSet
or TreeSet
, PriorityQueue
or ArrayDeque
implement them in a variety of ways, always ensuring that their mutable state remains hidden so that outsiders cannot corrupt it.
It’s no surprise, then, that we often design our own systems in a similar way.
In a web shop, an item might be modeled by the Item
interface, which is implemented by concrete classes such as Book
(with an ISBN), Furniture
(with dimensions) and ElectronicItem
(with additional information about connections and battery power).
The interface has methods like addToCart
, purchase
, ship
, or reorder
and new item types can be easily added to the system by implementing new classes.
But… it’s often not that simple.
While gathering all of these methods on Item
seemed reasonable because they all interact with the purchasing process, adding predictLowStock
(interacts with the machine learning-based pre-order system), registerForRecommendations
(another ML system, this time for item suggestions) and reportPurchase
(registration of the purchase of potentially dangerous goods) makes us doubt whether all of these operations really belong to the same interface.
It’s also problematic that tables of contents can only be displayed for books while the 3D apartment planner can only deal with furniture - should Item
now get the methods tableOfContent
and addToVirtualApartment
, each of which contains meaningful behavior in only one out of three Item
implementations with the other two throwing exceptions or doing nothing at all?
Alternatively we could introduce flags or do instanceof
checks, but that doesn’t solve another problem that arises after some time:
All of these subsystems share the item instances and repeatedly step on each other’s toes when changing their state, which causes some unpleasant bugs.
Somehow it feels like our beautiful design is shattered by ugly reality. A key contributing factor is that OOP is best at modelling evolving processes like shipping time, inventory management, or recommendation systems but not that suitable for modeling the things these processes are operating on - like the items above. So, what can we do?
Data-Oriented Programming
Where object-orientation sees the world as a network of interacting objects, each with an internal, usually mutable state (perhaps similar to a natural ecosystem), data-oriented programming (DOP for short) sees it as a chain of systems, each with a potentially changing state, that operate on immutable data (comparable to a production line). Operations on immutable data? That sounds like functional programming (FP for short) and in fact DOP has a lot in common with it. But DOP also contains potentially-mutable systems that can be modeled in an object-oriented fashion. We’ll talk a little about the relation between DOP, FP, and OOP in the last article in this series.
Data-oriented programming is based on a number of principles whose exact formulation isn’t quite finalized. In his seminal article “Data-Oriented Programming in Java”, Brian Goetz, Java Language Architect at Oracle, wrote in June 2022 (slightly reordered):
- Data is immutable.
- Model the data, the whole data, and nothing but the data.
- Make illegal states unrepresentable.
- Validate at the boundary.
That was version 1.0, so to speak. After using DOP for about 18 months in various projects (mostly demos and hobby projects, but one is also in production), I propose a first revised version 1.1 here:
- Model data immutably and transparently.
- Model the data, the whole data, and nothing but the data.
- Make illegal states unrepresentable.
- Separate operations from data.
Article Series
Over the coming weeks, we will publish an article on each of these four principles and close the series out with a sixth one that puts data-oriented programming in context of object-oriented as well as functional programming and gives some guidelines as to when and where to use it.
- Data-Oriented Programming in Java - Version 1.1 (this article)
- Model data immutably and transparently - DOP v1.1
- Model the data, the whole data, and nothing but the data - DOP v1.1
- Make illegal states unrepresentable - DOP v1.1
- Separate operations from data - DOP v1.1
- Wrapping up DOP v1.1
- Bonus: Why Update DOP to Version 1.1?