When to use Data-Oriented Programming v1.1

by Nicolai Parlog on June 10, 2024

In this sixth and last article in the series about data-oriented programming v1.1, we're wrapping it up with a review of data-oriented (DOP) versus functional (FP) versus object-oriented programming (OOP). First, let's briefly summarize what the four guiding principles left us with:

Use types to represent data:
- Model data transparently and immutably (usually with records).
- Model alternatives with sealed interfaces.
- Model the data as closely as possible and only represent legal states.
Implement operations as methods on other classes:
- Use exhaustive switch statements, predominantly over sealed interfaces and without a default branch.
- Use pattern matching to identify and decompose data.

If you want to read up on the details, find the other articles here:

What Java Version Do You Need?

Before we go into DOP and how it compares to FP and OOP, let's briefly examine on which Java versions it works best. While records and sealed types are present in JDK 17, the just-as-essential patterns in switch weren't finalized until JDK 21, which makes it the minimum requirement for data-oriented programming.

The single underscore as unnamed pattern was finalized in JDK 22 but while very helpful and elegant, it's not a requirement. Its absence can be worked around by repeating "defaulty" branches (remember, avoid outright default branches!):

switch(item) {
	case Book book -> createTableOfContents(book);
	case Furniture unused -> { }
	case ElectronicItem unused -> { }
}

DOP versus FP and OOP

In functional programming, all operations are pure functions, which have data as input and output and don't produce any side effects - appropriately composed, they implement the logic of the system. This works if you concentrate all mutable parts of a system in dedicated subsystems (e.g. the user interface in the client and data storage in a database) and view the stateless remainder of the system as a function that mediates between the other subsystems (e.g. user input and the current state of the database map to instructions for changing the interface and the database). This approach can be particularly effective for web applications and can lead to very maintainable code.

In many projects, however, it turns out to be difficult to achieve or maintain this absolute statelessness and absence of side effects. From the team's experience in functional programming to the suitability of the language, from functional and performance requirements to the availability of libraries and frameworks that support this approach, challenges abound.

The strength of functional programming isn't the panacea that awaits you if you follow all the rules to the letter, though, but that its approach works very well even on a small scale. Any piece of domain logic represented as a function - be it a simple stream pipeline or a chain of handwritten functions - makes the code base more reliable and usually more maintainable, too.

Data-oriented programming takes advantage of this fact and proposes a structure that favors functional purity wherever possible and isolates necessary deviations as far as possible in the subsystems responsible for the corresponding logic. DOP is therefore between FP and OOP, but overall closer to the former.

But object-oriented programming is not dead (again). The tools of encapsulation and inheritance, the ease with which large problems can be modularized (that is, broken down into small problems that are mostly isolated from each other), and our familiarity with this programming paradigm continue to make it valuable. So I have no intention to recommend a fundamental switch from OOP to DOP (or FP). Instead, we should see DOP as an additional tool that we can apply in appropriate situations.

When to Use DOP

Similar to functional programming, the advantages of data-oriented programming can be felt even on a small scale. The use of records, the prevention of mutation, the avoidance of placing complex operations on data, the clarity of switch over the visitor pattern - any piece of code that uses these techniques in the right environment will be clearer and more maintainable than without them. It is therefore not necessary to develop entire systems in a data-oriented manner.

If you want to start on a small scale, you should look out for two situations:

data processing (sub)systems
small (partial) problems that do not require further modularization

Well suited are, for example, systems that directly ingest and output data (e.g. batch jobs or data analysis tools), process events (where the events would be "the data"), or model an existing structure to allow its manipulation (the structure would be "the data", manipulation would be achieved via functional transformation - see for example the new class file API in JEP 457). This can be a small, stand-alone service or part of a larger system.

From my own experience I can say: Once you've used data-oriented programming and experienced the concepts in practice, you'll soon start to see small and large use cases everywhere, and so far I've always been very happy with the results. The code is readable thanks to the separation of data and operations, both can be easily verified and tested individually, and the overall architecture is easy to understand.

To everyone who has become curious after this (thorough) introduction and will soon be using DOP: Good luck, have fun! 🍀