Why Update Data-Oriented Programming to Version 1.1?

We just concluded a six-part and seven-thousand-word article series on version 1.1 of data-oriented programming in Java, which went deep into the paradigm as well as the language features that are best used to implement it. It left one question unanswered, though: Why did we need an update to 1.1? In what way does it hope to improve the original proposal? And as a follow up to that: What are the shortcomings of 1.1 and what could a version 1.2 do better? I’ll share my thoughts on that in this bonus article. 😃

Version 1.0

Brian Goetz laid out these four guiding principles in his seminal article Data-Oriented Programming in Java:

  • Model the data, the whole data, and nothing but the data.
  • Data is immutable.
  • Make illegal states unrepresentable.
  • Validate at the boundary.

After implementing a few (small) projects following these guidelines and communicating them in various talks and videos, I started to see room for improvements. Less with the guidelines themselves and more with the way they’re formulated:

  • Three of these statements are normative, but “data is immutable” is descriptive. This is not only inconsistent, a factual statement is also not suitable as a guideline.
  • The guidelines don’t mention transparency, which is essential for implementing operations outside of the classes modeling data. It was also a driving force behind records’ built-in transparency, so I thought it should get a mention.
  • Another important aspect I found underrepresented were operations. While Brian explains in details where to place and how to implement them, that advice is not explicitly captured in a principle.
  • Making illegal states unrepresentable and validating at the boundary are very closely related.
  • The principles don’t have equal weight. When presenting them, the majority of the time is spent on “Model the data, …” (which was also the principle that implicitly included how to handle operations) after which the remaining three guidelines can be ticked off in short order.
  • In general, I didn’t find these principles sufficiently orthogonal. The order in which they are presented greatly influences how much there is to say about each of them, which shows that they’re very dependent on one another.

As you can see, I didn’t (and don’t) have any issues with the building blocks of data-oriented programming. I just felt that their organization into four principles and their headers could be improved.

Version 1.1

To remedy the issues described above, I made the following changes:

  • Turn “Data is immutable” into a normative statement and add transparency. This makes it the perfect vehicle to talk about records.
  • Make the resulting guideline the first one, so it can discuss records in isolation before going into other features.
  • Subsume “validate at the boundary” under “make illegal states unrepresentable” by making boundary validation part of the strategy to achieve unrepresentable illegal states.
  • Add a guideline for operations.

This leads to:

  • Model data immutably and transparently.
  • Model the data, the whole data, and nothing but the data.
  • Make illegal states unrepresentable.
  • Separate operations from data.

Version 1.2?

But I’m still not entirely happy with the result:

  • Orthogonality barely improved. In fact, that I changed the order of the first two principles just shows how much they still depend on one another.
  • The weight distribution improved but is still a little out of balance. Using word count as a proxy, we can see that operations are three times heavier than preventing illegal states (2.6k words vs 0.8k words), although that comes in part from the lack of orthogonality: The article on illegal states is only so short because I already covered in earlier articles how to prevent many illegal combinations with sealed types and records.
  • “Model the data, the whole data, and nothing but the data” sounds very cool, but it’s not particularly evocative nor precise. In fact, you can talk about pretty much anything under that header. I wonder whether this is one of those darlings that writers have to kill occasionally.

There may well be more issues with the new version, in which case I’m sure that further experimentation with or exposure to data-oriented programming will flush them out. If there’s anything you think could be improved, please reach out to me - I’m nipafx everywhere.

Article Series