Make Illegal States Unrepresentable - Data-Oriented Programming v1.1

A system focused on data should ensure that only legal combinations of the data can be represented in the system and so a guiding principle of data-oriented programming is to make illegal states unrepresentable. We’ll examine that in this article, the fourth in a series that refines the four DOP principles in a version 1.1.

The world is chaotic and every rule seems to have an exception. “Every user has an email address” quickly becomes “every registered user has an email address, but it may be missing during the registration process.” When modeling that, you might get stuck with a User who has a String email field that can be null (or otherwise absent, e.g. with Optional) at any time, and the fact that registered users must have an email address is implicit at best but no longer enforced.

With such a design, you’re not doing yourself any favors! In any system, but especially in one with a data-focused design, you’ll benefit from only making legal states representable.

If a User needs to have an email address, the constructor should ensure that this is the case. If no product can have both an ISBN and battery life, this must be prevented - ideally by modeling the data so precisely that there is no type that has both fields (see the previous article for details on that). Precise types like that not only have the advantage that their creator doesn’t have to write constructors and tests that verify that illegal combinations don’t occur, but also help the developers using them. When they see an Item, they don’t have to ask themselves whether they can call isbn() or dimensions() because Item has none of these methods - Book has one and Furniture has the other.

So the plan is:

  • Use precisely modeled types (usually records) to describe the data.
  • In either/or situations, avoid multiple fields with mutually exclusive or conditional requirements and instead create a sealed interface to model the alternatives and use it as the type for a mandatory field.
  • Only if these design techniques, both of which are supported by the compiler, are not sufficient, resort to run-time checks in the constructor.

Validate at the Boundary

When a property of the data can’t be expressed so that the compiler enforces it, it must be validated at run time. But not just any time, it should generally happen as early as possible, ideally right at the boundary between the external world and your system - whether that means when the file is read from disk, when the database replies to a query, or when another app sends some JSON.

Validating the data this early ensures that no broken data enters the system but it is also important to make sure that the system doesn’t generate broken data. That means the instances it creates that may later be mapped back to CSV, JSON, an SQL query, etc. must also be validated. That makes the constructors of these types the ideal place for validation logic. In more complicated cases, factory methods or classes may be involved, in which case they need to apply these checks of course.

Here are a few examples of such validation logic, placed in a compact constructor for brevity:

record Book(String title, ISBN isbn, List<Author> authors) {

	Book {
		Objects.requireNonNull(title);
		if (title.isBlank())
			throw new IllegalArgumentException("Title must not be blank");
		Objects.requireNonNull(isbn);
		Objects.requireNonNull(authors);
		if (authors.isEmpty())
			throw new IllegalArgumentException("There must be at least one author");

		// plus immutable copies as in the previous article
	}

}

Modeling Variants

So, how do you deal with users who don’t have an email address until they suddenly do?

sealed interface User permits UnregisteredUser, RegisteredUser { }
record UnregisteredUser(/*...*/) { }
record RegisteredUser(/*...*/, Email email) {
	// constructor enforces presence of `email`
}

Then the email verification system takes an UnregisteredUser and an Email, the overall registration process accepts an UnregisteredUser and returns a RegisteredUser, the newsletter dispatch only accepts RegisteredUser, and any API that can handle both uses User for their parameters. This not only keeps the user types precise, it also allows the respective subsystems to clearly express which users they can handle.

And with that we can finally get to exactly these subsystems and how they process data - in the next article.

Summary

Most systems, especially ones with a data-focused design, will benefit from only making legal states representable. To achieve that in data-oriented programming, start by modeling data closely and don’t shy away from creating several types for different variations of “the same data” (can’t quite be the same if it has variations). In those situations or any other were different data is related, use sealed interfaces to model such alternatives. Every property of the data that can’t be captured by types should be validated during construction.

Learn more about version 1.1 of data-oriented programming in this article series: