Record Serialization

A record is a nominal tuple - a transparent, shallowly immutable carrier for a specific ordered sequence of elements. There are many interesting aspects of record classes, as can be read in Brian Goetz’s spotlight article, but here we will focus on one of the lesser known aspects, record serialization, and how it differs from (and I would argue is better than) serialization of normal classes.

While the concept of serialization is quite simple, it often gets complicated very quickly given the various customizations that can be applied. For records we wanted to keep things as simple and straightforward as possible, so:

  1. Serialization of a record object is based only on its state components.

  2. Deserialization of a record object uses only the canonical constructor.

A consequence of the first point is that customization of the serialized form of a record object is not possible - the serialized form is based on the state components and the state components only. This restriction simplifies the model, since the serialized form is readily and easily understood - it is the state components of the record.

The second point relates to the mechanics of the deserialization process. Suppose deserialization is reading the bytes of an object for a normal class (not a record class). Deserialization would create a new object by invoking the no-args constructor of a superclass, then use reflection to set the object’s fields to values deserialized from the stream. This is insecure because the normal class has no opportunity to validate the values coming from the stream. The result may be an “impossible” object which could never be created by an ordinary Java program. With records, deserialization works differently. Deserialization creates a new record object by invoking a record class’s canonical constructor, passing values deserialized from the stream as arguments to the canonical constructor. This is secure because it means the record class can validate the values before assigning them to fields, just like when an ordinary Java program creates a record object via new. “Impossible” objects are impossible. This is achievable because the record components, the canonical constructor, and the serialized form are all known and consistent.

Serializable Records leverage the guarantees provided by record classes to offer a simpler and more secure serialization model.

How Serialization Works for Records

We’ll build upon an example that is outlined in Brian’s article, that is, a simple class to model a range of integers, from a low end to a high end. Here though, for the purpose of comparison, we’ll write an implementation of a Range using a normal class, and another equivalent implementation using a record class.

A concrete implementation of a Range uses a normal class that implements Serializable, since we want to be able to serialize instances of it. A Range has a low-end, lo, and a high-end, hi.

public class RangeClass implements Serializable {
    private static final long serialVersionUID = -3305276997530613807L;
    private final int lo;
    private final int hi;
    public RangeClass(int lo, int hi) {
        this.lo = lo;
        this.hi = hi;
    }
    public int lo() { return lo; }
    public int hi() { return hi; }
    @Override public boolean equals(Object other) {
        if (other instanceof RangeClass that
                && this.lo == that.lo && this.hi == that.hi) {
            return true;
        }
        return false;
    }
    @Override public int hashCode() {
        return Objects.hash(lo, hi);
    }
    @Override public String toString() {
      return String.format("%s[lo=%d, hi=%d]", getClass().getName(), lo, hi);
    }
}

Notice the verbose boilerplate code for the equals, hashCode and toString!

A record class is made serializable in the same way as a normal class, by implementing Serializable. The equivalent record counterpart of RangeClass looks like this:

public record RangeRecord (int lo, int hi) implements Serializable { }

Notice that there is no need to add any additional boilerplate to RangeRecord in order to make it serializable. Specifically, there is no need to add a serialVersionUID field, since the serialVersionUID of a record class is 0L unless explicitly declared, and the requirement for matching the serialVersionUID value is waived for record classes. Rarely, for migration compatibility between normal classes and record classes, a serialVersionUID may be declared, see Section 5.6.2 Compatible Changes of the Java Object Serialization Specification for more details.

Ok, let’s serialize a RangeClass object and RangeRecord object, both with the same high end and low end values.

import java.io.*;

public class Serialize {
  public static void main(String... args) throws Exception {
    try (var fos = new FileOutputStream("serial.data");
         var oos = new ObjectOutputStream(fos)) {
      oos.writeObject(new RangeClass(100, 1));
      oos.writeObject(new RangeRecord(100, 1));
    }
  }
}
import java.io.*;

public class Deserialize {
  public static void main(String... args) throws Exception {
    try (var fis = new FileInputStream("serial.data");
         var ois = new ObjectInputStream(fis)) {
      System.out.println(ois.readObject());
      System.out.println(ois.readObject());
    }
  }
}

Running the Serialize and Deserialize programs we get:

java --enable-preview Serialize
java --enable-preview Deserialize
RangeClass[lo=100, hi=1]
RangeRecord[lo=100, hi=1]

Oops! Did you manage to spot the mistake? The low end value is actually higher than that of the high end value. This should not be allowed.

The invariant we want is: the low end of the range can be no higher than the high end. Let’s encode that invariant into the constructor of the concrete class.

public class RangeClass implements ... {
    // ...
    public RangeClass(int lo, int hi) {
        if (lo > hi)
            throw new IllegalArgumentException(String.format("%d, %d", lo, hi));
        this.lo = lo;
        this.hi = hi;
    }
    // ..
}

And the record equivalent.

public record RangeRecord (int lo, int hi) implements ... {
    public RangeRecord {
        if (lo > hi)
            throw new IllegalArgumentException(String.format("%d, %d", lo, hi));
    }
}

Notice that the above code is using the compact version of the canonical constructor declaration, allowing the boilerplate assignments to be omitted.

Now, let’s run the Deserialize program again, since it will attempt to deserialize the stream objects in the serial.data file, and we know that these stream objects have a low end value of 100 and a high end value of 1.

java --enable-preview Deserialize
RangeClass[lo=100, hi=1]
Exception in thread "main" java.io.InvalidObjectException: 100, 1
	at java.base/java.io.ObjectInputStream.readRecord(ObjectInputStream.java:2296)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2183)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1685)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:499)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:457)
	at Deserialize.main(Deserialize.java:9)
Caused by: java.lang.IllegalArgumentException: 100, 1
	at RangeRecord.<init>(RangeRecord.java:6)
	at java.base/java.io.ObjectInputStream.readRecord(ObjectInputStream.java:2294)
	... 5 more

First, we can see that a RangeClass object was deserialized even though the newly created object violates the constructor invariant. This may seem counterintuitive at first, but as described earlier, deserialization of an object whose class is a normal class (not a record class), creates the object by invoking the no-args constructor of the (first non-serializable) superclass, which in this case is java.lang.Object. Of course, it would not be possible for the Java program Serialize to generate such a byte stream for a RangeClass object, since the program must use the two-arg constructor with its invariant checking. However, remember deserialization just operates on a stream of bytes, and these bytes can, in some cases, come from almost anywhere.

Second, the RangeRecord stream object failed to deserialize, as its stream field values for the low end and high end violate the invariant check in the constructor. This is nice, and actually what we want - deserialization proceeds through the canonical constructor.

The fact that a serializable class can have a new object created without one of its constructors being invoked is often overlooked, even by experienced developers. An object created by invoking a distant no-args constructor can lead to unexpected behavior at run time, since invariant checks in the deserialized class’s constructor are not performed. However, deserialization of a record object cannot be exploited to create an “impossible” object.

For more details, see the preview-related specification change document for the Java Object Serialization Specification: https://docs.oracle.com/en/java/javase/14/docs/specs/records-serialization.html

The full source code as outlined above can be found here.

~