Simpler Serialization with Records

TL;DR: Learn how the design of Java’s Records can be leveraged to improve Java Serialization.


Close-up of 2 colorfull vynil records
Photo by Manuel Sardo

Record Classes

Record classes enhance Java’s ability to model “plain data” aggregates with less ceremony. A record class declares some immutable state, and commits to an API that matches that state. This means that record classes give up a freedom that classes usually enjoy – the ability to decouple their API from their internal representation – but in return, record classes become significantly more concise. Record classes were a preview feature in Java 14 and 15, and are now final in Java 16.

Here is a record class declared in the JDK’s jshell tool:

jshell> record Point (int x, int y) { }
|  created record Point

The state of Point consists of two components, x and y. These components are immutable and can only be accessed via accessor methods x() and y() which are automatically added to the Point class during compilation. Also added during compilation is a canonical constructor for initializing the components. For the Point record class, it is equivalent to the following:

public Point(int x, int y) {
  this.x = x;
  this.y = y;
}

Unlike the no-arg default constructor added to normal classes, the canonical constructor of a record class has the same signature as the state. (If an object needs mutable state, or state which is unknown when the object is created, then a record class is not the right choice; a normal class should be declared instead.)

Here is Point being instantiated and used: (we say that p, the instance of Point, is “a record”)

jshell> Point p = new Point(5, 10)
p ==> Point[x=5, y=10]

jshell> System.out.println("value of x: " + p.x())
value of x: 5

Taken together, the elements of a record class form a succinct protocol for developers to rely on: a concise description of state, a canonical constructor to initialize the state, and controlled access to the state. This design has many benefits, amongst others for serialization.

Serialization

Serialization is the process of converting an object into a format that can be stored on disk or transmitted over the network (“serialized”, “marshalled”), and from which the object can later be reconstituted (“deserialized”, “unmarshalled”). It provides the mechanics for extracting an object’s state and translating it to a persistent format, as well as the means for reconstructing an object with equivalent state from that format. Given their nature as plain data carriers, records are well suited for this use case.

Serialization is a powerful idea and many frameworks have implemented it, one of them being Java Object Serialization in the JDK (hereafter referred to as “Java Serialization”). In Java Serialization, any class that implements the java.io.Serializable interface is serializable - suspiciously simple! The interface has no members and serves only to mark a class as serializable. When serializing, the state of all non-transient fields is scraped (even private fields) and written to the serial byte stream. When deserializing, a superclass’ no-arg constructor is called to create an object before its fields are populated with the state read from the serial byte stream. The format of the serial byte stream (the “serialized form”) is chosen by Java Serialization unless the special methods writeObject and readObject are implemented to specify a custom format.

It’s not news that Java Serialization has flaws, and Brian Goetz’s Towards Better Serialization provides a summary of the problem space. The core of the problem is that Java Serialization was not designed as part of Java’s object model. This means that Java Serialization works with objects using backdoor techniques such as reflection, rather than relying on the API provided by an object’s class. For example, it is possible to create a new deserialized object without invoking one of its constructors, and data read from the serial byte stream is not validated against constructor invariants.

Record Serialization

In Java Serialization, a record class is made serializable just like a normal class, by implementing java.io.Serializable:

jshell> record Point (int x, int y) implements Serializable { }
|  created record Point

However, under the hood, Java Serialization treats a record (that is, an instance of a record class) very differently than an instance of a normal class (this article provides a good comparison). The design aims to keep things as simple as possible and is based on two properties:

  1. the serialization of a record is based only on its state components, and
  2. the deserialization of a record uses only the canonical constructor.

No customization of the serialization process is allowed for records. The simplicity of this approach is enabled by, and a logical continuation of, the semantic constraints placed on records. Being an immutable data carrier, a record can only ever have one state (the value of its components), so there is no need to allow customization of the serialized form. Similarly, on the deserialization side, the only way to create a record is through the canonical constructor of its record class, whose parameters are known because they are identical to the state description.

Going back to our sample record class Point, the serialization of a Point object using Java Serialization looks as follows:

jshell> var out = new ObjectOutputStream(new FileOutputStream("serial.data"));
out ==> java.io.ObjectOutputStream@5f184fc6

jshell> out.writeObject(new Point(5, 10));
jshell> var in = new ObjectInputStream(new FileInputStream("serial.data"));
in ==> java.io.ObjectInputStream@504bae78

jshell> in.readObject();
$5 ==> Point[x=5, y=10]

Under the hood, a serialization framework can use the x() and y() accessors of Point during serialization to extract the state of p’s components, which are then written to the serial byte stream. During deserialization, the bytes are read from serial.data and the state is passed to the canonical constructor of Point to obtain a new record.

Overall, the design of records naturally fits the demands of serialization. The tight coupling of state and API facilitates an implementation that is more secure and easier to maintain. Furthermore, it allows for some interesting efficiencies of the deserialization of records.

Optimizing Record Deserialization

For normal classes, Java Serialization relies heavily on reflection to set the private state of a newly deserialized object. However, record classes expose their state and means of reconstruction through a well-specified public API, which Java Serialization leverages.

The constrained nature of record classes allows us to re-evaluate Java Serialization’s strategy of reflection. If, as outlined above, the API of a record class describes the state of a record, and since this state is immutable, the serial byte stream no longer has to be the single source of truth and the serialization framework the single interpreter of that truth. Instead, the record class can take control of its serialized form, which can be derived from the components. Once the serialized form is derived, we can generate a matching “instantiator” based on that form ahead-of-time and store it in the class file of the record class. In this way, control is inverted from Java Serialization (or any other serialization framework) to the record class. The record class now determines its own serialized form, which it can optimize, store, and make available as required.

This can enhance record deserialization in several ways, with two interesting areas being class evolution and throughput.

More Freedom to Evolve Records

The potential for this arises from an existing well-specified feature of record deserialization: default value injection for absent stream fields. When no value is present in the serial byte stream for a particular record component, its default value is passed to the canonical constructor. The following example demonstrates this with an evolved version of the record class Point:

jshell> record Point (int x, int y, int z) implements Serializable { }
|  created record Point

After we serialized a Point record in the previous example, the serial.data file contained a representation of a Point with values for x and y only, not for z. For reasons of compatibility, we would like to be able to deserialize that original serialized object in the context of the new Point declaration. Thanks to the default value injection for absent field values, this is possible and deserialization completes successfully:

jshell> var in = new ObjectInputStream(new FileInputStream("serial.data"));
in ==> java.io.ObjectInputStream@421faab1

jshell> in.readObject();
$3 ==> Point[x=5, y=10, z=0]

This feature can be taken advantage of in the context of record serialization. If default values are injected during deserialization, do they need to be represented in the serialized form? In this case, a more compact serialized form could still fully capture the state of the record object.

More generally, this feature also helps support record class versioning, and makes serialization and deserialization overall more resilient to changes in record state across versions. Compared with normal classes, record classes are therefore even more suitable candidates for storing data.

More Throughput When Processing Records

The other interesting area for enhancement is throughput during deserialization. Object creation during deserialization usually requires reflective API calls, which are expensive and hard to get right. These two problems can be addressed by making the reflective calls more efficient and by encapsulating the instantiation mechanics in the record class itself.

For this, we can leverage the power of method handles combined with dynamically-computed constants. The method handle API in java.lang.invoke was introduced in Java 7 and offers a set of low-level operations for finding, adapting, combining, and invoking methods/setting fields. A method handle is a typed reference that allows transformations of arguments and return types, and can be faster than traditional reflection from Java 1.1., if used wisely. In our case, several method handles can be chained together to tailor the creation of records based on the serialized form of their record class.

This method handle chain can be stored as a dynamically-computed constant in the class file of the record class, which is lazily computed at first invocation. Dynamically-computed constants are amenable to optimizations by the JVM dynamic compiler so the instantiation code only adds a small overhead to the foot print of the record class. With this, the record class is now in charge of both its serialized form and its instantiation code and does no longer rely on other intermediaries or frameworks. This strategy further improves performance and code reuse. It also reduces the burden on the serialization framework, which can now simply use the deserialization strategy provided by the record class, without writing complex and potentially unsafe mapping mechanisms.

In Conclusion

We have seen how serialization can capitalize on the semantic constraints placed on records by the design of the Java language. Many further potential optimizations can be explored from here. It is evident that putting a record class in charge of its own serialized form allows us to go further with record serialization.