Record Serialization in Practice

TL;DR Learn how serialization frameworks can support record classes.

DJ Duke scratching on vynil records


Record Classes and Serialization

Serialization is the process of extracting an object’s state and translating it to a persistent format from which an equivalent object can be constructed. Record classes - now final in Java 16 - are semantically constraint classes whose design naturally fits the demands of serialization.

For normal Java classes, serialization can get very complicated very quickly, due to their freedom to model extensible behaviour and mutable state. Records in contrast keep things simple: They are plain data carriers that declare immutable state and provide an API to initialize and access that state. As an example, here’s the declaration of a Point record class:

record Point (int x, int y) { }

As you can see, the language offers a concise syntax for declaring record classes, whereby the record components are declared in the record header. The list of record components declared in the record header form the record descriptor. It is the record descriptor that describes the state and drives the concise API. A record class has a canonical constructor whose parameter list matches that of the record descriptor - this is used to initialize the state. The state of a record (an instance of a record class) is retrievable through its component values, and each component can be accessed via an accessor of the same name.

From this design flows the uncomplex record serialization protocol that Java Object Serialization uses, which is based on two properties:

  1. the serialization of a record is based only on its state components, and
  2. the deserialization of a record uses only the canonical constructor.

A record consists solely of its state, from which its serialized form is modeled without any customization. During serializing, the accessors are used to read that state before it is translated into the serialized form. During deserializing, the canonical constructor, whose parameters are known because they are identical to the state description, is called, which is the only way a record can be created.

Matters are more complicated for normal Java classes, and serialization frameworks like to use certain backdoor techniques to handle this complexity. For example, a common practice during serializing is to scrape private fields; this only works if Java language access control checks are suppressed. During deserializing, Core Reflection is typically used to set the private state of newly deserialized objects. Since JDK 15, this is no longer possible for record objects, not even with adequately privileged code (see this JIRA issue for more details). Instead, the only way to instantiate a record class is to invoke its canonical constructor.

Irrespective of this novel enforcement, there is really no need for intrusive reflective access and mutation. Record classes offer all the API points required for serialization by exposing their state and a means of reconstruction through well-specified methods - it would be a shame to not use them. Adopting a record-specific serialization protocol is an opportunity to make serialization frameworks more secure, more maintainable, and easier to use. We therefore embarked on an effort to help exisiting frameworks to support records adequately.


Serialization Frameworks Supporting Records

We engaged with three popular Java-based serialization frameworks in an effort to promote record serialization: Jackson, Kryo, and XStream.

Framework Jackson Kryo XStream
Serialization format JSON binary XML
Build JDK 8 8, 11 8

Jackson was the first project we engaged with, in fact when we reached out in June 2020 a contributor was already working on record support. We helped with some review comments and suggestions before the code was integrated two months later. Around the same time, an issue was opened by a user for Kryo, which was the starting point for our engagement that resulted in a pull request (PR). For XStream, we created an issue and a subsequent PR shortly after that. Both PRs were successfully integrated in March 2021.

Jackson
https://github.com/FasterXML/jackson
https://github.com/FasterXML/jackson-future-ideas/issues/46
https://github.com/FasterXML/jackson-databind/pull/2714

Kryo
https://github.com/EsotericSoftware/kryo
https://github.com/EsotericSoftware/kryo/issues/735
https://github.com/EsotericSoftware/kryo/pull/766

XStream
https://github.com/x-stream/xstream
https://github.com/x-stream/xstream/issues/210
https://github.com/x-stream/xstream/pull/220

While each project comes with its own architecture and conventions, there are some common ingredients for supporting records that we would like to share.


The Common Recipe

The basic idea for supporting records is the same across the three frameworks: Implement a record-specific Serializer/Deserializer and integrate it into the existing project. With the help of a few utility methods that you can find here, the implementation can be completed fairly easily.

Looking at the implementation in more detail, an important aspect is JDK version compatability. The frameworks in question typically compile with the oldest JDK they support, so that at runtime any JDK that is equal to or higher than the compile version can be used. Records were first introduced as a preview feature in Java 14, so in order to avoid a static dependency on Java 14+, the presence of records has to be determined at runtime. For this, Java 14 added a specific method Class::isRecord, which returns true if the class is a record class, otherwise it returns false. If the method is not present, the Java runtime does not support records. Additionally, another method and a new type were introduced: Class::getRecordComponents, which returns an array of java.lang.reflect.RecordComponent objects that represents the record components of this record class. A RecordComponent provides information about, and dynamic access to, a record component, in particular its name and type (RecordComponent::getName, RecordComponent::getType).

These few primitives are the key ingredients to implement record serialization. For example, the instantiation of a record class during deserialization can look like this:

Class<?>[] paramTypes = Arrays.stream(cls.getRecordComponents())
                                .map(RecordComponent::getType)
                                .toArray(Class<?>[]::new);
MethodHandle MH_canonicalConstructor =
        LOOKUP.findConstructor(cls, methodType(void.class, paramTypes))
                .asType(methodType(Object.class, paramTypes));
MH_canonicalConstructor.invokeWithArguments(args);

This code sample uses MethodHandles for the reflective calls (see the invoke package in the sample code). This is an implementation detail, the same can be achieved with Core Reflection (see the reflect package in the sample code). This being said, the method handle API in java.lang.invoke offers an interesting set of low-level operations for finding, adapting, combining, and invoking methods or setting fields, and can bring efficiency gains if used wisely.

With these tools at hand, the actual mechanics of serialization are straightforward. During serializing, the record components are obtained and translated to the serialized form. During deserializing, the values are read before they are passed to the canonical constructor of the record class for object creation. The serialized form depends on the serialization format and the conventions of the framework. In the case of Kryo, the consumer of the serialized data assumes the shape of the data, which means the serialized form can be boiled down to only the component values. This results in a very compact serialized form. In the case of XStream, the serialized form contains both the names and values of the components as well as the respective class names. The shape of the data is less streamlined, it does not have to reflect the shape of the class since the values can be matched by name. In general, the more the shape of the class is captured in the serialized form, the more flexible the deserialization process becomes. A more compact serialized form on the other hand relies on specific assumptions during deserialization that can bring about storage and memory efficiencies.

Another interesting aspect is the ordering of the record component representations in the serialized form. One approach is to apply a specific order, an obvious choice being the order of components in the record declaration. The array returned by Class::getRecordComponents adheres to this order, so does the parameter list of the canonical constructor. Following this order, the values can be sequentially written to and read from the stream, and be passed directly to the canonical constructor.

However, what if a record class evolves over time, for example if we change the order of its components? During serialization, we can now no longer be certain of the component order, and the rather static approach described above would not be satisfactory. To support this type of record class evolution, we need to allow more flexibility of the serialized form. More precisely, the implementation must provide some kind of matching or sorting algorithm to correctly map the stream values to the parameter list of the canonical constructor. One option here is to sort the record components in the serialized form lexicographically by name (as done in Kryo’s solution). Ordering record components in the serialized form does not just support record type versioning, but is also in line with normal classes. This has the benefit that future transitions from normal classes to record classes are facilitated.

Moving from the implementation to the testing, the dependency on Java 14+ needs to be handled here, too. Test code has to be stored separately from tests that depend on older JDK versions and can only be compiled and executed if Java 14+ is present. Maven is used as build and dependency management tool by all three frameworks and its build profiles can be used to compile tests conditionally based on the detected JDK version. Depending on the existing build process, this configuration can be a little tricky but certain plugins can help setting the compilation and execution of source and test code. When working with Java 14 and 15, certain flags are needed to enable preview features, specifically --enable-preview. With Java 16, records are a final so this flag is no longer required.


In Conclusion

Java records can add value to serialization frameworks. We showcased three frameworks that have successfully added record support and outlined their common recipe. With this, supporting records is far from difficult and - whether you are a framework developer or not - we hope you try it out.