Finalizing the Foreign APIs

Maurizio Cimadamore on September 16, 2021

Now that the Foreign Memory Access API and the Foreign Linker API have been around for some time, it is time to take a more holistic look at how these APIs are structured and used, and see if there are some final opportunities for simplification, before we make more steps towards finalizing these APIs. In this document, we will focus on the outstanding issues in the current iteration of the APIs and pave a path forward. I’d like to thank Paul Sandoz, John Rose and Brian Goetz who have provided many useful insights in the matters discussed throughout this document.

Memory dereference

When looking at how clients interact with the Foreign Memory Access API (especially when it comes to jextract-generated code), we noted an asymmetry between how memory is allocated and how memory is dereferenced. The code snippet below summarizes the issue:

MemorySegment c_sizet_array = allocator.allocateArray(SIZE_T, new long[] { 1L, 2L, 3L });
// print contents
for (int i = 0; i < 3; i++) {
   System.out.println(MemoryAccess.getLongAtIndex(c_sizet_array, i));
}

Above, we can see that the API for allocating a segment (SegmentAllocator::allocateArray) takes both a layout (namely, SIZE_T) and a long[] array. This idiom provides dynamic safety: if there is a mismatch between the size of the array component type and the size of the provided layout, an exception will be thrown. Perhaps surprisingly, the same doesn’t happen for the dereference API (MemoryAccess::getLongAtIndex), which only takes a segment and an offset; there is no layout argument here which the runtime can use to enforce additional validation.

This inconsistency is not a mere cosmetic issue - but reflects the way in which the Foreign Memory Access API has evolved over time. In the first iterations of the API, the only way to dereference a memory segment was through a memory access var handle. While var handles still play a central role in our dereference story, especially when it comes to structured access (think of C structs and tensors), in subsequent releases of the API we have made some usability concessions, and ended up adding a full set of dereference method in a side class (MemoryAccess) and, more recently, another set of methods to copy from Java arrays to memory segments and back (MemoryCopy). But there are problems with this approach:

These static methods are not consistent with the rest of the API; as seen above, they do not accept a layout parameter, and instead only accept an optional ByteOrder parameter. This is not very general, as endianness is merely one dimension which can affect how memory dereference should behave (what about e.g. alignment?)
Adding methods on side classes keeps the MemorySegment API simple, but creates a discoverability problem: when using an IDE it might not be obvious that the way to dereference a memory segment is to call a static method on a separate class.

In other words, it is time we look at these ancillary classes again, and see if a better solution is possible.

Attaching carriers to value layouts

A promising move, which we will discuss in the remainder of this document, is that of attaching carrier types to value layouts. That is, if we could express types such as ValueLayout<int> and ValueLayout<double, then our dereference API would look something like this:

interface MemorySegment {
   ...
   <Z> Z get(ValueLayout<Z> layout, long offset)
   <Z> void set(ValueLayout<Z> layout, long offset, Z value)
}

Note how this is nicely symmetric: assuming that we had constants like JAVA_INT (whose type would be ValueLayout<int>), we could now read an int value from a segment in a more straightforward way, as follows:

MemorySegment segment = ...
int i = segment.get(JAVA_INT, 0);

Here, the layout information (alignment, endianness) flows naturally into the dereference operation, thus making it unnecessary to support ByteOrder-based overloads. The dereference API shown here is also much more discoverable (only one code completion away, when using an IDE) [¹].

This seems like a win; not only is the API more usable and succinct, but it is also more extensible: should we add another carrier (Float16 or Long128), we would only need to define its layout, and no extra API would be required. Finally, attaching carriers to value layouts allow us to significantly simplify the Foreign Linker API (more on that later).

Since we do not have specialized generics yet, how do we approximate the above API with the language we have today? One trick that is available to us is to introduce additional value layout leafs, one for each carrier (e.g. ValueLayout.OfInt, ValueLayout.OfFloat, etc.), and then define many dereference overloads, one per layout carrier:

byte get(ValueLayout.OfByte layout, long offset)
short get(ValueLayout.OfShort layout, long offset)
int get(ValueLayout.OfInt layout, long offset)
...

This works remarkably well in practice: it gives us type safety (it is no longer possible for users to use the wrong carrier with the wrong layout) - and, when Valhalla is ready, we can rewire these classes to be parameterized subclasses of ValueLayout, and, eventually, deprecate them (as ValueLayout<Z> would be enough). With this API in place, the problematic code snippet with which we started this section would become something like this:

MemorySegment c_sizet_array = allocator.allocateArray(SIZE_T, new long[] { 1L, 2L, 3L });
// print contents
for (int i = 0; i < 3; i++) {
   System.out.println(c_sizet_array.get(SIZE_T, i));
}

If SIZE_T has type ValueLayout.OfLong, then clients will be forced (by the static compiler) to use a long[] array when initializing the memory segment. Moreover, the dereference operation now allows clients to specify a layout, whose static type will influence which dereference overload will be selected - meaning that passing SIZE_T to MemorySegment::get will be guaranteed to return a long.

Unsafe dereference

In some cases it would be nice to have dereference helpers for unsafe access too - consider the following case:

MemoryAddress addr = ...
int v = MemorySegment.globalNativeSegment().get(JAVA_INT, addr.toRawLongOffset());

While this code works well, it is also very verbose. In a way, this is by design - that is, clients should dereference memory segments, not plain addresses, as the former are safer (e.g. memory segment feature both spatial and temporal bounds). So, a safer alternative would be to do this:

MemoryAddress addr = ...
int v = addr.asSegment(100).get(JAVA_INT, 0);

But, for casual native off-heap access (especially for one-time upcall stubs), it would be nice for clients to have convenience unsafe dereference routines which work directly on MemoryAddress instances:

MemoryAddress addr = ...
int v = addr.get(JAVA_INT, 0);

Unlike their counterparts in MemorySegments the dereference methods in MemoryAddress would be restricted methods, and using them would require clients to provide the --enable-native-access flag on the command line.

Linker classification

If carriers are pushed down to value layouts, we can simplify other areas of the foreign API as well. CLinker provides two main abstractions, to create downcall method handles (method handles targeting native functions) and upcall stubs (native function pointers targeting Java method handles). When linking, users have to provide both a Java MethodType and a FunctionDescriptor; the first describes the Java signature that callsites will be dealing with, while the latter describes the classification information that is required by the linker runtime to make it all work:

MethodHandle strlen = CLinker.getInstance().downcallHandle(
    strLenAddr, // obtained with SymbolLookup
    MethodType.methodType(long.class, MemoryAddress.class),
    FunctionDescriptor.of(C_LONG, C_POINTER)
);

If carriers are attached to value layouts, it is fairly easy to see how the linking process would only require one set of information, namely the function descriptor: in fact we could always derive a Java MethodType from the set of layouts associated with the function descriptor, using the following simple rules:

if the layout is a value layout with carrier C, then C will be the carrier associated with that layout
if the layout is a group layout, then MemorySegment.class will be used as a carrier

In other words, the additional carrier information attached to value layouts would allow the linker runtime to distinguish between similarly-sized layout (e.g. a 32-bit value layout which can be either a C int or a C float). Moreover, we can always add new carriers to add as much classification as required by the linker runtime. This means that the above linkage request can be expressed more succintly as follows:

MethodHandle strlen = CLinker.getInstance().downcallHandle(
    strLenAddr, // obtained with SymbolLookup
    FunctionDescriptor.of(C_LONG, C_POINTER)
);

That is, only a function descriptor parameter is required and the Java type of the downcall method handle will be derived accordingly.

Layout attributes and constants

One immediate consequence of doing ABI classification this way is that the linker runtime is no longer reliant on the layout attribute mechanism to distinguish between similarly-sized value layouts; in fact, we propose to completely drop support for layout attributes from the layout API. While we do not expect this functionality to be widely used, we could always decide, at a later point, to allow users to attach custom Map instances to layouts. Our implementation would not use this metadata, but would merely pass it along (e.g. when altering a ValueLayout with one of the wither methods provided by the API).

Another important thing to note: since value layouts are sharply typed, typing of certain C layout constants, such as C_INT becomes ambiguous (it would be ValueLayout.OfInt on Windows/x64 and ValueLayout.OfLong on Linux/x64). Instead of defining these constants with a less sharp type, we will opt to completely remove platform-dependent C layout constants from CLinker: after all, it is the job of extraction tools, not the linker, to come up with a set of layout constants which work for a given extraction unit. Clients not using jextract can either define custom C layouts as static constants, or they can simply use JAVA_INT, JAVA_LONG, etc. which is not too different from using types such as jint and jdouble in JNI code. This observation allows us to remove most of the clutter from the CLinker API, and to return a much simpler interface.

Linker Safety

Another issue that we wanted to address more explicitly by the Foreign Linker API is the one of safety of foreign calls: in other words, when passing structs by-reference to native calls, what happens if the scope associated with the struct is closed before the native call has completed? This can happen both in the confined and in the shared case, albeit to reproduce the issue with a confined scope we need at least to use upcalls (e.g. close the scope from a Java upcall).

The issue here is that the linker API forces clients of downcall method handles to erase by-reference parameters down to MemoryAddress instances, and then pass those instances instead. This creates some tension in the API: either we also make MemoryAddress a scoped abstraction (so that they keep track of the scope from which they originated), or we lose safety. But making MemoryAddress a scoped abstraction (as we did in 17) has drawbacks: often MemoryAddress is used when interacting with native code, to model native pointers coming out of downcall method handles; as such, it is attractive to think of MemoryAddress as a simple wrapper around a long value (a machine address), which can be converted, at the user request, to a fuller segment (by providing custom size and scope). But if MemoryAddress already has a scope, things get murkier, and we have to define what happens when clients happen to (maybe accidentally) override the existing scope.

We propose to address this issue with the following moves:

CLinker no longer erases by-reference parameters to MemoryAddress - the Addressable carrier is used instead;
The Addressable interface also gets a resource scope accessor; this scope will be used by the linker runtime to keep the by-reference parameter alive throughout the call;
MemoryAddress is an Addressable implementation whose scope is always the global scope.

With these changes, when we link strlen as above, the type of the resulting downcall method handle won’t be (MemoryAddress)long but (Addressable)long. This means that clients can pass memory segments directly, and have the linker runtime pass them by-reference, as follows:

MemorySegment str = ...
long length = strlen.invokeExact((Addressable)str);

Or, w/o invokeExact:

MemorySegment str = ...
long length = strlen.invoke(str);

The presence of the additional cast with the invokeExact semantics is unfortunate, but, after evaluating many alternatives, it also seems the lesser evil. In most cases, tools will just be happy with the Addressable type - in fact that’s exactly what jextract needs to generate its wrappers:

long strlen(Addressable x1) {
   try {
       return strlen_handle.invokeExact(x1);
   } ...
}

Note that no cast is required in the above code, as the jextract wrapper is already generic. When not using jextract, the user has a choice: either to add a cast, like above (which is not much more verbose than to add a trailing .address() call), or to convert the method handle type, as follows:

MethodHandle strlen_segment = CLinker.getInstance().downcallHandle(
    strLenAddr, // obtained with SymbolLookup
    FunctionDescriptor.of(C_LONG, C_POINTER)
).asType(long.class, MemorySegment.class);

...

MemorySegment str = ...
long length = strlen_exact.invokeExact(str);

Since we can tweak the method type associated with the downcall method handle with MethodHandle::asType it is easy to inject sharper types into the downcall method handle, and drop the cast at the callsite, even when using invokeExact.

Resource scopes

There are currently different kinds of resource scopes, partially overlapping with each other. Looking at the ResourceScope class we find three main factories, to create confined, shared and implicit scopes. The first two are said to be explicit scopes - that is, clients can (deterministically) close such scopes using the close() method. Implicit scopes, on the other hand, cannot be closed - attempting to do so will result in an exception. As such, the only way to dispose of resources associated with an implicit scope is to let the scope become unreachable.

In reality the picture is a bit more convoluted, since the API also allows creating explicit scopes that are associated with cleaner objects; such scopes can be closed via the close() method (as any other explicit scope), but they also allow the scope to be cleaned up when it becomes unreachable. In some way, these scopes are both implicit and explicit.

While the resource scope API itself is relatively simple, the amount of different, and subtly overlapping factories it provides can be jarring. We propose to address this issue, by always registering a resource scope against a cleaner; after all, scopes are long-lived entities, and the overhead for registering scopes with an internal cleaner is minimal. Since now all scopes feature both explicit and implicit deallocation, the API can provide only two kinds of scopes, namely confined and shared, respectively, and drop implicit scopes. The resulting API is safer, because it is no longer possible for a client to forget to call close() (the cleaner will kick in, and perform the associated cleanup). The API is also more uniform, since now all scopes (but the global scope, which is a singleton) can be closed, and used in a try-with-resources [²].

A last simplification we propose has been first discussed here and replaces the resource scope handle mechanism with a more direct way to express dependencies between scopes. With this mechanism in place, the following code:

void accept(MemorySegment segment1, MemorySegment segment2) {
   try {
       var handle1 = segment1.scope().acquire();
       var handle2 = segment2.scope().acquire();
       <critical section>
   } finally {
       segment1.scope().release(handle1);
       segment2.scope().release(handle2);
   }
}

Can be expressed, more succinctly, as follows:

void accept(MemorySegment segment1, MemorySegment segment2) {
   try (ResourceScope scope = ResourceScope.newConfinedScope()) {
       scope.keepAlive(segment1.scope());
       scope.keepAlive(segment2.scope());
       <critical section>
   }
}

Finally, we would like to make ResourceScope implement the SegmentAllocator interface. It is not uncommon to have to call a method which requires a segment allocator from a context in which only a scope is available. The implementation of the ResourceScope interface already implements SegmentAllocator, but this implementation is not exposed in the public API, which instead allows clients to convert from scopes to allocators using the SegmentAllocator::ofScope method. We believe that making the relationship between resource scope and allocators public would help to reduce the number of conversions required between the different abstractions provided by the foreign API.

Preview reshuffling

In preparation for the API to become a preview API, we plan to move all the classes in the jdk.incubator.foreign package under the java.lang.foreign package [³] in the java.base module. Additionally, we plan to make the following changes (this work might take place on a separate branch, to avoid conflicts):

The MemoryHandles class will be dropped and all its contents will be moved under MethodHandles; this makes sense since this class contains a general factory for memory access var handle, plus a set of general var handle combinators.
Remove the SymbolLookup abstraction; to lookup symbols loader symbols, we plan to add a lookup method in the ClassLoader class. Removing SymbolLookup now does not prevent us from adding a more powerful lookup mechanism in the future; neither does it prevent clients from defining custom chained lookups, e.g. using a Function<String, MemoryAddress>.
Rename ResourceScope. It has been noted that the ResourceScope name is slightly misleading, as the word scope is sometimes interpreted in the context of lexical scopes. While it is true that ResourceScope can provide, via the try-with-resource construct, a lexical scope within which allocation occurs, some uses of the ResourceScope abstraction have nothing to do with lexical scopes (e.g. shared segments stored in fields). For this reason, a more specific name might be chosen.

Since the changes described in the previous sections already lead to the removal of many of the ancillary classes such as MemoryAccess, MemoryCopy and MemoryLayouts, no further adjustment will be necessary.

Summing up

Overall, the changes described here makes the Foreign APIs much tighter, simpler and safer too. Attaching carriers to value layouts allow dereference operation to be more general, uniform and statically safe; it also allows us to simplify the linker classification story, as there’s no need to redundantly provide the same information using a separate MethodType argument when constructing downcall method handles. And, since downcall method handles no longer require clients to erase by-reference parameters to MemoryAddress, clients can just pass any subtype of Addressable (most notable memory segments) - and the linker API will keep the scope of by-reference parameters alive for the duration of the call. The role of MemoryAddress becomes much simpler, as MemoryAddress now becomes a simple wrapper around a long, which is used to model native pointers (in other words, obtaining a MemoryAddress from an on-heap segment is no longer allowed). Finally, associating scopes with cleaners by default allows us to greatly simplify the API and to make it safer when it comes to preventing accidental memory leaks.

A javadoc which summarizes the proposed API changes can be found here; the corresponding code changes can be found in this experimental branch which also contains the required adjustments for the jextract tool to work with the new API.

A similar idiom can also be used to enhance usability and static safety of bulk memory operations as well (not shown here) ↩
We will likely provide overloaded scope factories which allow clients to opt out of cleaners, in case scope allocation performance is critical. That said, this should be an advanced option, and we do expect most clients to be happy with the defaults provided by the simpler factories. ↩
We might decide to split functionalities in different packages - e.g. java.lang.foreign for the memory access API, and java.lang.foreign.invoke for the foreign linker API. ↩