Loom Q&A

The publication of the State of Loom has prompted some common questions. Here’s my attempt at answering some of them.

Loom merely miniaturizes threads; doesn't it mean we still have all the same problems we have with threads today, only more of them?

​ Some of the biggest problems with threads arise because they are costly, limited resources that must be shared and carefully managed. That we can “right-size” threads to get a thread-per-task association in itself solves many of threads’ biggest problems such as pool management, thread-local leaks, and complex interruption. So while virtual threads are “only” miniaturized threads, miniaturization dramatically changes how we think and use a construct. A smartphone can be thought of as a miniaturized mainframe, yet it doesn’t carry with it the administration problems required from a multi-user, always-on machine. It is true that the ability to have millions of threads might introduce new kinds of problems, but that threads can now be fully associated with logical operations offers simple ways of managing them that are oriented around the business logic. That is the goal of structured concurrency.

One of the benefits Loom brings is that virtual threads are recognized by the various serviceability and observability tools like debuggers and profilers (JFR) and are used as their "context units". Couldn't we teach the platform about other kinds of task contexts like, say, reactive streams?

​ That would be a bad use of abstraction. A thread is already recognized as a basic unit of context. Virtual threads are lightweight enough to lend context to even small, short-lived tasks, so there is no need to add yet another kind of context. The thread abstraction serves us well; the problem so far has been the implementation.

My application is bounded by CPU/database capacity, and Loom doesn't help with that, so what's the point?

​ Loom can’t conjure hardware resources and it has no effect on software components where it is not used at all. It facilitates the better use of available resources in a natural, transparent, and fine-grained way. For example, instead of managing multiple thread pools with different sizes, a semaphore could be acquired where a limited resource, like a database, is used; threads that don’t use the database won’t be affected, and those that do, will coordinate through a centralized, easy-to-observe mechanism.

Cooperative scheduling, as offered by async/await in other languages allows me to know exactly where my the concurrent scheduling points are. Isn't that a better way to write concurrent code than relying on the thread abstraction?

​ No. For one, the Java platform already has preemptive threads, so cooperative scheduling would only add another kind of concurrency in addition to the thread-based one, and an incompatible one at that. Instead of one problem, we’d have two.

For another, even on platforms that don’t already offer preemptive scheduling, like JavaScript, cooperative scheduling is an inferior solution, especially where tasks might interact with one another in multiple ways, as opposed to, say, just sharing the UI. One can think of the difference between preemptive scheduling and cooperative scheduling as a choice of different defaults. With cooperative scheduling, by default every operation takes place in a critical section where no operations of other tasks can be interleaved, and we explicitly mark the points where interleaving is allowed occur, while with preemptive scheduling, interleaving can occur anywhere except where we explicitly forbid it with a critical section guarded by a lock. The latter default is preferable in the majority of server-side uses because most operations are confined and insensitive to scheduling points and because locks allow for critical sections that are specific to a particular shared resource. But most importantly, when we explicitly mark our critical sections our code is more robust in the face of changes. In the cooperative scheduling case, adding a scheduling point inside a subroutine (which is often necessary to allow it to do I/O) can break important but implicit assumptions in all of its transitive callers.

~