On Parallelism and Concurrency

Ron Pressler on November 30, 2021

A little over a week ago, we asked the Java user community for feedback on a new construct for structured concurrency we unveiled as part of Project Loom. It is intended to both make writing correct and efficient concurrent code easier to write, as well as make it easier for tools, like the new new hierarchical thread dump mechanism, to observe a concurrent program.

When we design a new construct, we try to start with a well-defined problem, which we describe in the motivation section of the JEP. Pinpointing the problem is the most important, and often the hardest part of the process. Accordingly, useful feedback takes the form of identifying problems with the proposed solution: either the solution doesn’t quite address the original problem, or perhaps it introduces another.

But there is another kind of feedback that is helpful, albeit implicitly: people’s responses can indicate that the problem we want to solve is itself not clear. In this case, there might be a fundamental difficulty in understsanding the problem, and it’s not a new one: the confusion between concurrency and parallelism.

Parallelism vs. Concurrency

The confusion is understandable and certainly isn’t unique to Java. For one, both concurrency and parallelism involve doing multiple things at the same time. For another, threads, which are often employed for both concurrency and parallelism, serve two different roles that are difficult to disentangle, one is more pertinent for parallelism, and the other for concurrency.

I will attempt to first define the two concepts and then try to to bring them together, into a single unifying framework that will help, I hope, see why they require different solutions. The definitions we’ll use might not be universal, but they conform to the ACM’s curriculum guidelines.

Parallelism is the problem of doing a single job — say, inverting a matrix or sorting a list — faster by employing multiple processing units. This is done by breaking up the job into multiple cooperating subtasks, different subsets of which running on a different core. Concurrency, in contrast, is the problem of scheduling some computational resources to a set of largely independent tasks that compete over those resources. The main performance measure we care about when we speak of concurrency isn’t the duration (latency) of any single task, but throughput — the number of tasks we can process per time-unit.

Unlike in the case of parallelism, where the cooperating sub-tasks are created by the parallel algorithm itself, the competing tasks in concurrency originate externally, and are part of the problem, not the solution. In other words, the reason we have multiple tasks in parallelism is because we want to do some job faster; the reason we have multiple tasks in concurrency is because handling multiple tasks is the job. The canonical example of a concurrent program is a server that serves requests arriving over the wire.

Once we learn these definitions, and that Loom’s virtual threads are intended for concurrency while parallel streams are for parallelism, we’re done: we use parallel streams when we need to sort a very large list, and we use virtual threads when we write a server.

But in this post I’d like to go deeper. These two problem statements are so different — I once compared parallelism to a school of piranhas devouring large prey and concurrency to cars traveling the streets of a city — that it’s almost a wonder anyone could confuse them. That we do confuse them indicates some similarity that, rather than dismiss, we’ll try to embrace.

A Common Framework

Let’s ignore the origin of the tasks or their purpose, and whether they’re cooperating or competing. If we focus only on the mechanism, we find that in both situations, for whatever reason, we want to work on some number of tasks at the same time. At this point we can allow ourselves some confusion and ask, why would we need different mechanisms for the two problems? To answer that, we will now re-introduce the requirements in both situations, and the means we have at our disposal to meet them.

Let’s begin with parallelism: We want to do multiple things at once to finish a job faster. How is that possible? If the task we need perform could be carried out with a device that can make progress on the task at some constant rate, and we had multiple such resources, we could split up our work into multiple tasks, and assign different subsets of those tasks to one of the resources. Problems of the kind mentioned before — sorting a large list or inverting a large matrix — can be performed by such resources we have at our disposal: processing cores. The parallel algorithm will split up the work into subtasks (that coordinate with each other in some efficient way) and divide them among our processing cores.

While the OS kernel has direct control over CPU cores, it does not expose CPU allocation directly to userspace programs. Instead, it offers a construct to indirectly — and only approximately — control CPU allocation: threads. Assuming no other processes are running on the same machine, if the number of threads is less than or equal to the number of cores, we expect the OS to assign each to a different core. Because this kind of parallel problem uses threads as a proxy for processing units, the number of threads we need to do the work faster is independent of its size. If we have ten cores, a parallel computation would optimally use ten threads whether the list it needs to sort has a million elements or a billion.

Now, with concurrency we have a stream of largely independent tasks arriving from an external source and competing for our resources. If the only kind of resource they require is processing, then, as before, all we’d need to do — all we’d be able to do — is schedule them to processing cores by assigning them to a relatively small number of threads. But it is often the case that those tasks cannot be completed by the CPU and require some other resource accessed with I/O like, say, a database or some HTTP service. Indeed, it is often the case where most of the time required to process such a task is spent in I/O. More threads couldn’t buy us more CPU to accelerate our parallel job, but they certainly don’t buy us more databases and services, either. How would doing multiple things concurrently, presumably using threads, help us at all in this case?

Recall that our goal with concurrency isn’t to handle tasks more quickly — i.e. with lower latency — but to complete as many of them as possible per time-unit, achieving high throughput. Even if each task takes a constant time to process, a constant latency of, say, one second, we could still process one of them per second or a million.

Little’s law, which relates throughput and latency in a concurrent system, tells us that we can increase our throughput by increasing our level of concurrency — the number of tasks we can start and make progress on concurrently. A relatively pleasant way to do that is to employ threads, which allow multiple sequential tasks to make progress independently, meaning all could make progress even if none has finished yet. But how many threads would we need? The answer is quite a lot.

If each request comes through some networking server socket and perhaps requires some other client sockets to communicate with other services, then the number of sockets limits the number of requests we can process concurrently, but that limit is high. If in the case of parallelism we needed a relatively small number of threads (equal to the number of CPU cores) regardless of how big the job, in the case of concurrency we want a very large number of threads regardless of how many cores we have (for a more detailed look on how virtual threads help increase the throughput of a concurrent system see this post).

True, each of those threads will consume some limited resource on our machine — some RAM, some networking bandwidth, some CPU — that at some point will become saturated and put a limit on the number of threads we can use effectively. If each thread consumes a significant amount of CPU, then that would become a limiting factor on their number, and so saying that virtual threads help with I/O-bound workloads but not with CPU-bound ones is, indeed correct, although what matters here is simply the number of threads, not their implementation, virtual or OS. But the key insight is that we’re using threads very differently from how we use them for parallelism. We do not use them as an interface for assigning cores, but for a different function: the ability to make independent progress on multiple operations.

Why do threads perform those two different functions? We can think of them as scheduling processing resources over both time and space. Allocating different CPU cores schedules processing over space, making progress on different cores simultaneously, while making independent progress on multiple tasks schedules processing over time, but not necessarily simultaneously[¹]. We could, therefore think of parallelism as the problem of scheduling resources over space and of concurrency as the problem of scheduling resources over time. I don’t know if that’s helpful, but it sure sounds cool.

Conclusion

Concurrency and parallelism are two very different problems. But even if we focus only on what they have in common — they perform multiple actions by employing multiple threads — they use threads in different ways.

Project Loom’s virtual threads allow us to create many more threads than we could with OS threads. Having more threads helps with concurrency, where the ideal number of threads is a function of the workload, but not with parallelism, where the ideal number is a function of the machine. The interesting difference that requires different mechanisms for tackling them is that problems that exploit threads’ function as proxies for CPU-cores don’t benefit from having more of them, while problems that exploit threads’ function of juggling many sequential operations that make progress independently do benefit from more. Loom’s APIs are intended to address this latter problem of juggling many largely independent tasks.

To know whether virtual threads will help your workload, ask yourself: would it benefit from having lots more threads or does it actually need more cores?

And besides, processing is special. Like the joke subroutine for determining whether the computer is on that I saw mentioned on Hacker News the other day, processing is, after all, the only resource you can ask for only if you already have it. It is required to ask for any other resource, too, and so it is, understandably, easier to use when more implicit, sometimes to the point we don’t notice it at all. It is the water in which computer programs swim. ↩