Pre-Allocated Methods for HashMap and HashSet - Sip of Java

The classes HashMap and HashSet have been part of the JDK since JDK 1.2; however, for much of this time, these classes, and child classes of them, have had a less-than-optimal usage pattern. In this article, we will explore what common mistake developers have been making when using these classes, the impact of this mistake, and how a new factory method added to these classes in JDK 19 aims to address this. Let’s take a look.

Setting Initial Capacity

The classes HashMap and HashSet, and child classes, LinkedHashMap, LinkedHashSet, and WeakHashMap, provide constructors where a “capacity” value can be set. In fact, this value is the internal table size for the Map or Set, which needs to be larger than the number of entries the Map or Set is expected to contain. Specifying the wrong value can lead to costly reallocations while the Map or Set is being populated.

Setting Pre-Allocated Maps and Sets

To address this issue in JDK 19 the static factory method new<TypeName>(int numMappings) (e.g. newHashMap(int numMappings)) was added to HashMap, HashSet and child classes. The new factory method will return a Map or Set with its capacity set based on the default load factory of 0.75. The result of this change is a small but noticeable improvement in performance, as seen in the below table:

Method used:     Elements:    Execution time (avg):
Constructor             10           380.079  ns/op
Factory                 10           378.929  ns/op
Constructor            100          4135.819  ns/op
Factory                100          3847.426  ns/op
Constructor           1000         39970.521  ns/op
Factory               1000         37568.595  ns/op

Note: See additional reading for a link to the repo of the benchmarking tests the above results were taken. Performance testing is difficult and time-consuming, and results will vary based on usage and system architecture.

Using Pre-Allocated Maps and Sets

When developers use a Map or Set, the number of elements might not be known. In this case, the benefit of using a pre-allocated factory method is minimal or might result in excessive memory usage. The pre-allocated factory method should only be used when you have a very good understanding of the expected collection capacity.

Additional Reading

Happy coding!