Using OptimizedTypedSet to Improve Map and Array Functions
Function evaluation is a big part of projection CPU cost. Recently we optimized a set of functions that use TypedSet
, e.g. map_concat
, array_union
, array_intersect
, and array_except
. By introducing a new OptimizedTypeSet
, the above functions saw improvements in several dimensions:
- Up to 80% reduction in wall time and CPU time in JMH benchmarks
- Reserved memory reduced by 5%
- Allocation rate reduced by 80%
Furthermore, OptimizedTypeSet resolves the long standing issue of throwing EXCEEDED_FUNCTION_MEMORY_LIMIT
for large incoming blocks: "The input to function_name is too large. More than 4MB of memory is needed to hold the intermediate hash set.”
The OptimizedTypeSet
and improvements to the above mentioned functions are merged to master, and will be available from Presto 0.244.
In this post we will look at the methods used in this improvement.
Avoid Using Internal BlockBuilder
TypedSet
has an internal BlockBuilder
and appends each Block
position being added to it, as well as an external BlockBuilder
to construct the results. Block building using BlockBuilder
is very costly and inefficient, especially when memory growth is not handled properly. The BlockBuilder
was needed to keep track of the elements being added, so that it can:
- Resolve hash table probing collision
- Rehash
However, BlockBuilder
is not necessary for either of these issues. In all use cases, whole blocks (instead of several positions of a block) are added to the TypedSet
, and the problem of resolving collisions can be resolved by keeping track of the blocks being added. Rehashing is not needed in most cases because we can calculate the maximum number of entries before creating the TypedSet
.
In the new design, we provided methods that add a whole Block
with different set operations:
- union
- intersect
- except
These methods take in a Block
and remember the selected positions. The operations can be applied on the same set multiple times, i.e. you can union two Block A and B, then intersect with Block C, then minus(except) Block D. This way the internal operations on the Block elements can be streamlined to more efficient loops. The memory consumption and allocation were also reduced because now we only need to remember the selected positions instead of building a whole new Block internally.
Avoid Recomputing Hash Positions
Calculating hash positions took over 60% of original function costs. For some operations like set intersection, the previous implementation requires creating multiple TypedSet
s and calculating the hash position multiple times. For example, the array_intersect
function calculates the distinct common elements from two incoming Block
s, one from the right side and the other one from the left side. It builds a TypedSet
R which contains a hash table R
for the right side Block
first. When inserting a new element X
from the left side Block
, it first checks if hash table R
contains the element. If not, the new element will be added to the other TypedSet
L, which has another hash table L
and represents the final results.
As shown in the above picture, the contains
and add
operation would both calculate the hash positions for hash table R
and L
respectively. This hash position indicates the starting position of the linear probing for both hash tables. But actually the hashPosition calculated in hash table R
can be reused by hash table L
if the hash table sizes and hash functions are the same.
In the new OptimizedTypedSet
, the first step is the same as the original implementation, in which the right side Block
is added to the set first, and hash table R
is constructed. Intersecting the set with the left side Block
was done by creating a new hash table L
of the same size (size precalculated) internally: it checks if the new elements are contained in hash table R
, and if not, add them to hash table L
. Unlike the original implementation, the contains
and add
operation share the same hash position calculation. When the new hash table L
is constructed, the hash table R
will be discarded.
Next Steps
The goal of the above PR was to demonstrate the benefit using the OptimizedTypedSet on 4 functions. There are a few other usages, e.g. MultimapAggregationFunction
, MapFromEntriesFunction
, etc. After we change them to use the new OptimizedTypedSet
, the legacy TypedSet
implementation can be removed.
Further reading
For more information please refer to the PR#15362.