Alluxio SDK Cache

Overview

Presto supports caching input data with a built-in Alluxio SDK cache to reduce query latency using the Hive Connector. This built-in cache utilizes local storage (such as SSD) on each worker with configurable capacity and locations. To understand its internals and the benchmark results of latency improvement, please read this article. Note that this is a read-cache, which is completely transparent to users and fully managed by individual Presto workers. To provide Presto with an independent, distributed cache service for read/write workloads with customized data caching policies, please refer to Alluxio Cache Service.

Setup

Enabling the Alluxio SDK cache is quite simple. Include the following configuration in etc/catalog/hive.properties and restart the Presto coordinator and workers:

hive.node-selection-strategy=SOFT_AFFINITY
cache.enabled=true
cache.type=ALLUXIO
cache.alluxio.max-cache-size=500GB
cache.base-directory=/tmp/alluxio-cache

In the above example configuration,

  • hive.node-selection-strategy=SOFT_AFFINITY instructs Presto scheduler to take data affinity into consideration when scheduling tasks to workers that enables meaningful data caching effectiveness. This configuration property defaults to NO_PREFERENCE and SDK cache is only enabled when set to SOFT_AFFINITY. Other configuration on coordinator that can impact data affinity includes node-scheduler.max-pending-splits-per-task (the max pending splits per task) and node-scheduler.max-splits-per-node (the max splits per node).

  • cache.enabled=true turns on the SDK cache and cache.type=ALLUXIO sets it to Alluxio.

  • cache.alluxio.max-cache-size=500GB sets storage space to be 500GB.

  • cache.base-directory=/tmp/alluxio-cache specifies a local directory /tmp/alluxio-cache. Note that this Presto server must have both read and write permission to access this local directory.

When affinity scheduling is enabled, a set of preferred nodes is assigned to a certain file section. The default file section size is 256MB. For example, if the file size is 512MB, two different affinity preferences will be assigned:

  • [0MB..256MB] -> NodeA, NodeB

  • [256MB+1B..512MB] -> NodeC, NodeD

The section is selected based on the split start offset. A split that has its first byte in the first section is preferred to be scheduled on NodeA or NodeB.

Change the size of the section by setting the hive.affinity-scheduling-file-section-size configuration property or the affinity_scheduling_file_section_size session property.

Monitoring

This Alluxio SDK cache is completely transparent to users. To verify if the cache is working, you can check the directory set by cache.base-directory and see if temporary files are created there. Additionally, Alluxio exports various JMX metrics while performing caching-related operations. System administrators can monitor cache usage across the cluster by checking the following metrics:

  • Client.CacheBytesEvicted: Total number of bytes evicted from the client cache.

  • Client.CacheBytesReadCache: Total number of bytes read from the client cache (e.g., cache hit).

  • Client.CacheBytesRequestedExternal: Total number of bytes the user requested to read which resulted in a cache miss. This number may be smaller than Client.CacheBytesReadExternal due to chunk reads.

  • Client.CacheHitRate: The hit rate measured by (# bytes read from cache) / (# bytes requested).

  • Client.CacheSpaceAvailable: Amount of bytes available in the client cache.

  • Client.CacheSpaceUsed: Amount of bytes used by the client cache.

Please refer to Alluxio client metrics for a full list of available metrics.