Alluxio SDK Cache
Overview
Presto supports caching input data with a built-in Alluxio SDK cache to reduce query latency using the Hive Connector. This built-in cache utilizes local storage (such as SSD) on each worker with configurable capacity and locations. To understand its internals and the benchmark results of latency improvement, please read this article. Note that this is a read-cache, which is completely transparent to users and fully managed by individual Presto workers. To provide Presto with an independent, distributed cache service for read/write workloads with customized data caching policies, please refer to Alluxio Cache Service.
Setup
Enabling the Alluxio SDK cache is quite simple. Include the following configuration in etc/catalog/hive.properties
and restart the Presto coordinator and workers:
hive.node-selection-strategy=SOFT_AFFINITY
cache.enabled=true
cache.type=ALLUXIO
cache.alluxio.max-cache-size=500GB
cache.base-directory=/tmp/alluxio-cache
In the above example configuration,
hive.node-selection-strategy=SOFT_AFFINITY
instructs Presto scheduler to take data affinity into consideration when scheduling tasks to workers that enables meaningful data caching effectiveness. This configuration property defaults toNO_PREFERENCE
and SDK cache is only enabled when set toSOFT_AFFINITY
. Other configuration on coordinator that can impact data affinity includesnode-scheduler.max-pending-splits-per-task
(the max pending splits per task) andnode-scheduler.max-splits-per-node
(the max splits per node).cache.enabled=true
turns on the SDK cache andcache.type=ALLUXIO
sets it to Alluxio.cache.alluxio.max-cache-size=500GB
sets storage space to be 500GB.cache.base-directory=/tmp/alluxio-cache
specifies a local directory/tmp/alluxio-cache
. Note that this Presto server must have both read and write permission to access this local directory.
When affinity scheduling is enabled, a set of preferred nodes is assigned to a certain file section. The default file section size is 256MB
. For example, if the file size is 512MB, two different affinity preferences will be assigned:
[0MB..256MB] -> NodeA, NodeB
[256MB+1B..512MB] -> NodeC, NodeD
The section is selected based on the split start offset. A split that has its first byte in the first section is preferred to be scheduled on NodeA
or NodeB
.
Change the size of the section by setting the hive.affinity-scheduling-file-section-size
configuration property or the affinity_scheduling_file_section_size
session property.
Monitoring
This Alluxio SDK cache is completely transparent to users. To verify if the cache is working, you can check the directory set by cache.base-directory
and see if temporary files are created there. Additionally, Alluxio exports various JMX metrics while performing caching-related operations. System administrators can monitor cache usage across the cluster by checking the following metrics:
Client.CacheBytesEvicted
: Total number of bytes evicted from the client cache.Client.CacheBytesReadCache
: Total number of bytes read from the client cache (e.g., cache hit).Client.CacheBytesRequestedExternal
: Total number of bytes the user requested to read which resulted in a cache miss. This number may be smaller than Client.CacheBytesReadExternal due to chunk reads.Client.CacheHitRate
: The hit rate measured by (# bytes read from cache) / (# bytes requested).Client.CacheSpaceAvailable
: Amount of bytes available in the client cache.Client.CacheSpaceUsed
: Amount of bytes used by the client cache.
Please refer to Alluxio client metrics for a full list of available metrics.