Presto C++ Runtime Metrics¶
Overview¶
Presto C++ workers expose various runtime metrics that can be collected and monitored when
runtime-metrics-collection-enabled is set to true. These metrics are available through the
GET /v1/info/metrics endpoint in Prometheus data format.
For information on enabling metrics collection, see Presto C++ Features.
Executor Metrics¶
These metrics track the performance and queue sizes of various executors in the Presto C++ worker.
presto_cpp.driver_cpu_executor_queue_size¶
Type: gauge
Description: Number of tasks currently queued in the driver CPU executor waiting to be processed.
presto_cpp.driver_cpu_executor_latency_ms¶
Type: histogram
Unit: milliseconds
Description: Latency distribution of tasks in the driver CPU executor, measuring the time from task submission to execution start.
presto_cpp.spiller_executor_queue_size¶
Type: gauge
Description: Number of spilling tasks currently queued in the spiller executor.
presto_cpp.spiller_executor_latency_ms¶
Type: histogram
Unit: milliseconds
Description: Latency distribution of spilling tasks in the spiller executor.
presto_cpp.http_executor_latency_ms¶
Type: histogram
Unit: milliseconds
Description: Latency distribution of HTTP request processing tasks in the HTTP executor.
HTTP Metrics¶
These metrics track HTTP requests and responses in the Presto C++ worker.
presto_cpp.num_http_request¶
Type: counter
Description: Total number of HTTP requests received by the worker since startup.
presto_cpp.num_http_request_error¶
Type: counter
Description: Total number of HTTP request errors encountered by the worker since startup.
presto_cpp.http_request_latency_ms¶
Type: histogram
Unit: milliseconds
Description: Latency distribution of HTTP request processing, from receipt to response.
presto_cpp.http_request_size_bytes¶
Type: histogram
Unit: bytes
Description: Size distribution of HTTP request payloads.
HTTP Client Metrics¶
These metrics track HTTP client connection behavior for outbound requests.
presto_cpp.http.client.num_connections_created¶
Type: counter
Description: Total number of HTTP client connections created by the worker.
presto_cpp.http.client.connection_first_use¶
Type: counter
Description: Number of HTTP requests that are the first request on a new connection (sequence number == 0).
presto_cpp.http.client.connection_reuse¶
Type: counter
Description: Number of HTTP requests sent on reused connections (sequence number > 0).
presto_cpp.http.client.transaction_create_delay_ms¶
Type: histogram
Unit: milliseconds
Description: Delay in creating HTTP client transactions.
Exchange Metrics¶
These metrics track data exchange operations between workers.
presto_cpp.exchange_source_peak_queued_bytes¶
Type: gauge
Unit: bytes
Description: Peak number of bytes queued in PrestoExchangeSource waiting to be consumed.
presto_cpp.exchange.request.duration¶
Type: histogram
Unit: milliseconds
Description: Duration distribution of exchange data fetch requests.
presto_cpp.exchange.request.num_tries¶
Type: histogram
Description: Number of retry attempts for exchange data fetch requests.
presto_cpp.exchange.request.page_size¶
Type: histogram
Unit: bytes
Description: Size distribution of data pages fetched through exchange requests.
presto_cpp.exchange.get_data_size.duration¶
Type: histogram
Unit: milliseconds
Description: Duration distribution of operations to get the size of exchange data.
presto_cpp.exchange.get_data_size.num_tries¶
Type: histogram
Description: Number of retry attempts for getting exchange data size.
Query Context and Memory Metrics¶
These metrics track query execution contexts and memory usage.
presto_cpp.num_query_contexts¶
Type: gauge
Description: Current number of active query contexts in the worker.
presto_cpp.memory_manager_total_bytes¶
Type: gauge
Unit: bytes
Description: Total bytes currently used by the memory manager across all queries’ memory pools.
Task Metrics¶
These metrics track task lifecycle and execution states.
Task Counts¶
presto_cpp.num_tasks¶
Type: counter
Description: Total number of tasks created on this worker since startup.
presto_cpp.num_tasks_bytes_processed¶
Type: counter
Unit: bytes
Description: Total bytes processed by all tasks on this worker.
presto_cpp.num_tasks_running¶
Type: gauge
Description: Current number of tasks in running state.
presto_cpp.num_tasks_finished¶
Type: counter
Description: Total number of tasks that completed successfully.
presto_cpp.num_tasks_cancelled¶
Type: counter
Description: Total number of tasks that were cancelled.
presto_cpp.num_tasks_aborted¶
Type: counter
Description: Total number of tasks that were aborted.
presto_cpp.num_tasks_failed¶
Type: counter
Description: Total number of tasks that failed with an error.
presto_cpp.num_tasks_planned¶
Type: gauge
Description: Number of tasks that have been created but not yet started, including queued tasks.
presto_cpp.num_tasks_queued¶
Type: gauge
Description: Number of tasks currently waiting in the task queue.
Task Health Metrics¶
presto_cpp.num_zombie_velox_tasks¶
Type: gauge
Description: Number of zombie Velox tasks (tasks that are no longer active but not cleaned up).
presto_cpp.num_zombie_presto_tasks¶
Type: gauge
Description: Number of zombie Presto tasks (tasks that are no longer active but not cleaned up).
presto_cpp.num_tasks_with_stuck_operator¶
Type: gauge
Description: Number of tasks that have at least one stuck operator.
presto_cpp.num_cancelled_tasks_by_stuck_driver¶
Type: counter
Description: Total number of tasks cancelled due to stuck drivers.
presto_cpp.num_tasks_deadlock¶
Type: counter
Description: Total number of tasks that encountered deadlock conditions.
presto_cpp.num_tasks_manager_lock_timeout¶
Type: counter
Description: Number of times the task manager lock acquisition timed out.
Driver Metrics¶
These metrics track the state and execution of drivers within tasks.
Driver States¶
presto_cpp.num_queued_drivers¶
Type: gauge
Description: Number of drivers currently queued and waiting to execute.
presto_cpp.num_on_thread_drivers¶
Type: gauge
Description: Number of drivers currently executing on threads.
presto_cpp.num_suspended_drivers¶
Type: gauge
Description: Number of drivers that are suspended.
Driver Blocking Reasons¶
presto_cpp.num_blocked_wait_for_consumer_drivers¶
Type: gauge
Description: Number of drivers blocked waiting for downstream consumers to consume data.
presto_cpp.num_blocked_wait_for_split_drivers¶
Type: gauge
Description: Number of drivers blocked waiting for new splits to be assigned.
presto_cpp.num_blocked_wait_for_producer_drivers¶
Type: gauge
Description: Number of drivers blocked waiting for upstream producers to provide data.
presto_cpp.num_blocked_wait_for_join_build_drivers¶
Type: gauge
Description: Number of drivers blocked waiting for join build side to complete.
presto_cpp.num_blocked_wait_for_join_probe_drivers¶
Type: gauge
Description: Number of drivers blocked during join probe operations.
presto_cpp.num_blocked_wait_for_merge_join_right_side_drivers¶
Type: gauge
Description: Number of drivers blocked waiting for merge join right side data.
presto_cpp.num_blocked_wait_for_memory_drivers¶
Type: gauge
Description: Number of drivers blocked waiting for memory to become available.
presto_cpp.num_blocked_wait_for_connector_drivers¶
Type: gauge
Description: Number of drivers blocked waiting for connector operations to complete.
presto_cpp.num_blocked_yield_drivers¶
Type: gauge
Description: Number of drivers that have yielded execution.
presto_cpp.num_stuck_drivers¶
Type: gauge
Description: Number of drivers that appear to be stuck and not making progress.
Worker Overload Metrics¶
These metrics indicate when the worker is overloaded and may reject new work.
presto_cpp.overloaded_mem¶
Type: gauge
Description: Exports 100 if the worker is overloaded in terms of memory usage, 0 otherwise.
presto_cpp.overloaded_cpu¶
Type: gauge
Description: Exports 100 if the worker is overloaded in terms of CPU usage, 0 otherwise.
presto_cpp.overloaded¶
Type: gauge
Description: Exports 100 if the worker is overloaded in terms of either memory or CPU, 0 otherwise.
presto_cpp.task_planned_time_ms¶
Type: gauge
Unit: milliseconds
Description: Average time tasks spend in the planned state (queued) before starting execution.
presto_cpp.overloaded_duration_sec¶
Type: gauge
Unit: seconds
Description: Duration in seconds that the worker has been continuously overloaded, or 0 if not currently overloaded.
Output Buffer Metrics¶
These metrics track the partitioned output buffers used for shuffling data.
presto_cpp.num_partitioned_output_buffer¶
Type: gauge
Description: Total number of output buffers currently managed by all OutputBufferManagers.
presto_cpp.partitioned_output_buffer_get_data_latency_ms¶
Type: histogram
Unit: milliseconds
Description: Latency distribution of getData() calls on OutputBufferManager instances.
Worker Runtime Metrics¶
presto_cpp.worker_runtime_uptime_secs¶
Type: counter
Unit: seconds
Description: Worker runtime uptime in seconds after the worker process started. This metric tracks how long the worker has been running.
Operating System Metrics¶
These metrics provide insight into OS-level resource usage by the worker process.
CPU Time Metrics¶
presto_cpp.os_user_cpu_time_micros¶
Type: counter
Unit: microseconds
Description: User CPU time consumed by the presto_server process since the process started.
presto_cpp.os_system_cpu_time_micros¶
Type: counter
Unit: microseconds
Description: System CPU time consumed by the presto_server process since the process started.
Page Fault Metrics¶
presto_cpp.os_num_soft_page_faults¶
Type: counter
Description: Total number of soft page faults (page faults that can be resolved without disk I/O) encountered by the presto_server process since startup.
presto_cpp.os_num_hard_page_faults¶
Type: counter
Description: Total number of hard page faults (page faults requiring disk I/O) encountered by the presto_server process since startup.
Context Switch Metrics¶
presto_cpp.os_num_voluntary_context_switches¶
Type: counter
Description: Total number of voluntary context switches in the presto_server process (when the process yields the CPU voluntarily).
presto_cpp.os_num_forced_context_switches¶
Type: counter
Description: Total number of involuntary context switches in the presto_server process (when the process is preempted by the OS).
Hive Connector Metrics¶
These metrics track the performance of the Hive connector’s file handle cache. The metrics include
a placeholder {} in their name which is replaced with the connector name at runtime.
File Handle Cache Metrics¶
presto_cpp.{connector}.hive_file_handle_cache_num_elements¶
Type: gauge
Description: Number of elements currently in the Hive file handle cache.
presto_cpp.{connector}.hive_file_handle_cache_pinned_size¶
Type: gauge
Unit: bytes
Description: Total size of pinned (in-use) entries in the Hive file handle cache.
presto_cpp.{connector}.hive_file_handle_cache_cur_size¶
Type: gauge
Unit: bytes
Description: Current total size of the Hive file handle cache.
presto_cpp.{connector}.hive_file_handle_cache_num_accumulative_hits¶
Type: counter
Description: Cumulative number of cache hits in the Hive file handle cache since startup.
presto_cpp.{connector}.hive_file_handle_cache_num_accumulative_lookups¶
Type: counter
Description: Cumulative number of cache lookups in the Hive file handle cache since startup.
presto_cpp.{connector}.hive_file_handle_cache_num_hits¶
Type: counter
Description: Number of cache hits in the Hive file handle cache (recent window).
presto_cpp.{connector}.hive_file_handle_cache_num_lookups¶
Type: counter
Description: Number of cache lookups in the Hive file handle cache (recent window).
Thread Pool Metrics¶
These metrics track the state of various thread pools. The metrics include a placeholder {}
in their name which is replaced with the thread pool name at runtime.
presto_cpp.{pool}.num_threads¶
Type: gauge
Description: Current number of threads in the thread pool.
presto_cpp.{pool}.num_active_threads¶
Type: gauge
Description: Number of threads currently executing tasks in the thread pool.
presto_cpp.{pool}.num_pending_tasks¶
Type: gauge
Description: Number of tasks waiting to be executed in the thread pool.
presto_cpp.{pool}.num_total_tasks¶
Type: counter
Description: Total number of tasks that have been submitted to the thread pool since startup.
presto_cpp.{pool}.max_idle_time_ns¶
Type: gauge
Unit: nanoseconds
Description: Maximum idle time for threads in the pool before they are terminated.
EventBase Violation Metrics¶
These metrics track violations of the EventBase (event loop) threading model.
presto_cpp.exchange_io_evb_violation_count¶
Type: counter
Description: Number of times the exchange I/O EventBase threading model was violated (operations performed on wrong thread).
presto_cpp.http_server_io_evb_violation_count¶
Type: counter
Description: Number of times the HTTP server I/O EventBase threading model was violated.
Memory Pushback Metrics¶
These metrics track the memory pushback mechanism that helps prevent out-of-memory conditions.
presto_cpp.memory_pushback_count¶
Type: counter
Description: Number of times the memory pushback mechanism has been triggered.
presto_cpp.memory_pushback_latency_ms¶
Type: histogram
Unit: milliseconds
Range: 0-100,000 ms (0-100 seconds)
Percentiles: P50, P90, P99, P100
Description: Latency distribution of memory pushback operations, measuring how long each pushback attempt takes.
presto_cpp.memory_pushback_reduction_bytes¶
Type: histogram
Unit: bytes
Range: 0-15 GB (150 buckets)
Percentiles: P50, P90, P99, P100
Description: Distribution of actual memory usage reduction achieved by each memory pushback attempt. This metric helps gauge the effectiveness of the memory pushback mechanism.
presto_cpp.memory_pushback_expected_reduction_bytes¶
Type: histogram
Unit: bytes
Range: 0-15 GB (150 buckets)
Percentiles: P50, P90, P99, P100
Description: Distribution of expected memory usage reduction for each memory pushback attempt. The expected reduction may differ from actual reduction as other threads might allocate memory during the pushback operation.
Additional Runtime Metrics¶
For additional runtime metrics related to specific subsystems:
S3 FileSystem Metrics: When Presto C++ workers interact with S3, additional runtime metrics are collected. See the Velox S3 FileSystem documentation.
Velox Metrics: Metrics from the underlying Velox execution engine are also available. These are prefixed with
velox.instead ofpresto_cpp.. See the Velox metrics documentation.
Accessing Metrics¶
To access these metrics:
Enable metrics collection by setting
runtime-metrics-collection-enabled=truein your worker configuration.Query the metrics endpoint:
curl http://worker-host:7777/v1/info/metricsThe response will be in Prometheus text format, suitable for scraping by Prometheus or other monitoring systems.
Example Output¶
# TYPE presto_cpp_worker_runtime_uptime_secs counter
presto_cpp_worker_runtime_uptime_secs{cluster="production",worker="worker-01"} 3600
# TYPE presto_cpp_num_tasks_running gauge
presto_cpp_num_tasks_running{cluster="production",worker="worker-01"} 42
# TYPE presto_cpp_memory_manager_total_bytes gauge
presto_cpp_memory_manager_total_bytes{cluster="production",worker="worker-01"} 8589934592
See Also¶
Presto C++ Features - For information on enabling metrics collection
Presto C++ Configuration Properties - For worker configuration properties
Velox Metrics Documentation - For metrics from the Velox execution engine