Presto Session Properties

This section describes session properties that may be used to tune Presto or alter its behavior when required.

The following is not a complete list of all session properties available in Presto, and does not include any connector-specific catalog properties.

For information on catalog properties, see the connector documentation.

For information on configuration properties, see Presto Configuration Properties.

General Properties

join_distribution_type

  • Type: string

  • Allowed values: AUTOMATIC, PARTITIONED, BROADCAST

  • Default value: AUTOMATIC

The type of distributed join to use. When set to PARTITIONED, presto will use hash distributed joins. When set to BROADCAST, it will broadcast the right table to all nodes in the cluster that have data from the left table. Partitioned joins require redistributing both tables using a hash of the join key. This can be slower (sometimes substantially) than broadcast joins, but allows much larger joins. In particular broadcast joins will be faster if the right table is much smaller than the left. However, broadcast joins require that the tables on the right side of the join after filtering fit in memory on each node, whereas distributed joins only need to fit in distributed memory across all nodes. When set to AUTOMATIC, Presto will make a cost based decision as to which distribution type is optimal. It will also consider switching the left and right inputs to the join. In AUTOMATIC mode, Presto will default to hash distributed joins if no cost could be computed, such as if the tables do not have statistics.

The corresponding configuration property is join-distribution-type.

redistribute_writes

  • Type: boolean

  • Default value: false

This property enables redistribution of data before writing. This can eliminate the performance impact of data skew when writing by hashing it across nodes in the cluster. It can be disabled when it is known that the output data set is not skewed in order to avoid the overhead of hashing and redistributing all the data across the network.

When both scale_writers and redistribute_writes are set to true, scale_writers takes precedence.

The corresponding configuration property is redistribute-writes.

scale_writers

  • Type: boolean

  • Default value: true

This property enables dynamic scaling of writer tasks based on throughput. When enabled, Presto automatically adjusts the number of writer tasks to use the minimum necessary for optimal performance. This can improve resource utilization by scaling out writers only when needed based on data throughput.

When both scale_writers and redistribute_writes are set to true, scale_writers takes precedence.

The corresponding configuration property is scale-writers.

task_writer_count

  • Type: integer

  • Default value: 1

Default number of local parallel table writer threads per worker. It is required to be a power of two for a Java query engine.

The corresponding configuration property is task.writer-count.

task_partitioned_writer_count

  • Type: integer

  • Default value: task_writer_count

Number of local parallel table writer threads per worker for partitioned writes. If not set, the number set by task_writer_count will be used. It is required to be a power of two for a Java query engine.

single_node_execution_enabled

  • Type: boolean

  • Default value: false

This property ensures that queries scheduled in this cluster use only a single node for execution, which may improve performance for small queries which can be executed within a single node.

The corresponding configuration property is single-node-execution-enabled.

offset_clause_enabled

  • Type: boolean

  • Default value: false

To enable the OFFSET clause in SQL query expressions, set this property to true.

The corresponding configuration property is offset-clause-enabled.

check_access_control_on_utilized_columns_only

  • Type: boolean

  • Default value: true

Apply access control rules on only those columns that are required to produce the query output.

Note: Setting this property to true with the following kinds of queries:

  • queries that have USING in a join condition

  • queries that have duplicate named common table expressions (CTE)

causes the query to be evaluated as if the property is set to false and checks the access control for all columns.

To avoid these problems:

  • replace all USING join conditions in a query with ON join conditions

  • set unique names for all CTEs in a query

The corresponding configuration property is check-access-control-on-utilized-columns-only.

max_serializable_object_size

  • Type: long

  • Default value: 1000

Maximum object size in bytes that can be considered serializable in a function call by the coordinator.

The corresponding configuration property is max-serializable-object-size.

max_prefixes_count

  • Type: integer

  • Minimum value: 1

  • Default value: 100

Maximum number of prefixes (catalog/schema/table scopes used to narrow metadata lookups) that Presto generates when querying information_schema. If the number of computed prefixes exceeds this limit, Presto falls back to a single broader prefix (catalog only). If it’s below the limit, the generated prefixes are used.

The corresponding configuration property is max-prefixes-count.

try_function_catchable_errors

  • Type: string

  • Default value: "" (empty string)

A comma-separated list of error code names that the TRY() function should catch and return NULL for, in addition to the default catchable errors (such as DIVISION_BY_ZERO, INVALID_CAST_ARGUMENT, INVALID_FUNCTION_ARGUMENT, and NUMERIC_VALUE_OUT_OF_RANGE).

This allows users to specify exactly which additional errors TRY() should suppress. Error codes are matched by their name (such as GENERIC_INTERNAL_ERROR, INVALID_ARGUMENTS).

Example usage:

SET SESSION try_function_catchable_errors = 'GENERIC_INTERNAL_ERROR,INVALID_ARGUMENTS';
SELECT TRY(my_function(x)) FROM table;

The corresponding configuration property is try-function-catchable-errors.

Spilling Properties

spill_enabled

  • Type: boolean

  • Default value: false

Try spilling memory to disk to avoid exceeding memory limits for the query.

Spilling works by offloading memory to disk. This process can allow a query with a large memory footprint to pass at the cost of slower execution times. See Supported Operations for a list of operations that support spilling.

Be aware that this is an experimental feature and should be used with care.

The corresponding configuration property is experimental.spill-enabled.

join_spill_enabled

  • Type: boolean

  • Default value: true

When spill_enabled is true, this determines whether Presto will try spilling memory to disk for joins to avoid exceeding memory limits for the query.

The corresponding configuration property is experimental.join-spill-enabled.

aggregation_spill_enabled

  • Type: boolean

  • Default value: true

When spill_enabled is true, this determines whether Presto will try spilling memory to disk for aggregations to avoid exceeding memory limits for the query.

The corresponding configuration property is experimental.aggregation-spill-enabled.

distinct_aggregation_spill_enabled

  • Type: boolean

  • Default value: true

When aggregation_spill_enabled is true, this determines whether Presto will try spilling memory to disk for distinct aggregations to avoid exceeding memory limits for the query.

The corresponding configuration property is experimental.distinct-aggregation-spill-enabled.

order_by_aggregation_spill_enabled

  • Type: boolean

  • Default value: true

When aggregation_spill_enabled is true, this determines whether Presto will try spilling memory to disk for order by aggregations to avoid exceeding memory limits for the query.

The corresponding configuration property is experimental.order-by-aggregation-spill-enabled.

window_spill_enabled

  • Type: boolean

  • Default value: true

When spill_enabled is true, this determines whether Presto will try spilling memory to disk for window functions to avoid exceeding memory limits for the query.

The corresponding configuration property is experimental.window-spill-enabled.

order_by_spill_enabled

  • Type: boolean

  • Default value: true

When spill_enabled is true, this determines whether Presto will try spilling memory to disk for order by to avoid exceeding memory limits for the query.

The corresponding configuration property is experimental.order-by-spill-enabled.

aggregation_operator_unspill_memory_limit

  • Type: data size

  • Default value: 4 MB

Limit for memory used for unspilling a single aggregation operator instance.

The corresponding configuration property is experimental.aggregation-operator-unspill-memory-limit.

Task Properties

task_concurrency

  • Type: integer

  • Restrictions: must be a power of two

  • Default value: 16

Default local concurrency for parallel operators such as joins and aggregations. This value should be adjusted up or down based on the query concurrency and worker resource utilization. Lower values are better for clusters that run many queries concurrently because the cluster will already be utilized by all the running queries, so adding more concurrency will result in slow downs due to context switching and other overhead. Higher values are better for clusters that only run one or a few queries at a time.

The corresponding configuration property is task.concurrency.

task_writer_count

  • Type: integer

  • Restrictions: must be a power of two

  • Default value: 1

The number of concurrent writer threads per worker per query. Increasing this value may increase write speed, especially when a query is not I/O bound and can take advantage of additional CPU for parallel writes (some connectors can be bottlenecked on CPU when writing due to compression or other factors). Setting this too high may cause the cluster to become overloaded due to excessive resource utilization.

The corresponding configuration property is task.writer-count.

Optimizer Properties

dictionary_aggregation

  • Type: boolean

  • Default value: false

Enables optimization for aggregations on dictionaries.

The corresponding configuration property is optimizer.dictionary-aggregation.

optimize_hash_generation

  • Type: boolean

  • Default value: true

Compute hash codes for distribution, joins, and aggregations early during execution, allowing result to be shared between operations later in the query. This can reduce CPU usage by avoiding computing the same hash multiple times, but at the cost of additional network transfer for the hashes. In most cases it will decrease overall query processing time.

It is often helpful to disable this property when using EXPLAIN in order to make the query plan easier to read.

The corresponding configuration property is optimizer.optimize-hash-generation.

pre_aggregate_before_grouping_sets

  • Type: boolean

  • Default value: false

When enabled, inserts a partial aggregation below the GroupId node in grouping sets queries to reduce the number of rows that GroupId multiplies across grouping sets. The partial aggregation groups by the union of all grouping set columns (the finest granularity needed), which can drastically reduce the input to GroupId. This is most effective when the data has high cardinality on the grouping columns, as the pre-aggregation can significantly reduce the row count before multiplication.

Only applies to decomposable aggregation functions such as SUM, COUNT, MIN, or MAX that support partial/intermediate/final splitting.

The corresponding configuration property is optimizer.pre-aggregate-before-grouping-sets.

push_aggregation_through_join

  • Type: boolean

  • Default value: true

When an aggregation is above an outer join and all columns from the outer side of the join are in the grouping clause, the aggregation is pushed below the outer join. This optimization is particularly useful for correlated scalar subqueries, which get rewritten to an aggregation over an outer join. For example:

SELECT * FROM item i
    WHERE i.i_current_price > (
        SELECT AVG(j.i_current_price) FROM item j
            WHERE i.i_category = j.i_category);

Enabling this optimization can substantially speed up queries by reducing the amount of data that needs to be processed by the join. However, it may slow down some queries that have very selective joins.

The corresponding configuration property is optimizer.push-aggregation-through-join.

push_partial_aggregation_through_join

  • Type: boolean

  • Default value: false

When a partial aggregation is above an inner join and all aggregation inputs come from only one side of the join, the partial aggregation is pushed below the join to that side. This reduces the amount of data flowing into the join operator, which can improve performance by allowing the aggregation to pre-reduce data before the join is performed.

The corresponding configuration property is optimizer.push-partial-aggregation-through-join.

push_projection_through_cross_join

  • Type: boolean

  • Default value: false

When enabled, pushes projection expressions through cross join nodes so that each expression is evaluated only on the side of the cross join that provides its input variables. This reduces the number of columns flowing through the cross join and avoids recomputing expressions on the multiplied output rows.

Only deterministic expressions are pushed. Expressions that reference variables from both sides of the cross join, or constant expressions, remain above the join.

The corresponding configuration property is optimizer.push-projection-through-cross-join.

push_table_write_through_union

  • Type: boolean

  • Default value: true

Parallelize writes when using UNION ALL in queries that write data. This improves the speed of writing output tables in UNION ALL queries because these writes do not require additional synchronization when collecting results. Enabling this optimization can improve UNION ALL speed when write speed is not yet saturated. However, it may slow down queries in an already heavily loaded system.

The corresponding configuration property is optimizer.push-table-write-through-union.

join_reordering_strategy

  • Type: string

  • Allowed values: AUTOMATIC, ELIMINATE_CROSS_JOINS, NONE

  • Default value: AUTOMATIC

The join reordering strategy to use. NONE maintains the order the tables are listed in the query. ELIMINATE_CROSS_JOINS reorders joins to eliminate cross joins where possible and otherwise maintains the original query order. When reordering joins it also strives to maintain the original table order as much as possible. AUTOMATIC enumerates possible orders and uses statistics-based cost estimation to determine the least cost order. If stats are not available or if for any reason a cost could not be computed, the ELIMINATE_CROSS_JOINS strategy is used.

The corresponding configuration property is optimizer.join-reordering-strategy.

confidence_based_broadcast

  • Type: boolean

  • Default value: false

Enable broadcasting based on the confidence of the statistics that are being used, by broadcasting the side of a joinNode which has the highest (HIGH or FACT) confidence statistics. If both sides have the same confidence statistics, then the original behavior will be followed.

The corresponding configuration property is optimizer.confidence-based-broadcast.

treat-low-confidence-zero-estimation-as-unknown

  • Type: boolean

  • Default value: false

Enable treating LOW confidence, zero estimations as UNKNOWN during joins.

The corresponding configuration property is optimizer.treat-low-confidence-zero-estimation-as-unknown.

retry-query-with-history-based-optimization

  • Type: boolean

  • Default value: false

Enable retry for failed queries who can potentially be helped by HBO.

The corresponding configuration property is optimizer.retry-query-with-history-based-optimization.

optimizer_inner_join_pushdown_enabled

  • Type: boolean

  • Default value: false

Enable push down inner join predicates to database. Only allows equality joins to be pushed down. Use optimizer_inequality_join_pushdown_enabled along with this configuration to push down inequality join predicates.

The corresponding configuration property is optimizer.inner-join-pushdown-enabled.

optimizer_inequality_join_pushdown_enabled

  • Type: boolean

  • Default value: false

Enable push down inner join inequality predicates to database. For this configuration to be enabled, optimizer_inner_join_pushdown_enabled should be set to true. The corresponding configuration property is optimizer.inequality-join-pushdown-enabled.

verbose_optimizer_info_enabled

  • Type: boolean

  • Default value: false

Use this and optimizers_to_enable_verbose_runtime_stats in development to collect valuable debugging information about the optimizer.

Set to true to use as shown in this example:

SET SESSION verbose_optimizer_info_enabled=true;

optimizers_to_enable_verbose_runtime_stats

  • Type: string

  • Allowed values: ALL, an optimizer rule name, or multiple comma-separated optimization rule names

  • Default value: none

Use this and verbose_optimizer_info_enabled in development to collect valuable debugging information about the optimizer.

Run the following command to use optimizers_to_enable_verbose_runtime_stats:

SET SESSION optimizers_to_enable_verbose_runtime_stats=ALL;

pushdown_subfields_for_map_functions

  • Type: boolean

  • Default value: true

Use this to optimize the map_filter() and map_subset() function.

It controls if subfields access is executed at the data source or not.

pushdown_subfields_for_cardinality

  • Type: boolean

  • Default value: false

Enable subfield pruning for the cardinality() function to skip reading keys and values.

When enabled, the query optimizer can push down subfield pruning for cardinality operations, allowing the data source to skip reading the actual keys and values when only the cardinality (count of elements) is needed.

schedule_splits_based_on_task_load

  • Type: boolean

  • Default value: false

If true then splits are scheduled to the tasks based on task load, rather than on the node load. This is particularly useful for the native worker as it runs splits for tasks differently than the java worker. The corresponding configuration property is node-scheduler.max-splits-per-task.

Set to true to use as shown in this example:

SET SESSION schedule_splits_based_on_task_load=true;

table_scan_shuffle_parallelism_threshold

  • Type: double

  • Default value: 0.1

Parallelism threshold for adding a shuffle above table scan. When the table’s parallelism factor is below this threshold (0.0-1.0) and table_scan_shuffle_strategy is COST_BASED, a round-robin shuffle exchange is added above the table scan to redistribute data.

The corresponding configuration property is optimizer.table-scan-shuffle-parallelism-threshold.

table_scan_shuffle_strategy

  • Type: string

  • Allowed values: DISABLED, ALWAYS_ENABLED, COST_BASED

  • Default value: DISABLED

Strategy for adding shuffle above table scan to redistribute data. When set to DISABLED, no shuffle is added. When set to ALWAYS_ENABLED, a round-robin shuffle exchange is always added above table scans. When set to COST_BASED, a shuffle is added only when the table’s parallelism factor is below the table_scan_shuffle_parallelism_threshold.

The corresponding configuration property is optimizer.table-scan-shuffle-strategy.

remote_function_names_for_fixed_parallelism

  • Type: string

  • Default value: "" (empty string, disabled)

A regular expression pattern to match fully qualified remote function names, such as catalog.schema.function_name, that should use fixed parallelism. When a remote function matches this pattern, the optimizer inserts round-robin shuffle exchanges before and after the projection containing the remote function call. This ensures that the remote function executes with a fixed degree of parallelism, which can be useful for controlling resource usage when calling external services.

This property only applies to external/remote functions (functions where isExternalExecution() returns true, such as functions using THRIFT, GRPC, or REST implementation types).

Example patterns:

  • myschema.myfunction - matches an exact function name

  • catalog.schema.remote_.* - matches all functions starting with remote_ in the specified catalog and schema

  • .*remote.* - matches any function containing remote in its fully qualified name

The corresponding configuration property is optimizer.remote-function-names-for-fixed-parallelism.

remote_function_fixed_parallelism_task_count

  • Type: integer

  • Default value: null (uses the default hash partition count)

The number of tasks to use for remote functions matching the remote_function_names_for_fixed_parallelism pattern. When set, this value determines the degree of parallelism for the round-robin shuffle exchanges inserted around matching remote function projections. If not set, the default hash partition count will be used.

This property is only effective when remote_function_names_for_fixed_parallelism is set to a non-empty pattern.

The corresponding configuration property is optimizer.remote-function-fixed-parallelism-task-count.

local_exchange_parent_preference_strategy

  • Type: string

  • Allowed values: ALWAYS, NEVER, AUTOMATIC

  • Default value: ALWAYS

Strategy to consider parent preferences when adding local exchange partitioning for aggregations. When set to ALWAYS, the optimizer always uses parent preferences for local exchange partitioning. When set to NEVER, it never uses parent preferences and instead uses the aggregation’s own grouping keys. When set to AUTOMATIC, the optimizer makes a cost-based decision, using parent preferences only when the estimated partition cardinality is greater than or equal to the task concurrency.

The corresponding configuration property is optimizer.local-exchange-parent-preference-strategy.

JDBC Properties

useJdbcMetadataCache

  • Type: boolean

  • Default value: false

Cache the result of the JDBC queries that fetch metadata about tables and columns.

allowDropTable

  • Type: boolean

  • Default value: false

Allow connector to drop tables.

metadataCacheTtl

  • Type: Duration

  • Default value: 0

Setting a duration controls how long to cache data.

metadataCacheRefreshInterval

  • Type: Duration

  • Default value: 0

metadataCacheMaximumSize

  • Type: long

  • Default value: 1

metadataCacheThreadPoolSize

  • Type: int

  • Default value: 1

The value represents the max background fetch threads for refreshing metadata.

Query Manager Properties

query_client_timeout

  • Type: Duration

  • Default value: 5m

This property can be used to configure how long a query runs without contact from the client application, such as the CLI, before it’s abandoned.

The corresponding configuration property is query.client.timeout.

query_priority

  • Type: int

  • Default value: 1

This property defines the priority of queries for execution and plays an important role in query admission. Queries with higher priority are scheduled first than the ones with lower priority. Higher number indicates higher priority.

query_max_queued_time

  • Type: Duration

  • Default value: 100d

Use to configure how long a query can be queued before it is terminated.

The corresponding configuration property is query.max-queued-time.

View and Materialized View Properties

default_view_security_mode

  • Type: string

  • Allowed values: DEFINER, INVOKER

  • Default value: DEFINER

Sets the default security mode for views and materialized views when the SECURITY clause is not explicitly specified in CREATE VIEW or CREATE MATERIALIZED VIEW statements.

  • DEFINER: Views execute with the permissions of the user who created them

  • INVOKER: Views execute with the permissions of the user querying them

The corresponding configuration property is default-view-security-mode.

legacy_materialized_views

  • Type: boolean

  • Default value: true

Use legacy materialized views implementation. Set to false to enable the new materialized views implementation with security modes (DEFINER and INVOKER), automatic query rewriting, and freshness tracking.

By default, this session property is locked to the server configuration value and cannot be changed. To allow runtime toggling of this property (for testing/migration purposes only), set experimental.allow-legacy-materialized-views-toggle = true in the server configuration.

The corresponding configuration property is experimental.legacy-materialized-views.

materialized_view_query_rewrite_cost_based_selection_enabled

  • Type: boolean

  • Default value: false

Enable cost-based selection when multiple materialized views are available for query rewriting. When enabled, the optimizer evaluates all compatible materialized view rewrites and selects the plan with the lowest estimated cost, instead of using the first compatible view.

The corresponding configuration property is materialized-view-query-rewrite-cost-based-selection-enabled.

materialized_view_stale_read_behavior

  • Type: string

  • Default value: USE_VIEW_QUERY

Controls behavior when a materialized view is stale and no per-view staleness config is set. Valid values are FAIL (throw an error) or USE_VIEW_QUERY (query base tables instead).

The corresponding configuration property is materialized-view-stale-read-behavior.

Warning

Materialized views are experimental. The SPI and behavior may change in future releases.

optimizer.optimize_multiple_approx_distinct_on_same_type

  • Type: boolean

  • Default value: false

Enable optimization to combine multiple approx_distinct() function calls on expressions of the same type into a single aggregation using set_agg with array operations (array_constructor, array_transpose).

optimizer.merge_max_by_and_min_by_aggregations

  • Type: boolean

  • Default value: false

Enable optimization to merge multiple max_by or min_by aggregations that share the same comparison key into a single aggregation with a ROW argument. This reduces computational overhead by performing only one comparison operation per row instead of N comparisons, and improves memory efficiency by maintaining a single aggregation state instead of N separate states.

For example, when enabled, the following query:

SELECT max_by(v1, k), max_by(v2, k), max_by(v3, k) FROM table

is internally optimized to use a single max_by(ROW(v1, v2, v3), k) call with field extraction, reducing both CPU and memory usage.