Release 0.298

Breaking Changes

  • Fix query statistics so that planningTime and finishingTime are no longer added to executionTime. executionTime is now the true execution time — how long it took the query to run the compute. It can be used to measure the efficiency of the workers without added planning time or the time spent on final steps such as partition registration. #27691

  • Remove configuration property use-new-nan-definition. #27829

  • Remove warn-on-common-nan-patterns server config and warn_on_common_nan_patterns session property. The NaN definition migration is complete and these warnings are no longer needed. #27830

  • Update the default behavior of field_names_in_json_cast_enabled from false to true. When field_names_in_json_cast_enabled = true, JSON fields are assigned to ROW fields by matching field names regardless of their order in the JSON object. Queries that rely on JSON field order when casting to ROW may return different results after upgrading. If your workload depends on the previous positional behavior, restore it by setting: SET SESSION field_names_in_json_cast_enabled = false;. #26833

Highlights

  • Improve materialized view query rewriting to support HAVING clauses. #27677

  • Improve coordinator-to-worker communication efficiency with 20-40% smaller payload sizes and 2-3x faster serialization compared to JSON. #27486

  • Improve query planning performance for wide-column projections by adding fast paths that skip unnecessary processing for variable references, constants, and identity assignments across multiple optimizer rules. #27547

  • Add incremental refresh for materialized views in the Iceberg connector, enabling efficient partial refreshes instead of full recomputation. #26959

  • Add support for Azure Blob Storage (wasb[s]://) and Azure Data Lake Storage Gen2 (abfs[s]://) in the Hive connector, with shared key and OAuth2 authentication. #25107

  • Add ALTER MATERIALIZED VIEW <name> SET PROPERTIES (...) SQL statement to update materialized view properties after creation. #27806

  • Add TopN late materialization optimization for ORDER BY ... LIMIT over wide tables with a unique $row_id column, sorting only sort keys first and fetching full rows via SemiJoin. #27641

  • Add TLS/SSL configuration, i18n character set, and configurable JDBC fetch size support for the Oracle connector. #27671 #27670 #27669

  • Add read support for Iceberg V3 row lineage hidden columns _row_id and _last_updated_sequence_number. #27240

  • Add support for min/max/count aggregation push down based on file stats. This can be toggled with the aggregate_push_down_enabled session property or the iceberg.aggregate-push-down-enabled configuration property. See Session Properties and Configuration Properties. #27085

  • Add view querying capabilities and upgrade to mongodb-driver-sync in the MongoDB connector. #26995 #27685

  • Add support for reading Delta Lake tables with column mapping enabled. #27483

Details

General Changes

  • Fix query statistics so that planningTime and finishingTime are no longer added to executionTime. executionTime is now the true execution time — how long it took the query to run the compute. It can be used to measure the efficiency of the workers without added planning time or the time spent on final steps such as partition registration. #27691

  • Fix RPC options argument parsing to use the last argument instead of hardcoding to use the third argument. #27700

  • Fix UnsupportedOperationException when using remote_function_names_for_fixed_parallelism with queries containing UNION ALL below the remote function projection. #27714

  • Fix a bug in the PushProjectionThroughCrossJoin optimizer rule where cascading projections above a cross join could cause validation errors by dropping pushed variables from intermediate residual projects. #27568

  • Fix a gap in query commit for DELETE queries when running on Spark. #26195

  • Fix data correctness bugs in MaterializedViewQueryOptimizer where queries without GROUP BY could be incorrectly rewritten to use materialized views with GROUP BY, producing fewer rows than expected. Previously, alias mismatches and scalar expression bypasses allowed invalid rewrites that silently collapsed duplicate rows. #27778

  • Fix materialized view query rewriting for CUBE, ROLLUP, and GROUPING SETS clauses. Column references inside these grouping elements are now correctly rewritten to materialized view columns. #27538

  • Fix a race condition in pruneFinishedQueryInfo causing task memory leak. #27597

  • Fix runtime type mismatch crashes in Velox native execution caused by non-deterministic HashMap iteration order in PreAggregateBeforeGroupId, PushPartialAggregationThroughExchange, and MultipleDistinctAggregationToMarkDistinct optimizer rules. #27493

  • Fix IllegalStateException during planning of ORDER BY ... LIMIT (TopN) queries over tables with a unique column. #27664

  • Fix coordinator memory leak caused by orphaned listener objects accumulating during scheduling cycles in HttpRemoteTaskWithEventLoop.whenSplitQueueHasSpace(). #27673

  • Improve convergence speed of GROUP BY + LIMIT queries on partitioned tables by excluding partition keys from the PrefilterForLimitingAggregation prefilter. #27678

  • Improve PrefilterForLimitingAggregation optimizer to use scan limiting instead of timeouts for more predictable performance. The optimization now limits the source scan to 1000 * LIMIT rows before applying DISTINCT LIMIT. #27819

  • Improve plan efficiency for UNION ALL queries with empty branches (for example, branches pruned by partition or snapshot filtering) by removing those branches from the plan. #27765

  • Improve efficiency of coordinator-to-worker communication with 20-40% smaller payload sizes and 2-3x faster serialization compared to JSON. #27486

  • Improve map_from_entries(ARRAY[ROW(...), ...]) by rewriting to MAP(ARRAY[keys], ARRAY[values]) at plan time, avoiding intermediate ROW construction. #27491

  • Improve logical planner performance for wide-column queries by indexing RelationType.resolveFields() for O(1) field lookup instead of O(N) linear scan. #27553

  • Improve query planning performance for wide-column projections by adding fast paths that skip unnecessary processing for variable references, constants, and identity assignments across multiple optimizer rules. #27547

  • Improve materialized view query rewriting to support HAVING clauses. #27677

  • Improve disjunction rewrite by adding ROW IN to disjunction rewrite to fire for all columns, not just partition keys, enabling better predicate pushdown and domain extraction. Gated behind session property rewrite_row_constructor_in_to_disjunction. #27680

  • Add ALTER MATERIALIZED VIEW <name> SET PROPERTIES (...) SQL statement to update materialized view properties after creation. #27806

  • Add push_aggregation_through_disjoint_union session property (default off) that pushes a GROUP BY aggregation completely below UNION ALL when at least one grouping key has constant values that are pairwise distinct across the union branches, eliminating the final aggregation. #27764

  • Add rpc_dispatch_batch_size session property to control batch size for RPC dispatch in BATCH mode. Default: 128. A value of 0 collects all rows before dispatching. #27700

  • Add rpc_streaming_mode session property to control RPC function execution mode (PER_ROW or BATCH). Default: PER_ROW. #27700

  • Add partition_aware_grouped_execution session property to schedule each (bucket, partition) as a separate lifespan in grouped execution, reducing per-lifespan data volumes for bucketed tables. Disabled by default. #27663

  • Add incremental refresh for materialized views. #26959

  • Add session property join_prefilter_build_side_with_complex_probe_side (default false) to extend join prefilter optimization to support complex probe-side patterns including UNION ALL, cross join, unnest, and aggregation. #27598

  • Add session property rewrite_bucketed_semi_join_to_join (default disabled) that rewrites bucketed semi-joins into joins to avoid a data shuffle. #27510

  • Add session property rewrite_row_constructor_in_to_disjunction (default disabled) that rewrites ROW IN ROW predicates into OR of AND equality chains when all ROW fields are partition keys, enabling per-column TupleDomain extraction for partition pruning. #27500

  • Add session property always_analyze_create_table_query_enabled to enable analyzing inner queries on CREATE TABLE AS SELECT IF NOT EXISTS statements when the target table already exists. #27504

  • Add support for ALTER TABLE ... ALTER COLUMN ... SET DEFAULT syntax to update Iceberg column write-default values. #27810

  • Add support for GROUP BY and ORDER BY ordinal references in materialized view query rewriting. Previously, queries like SELECT a, SUM(b) FROM t GROUP BY 1 would silently skip materialized view optimization. #27422

  • Add support for scalar functions in materialized view query rewriting. Queries using functions like CONCAT, ABS, JSON_EXTRACT, CAST, IF, COALESCE, and CASE expressions now correctly rewrite to scan the materialized view. #27549

  • Add cluster-overload.bypass-resource-groups configuration property to allow named resource groups to bypass cluster-overload throttling while continuing to honor per-group concurrency, memory, and CPU limits. #27642

  • Add optimize_row_in_predicate session property (default off) that rewrites multi-column ROW IN / ROW NOT IN predicates to expose per-column IN / NOT IN predicates, enabling partition pruning and other domain-based optimizations. #27708

  • Add push_filter_through_selecting_aggregation session property and optimizer.push-filter-through-selecting-aggregation configuration property (default false) to push HAVING predicates beneath single-value aggregates (MAX/MIN/ARBITRARY) for earlier row reduction. #27712

  • Add TopN late materialization optimization for ORDER BY ... LIMIT over wide tables with a unique $row_id column. Sorts only sort keys plus $row_id first, then fetches full rows via SemiJoin. #27641

  • Add split_part_reverse as a global Presto SQL function, replacing the Velox C++ UDF with a SQL-invoked scalar function available in all queries. #27480

  • Remove configuration property use-new-nan-definition. #27829

  • Remove warn-on-common-nan-patterns server config and warn_on_common_nan_patterns session property. The NaN definition migration is complete and these warnings are no longer needed. #27830

  • Update the default behavior of field_names_in_json_cast_enabled from false to true. When field_names_in_json_cast_enabled = true, JSON fields are assigned to ROW fields by matching field names regardless of their order in the JSON object. Queries that rely on JSON field order when casting to ROW may return different results after upgrading. If your workload depends on the previous positional behavior, restore it by setting: SET SESSION field_names_in_json_cast_enabled = false;. #26833

Prestissimo (Native Execution) Changes

  • Fix MaterializedOutput operator lifecycle bugs: silent data loss on noMoreData() exceptions, Velox contract violation crashes during OOM teardown, and missing MemoryReclaimer causing memory arbitration failures. #27833

  • Add support for iceberg V3 initialDefaultValue. #27767

  • Add support for plugin-registered custom types (such as those from the MongoDB and ML plugins) in native clusters. #27748

  • Add native_min_shuffle_compression_page_size_bytes session property to tune the small-page shuffle-compression skip threshold. #27683

Security Changes

JDBC Driver Changes

  • Add connection validation feature to enhance connection reliability. This can be enabled with the validateConnection session property to execute a validation query immediately after establishing the connection. #27002

  • Add support for execute procedure in JDBC connectors. #27282

Delta Lake Connector Changes

  • Fix a bug that made the metastore inconsistent if a Delta Lake table was created to an inaccessible location. #27129

  • Add support for reading Delta Lake tables with column mapping enabled. #27483

Hive Connector Changes

  • Fix race where concurrent REFRESH MATERIALIZED VIEW on the same Hive-backed Iceberg materialized view could lose a watermark update. #27835

  • Fix integer overflow when converting exclusive bounds to inclusive bounds in BigintRange, HugeintRange, and TimestampRange filters in the Hive connector. #27600

  • Add support for partition-aware grouped execution in the Hive connector, creating per-(bucket, partition) split queues and compound partition handles. #27663

  • Add support for Azure Blob Storage (wasb[s]://) and Azure Data Lake Storage Gen2 (abfs[s]://) in the Hive connector, with shared key and OAuth2 authentication. #25107

Iceberg Connector Changes

  • Fix access control for materialized view storage tables when legacy_materialized_views=false: storage-table access control is bypassed during MV expansion, while direct queries by name still go through access control. #27728

  • Fix failure during INSERT into Iceberg tables partitioned by day() when using timestamp with time zone columns. #27645

  • Improve updating of stale_read_behavior, staleness_window, and refresh_type on existing materialized views with ALTER MATERIALIZED VIEW ... SET PROPERTIES (requires legacy_materialized_views=false). #27806

  • Add iceberg.materialized-view-default-max-snapshots-per-refresh configuration property and matching session property to set the default bound. See Catalog Configuration. #27774

  • Add iceberg.materialized-view-default-storage-schema configuration property to route storage tables into a single schema. Defaults to the materialized view’s own schema; per-MV storage_schema overrides. See Catalog Configuration. #27728

  • Add max_snapshots_per_refresh materialized view property to bound how far each base table advances per REFRESH MATERIALIZED VIEW. Defaults to 0 (unbounded). Requires Iceberg V3 row lineage; V2 tables fall back to unbounded refresh. See Materialized View Properties. #27774

  • Add materialized_view_stitching_strategy and materialized_view_incremental_refresh_strategy session properties (values: ALWAYS, NEVER, AUTOMATIC; default: ALWAYS). Under AUTOMATIC, the optimizer selects between the rewrite and the full alternative based on cost; when stats are unavailable it falls back to row-count comparison. See Session Properties. #27820

  • Add read support for Iceberg V3 column initial-default values. #27659

  • Add incremental refresh for materialized views in the Iceberg connector, enabling efficient partial refreshes instead of full recomputation. #26959

  • Add min/max statistics for VARCHAR / CHAR columns in Iceberg tables. #27357

  • Add metastore cache invalidation procedure for Iceberg connector. #27200

  • Add predicate push down on _last_updated_sequence_number for file-level pruning. #27766

  • Add read support for Iceberg V3 row lineage hidden columns _row_id and _last_updated_sequence_number. #27240

  • Add support for min/max/count aggregation push down based on file stats. This can be toggled with the aggregate_push_down_enabled session property or the iceberg.aggregate-push-down-enabled configuration property. See Session Properties and Configuration Properties. #27085

  • Add support for updating column write-default values using ALTER TABLE ... SET DEFAULT (requires Iceberg format version 3+). #27810

  • Add support for ALTER COLUMN SET DATA TYPE DDL statements in the Iceberg connector. #25418

  • Add warning when predicate stitching or incremental refresh falls back to full recompute. #27816

  • Update write-default operations to preserve existing initial-default values as metadata-only changes. #27810

Lance Connector Changes

  • Add SQL filter pushdown to reduce data read from disk for selective queries. Supports equality, comparisons, IN lists, IS NULL, and range predicates on Boolean, Integer, Bigint, Real, Double, Varchar, Date, and Timestamp types. See Predicate Pushdown. #27430

  • Add configurable index and metadata cache sizes via lance.index-cache-size and lance.metadata-cache-size. #27325

  • Add version-aware dataset caching with snapshot isolation for consistent query reads. #27325

MongoDB Connector Changes

  • Add view querying capabilities in the Mongo connector. #26995

  • Upgrade mongo-java-driver to mongodb-driver-sync. #27685

Oracle Connector Changes

  • Add TLS/SSL configuration support for the Oracle connector with oracle.tls.enabled, oracle.tls.truststore-path, and oracle.tls.truststore-password properties. #27671

  • Add Oracle i18n character set support. #27670

  • Add jdbc-fetch-size configuration property to control the number of rows fetched per database round-trip for the Oracle connector. #27669

Prometheus Connector Changes

  • Add mixed case-sensitive identifier support for Prometheus connector. #26260

Singlestore Connector Changes

  • Fix TINYINT type mapping to preserve TINYINT semantics instead of incorrectly mapping to BOOLEAN after a JDBC driver upgrade. #27790

  • Fix varchar type mapping for TEXT types to use byte-based thresholds matching the JDBC driver’s COLUMN_SIZE reporting. #27790

Verifier Changes

  • Add query-rewriter-factory configuration property to allow extending the verifier QueryRewriter with custom implementations. #27703

Credits

Aditi Pandit, Allen Shen, Amit Dutta, Apurva Kumar, Arjun Gupta, Asish Kumar, Auden Woolfson, Ben Hu, Bryan Cutler, Chandrakant Vankayalapati, Christian Zentgraf, Daniel Bauer, Deepak Majeti, Deepak Mehra, Dilli Babu Godari, Dong Wang, Gary Helmling, Glerin Pinhero, Han Yan, Henry Dikeman, Jalpreet Singh Nanda, Jamille Shao-Ni, Jianjian Xie, Joe Abraham, Ke Wang, Kevin Tang, Konjac Huang, Li, Maria Basmanova, Miguel Blanco Godón, Nandakumar Balagopal, Natasha Sehgal, Naveen Mahadevuni, Nivin C S, Pramod Satya, Prashant Sharma, Pratik Joseph Dabre, Pratyaksh Sharma, Rebecca Schlussel, Reetika Agrawal, Rui Mo, Saurabh Mahawar, Sayari Mukherjee, Sergey Pershin, Shahim Sharafudeen, Shakyan Kushwaha, Shrinidhi Joshi, Sreeni Viswanadha, Steve Burnett, Swapnil, Timothy Meehan, Tirumala Saiteja Goruganthu, XiaoDu, Xiaoxuan, Yabin Ma, Yihong Wang, Zac, Zac Blanco, abhinavmuk04, bibith4, dependabot[bot], feilong-liu, jkhaliqi, join-theory-de, mohsaka, nishithakbhaskaran, peterenescu, shelton408, sumi-mathew, vhsu14, zhichenxu-meta