Release 0.290

Highlights

  • Fix to reduce drop time for Iceberg tables with deleted metadata in S3 storage. #23510

  • Fix a data corruption in uncompressed ORC/DWRF files with large values in string/binary columns. #23760

  • Improve JoinPrefilter optimizer for wide join keys and multiple join keys. #23858

  • Add UUID type support to the Parquet reader and writer. #23627

  • Add a configurable-sized cache for Iceberg table puffin files to improve query planning time controlled by the iceberg.max-statistics-file-cache-size configuration property. #23177

  • Add support of UUID-typed columns. #23627

  • Add support to query Iceberg table by branch/tag name. #23539

  • Add support for procedure fast_forward for Iceberg. #23589

  • Add support for using named arguments in procedures register_table and unregister_table. #23592

Details

General Changes

  • Fix array_intersect() for single parameter array<array<T>> to be deterministic regardless of the order of null input. #23890

  • Fix bug in local property calculation when spill is enabled. #23922

  • Fix bug to unescape like pattern and validate escape string with no unresolved value. #23456

  • Fix to query and filter using Iceberg metadata columns $path and $data_sequence_number. #23472

  • Fix nullability of columns in information schema. #23577

  • Fix distinct operator for UUID type. #23732

  • Improve element_at by avoiding pushdown of negative position for element_at for array. #23479

  • Improve GET /v1/info/state to return INACTIVE state until the resource group configuration manager is fully initialized. #23585

  • Improve JoinPrefilter optimizer for wide join keys and multiple join keys. #23858

  • Improve writer scaling in skewed conditions by setting optimized_scale_writer_producer_buffer to on by default. #23774

  • Add UUID type support to the Parquet reader and writer. #23627

  • Add a configurable-sized cache for Iceberg table puffin files to improve query planning time controlled by the iceberg.max-statistics-file-cache-size configuration property. #23177

  • Add a flag to the Presto CLI which allows skipping SSL certificate verification. #23780

  • Add a session property native_max_extended_partial_aggregation_memory which specifies Presto native max partial aggregation memory when data reduction is optimal. #23527

  • Add a session property native_max_partial_aggregation_memory which specifies Presto native max partial aggregation memory when data reduction is not optimal. #23527

  • Add a session property native_max_spill_bytes which specifies Presto native max allowed spill bytes. #23527

  • Add function is_private_ip() that returns true when the input IP address is private or a reserved IP address. #23520

  • Add function ip_prefix_subnets() that splits the input prefix into subnets the size of the new prefix length. #23656

  • Add new configuration property eager-plan-validation-enabled for eager building of validation of a logical plan before queuing. #23649

  • Add session property inline_projections_on_values and configuration property optimizer.inline-projections-on-values to evaluate project node on values node. #23245

  • Add support in QueuedStatement protocol to accept pre-minted query id and slug. #23407

  • Add support to proxy AuthorizedIdentity using JWT. #23546

  • Add support for casting char datatype to various numeric datatypes. #23792

  • Replace configuration property async-cache-full-persistence-interval with async-cache-persistence-interval. #23626

  • Remove array_dupes and array_has_dupes alias names from functions array_duplicates() and array_has_duplicates(). #23762

Presto C++ Changes

  • Fix task.writer-count and task.partitioned-writer-count configuration properties in Presto C++ for consistency with Presto. #23902

  • Fix a bug where users weren’t able to set the native_expression.max_array_size_in_reduce session property. #23856

  • Fix plan validation failures for some join queries running with spill enabled when using Presto C++. #23595

  • Fix bug so that proper logical type parameters are now read and written to Parquet files. #23388

  • Fix a data corruption in uncompressed ORC/DWRF files with large values in string/binary columns. #23760

  • Improve arbitrator configs to use the new string-based format. #23496

  • Add $path and $bucket to split info, and fixed the split counts in the coordinator UI. #23755

  • Add a metric presto_cpp.memory_pushback_expected_reduction_bytes to track expected reduction in memory after a pushback attempt. #23872

  • Add a new counter, presto_cpp.memory_pushback_reduction_bytes, to monitor the actual memory reduction achieved with each memory pushback attempt. #23813

  • Add native_max_local_exchange_partition_count session property which maps to the max_local_exchange_partition_count velox query property to limit the number of partitions created by a local exchange. #23910

  • Add session property: native_writer_flush_threshold_bytes which specifies the minimum memory footprint size required to reclaim memory from a file writer by flushing its buffered data to disk. #23891

  • Add session property: native_max_page_partitioning_buffer_size which specifies the maximum bytes to buffer per PartitionedOutput operator to avoid creating tiny SerializedPages. #23853

  • Add session property: native_max_output_buffer_size which specifies the maximum size in bytes for the task’s buffered output. The buffer is shared among all drivers. #23853

  • Add incremental periodic cache persistence for Presto C++ worker. #23626

  • Add native system session property provider. #23045

  • Remove session property native_join_spiller_partition_bits. #23906

  • Revert merging of FilterNode into TableScanNode done in #23755. #23855

Security Changes

Hive Connector Changes

  • Fix interpretation of ambiguous timestamps inside array, map, or row types for tables using TEXTFILE format to interpret the timestamps as the earliest possible unixtime for consistency with the rest of Presto. #23593

  • Fix timestamps inside array, map, or row types for tables using TEXTFILE format to respect the hive.time-zone property. #23593

Iceberg Connector Changes

  • Fix time-type columns to return properly when iceberg.parquet-batch-read-optimization-enabled is set to TRUE. #23542

  • Fix to reduce drop time for Iceberg tables with deleted metadata in S3 storage. #23510

  • Fix bug so that proper logical type parameters are now read and written to Parquet files. #23388

  • Fix a data corruption in uncompressed ORC/DWRF files with large values in string/binary columns. #23760

  • Add Iceberg metadata table $ref. #23503

  • Add configuration property iceberg.rest.auth.oauth2.scope for OAUTH2 authentication in Iceberg’s REST catalog. #23884

  • Add configuration property iceberg.rest.auth.oauth2.uri. #23739

  • Add procedure rollback_to_timestamp to rollback an Iceberg table to a given point in time. #23559

  • Add support of UUID-typed columns. #23627

  • Add support to query Iceberg table by branch/tag name. #23539

  • Add table property metrics_max_inferred_column to configure the max columns number for which metrics are collected, and support metrics_max_inferred_column for Iceberg tables with PARQUET format. #23468

  • Add support for procedure fast_forward for Iceberg. #23589

  • Add support for using named arguments in procedures register_table and unregister_table. #23592

  • Support new procedure set_current_snapshot for Iceberg. #23567

  • Support timestamp without timezone in time travel expressions. #23714

MongoDB Connector Changes

  • Add support for varbinary data type in MongoDB. #23386

  • Add support for MongoDB ALTER TABLE statement. #23266

Cassandra Connector Changes

  • Upgrade cassandra-driver-core to 3.11.5 for SSL support. #23493

Elasticsearch Connector Changes

  • Improve handling of exceptions for empty tables in Elasticsearch. #23850

SPI Changes

  • Add Partitioning, PartitioningScheme, PartitioningHandle, PlanFragmentId, StageExecutionDescriptor and SimplePlanFragment to the SPI. #23601

Credits

Abhisek Saikia, Amit Dutta, Anant Aneja, Ananthu-Nair, Andrii Rosa, Bikramjeet Vig, Bryan Cutler, Chen Yang, Christian Zentgraf, David Tolnay, Deepa-George, Deepak Majeti, Denodo Research Labs, Elbin Pallimalil, Elliotte Rusty Harold, Feilong Liu, Ge Gao, Hazmi, Jalpreet Singh Nanda (:imjalpreet), Jayaprakash Sivaprasad, Jialiang Tan, Jimmy Lu, Joe Abraham, Karnati-Naga-Vivek, Ke, Konjac Huang, Krishna Pai, Linsong Wang, Mahadevuni Naveen Kumar, Matt Calder, Naveen Nitturu, Nikhil Collooru, Pramod, Pratik Joseph Dabre, Rebecca Schlussel, Reetika Agrawal, Richard Barnes, Rohan Pal Sidhu, Sam Partington, Serge Druzkin, Sergey Pershin, Steve Burnett, SthuthiGhosh9400, Swapnil Tailor, Timothy Meehan, Xiaoxuan Meng, Yihong Wang, Ying, Zac Blanco, Zac Wen, Zuyu ZHANG, abhibongale, aditi-pandit, ajay-kharat, auden-woolfson, exxiang, jackychen718, jaystarshot, kiersten-stokes, lingbin, lithinpurushothaman, lukmanulhakkeem, misterjpapa, mohsaka, namya28, oyeliseiev-ua, pratyakshsharma, prithvip, wangd