Release 0.292

Highlights

  • Improve error handling of INTERVAL DAY, INTERVAL HOUR, and INTERVAL SECOND operators when experiencing overflows. #24353

  • Improve presto router UI. #24411

  • Upgrade bootstrap to version 5. #24167

  • Add Java and Native Arrow Flight connector. #24427

  • Add a MySQL-compatible function bit_length that returns the count of bits for the given string. #24531

  • Add support to build Presto with JDK 17. #24677

  • Add the ability to canonicalize JSON output through session property canonicalized_json_extract. #24614

  • Add support for native ORC reader. #23037

  • Improve task.max-drivers-per-task by setting the default value to use thread concurrency of the host. #24642

  • Fix a security bug when check_access_control_for_utlized_columns is true for queries that uses a WITH clause. Previously we would sometimes not check permissions for certain columns that were used in the query. Now we will always check permissions for all columns used in the query. There are some corner cases for CTEs with the same name where we may check more columns than are used or fall back to checking all columns referenced in the query. #24647

  • Fix Parquet read failing for nested Decimal types. #24440

  • Add manifest file caching for deployments which use the Hive metastore. #24481

  • Add table property write.data.path to specify independent data write paths for Iceberg tables. #24397

  • Add support for Iceberg table sort orders. Tables can be created to add a list of sorted_by columns which will be used to order files written to the table. #21977

  • Add support for UPDATE SQL statements. #24281

  • Add configuration property tpcds.use-varchar-type to allow toggling of char columns to varchar columns. #24406

Details

General Changes

  • Fix Hive UUID type parsing. #24538

  • Fix addition, subtraction, multiplication and division of INTERVAL YEAR MONTH values. #24617

  • Fix index error when a map column is passed into an unnest function by using the column analyzer to correctly map key and value output fields back to correct input expression. #24789

  • Fix silently returning incorrect results when trying to construct a TimestampWithTimeZone from a value that has a unix timestamp that is too large/small. #24674

  • Fix a potential block by making the number of task event loop configurable via a configuration file. #24565

  • Improve analysis of utilized columns in a query by exploring view definitions and checking the utilized columns of the underlying tables. #24638

  • Improve error handling of INTERVAL DAY, INTERVAL HOUR, and INTERVAL SECOND operators when experiencing overflows. #24353

  • Improve scheduling by using long instead of DataSize for critical path. #24582

  • Improve scheduling by using long instead of DateTime for critical path. #24673

  • Improve presto router UI. #24411

  • Improve how multiple operator stats are merged together. #24414

  • Improve metrics creation by refactoring local variables to a dedicated class. #24414

  • Improve efficiency of coordinator when running a large number of tasks, controlled by task.enable-event-loop. #24668

  • Add Troubleshooting topic to the Presto documentation. #24601

  • Add Arrow Flight connector. #24427

  • Add a MySQL-compatible function bit_length that returns the count of bits for the given string. #24531

  • Add configuration property exclude-invalid-worker-session-properties. #23968

  • Add documentation for file-based Hive metastore to Deploying Presto. #24620

  • Add documentation for the Arrow Flight Connector. #24427

  • Add pagesink for DELETES to support future use. #24528

  • Add serialization for new types. #24528

  • Add support to build Presto with JDK 17. #24677

  • Add a new optimizer rule to add exchanges below a combination of partial aggregation+ GroupId . Enabled with the boolean session property enable_forced_exchange_below_group_id. #24047

  • Add module presto-native-tests to run end-to-end tests with Presto native workers. #24234

  • Add map of node ID to plan node to QueryCompletedEvent in the event listener interface. #24590

  • Add support for multiple query event listeners. #24456

  • Add spark.dynamic-presto-memory-pool-tuning-enabled configuration property to dynamically configure available Spark executor memory based on available container memory. #24714

  • Add the ability to canonicalize JSON output through session property canonicalized_json_extract. #24614

  • Add the ability for a file-based Hive metastore to use HDFS/S3 location as warehouse dir. #24660

  • Remove org.apache.logging.log4j:log4j-api from root POM. #24605

  • Remove org.apache.logging.log4j:log4j-core from root POM. #24605

  • Upgrade bootstrap to version 5. #24167

  • Upgrade jQuery to version 3.7.1. #24167

Prestissimo (Native Execution) Changes

  • Add a native type manager. #24179

  • Add support for Apache Arrow Flight connectors #24504

  • Add Presto native shared arbitrator configuration properties:
    • shared-arbitrator.global-arbitration-abort-time-ratio.

    • shared-arbitrator.global-arbitration-memory-reclaim-pct.

    • shared-arbitrator.global-arbitration-without-spill.

    • shared-arbitrator.memory-pool-abort-capacity-limit.

    • shared-arbitrator.memory-pool-min-reclaim-bytes.

    • shared-arbitrator.memory-reclaim-threads-hw-multiplier.

    #24720

  • Add a type parameter for ConnectorDeleteTableHandle implementations to ConnectorProtocolTemplate, along with support for (de)serialization of connector-specific types. Existing native connector implementations defining ConnectorProtocolTemplate specializations must update their definitions to supply their specific type or use NotImplemented. #24721

  • Add exchange.http-client.request-data-sizes-max-wait-sec to native system configs. #24774

  • Add spill-enabled, join-spill-enabled, aggregation-spill-enabled, and order-by-spill-enabled to native system configs. #24726

  • Add new error code name MEMORY_ARBITRATION_FAILURE under error code INSUFFICIENT_RESOURCE. #24773

  • Add a native function namespace manager. #23358

  • Add support for ORC reader. #23037

  • Add node pool type specification when reporting to the coordinator from a C++ worker. #24569

  • Improve task.max-drivers-per-task by setting the default value to use thread concurrency of the host. #24642

Security Changes

  • Fix a security bug when check_access_control_for_utlized_columns is true for queries that uses a WITH clause. Previously we would sometimes not check permissions for certain columns that were used in the query. Now we will always check permissions for all columns used in the query. There are some corner cases for CTEs with the same name where we may check more columns than are used or fall back to checking all columns referenced in the query. #24647

  • Remove reload4j dependency in response to WS-2022-0467. #24606

  • Replace deprecated dagre-d3 with dagre-d3-es in response to a high severity vulnerability WS-2022-0322. #24167

  • Upgrade libthrift to 0.14.1 in response to CVE-2020-13949. #24462

  • Upgrade netty dependencies to version 4.1.115.Final in response to CVE-2024-47535. #24586

  • Upgrade prismJs to 1.30.0 in response to CVE-2024-53382. #24765

  • Upgrade the errorprone dependency from version 2.28.0 to 2.36.0. #24475

  • Upgrade the io.grpc library from version 1.68.0 to 1.70.0 in response to CVE-2024-7254, CVE-2020-8908. #24475

  • Upgrade org.apache.logging.log4j:log4j-api from 2.17.1 to 2.24.3 in response to CVE-2024-47554. #24507

  • Upgrade org.apache.logging.log4j:log4j-core from 2.17.1 to 2.24.3 in response to CVE-2024-47554. #24507

  • Upgrade commons-text to 1.13.0 in response to CVE-2024-47554. #24467

  • Upgrade okhttp to 4.12.0 in response to CVE-2023-3635. #24473

  • Upgrade okio to 3.6.0 in response to CVE-2023-3635. #24473

  • Upgrade org.apache.calcite to 1.38.0 in response to CVE-2023-2976. #24706

  • Upgrade org.apache.ratis to 3.1.3 in response to CVE-2020-15250. #24496

  • Upgrade aws-java-sdk version to 1.12.782 in response to CVE-2024-21634. #24606

  • Upgrade json-smart version to 2.5.2 in response to CVE-2024-57699. #24631

  • Upgrade the accumulo version to 1.10.1 in response to CVE-2020-17533. #24438

  • Upgrade the hive-dwrf version to 0.8.7 which involved upgrading snappy version to 0.5 in response to CVE-2024-36124. #24461

Elasticsearch Connector Changes

Hive Connector Changes

  • Fix Parquet read failing for nested Decimal types. #24440

  • Fix getting views for Hive metastore 2.3+. #24466

  • Add session property hive.stats_based_filter_reorder_disabled for disabling reader stats based filter reordering. #24630

  • Replace return type of beginDelete. #24528

  • Rename session property hive.stats_based_filter_reorder_disabled to hive.native_stats_based_filter_reorder_disabled. #24637

  • Update native HiveConnectorProtocol to supply NotImplemented for ConnectorDeleteTableHandle type. #24721

Iceberg Connector Changes

  • Fix IcebergTableHandle implementation to work with new types used in begin/finishDelete. #24528

  • Fix bug with missing statistics when the statistics file cache has a partial miss. #24480

  • Fix Iceberg date column filtering. #24583

  • Add read.split.target-size table property. #24417

  • Add target_split_size_bytes session property. #24417

  • Add a dedicated subclass of FileHiveMetastore for the Iceberg connector to capture and isolate the differences in behavior. #24573

  • Add connector configuration property iceberg.catalog.hadoop.warehouse.datadir for Hadoop catalog to specify root data write path for its new created tables. #24397

  • Add logic to Iceberg type converter for timestamp with timezone. #23534

  • Add manifest file caching for deployments which use the Hive metastore. #24481

  • Add support for the hive.affinity-scheduling-file-section-size configuration property and affinity_scheduling_file_section_size session property. #24598

  • Add support of renaming table for Iceberg connector when configured with HIVE file catalog. #24312

  • Add table property write.data.path to specify independent data write paths for Iceberg tables. #24397

  • Add support for Iceberg table sort orders. Tables can be created to add a list of sorted_by columns which will be used to order files written to the table. #21977

  • Add support for UPDATE SQL statements. #24281

  • Deprecate some table property names in favor of property names from the Iceberg library. See Iceberg Connector. #24581

  • Improve Iceberg queries by enabling manifest file caching by default. #24481

  • Update native IcebergConnectorProtocol to supply NotImplemented for ConnectorDeleteTableHandle type. #24721

Kudu Connector Changes

  • Replace return type of beginDelete. #24528

TPC-DS Connector Changes

  • Add configuration property tpcds.use-varchar-type to allow toggling of char columns to varchar columns. #24406

SPI Changes

  • Fix query failures by setting REMOTE_BUFFER_CLOSE_FAILED as a retriable error. #24808

  • Add ConnectorSession as an argument to PlanChecker.validate and PlanChecker.validateFragment. #24557

  • Add DeleteTableHandle support for the ConnectorTableHandles changes in Metadata. #24528

  • Add CoordinatorPlugin#getExpressionOptimizerFactories to customize expression evaluation in the Presto coordinator. #24144

  • Add a separate ConnectorDeleteTableHandle interface for ConnectorMetadata.beginDelete and ConnectorMetadata.finishDelete, replacing the previous usage of ConnectorTableHandle. #24528

  • Add IndexSourceNode to the SPI. #24678

  • Update beginDelete to return new types, and finishDelete to accept new types in ConnectorMetadata. #24528

Credits

Abe Varghese, Amit Dutta, Anant Aneja, Andrii Rosa, Arjun Gupta, Artem Selishchev, Bryan Cutler, Chandrashekhar Kumar Singh, Christian Zentgraf, Deepak Majeti, Denodo Research Labs, Dilli-Babu-Godari, Elbin Pallimalil, Eric Liu, Gary Helmling, Ge Gao, HeidiHan0000, Jalpreet Singh Nanda, Jialiang Tan, Jiaqi Zhang, Joe Giardino, Ke, Kevin Tang, Kevin Wilfong, Krishna Pai, Li Zhou, Mahadevuni Naveen Kumar, Mariam Almesfer, Matt Karrmann, Minhan Cao, Natasha Sehgal, Nicholas Ormrod, Nidhin Varghese, Nikhil Collooru, Nivin C S, Patrick Sullivan, Pradeep Vaka, Pramod Satya, Prashant Sharma, Pratik Joseph Dabre, Rebecca Schlussel, Reetika Agrawal, Richard Barnes, Sagar Sumit, Sayari Mukherjee, Sergey Pershin, Shahad, Shahim Sharafudeen, Shakyan Kushwaha, Shang Ma, Shelton Cai, Steve Burnett, Swapnil, Timothy Meehan, Xiao Du, Xiaoxuan Meng, Yihong Wang, Ying, Yuanda (Yenda) Li, Zac Blanco, Zac Wen, aditi-pandit, ajay-kharat, auden-woolfson, dnskr, inf, jay.narale, librian415, namya28, shenh062326, sumi, vhsu14, wangd, wypb