Materialized Views¶
Introduction¶
A materialized view stores the results of a query physically, unlike regular views which are virtual. Queries can read pre-computed results instead of re-executing the underlying query against base tables.
Materialized views improve performance for expensive queries by calculating results once during refresh rather than on every query. Common use cases include aggregations, joins, and filtered subsets of large tables.
Warning
Materialized views are experimental. The SPI and behavior may change in future releases. Use at your own risk in production environments.
To enable materialized views, set experimental.legacy-materialized-views = false
in your configuration properties.
Alternatively, you can use SET SESSION legacy_materialized_views = false to enable them for a session,
but only if experimental.allow-legacy-materialized-views-toggle = true
is set in the server configuration. The toggle option should only be used in non-production environments
for testing and migration purposes.
Security Modes¶
When legacy_materialized_views=false, materialized views support SQL standard security modes
that control whose permissions are used when querying the view:
- SECURITY DEFINER
The materialized view executes with the permissions of the user who created it. This is the default mode and matches the behavior of most SQL systems. When using DEFINER mode, column masks and row filters on base tables are permitted.
- SECURITY INVOKER
The materialized view executes with the permissions of the user querying it. Each user must have appropriate permissions on the underlying base tables. When using INVOKER mode and there are column masks or row filters on the base tables, the materialized view is treated as stale, since the data would vary by user.
The default security mode can be configured using the default_view_security_mode session
property. When the SECURITY clause is not specified in CREATE MATERIALIZED VIEW, this
default is used.
Note
The REFRESH operation always uses DEFINER rights regardless of the view’s security mode.
Stale Data Handling¶
Connectors report the freshness state of materialized views to the engine. When a materialized view is stale (base tables have been modified since the data was last known to be fresh), the engine determines how to handle the query based on configuration.
Connectors can configure staleness handling per materialized view, including a behavior setting and staleness tolerance window. See connector-specific documentation for details (for example, Iceberg).
When no per-view configuration is specified, the default behavior is USE_VIEW_QUERY (Presto
falls back to executing the underlying view query against the base tables). This can be changed
using the materialized_view_stale_read_behavior session property or the
materialized-view-stale-read-behavior configuration property. Setting it to FAIL causes
the query to fail with an error when the materialized view is stale.
Required Permissions¶
The following permissions are required for materialized view operations when
legacy_materialized_views=false:
- CREATE MATERIALIZED VIEW
CREATE TABLEpermissionCREATE VIEWpermission
- REFRESH MATERIALIZED VIEW
INSERTpermission (to write new data)DELETEpermission (to remove old data)
- DROP MATERIALIZED VIEW
DROP TABLEpermissionDROP VIEWpermission
- Querying a materialized view
For DEFINER mode: User needs
SELECTpermission on the view itself. Additionally, the view owner must haveCREATE_VIEW_WITH_SELECT_COLUMNSpermission on base tables when non-owners query the view to prevent privilege escalation.For INVOKER mode: User needs
SELECTpermission on all underlying base tables
Data Consistency Modes¶
Materialized views support three data consistency modes that control how queries are optimized when the view’s data may be stale:
- USE_STITCHING (default)
Reads fresh data from storage, recomputes stale data from base tables, and combines results via UNION.
- FAIL
Fails the query if the materialized view is stale.
- USE_VIEW_QUERY
Executes the view query against base tables. Always fresh but highest cost.
Set via session property:
SET SESSION materialized_view_skip_storage = 'USE_STITCHING';
Predicate Stitching (USE_STITCHING Mode)¶
Overview¶
Predicate stitching recomputes only stale data rather than the entire view. When base tables change, Presto identifies which data is affected and generates a UNION query that combines:
Storage scan: Reads unchanged (fresh) data from the materialized view’s storage
Recompute branch: Recomputes changed (stale) data from base tables using the view’s defining query
This avoids full recomputation when only a subset of data is stale, though there is overhead from the UNION operation and predicate-based filtering.
How It Works¶
Staleness Detection
For each base table referenced in the materialized view, a connector may track which data has changed since the last refresh and return predicates identifying the stale data. The specific mechanism depends on the connector:
At refresh time, metadata is recorded (implementation varies by connector)
When the view is queried, the current state is compared with the recorded state
Predicates are built that identify exactly which data is stale
See the connector-specific documentation for details on how staleness is tracked. For Iceberg tables, see Materialized Views.
Query Rewriting
When a query uses a materialized view with stale data, the optimizer rewrites the query to use UNION:
-- Original query
SELECT * FROM my_materialized_view WHERE order_date >= '2024-01-01'
-- Rewritten with predicate stitching (example using partition predicates)
SELECT * FROM (
-- Fresh partitions from storage
SELECT * FROM my_materialized_view_storage
WHERE order_date >= '2024-01-01'
AND order_date NOT IN ('2024-01-15', '2024-01-16') -- Exclude stale
UNION ALL
-- Stale partitions recomputed
SELECT o.order_id, c.customer_name, o.order_date
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
AND o.order_date = c.reg_date
WHERE o.order_date IN ('2024-01-15', '2024-01-16') -- Stale partition filter
AND c.reg_date IN ('2024-01-15', '2024-01-16') -- Propagated via equivalence
AND o.order_date >= '2024-01-01' -- Original filter preserved
)
The partition predicate is propagated to equivalent columns in joined tables (in this case,
c.reg_date), allowing partition pruning on the customers table as well.
Requirements¶
For predicate stitching to work effectively, the following requirements must be met:
Predicate Mapping Requirement
The connector must be able to express staleness as predicates that can be mapped to the materialized view’s columns. The specific requirements depend on the connector implementation. For partition-based connectors (like Iceberg), this typically means:
Base table partition columns must appear in the SELECT list or be equivalent to columns that do
The materialized view should be partitioned on the same or equivalent columns
Partition columns must use compatible data types
See connector-specific documentation for details on staleness tracking requirements.
Unsupported Query Patterns
Predicate stitching does not work with:
Outer joins: LEFT, RIGHT, and FULL OUTER joins
Non-deterministic functions:
RANDOM(),NOW(),UUID(), etc.
Security Constraints
For SECURITY INVOKER materialized views, predicate stitching requires that:
No column masks are defined on base tables (or the view is treated as fully stale)
No row filters are defined on base tables (or the view is treated as fully stale)
This is because column masks and row filters can vary by user, making it impossible to determine staleness in a user-independent way.
Column Equivalences and Passthrough Columns¶
Predicate stitching supports passthrough columns through column equivalences, which allows tracking staleness even when predicate columns from base tables are not directly in the materialized view’s output.
Column Equivalence
When tables are joined with equality predicates, those columns become equivalent for predicate propagation purposes. This applies to any type of staleness predicate (partition-based, snapshot-based, etc.). For example with partition predicates:
CREATE TABLE orders (order_id BIGINT, customer_id BIGINT, order_date VARCHAR)
WITH (partitioning = ARRAY['order_date']);
CREATE TABLE customers (customer_id BIGINT, name VARCHAR, reg_date VARCHAR)
WITH (partitioning = ARRAY['reg_date']);
-- Materialized view with equivalence: order_date = reg_date
CREATE MATERIALIZED VIEW order_summary
WITH (partitioning = ARRAY['order_date'])
AS
SELECT o.order_id, c.name, o.order_date
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
AND o.order_date = c.reg_date; -- Creates equivalence
In this example:
orders.order_dateandcustomers.reg_dateare equivalent due to the equality join conditionEven though
reg_dateis not in the SELECT list, staleness can be tracked through the equivalence toorder_dateWhen
customerstable changes in partitionreg_date='2024-01-15', this maps toorder_date='2024-01-15'for recomputation
How Passthrough Mapping Works
Equivalence Extraction: During materialized view creation, Presto analyzes JOIN conditions to identify column equivalences
Staleness Detection: When a base table changes:
The connector detects which data changed in the base table and returns predicates
For passthrough columns, predicates are mapped through equivalences
Example:
customers.reg_date='2024-01-15'→orders.order_date='2024-01-15'
Predicate Application: The mapped predicates are used in:
Storage scan: Exclude data where equivalent columns match stale values
Recompute branch: Filter the stale table using the staleness predicate
Joined tables: Propagate the predicate to equivalent columns in joined tables, enabling pruning on those tables as well
Requirements for Passthrough Columns
Join must be an INNER JOIN (not LEFT, RIGHT, or FULL OUTER)
Equality must be direct (
col1 = col2), not through expressions likecol1 = col2 + 1At least one column in the equivalence class must be in the materialized view’s output
Data types must be compatible
Transitive Equivalences
Multiple equivalences can be chained together. If A.x = B.y and B.y = C.z, then
A.x, B.y, and C.z are all equivalent for predicate propagation.
Unsupported Patterns¶
Predicate stitching is not applied in the following cases:
No staleness predicates available: If the connector cannot provide staleness predicates
Predicate columns not preserved: If predicate columns are transformed or not mappable to the materialized view’s output
Outer joins with passthrough: LEFT, RIGHT, and FULL OUTER joins invalidate passthrough equivalences due to null handling
Expression-based equivalences:
CAST(col1 AS DATE) = col2orcol1 = col2 + 1
When predicate stitching cannot be applied, the behavior falls back to the configured consistency mode:
If
USE_STITCHINGis set but stitching is not possible, the query falls back to full recompute (equivalent toUSE_VIEW_QUERY)A warning may be logged indicating why stitching was not possible
Performance Considerations¶
When Stitching is Most Effective
Large materialized views: More benefit from avoiding full recomputation
Localized changes: When only a small fraction of data is stale
Frequently refreshed: When most data remains fresh between queries
Well-structured data: When staleness predicates align with data modification patterns
Cost Trade-offs
Predicate stitching introduces a UNION operation, which has overhead:
Storage scan overhead: Reading from storage + filtering fresh data
Recompute overhead: Querying base tables + filtering stale data
Union overhead: Combining results from both branches
However, this is typically much cheaper than:
Full recompute: Reading all base table data
Stale data: Returning incorrect results
Optimization Tips
Predicate granularity: For partition-based connectors, choose partition columns that align with data modification patterns
Too coarse (e.g., partitioning by year): Recomputes too much data
Too fine (e.g., partitioning by second): Too many partitions to manage
Refresh frequency: Balance freshness needs with refresh costs
More frequent refreshes: Less recomputation per query, but higher refresh costs
Less frequent refreshes: More recomputation per query, but lower refresh costs
Query filters: Include predicate columns in query filters when possible:
-- Good: Limits scan to relevant data SELECT * FROM mv WHERE order_date >= '2024-01-01' -- Less optimal: Scans all data SELECT * FROM mv WHERE customer_id = 12345Monitor metrics: Track the ratio of storage scan vs recompute:
High recompute ratio: Consider more frequent refreshes or better staleness granularity
High storage scan ratio: Stitching is working efficiently
See Also¶
CREATE MATERIALIZED VIEW, DROP MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, SHOW CREATE MATERIALIZED VIEW