CREATE-ASYNC-MATERIALIZED-VIEW

Name

CREATE ASYNC MATERIALIZED VIEW

Description

This statement is used to create an asynchronous materialized view.

syntax

  1. CREATE MATERIALIZED VIEW (IF NOT EXISTS)? mvName=multipartIdentifier
  2. (LEFT_PAREN cols=simpleColumnDefs RIGHT_PAREN)? buildMode?
  3. (REFRESH refreshMethod? refreshTrigger?)?
  4. (KEY keys=identifierList)?
  5. (COMMENT STRING_LITERAL)?
  6. (PARTITION BY LEFT_PAREN partitionKey = identifier RIGHT_PAREN)?
  7. (DISTRIBUTED BY (HASH hashKeys=identifierList | RANDOM) (BUCKETS (INTEGER_VALUE | AUTO))?)?
  8. propertyClause?
  9. AS query

illustrate

simpleColumnDefs

Used to define the materialized view column information, if not defined, it will be automatically derived

  1. simpleColumnDefs
  2. : cols+=simpleColumnDef (COMMA cols+=simpleColumnDef)*
  3. ;
  4. simpleColumnDef
  5. : colName=identifier (COMMENT comment=STRING_LITERAL)?
  6. ;

For example, define two columns aa and bb, where the annotation for aa is “name”

  1. CREATE MATERIALIZED VIEW mv1
  2. (aa comment "name",bb)
buildMode

Used to define whether the materialized view is refreshed immediately after creation, default to IMMEDIATE

IMMEDIATE:Refresh Now

DEFERRED:Delay refresh

  1. buildMode
  2. : BUILD (IMMEDIATE | DEFERRED)
  3. ;

For example, specifying the materialized view to refresh immediately

  1. CREATE MATERIALIZED VIEW mv1
  2. BUILD IMMEDIATE
refreshMethod

Used to define the refresh method for materialized views, default to AUTO

COMPLETE:Full refresh

AUTO:Try to refresh incrementally as much as possible. If incremental refresh is not possible, refresh in full

  1. refreshMethod
  2. : COMPLETE | AUTO
  3. ;

For example, specifying full refresh of materialized views

  1. CREATE MATERIALIZED VIEW mv1
  2. REFRESH COMPLETE
refreshTrigger

Trigger method for refreshing data in materialized views, default to MANUAL

MANUAL:Manual refresh

SCHEDULE:Timed refresh

  1. refreshTrigger
  2. : ON MANUAL
  3. | ON SCHEDULE refreshSchedule
  4. ;
  5. refreshSchedule
  6. : EVERY INTEGER_VALUE mvRefreshUnit (STARTS STRING_LITERAL)?
  7. ;
  8. mvRefreshUnit
  9. : MINUTE | HOUR | DAY | WEEK
  10. ;

For example: executed every 2 hours, starting from 21:07:09 on December 13, 2023

  1. CREATE MATERIALIZED VIEW mv1
  2. REFRESH ON SCHEDULE EVERY 2 HOUR STARTS "2023-12-13 21:07:09"
key

The materialized view is the DUPLICATE KEY model, therefore the specified columns are arranged in sequence

  1. identifierList
  2. : LEFT_PAREN identifierSeq RIGHT_PAREN
  3. ;
  4. identifierSeq
  5. : ident+=errorCapturingIdentifier (COMMA ident+=errorCapturingIdentifier)*
  6. ;

For example, specifying k1 and k2 as sorting sequences

  1. CREATE MATERIALIZED VIEW mv1
  2. KEY(k1,k2)
partition

There are two types of partitioning methods for materialized views. If no partitioning is specified, there will be a default single partition. If a partitioning field is specified, the system will automatically deduce the source base table of that field and synchronize all partitions of the base table (currently supporting OlapTable and hive). (Limitation: If the base table is an OlapTable, it can only have one partition field)

For example, if the base table is a range partition with a partition field of create_time and partitioning by day, and partition by(ct) as select create_time as ct from t1 is specified when creating a materialized view, then the materialized view will also be a range partition with a partition field of ‘ct’ and partitioning by day

The selection of partition fields and the definition of materialized views must meet the following constraints to be successfully created; otherwise, an error “Unable to find a suitable base table for partitioning” will occur:

  • At least one of the base tables used by the materialized view must be a partitioned table.
  • Partitioned tables used by the materialized view must employ list or range partitioning strategies.
  • The top-level partition column in the materialized view can only have one partition field.
  • The SQL of the materialized view needs to use partition columns from the base table.
  • If GROUP BY is used, the partition column fields must be after the GROUP BY.
  • If window functions are used, the partition column fields must be after the PARTITION BY.
  • Data changes should occur on partitioned tables. If they occur on non-partitioned tables, the materialized view needs to be fully rebuilt.
  • Using the fields that generate nulls in the JOIN as partition fields in the materialized view prohibits partition incremental updates.

property

The materialized view can specify both the properties of the table and the properties unique to the materialized view.

The properties unique to materialized views include:

grace_period: When performing query rewrites, there is a maximum allowed delay time (measured in seconds) for the data of the materialized view. If there is a discrepancy between the data of partition A and the base table, and the last refresh time of partition A of the materialized view was 1, while the current system time is 2, then this partition will not undergo transparent rewriting. However, if the grace_period is greater than or equal to 1, this partition will be used for transparent rewriting.

excluded_trigger_tables: Table names ignored during data refresh, separated by commas. For example, table1, table2

refresh_partition_num: The number of partitions refreshed by a single insert statement is set to 1 by default. When refreshing a materialized view, the system first calculates the list of partitions to be refreshed and then splits it into multiple insert statements that are executed in sequence according to this configuration. If any insert statement fails, the entire task will stop executing. The materialized view ensures the transactionality of individual insert statements, meaning that failed insert statements will not affect partitions that have already been successfully refreshed.

workload_group: The name of the workload_group used by the materialized view when performing refresh tasks. This is used to limit the resources used for refreshing data in the materialized view, in order to avoid affecting the operation of other business processes. For details on how to create and use workload_group, refer to WORKLOAD-GROUP

query

Create a query statement for the materialized view, and the result is the data in the materialized view

Random functions are not supported, for example:

  1. SELECT random() as dd,k3 FROM user

Example

  1. Create a materialized view mv1 that refreshes immediately and then once a week, with the data source being the hive catalog

    1. CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON SCHEDULE EVERY 1 WEEK
    2. DISTRIBUTED BY RANDOM BUCKETS 2
    3. PROPERTIES (
    4. "replication_num" = "1"
    5. )
    6. AS SELECT * FROM hive_catalog.db1.user;
  2. Create a materialized view with multiple table joins

    1. CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON SCHEDULE EVERY 1 WEEK
    2. DISTRIBUTED BY RANDOM BUCKETS 2
    3. PROPERTIES (
    4. "replication_num" = "1"
    5. )
    6. AS select user.k1,user.k3,com.k4 from user join com on user.k1=com.k1;

Keywords

  1. CREATE, ASYNC, MATERIALIZED, VIEW