Materialize
Materialize the results of a query into a table
Description
Materialize is a lower level vreplication command that allows for generalized materialization of tables. The target tables can be copies, aggregations, or views. The target tables are kept in sync in near-realtime.
You can specify multiple tables to materialize using the create sub-command’s --table-settings
flag.
Be careful to avoid using the INSTANT ADD COLUMN
feature in MySQL 8.0+ with materialization source tables as this can cause the vreplication based materialization workflow to break.
The Basic Materialize Workflow Lifecycle
- Initiate the migration using
Materialize
Monitor the workflow using
show
orstatus
Materialize --target-keyspace <target-keyspace> show --workflow <workflow>
Materialize --target-keyspace <target-keyspace> status --workflow <workflow>
Start accessing your views once the workflow has started Replicating
Command
Please see the Materialize command reference for a full list of sub-commands and their flags.
Example
vtctldclient --server localhost:15999 Materialize --workflow product_sales --target-keyspace commerce create --source-keyspace commerce --table-settings '[{"target_table": "sales_by_sku", "create_ddl": "create table sales_by_sku (sku varbinary(128) not null primary key, orders bigint, revenue bigint)", "source_expression": "select sku, count(*) as orders, sum(price) as revenue from corder group by sku"}]' --cells zone1 --cells zone2 --tablet-types replica
Parameters
Action
Materialize is an “umbrella” command. The action or sub-command defines the operation on the workflow.
Options
Each action or sub-command has additional options/parameters that can be used to modify its behavior. Please see the command’s reference docs for the full list of command options or flags. Below we will add additional information for a subset of key options.
--cells
optional
default local cell
A comma-separated list of cell names or cell aliases. This list is used by VReplication to determine which cells should be used to pick a tablet for selecting data from the source keyspace.
Uses
- Improve performance by using picking a tablet in cells in network proximity with the target
- To reduce bandwidth costs by skipping cells that are in different availability zones
- Select cells where replica lags are lower
--tablet-types
optional
default --vreplication_tablet_type
parameter value for the tablet. --vreplication_tablet_type
has the default value of “in_order:REPLICA,PRIMARY”.
string
Source tablet types to replicate from (e.g. PRIMARY, REPLICA, RDONLY). The value specified impacts tablet selection for the workflow.
Uses
- To reduce the load on PRIMARY tablets by using REPLICAs or RDONLYs
- Reducing lag by pointing to PRIMARY
--table-settings
required
JSON
This is a JSON array where each value must contain two key/value pairs. The first required key is ‘target_table’ and it is the name of the table in the target-keyspace to store the results in. The second required key is ‘source_expression’ and its value is the select query to run against the source table. An optional key/value pair can also be specified for ‘create_ddl’ which provides the DDL to create the target table if it does not exist – you can alternatively specify a value of ‘copy’ if the target table schema should be copied as-is from the source keyspace. Here’s an example value for table-settings:
[
{
"target_table": "customer_one_email",
"source_expression": "select email from customer where customer_id = 1"
},
{
"target_table": "states",
"source_expression": "select * from states",
"create_ddl": "copy"
},
{
"target_table": "sales_by_sku",
"source_expression": "select sku, count(*) as orders, sum(price) as revenue from corder group by sku",
"create_ddl": "create table sales_by_sku (sku varbinary(128) not null primary key, orders bigint, revenue bigint)"
}
]
Notes
There are special commands to perform common materialization tasks and you should prefer them to using Materialize
directly.
- If you just want to copy tables to a different keyspace use MoveTables
- If you want to change sharding strategies use Reshard instead