CreateLookupVindex
This guide follows on from the Get Started guides. Please make sure that you have an Operator or local installation ready. Make sure you are at the point where you have the sharded keyspace called customer
setup.
CreateLookupVindex is a VReplication workflow used to create and backfill a lookup Vindex automatically for a table that already exists, and may have a significant amount of data in it already.
Internally, the CreateLookupVindex process uses VReplication for the backfill process, until the lookup Vindex is “in sync”. Then the normal process for adding/deleting/updating rows in the lookup Vindex via the usual transactional flow when updating the “owner” table for the Vindex takes over.
In this guide, we will walk through the process of using the CreateLookupVindex workflow, and give some insight into what happens underneath the covers.
The CreateLookupVindex
vtctl
client command has the following syntax:
CreateLookupVindex -- [--cells=<source_cells>] [--continue_after_copy_with_owner=false] [--tablet_types=<source_tablet_types>] <keyspace> <json_spec>
<json_spec>
: Use the lookup Vindex specified in<json_spec>
along with VReplication to populate/backfill the lookup Vindex from the source table.<keyspace>
: The Vitess keyspace we are creating the lookup Vindex in. The source table is expected to also be in this keyspace.--tablet-types
: Provided to specify the tablet types (e.g.PRIMARY
,REPLICA
,RDONLY
) that are acceptable as source tablets for the VReplication stream(s) that this command will create. If not specified, the tablet type used will default to the value of the vttablet —vreplication_tablet_type flag value, which defaults toin_order:REPLICA,PRIMARY
.--cells
: By default VReplication streams, such as used byCreateLookupVindex
, will not cross cell boundaries. If you want the VReplication streams to source their data from tablets in cells other than the local cell, you can use the--cells
option to specify a comma-separated list of cells (see VReplication tablet selection).--continue_after_copy_with_owner
: By default, when an owner is provided in the<json_spec>
, the VReplication streams will stop after the backfill completes. Specify this flag if you don’t want this to happen. This is useful if, for example, the owner table is being migrated from an unsharded keyspace to a sharded keyspace using MoveTables.
The <json_spec>
describes the lookup Vindex to be created, and details about the table it is to be created against (on which column, etc.). However, you do not have to specify details about the actual lookup table, Vitess will create that automatically based on the type of the column you are creating the Vindex column on, etc.
In the context of the customer
database that is part of the Vitess examples we started earlier, let’s add some rows into the customer.corder
table, and then look at an example <json_spec>
:
$ mysql -P 15306 -h 127.0.0.1 -u root --binary-as-hex=false -A
Welcome to the MySQL monitor. Commands end with ; or \g.
...
mysql> use customer;
Database changed
mysql> show tables;
+-----------------------+
| Tables_in_vt_customer |
+-----------------------+
| corder |
| customer |
+-----------------------+
2 rows in set (0.00 sec)
mysql> desc corder;
+-------------+----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------------+------+-----+---------+-------+
| order_id | bigint | NO | PRI | NULL | |
| customer_id | bigint | YES | | NULL | |
| sku | varbinary(128) | YES | | NULL | |
| price | bigint | YES | | NULL | |
+-------------+----------------+------+-----+---------+-------+
4 rows in set (0.01 sec)
mysql> insert into corder (order_id, customer_id, sku, price) values (1, 1, "Product_1", 100);
Query OK, 1 row affected (0.01 sec)
mysql> insert into corder (order_id, customer_id, sku, price) values (2, 1, "Product_2", 101);
Query OK, 1 row affected (0.01 sec)
mysql> insert into corder (order_id, customer_id, sku, price) values (3, 2, "Product_3", 102);
Query OK, 1 row affected (0.01 sec)
mysql> insert into corder (order_id, customer_id, sku, price) values (4, 3, "Product_4", 103);
Query OK, 1 row affected (0.01 sec)
mysql> insert into corder (order_id, customer_id, sku, price) values (5, 4, "Product_5", 104);
Query OK, 1 row affected (0.03 sec)
mysql> select * from corder;
+----------+-------------+-----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+-----------+-------+
| 1 | 1 | Product_1 | 100 |
| 2 | 1 | Product_2 | 101 |
| 3 | 2 | Product_3 | 102 |
| 4 | 3 | Product_4 | 103 |
| 5 | 4 | Product_5 | 104 |
+----------+-------------+-----------+-------+
5 rows in set (0.01 sec)
If we look at the VSchema for the customer.corder
table, we will see there is a hash
index on the customer_id
column:
$ vtctldclient GetVSchema customer
{
"sharded": true,
"vindexes": {
"hash": {
"type": "hash",
"params": {},
"owner": ""
}
},
"tables": {
"corder": {
"type": "",
"column_vindexes": [
{
"column": "customer_id",
"name": "hash",
"columns": []
}
],
"auto_increment": {
"column": "order_id",
"sequence": "order_seq"
},
"columns": [],
"pinned": "",
"column_list_authoritative": false,
"source": ""
},
"customer": {
"type": "",
"column_vindexes": [
{
"column": "customer_id",
"name": "hash",
"columns": []
}
],
"auto_increment": {
"column": "customer_id",
"sequence": "customer_seq"
},
"columns": [],
"pinned": "",
"column_list_authoritative": false,
"source": ""
}
},
"require_explicit_routing": false
}
We can now see that 4 of our 5 rows have ended up on the -80
shard with the 5th row on the 80-
shard:
mysql> use customer/-80
Database changed
mysql> select * from corder;
+----------+-------------+-----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+-----------+-------+
| 1 | 1 | Product_1 | 100 |
| 2 | 1 | Product_2 | 101 |
| 3 | 2 | Product_3 | 102 |
| 4 | 3 | Product_4 | 103 |
+----------+-------------+-----------+-------+
4 rows in set (0.00 sec)
mysql> use customer/80-
Database changed
mysql> select * from corder;
+----------+-------------+-----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+-----------+-------+
| 5 | 4 | Product_5 | 104 |
+----------+-------------+-----------+-------+
1 row in set (0.01 sec)
Note that this skewed distribution is completely coincidental — for larger numbers of rows we would expect the distribution to be approximately even for a hash
index.
Now let’s say we want to add a lookup Vindex on the sku
column. We can use a consistent_lookup or consistent_lookup_unique Vindex type. In our example we will use consistent_lookup_unique
.
Here is our example <json_spec>
:
$ cat lookup_vindex.json
{
"sharded": true,
"vindexes": {
"corder_lookup": {
"type": "consistent_lookup_unique",
"params": {
"table": "customer.corder_lookup",
"from": "sku",
"to": "keyspace_id"
},
"owner": "corder"
}
},
"tables": {
"corder": {
"column_vindexes": [
{
"column": "sku",
"name": "corder_lookup"
}
]
}
}
}
Note that as mentioned above, we do not have to tell Vitess about how to shard the actual backing table for the lookup Vindex or any schema to create as it will do it automatically. Now, let us actually execute the CreateLookupVindex
command:
$ vtctlclient --server localhost:15999 CreateLookupVindex -- --tablet_types=RDONLY customer "$(cat lookup_vindex.json)"
Note:
- We are specifying a tablet_type of
RDONLY
; meaning it is going to run the VReplication streams from tablets of theRDONLY
type only. If tablets of this type cannot be found, in a shard, the lookup Vindex population will fail.
Now, in our case, the table is tiny, so the copy will be instant, but in a real-world case this might take hours. To monitor the process, we can use the usual VReplication commands. However, the VReplication status commands needs to operate on individual tablets. Let’s check which tablets we have in our environment, so we know which tablets to issue commands against:
$ vtctldclient --server localhost:15999 GetTablets --keyspace customer
zone1-0000000300 customer -80 primary localhost:15300 localhost:17300 [] 2020-08-13T01:23:15Z
zone1-0000000301 customer -80 replica localhost:15301 localhost:17301 [] <null>
zone1-0000000302 customer -80 rdonly localhost:15302 localhost:17302 [] <null>
zone1-0000000400 customer 80- primary localhost:15400 localhost:17400 [] 2020-08-13T01:23:15Z
zone1-0000000401 customer 80- replica localhost:15401 localhost:17401 [] <null>
zone1-0000000402 customer 80- rdonly localhost:15402 localhost:17402 [] <null>
Now we can look what happened in greater detail:
- VReplication streams were setup from the primary tablets
zone1-0000000300
andzone1-0000000400
; pulling data from theRDONLY
source tabletszone1-0000000302
andzone1-0000000402
. - Note that each primary tablet will start streams from each source tablet, for a total of 4 streams in this case.
Lets observe the VReplication streams that got created using the vtctlclient Workflow show
command.
The created vreplication workflow will have a generated name of <target_table_name>_vdx
. So in our example here: corder_lookup_vdx
.
$ vtctlclient --server localhost:15999 Workflow customer.corder_lookup_vdx show
{
"Workflow": "corder_lookup_vdx",
"SourceLocation": {
"Keyspace": "customer",
"Shards": [
"-80",
"80-"
]
},
"TargetLocation": {
"Keyspace": "customer",
"Shards": [
"-80",
"80-"
]
},
"MaxVReplicationLag": 78,
"MaxVReplicationTransactionLag": 1674479901,
"Frozen": false,
"ShardStatuses": {
"-80/zone1-0000000300": {
"PrimaryReplicationStatuses": [
{
"Shard": "-80",
"Tablet": "zone1-0000000300",
"ID": 1,
"Bls": {
"keyspace": "customer",
"shard": "-80",
"filter": {
"rules": [
{
"match": "corder_lookup",
"filter": "select sku as sku, keyspace_id() as keyspace_id from corder where in_keyrange(sku, 'customer.binary_md5', '-80') group by sku, keyspace_id"
}
]
},
"stop_after_copy": true
},
"Pos": "cb8ae288-9b1f-11ed-84ff-04ed332e05c2:1-117",
"StopPos": "",
"State": "Stopped",
"DBName": "vt_customer",
"TransactionTimestamp": 0,
"TimeUpdated": 1674479823,
"TimeHeartbeat": 0,
"TimeThrottled": 0,
"ComponentThrottled": "",
"Message": "Stopped after copy.",
"Tags": "",
"WorkflowType": "CreateLookupIndex",
"WorkflowSubType": "None",
"CopyState": null
},
{
"Shard": "-80",
"Tablet": "zone1-0000000300",
"ID": 2,
"Bls": {
"keyspace": "customer",
"shard": "80-",
"filter": {
"rules": [
{
"match": "corder_lookup",
"filter": "select sku as sku, keyspace_id() as keyspace_id from corder where in_keyrange(sku, 'customer.binary_md5', '-80') group by sku, keyspace_id"
}
]
},
"stop_after_copy": true
},
"Pos": "de051c70-9b1f-11ed-832d-04ed332e05c2:1-121",
"StopPos": "",
"State": "Stopped",
"DBName": "vt_customer",
"TransactionTimestamp": 0,
"TimeUpdated": 1674479823,
"TimeHeartbeat": 0,
"TimeThrottled": 0,
"ComponentThrottled": "",
"Message": "Stopped after copy.",
"Tags": "",
"WorkflowType": "CreateLookupIndex",
"WorkflowSubType": "None",
"CopyState": null
}
],
"TabletControls": null,
"PrimaryIsServing": true
},
"80-/zone1-0000000401": {
"PrimaryReplicationStatuses": [
{
"Shard": "80-",
"Tablet": "zone1-0000000401",
"ID": 1,
"Bls": {
"keyspace": "customer",
"shard": "-80",
"filter": {
"rules": [
{
"match": "corder_lookup",
"filter": "select sku as sku, keyspace_id() as keyspace_id from corder where in_keyrange(sku, 'customer.binary_md5', '80-') group by sku, keyspace_id"
}
]
},
"stop_after_copy": true
},
"Pos": "cb8ae288-9b1f-11ed-84ff-04ed332e05c2:1-117",
"StopPos": "",
"State": "Stopped",
"DBName": "vt_customer",
"TransactionTimestamp": 0,
"TimeUpdated": 1674479823,
"TimeHeartbeat": 0,
"TimeThrottled": 0,
"ComponentThrottled": "",
"Message": "Stopped after copy.",
"Tags": "",
"WorkflowType": "CreateLookupIndex",
"WorkflowSubType": "None",
"CopyState": null
},
{
"Shard": "80-",
"Tablet": "zone1-0000000401",
"ID": 2,
"Bls": {
"keyspace": "customer",
"shard": "80-",
"filter": {
"rules": [
{
"match": "corder_lookup",
"filter": "select sku as sku, keyspace_id() as keyspace_id from corder where in_keyrange(sku, 'customer.binary_md5', '80-') group by sku, keyspace_id"
}
]
},
"stop_after_copy": true
},
"Pos": "de051c70-9b1f-11ed-832d-04ed332e05c2:1-123",
"StopPos": "",
"State": "Stopped",
"DBName": "vt_customer",
"TransactionTimestamp": 0,
"TimeUpdated": 1674479823,
"TimeHeartbeat": 0,
"TimeThrottled": 0,
"ComponentThrottled": "",
"Message": "Stopped after copy.",
"Tags": "",
"WorkflowType": "CreateLookupIndex",
"WorkflowSubType": "None",
"CopyState": null
}
],
"TabletControls": null,
"PrimaryIsServing": true
}
},
"SourceTimeZone": "",
"TargetTimeZone": ""
}
There is a lot going on in this output, but the most important parts are the state
and message
fields which say Stopped
and Stopped after copy.
for all four of the streams. This means that the VReplication streams finished their copying/backfill of the lookup table.
Note that if the tables were large and the copy was still in progress, the state
field would say Copying
— you can see the state/progress as part of Workflow show
json output.
We can verify the result of the backfill by looking at the customer
keyspace again in the MySQL client:
mysql> show tables;
+-----------------------+
| Tables_in_vt_customer |
+-----------------------+
| corder |
| corder_lookup |
| customer |
+-----------------------+
3 rows in set (0.01 sec)
Note there is now a new table, corder_lookup
; which was created as the backing table for the lookup Vindex. Lets look at this table:
mysql> desc corder_lookup;
+-------------+----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------------+------+-----+---------+-------+
| sku | varbinary(128) | NO | PRI | NULL | |
| keyspace_id | varbinary(128) | YES | | NULL | |
+-------------+----------------+------+-----+---------+-------+
2 rows in set (0.01 sec)
mysql> select sku, hex(keyspace_id) from corder_lookup;
+-----------+------------------+
| sku | hex(keyspace_id) |
+-----------+------------------+
| Product_2 | 166B40B44ABA4BD6 |
| Product_3 | 06E7EA22CE92708F |
| Product_1 | 166B40B44ABA4BD6 |
| Product_4 | 4EB190C9A2FA169C |
| Product_5 | D2FD8867D50D2DFE |
+-----------+------------------+
Basically, this shows exactly what we expected. Now, we have to clean-up the artifacts of the backfill. The ExternalizeVindex
command will delete the VReplication streams and also clear the write_only
flag from the Vindex indicating that it is not backfilling anymore.
$ vtctlclient --server localhost:15999 ExternalizeVindex customer.corder_lookup
Next, to confirm the lookup Vindex is doing what we think it should, we can use the vexplain plan SQL statement:
mysql> vexplain plan select * from corder where customer_id = 1;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| JSON |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
"OperatorType": "Route",
"Variant": "EqualUnique",
"Keyspace": {
"Name": "customer",
"Sharded": true
},
"FieldQuery": "select order_id, customer_id, sku, price from corder where 1 != 1",
"Query": "select order_id, customer_id, sku, price from corder where customer_id = 1",
"Table": "corder",
"Values": [
"INT64(1)"
],
"Vindex": "hash"
} |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Since the above select
statement is doing a lookup using the primary Vindex on the corder
table, this query does not Scatter (variant is SelectEqualUnique
), as expected. Let’s try a scatter query to see what that looks like:
mysql> vexplain select * from corder;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| JSON |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
"OperatorType": "Route",
"Variant": "Scatter",
"Keyspace": {
"Name": "customer",
"Sharded": true
},
"FieldQuery": "select order_id, customer_id, sku, price from corder where 1 != 1",
"Query": "select order_id, customer_id, sku, price from corder",
"Table": "corder"
} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
So now we see the expectied variant of SelectScatter
for a scatter query. Let’s try a lookup on a column that does not have a primary or secondary (lookup) Vindex, e.g. the price
column:
mysql> vexplain select * from corder where price = 103\G
*************************** 1. row ***************************
JSON: {
"OperatorType": "Route",
"Variant": "Scatter",
"Keyspace": {
"Name": "customer",
"Sharded": true
},
"FieldQuery": "select order_id, customer_id, sku, price from corder where 1 != 1",
"Query": "select order_id, customer_id, sku, price from corder where price = 103",
"Table": "corder"
}
1 row in set (0.00 sec)
That also scatters, as expected, because there’s no Vindex on the column.
Now, let’s try a lookup on the sku
column, which we have created our lookup Vindex on:
mysql> vexplain select * from corder where sku = "Product_1";
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| JSON |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
"OperatorType": "VindexLookup",
"Variant": "EqualUnique",
"Keyspace": {
"Name": "customer",
"Sharded": true
},
"Values": [
"VARCHAR(\"Product_1\")"
],
"Vindex": "corder_lookup",
"Inputs": [
{
"OperatorType": "Route",
"Variant": "IN",
"Keyspace": {
"Name": "customer",
"Sharded": true
},
"FieldQuery": "select sku, keyspace_id from corder_lookup where 1 != 1",
"Query": "select sku, keyspace_id from corder_lookup where sku in ::__vals",
"Table": "corder_lookup",
"Values": [
":sku"
],
"Vindex": "binary_md5"
},
{
"OperatorType": "Route",
"Variant": "ByDestination",
"Keyspace": {
"Name": "customer",
"Sharded": true
},
"FieldQuery": "select order_id, customer_id, sku, price from corder where 1 != 1",
"Query": "select order_id, customer_id, sku, price from corder where sku = 'Product_1'",
"Table": "corder"
}
]
} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
As expected, we can see it is not scattering anymore, which it would have before we executed the CreateLookupVindex
command.
Lastly, let’s ensure that the lookup Vindex is being updated appropriately when we insert and delete rows:
mysql> select * from corder;
+----------+-------------+-----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+-----------+-------+
| 5 | 4 | Product_5 | 104 |
| 1 | 1 | Product_1 | 100 |
| 2 | 1 | Product_2 | 101 |
| 3 | 2 | Product_3 | 102 |
| 4 | 3 | Product_4 | 103 |
+----------+-------------+-----------+-------+
5 rows in set (0.00 sec)
mysql> delete from corder where customer_id = 1 and sku = "Product_1";
Query OK, 1 row affected (0.03 sec)
mysql> select * from corder;
+----------+-------------+-----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+-----------+-------+
| 2 | 1 | Product_2 | 101 |
| 3 | 2 | Product_3 | 102 |
| 4 | 3 | Product_4 | 103 |
| 5 | 4 | Product_5 | 104 |
+----------+-------------+-----------+-------+
4 rows in set (0.01 sec)
mysql> select sku, hex(keyspace_id) from corder_lookup;
+-----------+------------------+
| sku | hex(keyspace_id) |
+-----------+------------------+
| Product_4 | 4EB190C9A2FA169C |
| Product_5 | D2FD8867D50D2DFE |
| Product_2 | 166B40B44ABA4BD6 |
| Product_3 | 06E7EA22CE92708F |
+-----------+------------------+
4 rows in set (0.01 sec)
We deleted a row from the corder
table, and the matching lookup Vindex row is gone. Now we can try adding a row:
mysql> insert into corder (order_id, customer_id, sku, price) values (6, 1, "Product_6", 105);
Query OK, 1 row affected (0.02 sec)
mysql> select * from corder;
+----------+-------------+-----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+-----------+-------+
| 2 | 1 | Product_2 | 101 |
| 3 | 2 | Product_3 | 102 |
| 4 | 3 | Product_4 | 103 |
| 6 | 1 | Product_6 | 105 |
| 5 | 4 | Product_5 | 104 |
+----------+-------------+-----------+-------+
5 rows in set (0.00 sec)
mysql> select sku, hex(keyspace_id) from corder_lookup;
+-----------+------------------+
| sku | hex(keyspace_id) |
+-----------+------------------+
| Product_4 | 4EB190C9A2FA169C |
| Product_5 | D2FD8867D50D2DFE |
| Product_6 | 166B40B44ABA4BD6 |
| Product_2 | 166B40B44ABA4BD6 |
| Product_3 | 06E7EA22CE92708F |
+-----------+------------------+
5 rows in set (0.00 sec)
We added a new row to the corder
table, and now we have a new row in the lookup table!