Altering tables

Altering tables

Note

ALTER COLUMN and DROP COLUMN actions are not currently supported. See SQL compatibility.

Updating parameters

The parameters of a table can be modified using the ALTER TABLE clause:

cr> alter table my_table set (number_of_replicas = '0-all');
ALTER OK, -1 rows affected (... sec)

In order to set a parameter to its default value use reset:

cr> alter table my_table reset (number_of_replicas);
ALTER OK, -1 rows affected (... sec)

Changing the number of shards

Changing the number of shards in general works in the following steps.

A new target table is created but with more/less number of primary shards.
The segments from the source table (the underling Lucene index to be precise) are hard-linked into the target table at file system level.
The source table is dropped while the new table is renamed into the source and then recovered in the cluster.

Note

Segment hard-linking makes this operation relevantly cheap as it involves no data copying. If the file system, however, does not support hard-linking, then all segments will be copied into the new table, resulting in much more time and resource consuming operation.

To change the number of primary shards of a table, it is necessary to first satisfy certain conditions.

Decreasing the number of shards

To decrease the number of shards, it is necessary to ensure the following two conditions:

First, a (primary or replica) copy of every shard of the table must be present on the same node. The user can choose the most suitable node for this operation and then restrict table shard allocation on that node using the shard allocation filtering.

The second condition for decreasing a table’s number of shards is to block write operations to the table:

cr> alter table my_table set ("blocks.write" = true);
ALTER OK, -1 rows affected (... sec)

Afterwards the number of shards can be decreased:

cr> alter table my_table set (number_of_shards = 1);
ALTER OK, 0 rows affected (... sec)

The user should then revert the restrictions applied on the table, for instance

cr> alter table my_table reset ("routing.allocation.require._name", "blocks.write");
ALTER OK, -1 rows affected (... sec)

It is necessary to use a factor of the current number of primary shards as the target number of shards. For example, a table with 8 shards can be shrunk into 4, 2 or 1 primary shards.

Increase the number of shards

Increasing the number of shards is limited to tables which have been created with a number_of_routing_shards setting. For such tables the shards can be increased by a factor that depends on this setting. For example, a table with 5 shards, with number_of_routing_shards set to 20 can be changed to have either 10 or 20 shards. (5 x 2 (x 2)) = 20 or (5 x 4) = 20.

The only condition required for increasing the number of shards is to block operations to the table:

cr> alter table my_table set ("blocks.write" = true);
ALTER OK, -1 rows affected (... sec)

Afterwards, the table shards can be increased:

cr> alter table my_table set (number_of_shards = 2);
ALTER OK, 0 rows affected (... sec)

Similarly, the user should revert the restrictions applied on the table, for instance:

cr> alter table my_table set ("blocks.write" = false);
ALTER OK, -1 rows affected (... sec)

Read Alter Partitioned Tables to see how to alter parameters of partitioned tables.

Adding columns

In order to add a column to an existing table use ALTER TABLE with the ADD COLUMN clause:

cr> alter table my_table add column new_column_name text;
ALTER OK, -1 rows affected (... sec)

The inner schema of object columns can also be extended, as shown in the following example.

First a column of type object is added:

cr> alter table my_table add column obj_column object as (age int);
ALTER OK, -1 rows affected (... sec)

And now a nested column named name is added to the obj_column:

cr> alter table my_table add column obj_column['name'] text;
ALTER OK, -1 rows affected (... sec)

cr> select column_name, data_type from information_schema.columns
... where table_name = 'my_table' and column_name like 'obj_%';
+--------------------+-----------+
| column_name        | data_type |
+--------------------+-----------+
| obj_column         | object    |
| obj_column['age']  | integer   |
| obj_column['name'] | text      |
+--------------------+-----------+
SELECT 3 rows in set (... sec)

Closing and opening tables

A table can be closed by using ALTER TABLE with the CLOSE clause:

cr> alter table my_table close;
ALTER OK, -1 rows affected (... sec)

Closing a table will cause all operations beside ALTER TABLE ... OPEN to fail.

A table can be reopened again by using ALTER TABLE with the OPEN clause:

cr> alter table my_table open;
ALTER OK, -1 rows affected (... sec)

Note

This setting is not the same as blocks.read_only. Closing and opening a table will preserve these settings if they are already set.

Renaming tables

A table can be renamed by using ALTER TABLE with the RENAME TO clause:

cr> alter table my_table rename to my_new_table;
ALTER OK, -1 rows affected (... sec)

During the rename operation the shards of the table become temporarily unavailable.

Reroute shards

With the REROUTE command it is possible to control the allocations of shards. This gives you the ability to re-balance the cluster state manually. The supported reroute options are listed in the reference documentation of ALTER TABLE REROUTE.

Shard rerouting can help solve several problems:

Unassigned shards: Due to cause of lack of space, shard awareness or any other failure that happens during the automatic shard allocation it is possible to gain unassigned shards in the cluster.

“Hot Shards”: Most of your queries affect certain shards only. These shards lie on a node that has insufficient resources.

This command takes these Routing Allocation Settings into account. Once an allocation occurs CrateDB tries (by default) to re-balance shards to an even state. CrateDB can be set to disable shard re-balancing with the setting cluster.routing.rebalance.enable=None to perform only the explicit triggered allocations. .

Note

The command only triggers the allocation and reports back if the process has been acknowledged or rejected. Moving or allocating large shards takes more time to complete.

In those two cases it may be necessary to move shards manually to another node or force the retry of the allocation process.