Managing replication nodes
ALTER CLUSTER <cluster_name> UPDATE nodes
statement updates node lists on each node of the cluster to include every active node in the cluster. See Joining a cluster for more info on node lists.
- SQL
- JSON
- PHP
- Python
- javascript
- Java
SQL JSON PHP Python javascript Java
ALTER CLUSTER posts UPDATE nodes
POST /cli -d "
ALTER CLUSTER posts UPDATE nodes
"
$params = [
'cluster' => 'posts',
'body' => [
'operation' => 'update',
]
];
$response = $client->cluster()->alter($params);
utilsApi.sql('ALTER CLUSTER posts UPDATE nodes')
res = await utilsApi.sql('ALTER CLUSTER posts UPDATE nodes');
utilsApi.sql("ALTER CLUSTER posts UPDATE nodes");
Response
{u'error': u'', u'total': 0, u'warning': u''}
{"total":0,"error":"","warning":""}
For example, when the cluster was initially created, the list of nodes used for rejoining the cluster was 10.10.0.1:9312,10.10.1.1:9312
. Since then other nodes joined the cluster and now we have the following active nodes: 10.10.0.1:9312,10.10.1.1:9312,10.15.0.1:9312,10.15.0.3:9312
.
But the list of nodes used for rejoining the cluster is still the same. Running the ALTER CLUSTER ... UPDATE nodes
copies the list of active nodes to the list of nodes used to rejoin on restart. After this, the list of nodes used on restart includes all the active nodes in the cluster.
Both lists of nodes can be viewed using Cluster status statement (cluster_post_nodes_set
and cluster_post_nodes_view
).
Removing node from cluster
To remove a node from a replication cluster you need to:
- stop the node
- remove info about the cluster in
<data_dir>/manticore.json
(/var/lib/manticore/manticore.json
in most cases) on the node you’ve stopped - run
ALTER CLUSTER cluster_name UPDATE nodes
on any other node
After this the other nodes will forget about the detached node and the node will forget about the cluster. It won’t impact tables neither in the cluster nor on the detached node.
Replication cluster status
Node status outputs, among other information, cluster status variables.
The output format is cluster_name_variable_name
variable_value
. Most of them are described in Galera Documentation Status Variables. Additionally we display:
- cluster_name - name of the cluster
- node_state - current state of the node:
closed
,destroyed
,joining
,donor
,synced
- indexes_count - number of tables managed by the cluster
- indexes - list of table names managed by the cluster
- nodes_set - list of nodes in the cluster defined with cluster
CREATE
,JOIN
orALTER UPDATE
commands - nodes_view - actual list of nodes in cluster which this node sees
- SQL
- JSON
- PHP
- Python
- javascript
- Java
SQL JSON PHP Python javascript Java
SHOW STATUS
POST /cli -d "
SHOW STATUS
"
$params = [
'body' => []
];
$response = $client->nodes()->status($params);
utilsApi.sql('SHOW STATUS')
res = await utilsApi.sql('SHOW STATUS');
utilsApi.sql("SHOW STATUS");
Response
+----------------------------+-------------------------------------------------------------------------------------+
| Counter | Value |
+----------------------------+-------------------------------------------------------------------------------------+
| cluster_name | post |
| cluster_post_state_uuid | fba97c45-36df-11e9-a84e-eb09d14b8ea7 |
| cluster_post_conf_id | 1 |
| cluster_post_status | primary |
| cluster_post_size | 5 |
| cluster_post_local_index | 0 |
| cluster_post_node_state | synced |
| cluster_post_indexes_count | 2 |
| cluster_post_indexes | pq1,pq_posts |
| cluster_post_nodes_set | 10.10.0.1:9312 |
| cluster_post_nodes_view | 10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication |
"
{"columns":[{"Counter":{"type":"string"}},{"Value":{"type":"string"}}],
"data":[
{"Counter":"cluster_name", "Value":"post"},
{"Counter":"cluster_post_state_uuid", "Value":"fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
{"Counter":"cluster_post_conf_id", "Value":"1"},
{"Counter":"cluster_post_status", "Value":"primary"},
{"Counter":"cluster_post_size", "Value":"5"},
{"Counter":"cluster_post_local_index", "Value":"0"},
{"Counter":"cluster_post_node_state", "Value":"synced"},
{"Counter":"cluster_post_indexes_count", "Value":"2"},
{"Counter":"cluster_post_indexes", "Value":"pq1,pq_posts"},
{"Counter":"cluster_post_nodes_set", "Value":"10.10.0.1:9312"},
{"Counter":"cluster_post_nodes_view", "Value":"10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"}
],
"total":0,
"error":"",
"warning":""
}
(
"cluster_name" => "post",
"cluster_post_state_uuid" => "fba97c45-36df-11e9-a84e-eb09d14b8ea7",
"cluster_post_conf_id" => 1,
"cluster_post_status" => "primary",
"cluster_post_size" => 5,
"cluster_post_local_index" => 0,
"cluster_post_node_state" => "synced",
"cluster_post_indexes_count" => 2,
"cluster_post_indexes" => "pq1,pq_posts",
"cluster_post_nodes_set" => "10.10.0.1:9312",
"cluster_post_nodes_view" => "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"
)
{u'columns': [{u'Key': {u'type': u'string'}},
{u'Value': {u'type': u'string'}}],
u'data': [
{u'Key': u'cluster_name', u'Value': u'post'},
{u'Key': u'cluster_post_state_uuid', u'Value': u'fba97c45-36df-11e9-a84e-eb09d14b8ea7'},
{u'Key': u'cluster_post_conf_id', u'Value': u'1'},
{u'Key': u'cluster_post_status', u'Value': u'primary'},
{u'Key': u'cluster_post_size', u'Value': u'5'},
{u'Key': u'cluster_post_local_index', u'Value': u'0'},
{u'Key': u'cluster_post_node_state', u'Value': u'synced'},
{u'Key': u'cluster_post_indexes_count', u'Value': u'2'},
{u'Key': u'cluster_post_indexes', u'Value': u'pq1,pq_posts'},
{u'Key': u'cluster_post_nodes_set', u'Value': u'10.10.0.1:9312'},
{u'Key': u'cluster_post_nodes_view', u'Value': u'10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication'}],
u'error': u'',
u'total': 0,
u'warning': u''}
{"columns": [{"Key": {"type": "string"}},
{"Value": {"type": "string"}}],
"data": [
{"Key": "cluster_name", "Value": "post"},
{"Key": "cluster_post_state_uuid", "Value": "fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
{"Key": "cluster_post_conf_id", "Value": "1"},
{"Key": "cluster_post_status", "Value": "primary"},
{"Key": "cluster_post_size", "Value": "5"},
{"Key": "cluster_post_local_index", "Value": "0"},
{"Key": "cluster_post_node_state", "Value": "synced"},
{"Key": "cluster_post_indexes_count", "Value": "2"},
{"Key": "cluster_post_indexes", "Value": "pq1,pq_posts"},
{"Key": "cluster_post_nodes_set", "Value": "10.10.0.1:9312"},
{"Key": "cluster_post_nodes_view", "Value": "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"}],
"error": "",
"total": 0,
"warning": ""}
{columns=[{ Key : { type=string }},
{ Value : { type=string }}],
data : [
{ Key=cluster_name, Value=post},
{ Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
{ Key=cluster_post_conf_id, Value=1},
{ Key=cluster_post_status, Value=primary},
{ Key=cluster_post_size, Value=5},
{ Key=cluster_post_local_index, Value=0},
{ Key=cluster_post_node_state, Value=synced},
{ Key=cluster_post_indexes_count, Value=2},
{ Key=cluster_post_indexes, Value=pq1,pq_posts},
{ Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
{ Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication}],
error= ,
total=0,
warning= }
Restarting a cluster
A multi-master replication cluster requires its single node to be started as a reference point before all the other nodes join it and form a cluster. This is called cluster bootstrapping which introduces a primary component
before others see that as a reference point to sync up the data from. The restart of a single node or reconnecting from a node after a shutdown can be done as usual.
After the whole cluster shutdown the server that was stopped last should be started first with --new-cluster
command line option (or run manticore_new_cluster
to start it via systemd). To make sure that the server is able to start as a reference point the grastate.dat
file located at the cluster path should be updated with the value of 1
for safe_to_bootstrap
option. I.e., both conditions, --new-cluster
and safe_to_bootstrap=1
, must be satisfied. An attempt to start any other node without these options set will trigger an error. To override this protection and start cluster from another server forcibly --new-cluster-force
command line option may be used. You can alternatively run manticore_new_cluster --force
to use systemd for that.
In case of a hard crash or an unclean shutdown of all the servers in the cluster you need to identify the most advanced node that has the largest seqno
in the grastate.dat
file located at the cluster path and start that server with the command line key --new-cluster-force
.