Manually Upgrading a Cluster Deployment

Manually Upgrading a Cluster Deployment

This page will guide you through the process of a manual upgrade of a cluster setup. The different nodes in a cluster can be upgraded one at a time without incurring downtime of the cluster and very short downtimes of the single nodes.

The manual upgrade procedure described in this Section can be used to upgrade to a new hotfix, or to perform an upgrade to a new minor version of ArangoDB. Please refer to the Upgrade Paths section for detailed information.

It is highly recommended to upgrade 3.6.x deployments to at least 3.6.15 and 3.7.x deployments to at least 3.7.13 because of a technical problem, see Technical Alert #6.

Preparations

The ArangoDB installation packages (e.g. for Debian or Ubuntu) set up a convenient standalone instance of arangod. During installation, this instance’s database will be upgraded (see --database.auto-upgrade) and the service will be (re)started.

You have to make sure that your cluster deployment is independent of this standalone instance. Specifically, make sure that the database directory as well as the socket used by the standalone instance provided by the package are separate from the ones in your cluster configuration. Also, that you haven’t modified the init script or systemd unit file for the standalone instance in way that it would start or stop your cluster instance instead.

You can read about the details on how to deploy your cluster independently of the standalone instance in the cluster deployment preliminary.

In the following, we assume that you don’t use the standalone instance from the package but only a manually started cluster instance, and we will move the standalone instance out of the way if necessary so you have to make as little changes as possible to the running cluster.

Install the new ArangoDB version binary

The first step is to install the new ArangoDB package.

Note: you do not have to stop the cluster (arangod) processes before upgrading it.

For example, if you want to upgrade to 3.7.13 on Debian or Ubuntu, either call

$ apt install arangodb=3.7.13

(apt-get on older versions) if you have added the ArangoDB repository. Or install a specific package using

$ dpkg -i arangodb3-3.7.13-1_amd64.deb

after you have downloaded the corresponding file from download.arangodb.com.

Stop the Standalone Instance

As the package will automatically start the standalone instance, you might want to stop it now, as otherwise this standalone instance that is started on your machine can create some confusion later. As you are starting the cluster processes manually you do not need this standalone instance, and you can hence stop it:

$ service arangodb3 stop

Also, you might want to remove the standalone instance from the default runlevels to prevent it to start on the next reboot of your machine. How this is done depends on your distribution and init system. For example, on older Debian and Ubuntu systems using a SystemV-compatible init, you can use:

$ update-rc.d -f arangodb3 remove

Set supervision in maintenance mode

It is required to disable cluster supervision in order to upgrade your cluster. The following API calls will activate and de-activate the Maintenance mode of the Supervision job:

You might use curl to send the API call.

Activate Maintenance mode

curl -u username:password <coordinator>/_admin/cluster/maintenance -XPUT -d'"on"'

For Example:

curl http://localhost:7002/_admin/cluster/maintenance -XPUT -d'"on"'
{"error":false,"warning":"Cluster supervision deactivated. 
It will be reactivated automatically in 60 minutes unless this call is repeated until then."}

Note: In case the manual upgrade takes longer than 60 minutes, the API call has to be resend.

Deactivate Maintenance mode

The cluster supervision reactivates 60 minutes after disabling it. It can be manually reactivated by the following API call:

curl -u username:password <coordinator>/_admin/cluster/maintenance -XPUT -d'"off"'

For example:

curl http://localhost:7002/_admin/cluster/maintenance -XPUT -d'"off"'
{"error":false,"warning":"Cluster supervision reactivated."}

Upgrade the cluster processes

Now all the cluster (Agents, DB-Servers and Coordinators) processes (arangod) have to be upgraded on each node.

Note: The maintenance mode has to be activated.

In order to stop the arangod processes we will need to use a command like kill -15:

kill -15 <pid-of-arangod-process>

The pid associated to your cluster can be checked using a command like ps:

ps -C arangod -fww

The output of the command above does not only show the PID’s of all arangod processes but also the used commands, which can be useful for the following restart of all arangod processes.

The output below is from a test machine where three Agents, two DB-Servers and two Coordinators are running locally. In a more production-like scenario, you will find only one instance of each one running:

ps -C arangod -fww
UID        PID  PPID  C STIME TTY          TIME CMD
max      29075  8072  0 13:50 pts/2    00:00:42 arangod --server.endpoint tcp://0.0.0.0:5001 --agency.my-address=tcp://127.0.0.1:5001 --server.authentication false --agency.activate true --agency.size 3 --agency.endpoint tcp://127.0.0.1:5001 --agency.supervision true --log.file a1 --javascript.app-path /tmp --database.directory agent1
max      29208  8072  2 13:51 pts/2    00:02:08 arangod --server.endpoint tcp://0.0.0.0:5002 --agency.my-address=tcp://127.0.0.1:5002 --server.authentication false --agency.activate true --agency.size 3 --agency.endpoint tcp://127.0.0.1:5001 --agency.supervision true --log.file a2 --javascript.app-path /tmp --database.directory agent2
max      29329 16224  0 13:51 pts/3    00:00:42 arangod --server.endpoint tcp://0.0.0.0:5003 --agency.my-address=tcp://127.0.0.1:5003 --server.authentication false --agency.activate true --agency.size 3 --agency.endpoint tcp://127.0.0.1:5001 --agency.supervision true --log.file a3 --javascript.app-path /tmp --database.directory agent3
max      29461 16224  1 13:53 pts/3    00:01:11 arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:6001 --cluster.my-address tcp://127.0.0.1:6001 --cluster.my-role PRIMARY --cluster.agency-endpoint tcp://127.0.0.1:5001 --cluster.agency-endpoint tcp://127.0.0.1:5002 --cluster.agency-endpoint tcp://127.0.0.1:5003 --log.file db1 --javascript.app-path /tmp --database.directory dbserver1
max      29596  8072  0 13:54 pts/2    00:00:56 arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:6002 --cluster.my-address tcp://127.0.0.1:6002 --cluster.my-role PRIMARY --cluster.agency-endpoint tcp://127.0.0.1:5001 --cluster.agency-endpoint tcp://127.0.0.1:5002 --cluster.agency-endpoint tcp://127.0.0.1:5003 --log.file db2 --javascript.app-path /tmp --database.directory dbserver2
max      29824 16224  1 13:55 pts/3    00:01:53 arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:7001 --cluster.my-address tcp://127.0.0.1:7001 --cluster.my-role COORDINATOR --cluster.agency-endpoint tcp://127.0.0.1:5001 --cluster.agency-endpoint tcp://127.0.0.1:5002 --cluster.agency-endpoint tcp://127.0.0.1:5003 --log.file c1 --javascript.app-path /tmp --database.directory coordinator1
max      29938 16224  2 13:56 pts/3    00:02:13 arangod --server.authentication=false --server.endpoint tcp://0.0.0.0:7002 --cluster.my-address tcp://127.0.0.1:7002 --cluster.my-role COORDINATOR --cluster.agency-endpoint tcp://127.0.0.1:5001 --cluster.agency-endpoint tcp://127.0.0.1:5002 --cluster.agency-endpoint tcp://127.0.0.1:5003 --log.file c2 --javascript.app-path /tmp --database.directory coordinator2

Upgrade a cluster node

The following procedure is upgrading Agent, DB-Server and Coordinator on one node.

Note: The starting commands of Agent, DB-Server and Coordinator have to be reused.

Stop the Agent

kill -15 <pid-of-agent>

Upgrade the Agent

The arangod process of the Agent has to be upgraded using the same command that has been used before with the additional option:

--database.auto-upgrade=true

The Agent will stop automatically after the upgrade.

Restart the Agent

The arangod process of the Agent has to be restarted using the same command that has been used before (without the additional option).

Stop the DB-Server

kill -15 <pid-of-dbserver>

Upgrade the DB-Server

The arangod process of the DB-Server has to be upgraded using the same command that has been used before with the additional option:

--database.auto-upgrade=true

The DB-Server will stop automatically after the upgrade.

Restart the DB-Server

The arangod process of the DB-Server has to be restarted using the same command that has been used before (without the additional option).

Stop the Coordinator

kill -15 <pid-of-coordinator>

Upgrade the Coordinator

The arangod process of the Coordinator has to be upgraded using the same command that has been used before with the additional option:

--database.auto-upgrade=true

The Coordinator will stop automatically after the upgrade.

Restart the Coordinator

The arangod process of the Coordinator has to be restarted using the same command that has been used before (without the additional option).

After repeating this process on every node all Agents, DB-Servers and Coordinators are upgraded and the manual upgrade has successfully finished.

The cluster supervision is reactivated by the API call:

curl -u username:password <coordinator>/_admin/cluster/maintenance -XPUT -d'"off"'