Orchestrate CockroachDB with Mesosphere DC/OS (Insecure)
This page shows you how to orchestrate the deployment and management of an insecure 3-node CockroachDB cluster with Mesosphere DC/OS.
Warning:
Deploying an insecure cluster is not recommended for data in production. We'll update this page once it's possible to orchestrate a secure cluster with Mesosphere DC/OS.
Before you begin
Before getting started, it's important to review some current requirements and limitations.
Requirements
- Your cluster must have at least 3 private nodes.
- If you are using Enterprise DC/OS, you may need to provision a service account before installing CockroachDB. Only someone with
superuser
permission can create the service account.
Security ModeService Accountstrict
Requiredpermissive
Optionaldisabled
Not Required
Limitations
CockroachDB in DC/OS works the same as in other environments with the exception of the following limitations:
- The
cockroachdb
DC/OS service has been tested only on DC/OS versions 1.9 and 1.10. - Running in secure mode is not supported at this time.
- Running a multi-datacenter cluster is not supported at this time.
- Removing a node is not supported at this time.
- Neither volume type nor volume size requirements may be changed after initial deployment.
- Rack placement and awareness are not supported at this time.
Step 1. Install and Launch DC/OS
The fastest way to get up and running is to use the open source DC/OS template on AWS CloudFormation. However, you can find details about other open source or enterprise DC/OS installation methods in the official documentation:
- Open Source DC/OS
- Enterprise DC/OS
When using AWS CloudFormation, the launch process generally takes 10 to 15 minutes. Once you see theCREATE_COMPLETE
status in the CloudFormation UI, be sure to launch DC/OS and install the DC/OS CLI.
Step 2. Start CockroachDB
- Review the default CockroachDB configuration:
$ dcos package describe --config cockroachdb
The default CockroachDB configuration creates a 3-node CockroachDB cluster with reasonable defaults, but you may require different settings depending on the context of your deployment. To customize the settings for your deployment, create a cockroach.json
file based on the output of the command above. Be sure to consider our Recommended Production Settings.
Start the CockroachDB cluster as a DC/OS service.
- If you are using the default configuration, run:
$ dcos package install cockroachdb
- If you created a custom
cockroach.json
configuration, run:
$ dcos package install cockroachdb --options=cockroach.json
- Monitor the cluster's deployment from the Services tab of the DC/OS UI.
Note:
You can install CockroachDB from the DC/OS UI as well.
Step 3. Test the cluster
- Discover the endpoints for your CockroachDB cluster:
$ dcos cockroachdb endpoints pg
{
"address": [
"10.0.0.212:26257",
"10.0.2.57:26257",
"10.0.3.81:26257"
],
"dns": [
"cockroachdb-0-node-init.cockroachdb.autoip.dcos.thisdcos.directory:26257",
"cockroachdb-1-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257",
"cockroachdb-2-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257"
],
"vip": "pg.cockroachdb.l4lb.thisdcos.directory:26257"
}
The endpoints returned will include:
.mesos
hostnames for each instance that will follow the instances if they're moved within the DC/OS cluster.- A direct IP address for each instance, if
.mesos
hostnames are not resolvable. - A
vip
address, which is an HA-enabled hostname for accessing any of the instances. You'll use thevip
address in some of the next steps.
In general, the.mesos
endpoints will only work from within the same DC/OS cluster. From outside the cluster, you can either use the direct IPs or set up a proxy service that acts as a frontend to your CockroachDB instance. For development and testing purposes, you can use a DC/OS tunnel to access services from outside the cluster, but this option is not suitable for production use. See monitor the cluster below for more details.
$ dcos node ssh --master-proxy --leader
- Start a temporary container and open the built-in SQL shell inside it, using the
vip
endpoint as the—host
:
$ docker run -it cockroachdb/cockroach:v19.1.0 sql --insecure --host=pg.cockroachdb.l4lb.thisdcos.directory
# Welcome to the cockroach SQL interface.
# All statements must be terminated by a semicolon.
# To exit: CTRL + D.
root@pg.cockroachdb.l4lb.thisdcos.directory:26257/>
- Run some basic CockroachDB SQL statements:
> CREATE DATABASE bank;
> CREATE TABLE bank.accounts (id INT PRIMARY KEY, balance DECIMAL);
> INSERT INTO bank.accounts VALUES (1, 1000.50);
> SELECT * FROM bank.accounts;
+----+---------+
| id | balance |
+----+---------+
| 1 | 1000.5 |
+----+---------+
(1 row)
- Exit the SQL shell and delete the temporary pod:
> \q
Step 4. Monitor the cluster
To access the cluster's Admin UI, you can use a DC/OS tunnel to run an HTTP proxy:
- Install the DC/OS tunnel package:
$ dcos package install tunnel-cli --cli
- Start a DC/OS tunnel:
$ sudo dcos tunnel http
- In a browser, go to http://http.cockroachdb.l4lb.thisdcos.directory.mydcos.directory.
Step 5. Scale the cluster
The default cockroachdb
service creates a 3-node CockroachDB cluster. You can add nodes to the cluster after the service has been launched by updating the Scheduler process:
- In the DC/OS UI, go to the Services table.
- Select the cockroachdb service.
- In the upper right, select Edit.
- Select Environment.
- Update the
NODE_COUNT
variable to match the number of CockroachDB nodes you want. - Click Review & Run and then Run Service.
The Scheduler process will restart with the new configuration and will validate any detected changes. To check that nodes were successfully added to the cluster, go back to the Admin UI, view Node List, and check for the new nodes.
Alternately, you can SSH to the DC/OS master node and then run the cockroach node status
command in a temporary container, again using the vip
endpoint as the —host
:
$ dcos node ssh --master-proxy --leader
$ docker run -it cockroachdb/cockroach:v19.1.0 node status --all --insecure --host=pg.cockroachdb.l4lb.thisdcos.directory
+----+--------------------------------------------------------------------------+----------------------+---------------------+---------------------+------------+-----------+-------------+--------------+--------------+------------------+-----------------------+--------+--------------------+-----------------------+
| id | address | build | updated_at | started_at | live_bytes | key_bytes | value_bytes | intent_bytes | system_bytes | replicas_leaders | replicas_leaseholders | ranges | ranges_unavailable | ranges_underreplicated |
+----+--------------------------------------------------------------------------+----------------------+---------------------+---------------------+------------+-----------+-------------+--------------+--------------+------------------+-----------------------+--------+--------------------+-----------------------+
| 1 | cockroachdb-0-node-init.cockroachdb.autoip.dcos.thisdcos.directory:26257 | v19.1.0 | 2017-12-11 20:59:12 | 2017-12-11 19:14:42 | 41183973 | 1769 | 41187432 | 0 | 6018 | 2 | 2 | 2 | 0 | 0 |
| 2 | cockroachdb-1-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257 | v19.1.0 | 2017-12-11 20:59:12 | 2017-12-11 19:14:52 | 115448 | 71037 | 209282 | 0 | 6218 | 4 | 4 | 4 | 0 | 0 |
| 3 | cockroachdb-2-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257 | v19.1.0 | 2017-12-11 20:59:03 | 2017-12-11 19:14:53 | 120325 | 72652 | 217422 | 0 | 6732 | 4 | 3 | 4 | 0 | 0 |
| 4 | cockroachdb-3-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257 | v19.1.0 | 2017-12-11 20:59:03 | 2017-12-11 20:21:43 | 41248030 | 79147 | 41338632 | 0 | 6569 | 1 | 1 | 1 | 0 | 0 |
| 5 | cockroachdb-4-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257 | v19.1.0 | 2017-12-11 20:59:04 | 2017-12-11 20:56:54 | 41211967 | 30550 | 41181417 | 0 | 6854 | 1 | 1 | 1 | 0 | 0 |
+----+--------------------------------------------------------------------------+--------+---------------------+---------------------+------------+-----------+-------------+--------------+--------------+------------------+-----------------------+--------+--------------------+-------------+-----------------------+
(5 rows)
Step 6. Maintain the cluster
Choose the relevant maintenance task:
Update configurations
In addition to adding nodes, you can change the CPU and Memory requirements for nodes, update placement constraints, and make changes to other service settings. Just follow the instructions in the previous step, but change the environment variable for the update you want to make.
- After making a change, the scheduler will restart and automatically deploy any detected changes to the service, one node at a time. For example, a given change will first be applied to
cockroachdb-0
, thencockroachdb-1
, and so on. - Nodes are configured with a "Readiness check" to ensure that the underlying service appears to be in a healthy state before applying a given change to the next node in the sequence. However, this basic check is not foolproof and reasonable care should be taken to ensure that a given configuration change will not negatively affect the behavior of the service.
Restart a Node
You can restart a node while keeping it at its current location with its current persistent volume data. This may be thought of as similar to restarting a system process, but it also deletes any data that is not on a persistent volume.
- Get the pod name for the node you want to restart:
$ dcos cockroachdb pod list
[
"cockroachdb-0",
"cockroachdb-1",
"cockroachdb-2",
"cockroachdb-3",
"cockroachdb-4",
"metrics-0"
]
- Restart the relevant pod:
$ dcos cockroachdb pods restart cockroachdb-<NUM>
Replace a Node
You can move a node to a new system and discard the persistent volumes at the prior system to be rebuilt at the new system. Nodes are not moved automatically, so this step must be performed, for example, before a system is offlined or when a system has already been offlined.
- Get the pod name for the node you want to restart:
$ dcos cockroachdb pod list
[
"cockroachdb-0",
"cockroachdb-1",
"cockroachdb-2",
"cockroachdb-3",
"cockroachdb-4",
"metrics-0"
]
- Stop and restart the pod at a new location in the DC/OS cluster:
$ dcos cockroachdb pods replace cockroachdb-<NUM>
Troubleshoot (access logs)
Logs for the Scheduler and service (i.e., CockroachDB) can be viewed from the DC/OS web interface.
- Scheduler logs are useful for determining why a node isn't being launched (this is under the purview of the Scheduler).
- Node logs are useful for examining problems in the service itself, i.e., CockroachDB.
In all cases, logs are generally piped to files named stdout and/or stderr.
To view logs for a given node:
- In the DC/OS UI, go to the Services table.
- Select the cockroachdb service.
- In the list of tasks for the service, select the task to be examined. The Scheduler is named after the service, and nodes are
cockroachdb-0-node-init
orcockroachdb-#-node-join
. - In the task details, go to the Logs tab.
Backup and restore
The cockroachdb
DC/OS service provides an easy way use CockroachDB's open source cockroach dump
command to back up data on a per-database basis to an S3 bucket and to restore data from such a backup. Note that using datastores other than S3 is not yet supported.
Tip:
If you need to back up to/restore from datasources other than S3, or you have a very large database and need faster backups, incremental backups, or a faster, distributed restore process, consider contacting Cockroach Labs about an enterprise license.
Backup
To backup the tables in a database, run the following command:
$ dcos cockroachdb backup [<flags>] <database> <s3-bucket>
You can configure the communication with S3 using the following optional flags:
Flag | Description |
---|---|
—aws-access-key | AWS Access Key |
—aws-secret-key | AWS Secret Key |
—s3-dir | AWS S3 target path |
—s3-backup-dir | Target path within s3-dir |
—region | AWS region |
By default, the AWS access and secret keys will be pulled from your environment via the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variables, respectively. You must either have these environment variables defined or specify the flags for the backup to work.
Make sure that you provision your nodes with enough disk space to perform a backup. The backups are stored on disk before being uploaded to S3, and will take up as much space as the data currently in the tables, so you'll need half of your total available space to be free to backup every keyspace at once.
Restore
To restore cluster data, run the following command:
$ dcos cockroachdb restore [<flags>] <database> <s3-bucket> <s3-backup-dir>
You can configure the communication with S3 using the following optional flags to the CLI command:
Flag | Description |
---|---|
—aws-access-key | AWS Access Key |
—aws-secret-key | AWS Secret Key |
—s3-dir | AWS S3 target path |
—s3-backup-dir | Target path within s3-dir |
—region | AWS region |
By default, the AWS access and secret keys will be pulled from your environment via the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variables, respectively. You must either have these environment variables defined or specify the flags for the backup to work.
Step 7. Stop the cluster
To shut down the CockroachDB cluster:
- Uninstall the
cockroachdb
service:
$ MY_SERVICE_NAME=cockroachdb
$ dcos package uninstall --app-id=$MY_SERVICE_NAME $MY_SERVICE_NAME
- If you're using DC/OS version 1.9, use the following command to clean up remaining reserved resources with the framework cleaner script,
janitor.py
. Note that this step is not needed for DC/OS 1.10.
$ dcos node ssh --master-proxy --leader "docker run mesosphere/janitor /janitor.py \
-r $MY_SERVICE_NAME-role \
-p $MY_SERVICE_NAME-principal \
-z dcos-service-$MY_SERVICE_NAME"
- Uninstall DC/OS. If you used AWS CloudFormation, see Uninstalling DC/OS on AWS EC2.