System Information
CrateDB provides the sys
schema which contains virtual tables. These tables are read-only and can be queried to get statistical real-time information about the cluster, its nodes and their shards:
Table of Contents
- Cluster
- Nodes
- Node Checks
- Shards
- Jobs, Operations, and Logs
- Cluster Checks
- Health
- Repositories
- Snapshots
- Summits
- Users
- Allocations
- Shard Table Permissions
Cluster
Basic information about the CrateDB cluster can be retrieved from the sys.cluster
table:
Name | Description | Return Type |
---|---|---|
id | A unique ID generated by the system. | STRING |
license | The current CrateDB license information. | OBJECT |
name | The cluster name. | STRING |
master_node | Node ID of the node which currently operates as master | STRING |
settings | The cluster settings. | OBJECT |
The result has at most 1 row:
cr> select name from sys.cluster;
+--------------+
| name |
+--------------+
| Testing... |
+--------------+
SELECT 1 row in set (... sec)
Cluster License
The sys.cluster.license
expression returns information about the currently registered license.
license
Column Name | Description | Return Type |
---|---|---|
license | The current CrateDB license information or NULL on CrateDB CE. | OBJECT |
license[‘expiry_date’] | The timestamp on which the license expires. | TIMESTAMP |
license[‘issued_to’] | The organisation for which the license is issued. | STRING |
license[‘max_nodes’] | The maximum number of nodes the license is valid for. | INTEGER |
Cluster Settings
The sys.cluster.settings
expression returns information about the currently applied cluster settings.
cr> select settings from sys.cluster;
+-----------------------------------------------------------------------------------------------------------------------------------------------------...-+
| settings |
+-----------------------------------------------------------------------------------------------------------------------------------------------------...-+
| {"bulk": {...}, "cluster": {...}, "discovery": {...}, "gateway": {...}, "indices": {...}, "license": {...}, "logger": [], "stats": {...}, "udc": {...}} |
+-----------------------------------------------------------------------------------------------------------------------------------------------------...-+
SELECT 1 row in set (... sec)
cr> select column_name, data_type from information_schema.columns
... where column_name like 'settings%'
... and table_name = 'cluster';
+-----------------------------------------------------------------------------------+--------------+
| column_name | data_type |
+-----------------------------------------------------------------------------------+--------------+
| settings | object |
| settings['bulk'] | object |
| settings['bulk']['request_timeout'] | string |
| settings['cluster'] | object |
| settings['cluster']['graceful_stop'] | object |
| settings['cluster']['graceful_stop']['force'] | boolean |
| settings['cluster']['graceful_stop']['min_availability'] | string |
| settings['cluster']['graceful_stop']['reallocate'] | boolean |
| settings['cluster']['graceful_stop']['timeout'] | string |
| settings['cluster']['info'] | object |
| settings['cluster']['info']['update'] | object |
| settings['cluster']['info']['update']['interval'] | string |
| settings['cluster']['routing'] | object |
| settings['cluster']['routing']['allocation'] | object |
| settings['cluster']['routing']['allocation']['allow_rebalance'] | string |
| settings['cluster']['routing']['allocation']['balance'] | object |
| settings['cluster']['routing']['allocation']['balance']['index'] | float |
| settings['cluster']['routing']['allocation']['balance']['shard'] | float |
| settings['cluster']['routing']['allocation']['balance']['threshold'] | float |
| settings['cluster']['routing']['allocation']['cluster_concurrent_rebalance'] | integer |
| settings['cluster']['routing']['allocation']['disk'] | object |
| settings['cluster']['routing']['allocation']['disk']['threshold_enabled'] | boolean |
| settings['cluster']['routing']['allocation']['disk']['watermark'] | object |
| settings['cluster']['routing']['allocation']['disk']['watermark']['flood_stage'] | string |
| settings['cluster']['routing']['allocation']['disk']['watermark']['high'] | string |
| settings['cluster']['routing']['allocation']['disk']['watermark']['low'] | string |
| settings['cluster']['routing']['allocation']['enable'] | string |
| settings['cluster']['routing']['allocation']['exclude'] | object |
| settings['cluster']['routing']['allocation']['exclude']['_host'] | string |
| settings['cluster']['routing']['allocation']['exclude']['_id'] | string |
| settings['cluster']['routing']['allocation']['exclude']['_ip'] | string |
| settings['cluster']['routing']['allocation']['exclude']['_name'] | string |
| settings['cluster']['routing']['allocation']['include'] | object |
| settings['cluster']['routing']['allocation']['include']['_host'] | string |
| settings['cluster']['routing']['allocation']['include']['_id'] | string |
| settings['cluster']['routing']['allocation']['include']['_ip'] | string |
| settings['cluster']['routing']['allocation']['include']['_name'] | string |
| settings['cluster']['routing']['allocation']['node_concurrent_recoveries'] | integer |
| settings['cluster']['routing']['allocation']['node_initial_primaries_recoveries'] | integer |
| settings['cluster']['routing']['allocation']['require'] | object |
| settings['cluster']['routing']['allocation']['require']['_host'] | string |
| settings['cluster']['routing']['allocation']['require']['_id'] | string |
| settings['cluster']['routing']['allocation']['require']['_ip'] | string |
| settings['cluster']['routing']['allocation']['require']['_name'] | string |
| settings['cluster']['routing']['rebalance'] | object |
| settings['cluster']['routing']['rebalance']['enable'] | string |
| settings['discovery'] | object |
| settings['discovery']['zen'] | object |
| settings['discovery']['zen']['minimum_master_nodes'] | integer |
| settings['discovery']['zen']['ping_timeout'] | string |
| settings['discovery']['zen']['publish_timeout'] | string |
| settings['gateway'] | object |
| settings['gateway']['expected_nodes'] | integer |
| settings['gateway']['recover_after_nodes'] | integer |
| settings['gateway']['recover_after_time'] | string |
| settings['indices'] | object |
| settings['indices']['breaker'] | object |
| settings['indices']['breaker']['fielddata'] | object |
| settings['indices']['breaker']['fielddata']['limit'] | string |
| settings['indices']['breaker']['fielddata']['overhead'] | double |
| settings['indices']['breaker']['query'] | object |
| settings['indices']['breaker']['query']['limit'] | string |
| settings['indices']['breaker']['query']['overhead'] | double |
| settings['indices']['breaker']['request'] | object |
| settings['indices']['breaker']['request']['limit'] | string |
| settings['indices']['breaker']['request']['overhead'] | double |
| settings['indices']['recovery'] | object |
| settings['indices']['recovery']['internal_action_long_timeout'] | string |
| settings['indices']['recovery']['internal_action_timeout'] | string |
| settings['indices']['recovery']['max_bytes_per_sec'] | string |
| settings['indices']['recovery']['recovery_activity_timeout'] | string |
| settings['indices']['recovery']['retry_delay_network'] | string |
| settings['indices']['recovery']['retry_delay_state_sync'] | string |
| settings['license'] | object |
| settings['license']['enterprise'] | boolean |
| settings['license']['ident'] | string |
| settings['logger'] | object_array |
| settings['logger']['level'] | string |
| settings['logger']['name'] | string |
| settings['stats'] | object |
| settings['stats']['breaker'] | object |
| settings['stats']['breaker']['log'] | object |
| settings['stats']['breaker']['log']['jobs'] | object |
| settings['stats']['breaker']['log']['jobs']['limit'] | string |
| settings['stats']['breaker']['log']['jobs']['overhead'] | double |
| settings['stats']['breaker']['log']['operations'] | object |
| settings['stats']['breaker']['log']['operations']['limit'] | string |
| settings['stats']['breaker']['log']['operations']['overhead'] | double |
| settings['stats']['enabled'] | boolean |
| settings['stats']['jobs_log_expiration'] | string |
| settings['stats']['jobs_log_filter'] | string |
| settings['stats']['jobs_log_persistent_filter'] | string |
| settings['stats']['jobs_log_size'] | integer |
| settings['stats']['operations_log_expiration'] | string |
| settings['stats']['operations_log_size'] | integer |
| settings['stats']['service'] | object |
| settings['stats']['service']['interval'] | string |
| settings['udc'] | object |
| settings['udc']['enabled'] | boolean |
| settings['udc']['initial_delay'] | string |
| settings['udc']['interval'] | string |
| settings['udc']['url'] | string |
+-----------------------------------------------------------------------------------+--------------+
SELECT ... rows in set (... sec)
For further details, see the Cluster Settings configuration section.
Nodes
To get information about the nodes query for sys.nodes
.
This table can be queried for one, multiple or all nodes within a cluster.
The table schema is as follows:
id
Column Name | Description | Return Type |
---|---|---|
id | A unique ID within the cluster generated by the system. | STRING |
name
Column Name | Description | Return Type |
---|---|---|
name | The node name within a cluster. The system will choose a random name. You can specify the node name via your own custom configuration. | STRING |
hostname
Column Name | Description | Return Type |
---|---|---|
hostname | The specified host name of the machine the node is running on. | STRING |
rest_url
Column Name | Description | Return Type |
---|---|---|
rest_url | Full http(s) address where the REST API of the node is exposed, including schema, hostname (or IP) and port. | STRING |
port
Column Name | Description | Return Type |
---|---|---|
port | The specified ports for both HTTP and binary transport interfaces. You can specify the ports via your own custom configuration. | OBJECT |
port[‘http’] | CrateDB’s HTTP port. | INTEGER |
port[‘transport’] | CrateDB’s binary transport port. | INTEGER |
port[‘psql’] | The PostgreSQL wire protocol port. | INTEGER |
load
Column Name | Description | Return Type |
---|---|---|
load | System load statistics | OBJECT |
load[‘1’] | Average load over the last 1 minute. | DOUBLE |
load[‘5’] | Average load over the last 5 minutes. | DOUBLE |
load[‘15’] | Average load over the last 15 minutes. | DOUBLE |
load[‘probe_timestamp’] | Unix timestamp at the time of collection of the load probe. | LONG |
mem
Column Name | Description | Return Type |
---|---|---|
mem | Memory utilization statistics of the host. | OBJECT |
mem[‘used’] | Currently used memory in bytes. | LONG |
mem[‘used_percent’] | Currently used memory in percent of total. | SHORT |
mem[‘free’] | Currently available memory in bytes. | LONG |
mem[‘free_percent’] | Currently available memory in percent of total. | SHORT |
mem[‘probe_timestamp’] | Unix timestamp at the time of collection of the memory probe. | LONG |
heap
Column Name | Description | Return Type |
---|---|---|
heap | Heap memory utilization statistics. | OBJECT |
heap[‘used’] | Currently used heap memory in bytes. | LONG |
heap[‘max’] | Maximum available heap memory. You can specify the max heap memory CrateDB should use in the configuration. | LONG |
heap[‘free’] | Currently available heap memory in bytes. | LONG |
heap[‘probe_timestamp’] | Unix timestamp at the time of collection of the heap probe. | LONG |
version
Column Name | Description | Return Type |
---|---|---|
version | CrateDB version information. | OBJECT |
version[‘number’] | Version string in format “major.minor.hotfix” | STRING |
version[‘build_hash’] | SHA hash of the Github commit which this build was built from. | STRING |
version[‘build_snapshot’] | Indicates whether this build is a snapshot build. | BOOLEAN |
cluster_state_version
Column Name | Description | Return Type |
---|---|---|
cluster_state_version | The current version of the cluster state. The cluster state is an immutable structure and that is recreated when a change is published. | LONG |
fs
Column Name | Description | Return Type |
---|---|---|
fs | Utilization statistics about the file system. | OBJECT |
fs[‘total’] | Aggregated usage statistic of all disks on the host. | OBJECT |
fs[‘total’][‘size’] | Total size of all disks in bytes. | LONG |
fs[‘total’][‘used’] | Total used space of all disks in bytes. | LONG |
fs[‘total’][‘available’] | Total available space of all disks in bytes. | LONG |
fs[‘total’][‘reads’] | Total number of reads on all disks. | LONG |
fs[‘total’][‘bytes_read’] | Total size of reads on all disks in bytes. | LONG |
fs[‘total’][‘writes’] | Total number of writes on all disks. | LONG |
fs[‘total’][‘bytes_written’] | Total size of writes on all disks in bytes. | LONG |
fs[‘disks’] | Usage statistics of individual disks on the host. | ARRAY |
fs[‘disks’][‘dev’] | Device name | STRING |
fs[‘disks’][‘size’] | Total size of the disk in bytes. | LONG |
fs[‘disks’][‘used’] | Used space of the disk in bytes. | LONG |
fs[‘disks’][‘available’] | Available space of the disk in bytes. | LONG |
fs[‘disks’][‘reads’] | Number of reads on the disk. DEPRECATED: always returns -1 | LONG |
fs[‘disks’][‘bytes_read’] | Total size of reads on the disk in bytes. DEPRECATED: always returns -1 | LONG |
fs[‘disks’][‘writes’] | Number of writes on the disk. DEPRECATED: always returns -1 | LONG |
fs[‘disks’][‘bytes_written’] | Total size of writes on the disk in bytes. DEPRECATED: always returns -1 | LONG |
fs[‘data’] | Information about data paths used by the node. | ARRAY |
fs[‘data’][‘dev’] | Device name | STRING |
fs[‘data’][‘path’] | File path where the data of the node resides. | STRING |
thread_pools
Column Name | Description | Return Type |
---|---|---|
thread_pools | Usage statistics of Java thread pools. | ARRAY |
thread_pools[‘name’] | Name of the pool. | STRING |
thread_pools[‘active’] | Number of currently running thread in the thread pool. | INTEGER |
thread_pools[‘rejected’] | Total number of rejected threads in the thread pool. | LONG |
thread_pools[‘largest’] | Largest number of threads that have ever simultaniously been in the pool. | INTEGER |
thread_pools[‘completed’] | Total number of completed thread in teh thread pool. | LONG |
thread_pools[‘threads’] | Size of the thread pool. | INTEGER |
thread_pools[‘queue’] | Number of thread currently in the queue. | INTEGER |
os
Column Name | Description | Return Type |
---|---|---|
os | Operating system stats | OBJECT |
os[‘uptime’] | System uptime in milliseconds Requires allowing system calls on Windows and macOS. See notes in Uptime Limitations. | LONG |
os[‘timestamp’] | UNIX timestamp in millisecond resolution | LONG |
os[‘cpu’] | Information about CPU utilization | OBJECT |
os[‘cpu’][‘used’] | System CPU usage as percentage | SHORT |
os[‘cpu’][‘system’] | CPU time used by the system DEPRECATED: always returns -1 | SHORT |
os[‘cpu’][‘user’] | CPU time used by applications DEPRECATED: always returns -1 | SHORT |
os[‘cpu’][‘idle’] | Idle CPU time DEPRECATED: always returns -1 | SHORT |
os[‘cpu’][‘stolen’] | The amount of CPU ‘stolen’ from this virtual machine by the hypervisor for other tasks. DEPRECATED: always returns -1 | SHORT |
os[‘probe_timestamp’] | Unix timestamp at the time of collection of the OS probe. | LONG |
os[‘cgroup’] | Information about Cgroups (Linux only) | OBJECT |
os[‘cgroup’][‘cpuacct’] | Information about CPU accounting | OBJECT |
os[‘cgroup’][‘cpuacct’][‘control_group’] | The path to the cpu accounting cgroup | STRING |
os[‘cgroup’][‘cpuacct’][‘usage_nanos’] | The total CPU time (in nanoseconds) consumed by all tasks in this cgroup. | LONG |
os[‘cgroup’][‘cpu’] | Information about the CPU subsystem | OBJECT |
os[‘cgroup’][‘cpu’][‘control_group’] | The path to the cpu cgroup | STRING |
os[‘cgroup’][‘cpu’][‘cfs_period_micros’] | The period of time (in microseconds) the cgroup access to the CPU gets reallocated. | LONG |
os[‘cgroup’][‘cpu’][‘cfs_quota_micros’] | The total amount of time (in microseconds) for which all tasks in the cgroup can run during one period (cfs_period_micros). | LONG |
os[‘cgroup’][‘cpu’][‘num_elapsed_periods’] | The nr. of period intervals (cfs_period_micros) that have elapsed. | LONG |
os[‘cgroup’][‘cpu’][‘num_times_throttled’] | The nr. of times tasks in the cgroup have been throttled. | LONG |
os[‘cgroup’][‘cpu’][‘time_throttled_nanos’] | The total time (in nanoseconds) for which tasks in the cgroup have been throttled. | LONG |
os[‘cgroup’][‘mem’] | Information about memory resources used by tasks in a cgroup. | OBJECT |
os[‘cgroup’][‘mem’][‘control_group’] | The path to the memory cgroup | STRING |
os[‘cgroup’][‘mem’][‘usage_bytes’] | The total current memory usage by processes in the cgroup. | STRING |
os[‘cgroup’][‘mem’][‘limit_bytes’] | The max. amount of user memory in the cgroup. | STRING |
The cpu information values are cached for 1s. They might differ from the actual values at query time. Use the probe timestamp to get the time of collection. When analyzing the cpu usage over time, always use os['probe_timestamp']
to calculate the time difference between 2 probes.
Cgroup Limitations
Note
Cgroup metrics only work if the stats are available from /sys/fs/cgroup/cpu
and /sys/fs/cgroup/cpuacct
.
Uptime Limitations
Note
os[‘uptime’] required a system call when running CrateDB on Windows or macOS, however, system calls are not permitted by default. If you require this metric you need to allow system calls by setting bootstrap.seccomp
to false
. This setting must be set in the crate.yml or via command line argument and cannot be changed at runtime.
os_info
Column Name | Description | Return Type |
---|---|---|
os_info | Operating system information | OBJECT |
os_info[‘available_processors’] | Number of processors that are available in the JVM. This is usually equal to the number of cores of the CPU. | INTEGER |
os_info[‘name’] | Name of the operating system (ex: Linux, Windows, macOS) | STRING |
os_info[‘arch’] | Name of the JVM architecture (ex: amd64, x86) | STRING |
os_info[‘version’] | Version of the operating system | STRING |
os_info[‘jvm’] | Information about the JVM (Java Virtual Machine) | OBJECT |
os_info[‘jvm’][‘version’] | The JVM version | STRING |
os_info[‘jvm’][‘vm_name’] | The name of the JVM (eg. OpenJDK, Java Hotspot(TM) ) | STRING |
os_info[‘jvm’][‘vm_vendor’] | The vendor name of the JVM | STRING |
os_info[‘jvm’][‘vm_version’] | The version of the JVM | STRING |
network
Network statistics are deprecated in CrateDB 2.3 and may completely be removed in subsequent versions. All LONG
columns always return 0
.
Column Name | Description | Return Type |
---|---|---|
network | Statistics about network activity on the host. | OBJECT |
network[‘probe_timestamp’] | Unix timestamp at the time of collection of the network probe. | LONG |
network[‘tcp’] | TCP network activity on the host. | OBJECT |
network[‘tcp’][‘connections’] | Information about TCP network connections. | OBJECT |
network[‘tpc’][‘connections’][‘initiated’] | Total number of initiated TCP connections. | LONG |
network[‘tpc’][‘connections’][‘accepted’] | Total number of accepted TCP connections. | LONG |
network[‘tpc’][‘connections’][‘curr_established’] | Total number of currently established TCP connections. | LONG |
network[‘tcp’][‘connections’][‘dropped’] | Total number of dropped TCP connections. | LONG |
network[‘tcp’][‘connections’][‘embryonic_dropped’] | Total number of TCP connections that have been dropped before they were accepted. | LONG |
network[‘tcp’][‘packets’] | Information about TCP packets. | OBJECT |
network[‘tpc’][‘packets’][‘sent’] | Total number of TCP packets sent. | LONG |
network[‘tcp’][‘packets’][‘received’] | Total number of TCP packets received. | LONG |
network[‘tpc’][‘packets’][‘retransmitted’] | Total number of TCP packets retransmitted due to an error. | LONG |
network[‘tcp’][‘packets’][‘errors_received’] | Total number of TCP packets that contained checksum errors, had a bad offset, were dropped because of a lack of memory or were too short. | LONG |
network[‘tcp’]][‘packets’][‘rst_sent’] | Total number of RST packets sent due to left unread data in queue when socket is closed. See tools.ietf.org. | LONG |
connections
Column Name | Description | Return Type |
---|---|---|
http | Number of connections established via HTTP | OBJECT |
http[‘open’] | The currently open connections established via HTTP | LONG |
http[‘total’] | The total number of connections that have been established via HTTP over the life time of a CrateDB node | LONG |
psql | Number of connections established via Postgres protocol | OBJECT |
psql[‘open’] | The currently open connections established via Postgres protocol | LONG |
psql[‘total’] | The total number of connections that have been established via Postgres protocol over the life time of a CrateDB node | LONG |
transport | Number of connections established via Transport protocol | OBJECT |
transport[‘open’] | The currently open connections established via Transport protocol | LONG |
process
Column Name | Description | Return Type |
---|---|---|
process | Statistics about the CrateDB process. | OBJECT |
process[‘open_file_descriptors’] | Number of currently open file descriptors used by the CrateDB process. | LONG |
process[‘max_open_file_descriptors’] | The maximum number of open file descriptors CrateDB can use. | LONG |
process[‘probe_timestamp’] | The system UNIX timestamp at the moment of the probe collection. | LONG |
process[‘cpu’] | Information about the CPU usage of the CrateDB process. | OBJECT |
process[‘cpu’][‘percent’] | The CPU usage of the CrateDB JVM process given in percent. | SHORT |
process[‘cpu’][‘user’] | The process CPU user time in milliseconds. DEPRECATED: always returns -1 | LONG |
process[‘cpu’][‘system’] | The process CPU kernel time in milliseconds. DEPRECATED: always returns -1 | LONG |
The cpu information values are cached for 1s. They might differ from the actual values at query time. Use the probe timestamp to get the time of the collect. When analyzing the cpu usage over time, always use process['probe_timestamp']
to calculate the time difference between 2 probes.
Note
If one of the queried nodes is not responding within three seconds it returns null
every column except id
and name
. This behaviour could be used to detect hanging nodes.
Node Checks
The table sys.node_checks
exposes a list of internal node checks and results of their validation.
The table schema is the following:
Column Name | Description | Return Type |
---|---|---|
id | The unique check ID. | INTEGER |
node_id | The unique node ID. | STRING |
severity | The level of severity. The higher the value of the field the higher severity. | INTEGER |
description | The description message for the setting check. | STRING |
passed | The flag determines whether the check for the setting has passed. | BOOLEAN |
acknowledged | The flag determines whether the check for this setting has been acknowledged by the user in order to ignored the value of passed column. This column can be updated. | BOOLEAN |
Example query:
cr> select id, node_id, description from sys.node_checks order by id, node_id;
+----+---------...-+--------------------------------------------------------------...-+
| id | node_id | description |
+----+---------...-+--------------------------------------------------------------...-+
| 1 | ... | The value of the cluster setting 'gateway.expected_nodes' mus... |
| 2 | ... | The value of the cluster setting 'gateway.recover_after_nodes... |
| 3 | ... | If any of the "expected nodes" recovery settings are set, the... |
| 5 | ... | The high disk watermark is exceeded on the node. The cluster ... |
| 6 | ... | The low disk watermark is exceeded on the node. The cluster w... |
| 7 | ... | The flood stage disk watermark is exceeded on the node. Table... |
| 8 | ... | The JVM version with which CrateDB is running should be >= 11... |
+----+---------...-+--------------------------------------------------------------...-+
SELECT 7 rows in set (... sec)
Acknowledge Failed Checks
It is possible to acknowledge every check by updating the acknowledged
column. By doing this, specially CrateDB’s built-in Admin-UI won’t complain anymore about failing checks.
Imagine we’ve added a new node to our cluster, but as the gateway.expected_nodes column can only be set via config-file or command-line argument, the check for this setting will not pass on the already running nodes until the config-file or command-line argument on these nodes is updated and the nodes are restarted (which is not what we want on a healthy well running cluster).
In order to make the Admin-UI accept a failing check (so the checks label goes green again), we must acknowledge this check by updating it’s acknowledged
flag:
cr> update sys.node_checks set acknowledged = true where id = 1;
UPDATE OK, 1 row affected (... sec)
Caution
Updates on this column are transient, so changed values are lost after the affected node is restarted.
Description of Checked Node Settings
Recovery Expected Nodes
The check for the gateway.expected_nodes setting checks that the number of nodes that should be waited for the immediate cluster state recovery, must be equal to the maximum number of data and master nodes in the cluster.
Recovery After Nodes
The check for the gateway.recover_after_nodes verifies that the number of started nodes before the cluster starts must be greater than the half of the expected number of nodes and equal/less than number of nodes in the cluster.
(E / 2) < R <= E
where R
is the number of recovery nodes, E
is the number of expected nodes.
Recovery After Time
If gateway.recover_after_nodes is set, then gateway.recover_after_time must not be set to 0s
, otherwise the gateway.recover_after_nodes
setting wouldn’t have any effect.
Routing Allocation Disk Watermark High
The check for the cluster.routing.allocation.disk.watermark.high setting verifies that the high watermark is not exceeded on the current node. The usage of each disk for configured CrateDB data paths is verified against the threshold setting. If one or more verification fails the check is marked as not passed.
Routing Allocation Disk Watermark Low
The check for the cluster.routing.allocation.disk.watermark.low which controls the low watermark for the node disk usage. The check verifies that the low watermark is not exceeded on the current node. The verification is done against each disk for configured CrateDB data paths. The check is not passed if the verification for one or more disk fails.
JVM Version
The check for the JVM version checks if CrateDB is running under Java 11 or later. If not the check fails as we’re dropping support for earlier versions in future release. This is a low severity check that doesn’t require immediate action. But to be able to upgrade to future version the JVM should be upgraded eventually.
Shards
The table sys.shards
contains real-time statistics for all shards of all (non-system) tables.
Table Schema
Column Name | Description | Return Type |
---|---|---|
_node | Information about the node the shard is located at. Contains the same information as the | OBJECT |
blob_path | Path to the directory which contains the blob files of the shard, or null if the shard is not a blob shard. | STRING |
id | The shard ID. This shard ID is managed by the managed by the system ranging from 0 and up to the specified number of shards of a table (by default the number of shards is 5). | INTEGER |
min_lucene_version | Shows the oldest lucene segment version used in this shard. | STRING |
num_docs | The total amount of docs within a shard. | LONG |
orphan_partition | True if the partition has NO table associated with. In rare situations the table is missing. False on non-partitioned tables. | BOOLEAN |
partition_ident | The partition ident of a partitioned table. Empty string on non-partitioned tables. | STRING |
path | Path to the shard directory on the filesystem. This directory contains state and index files. | STRING |
primary | Describes if the shard is the primary shard. | BOOLEAN |
recovery | Represents recovery statistic of the particular shard. Recovery is the process of moving a table shard to a different node or loading it from disk, e.g. during node startup (local gateway recovery), replication, shard rebalancing or snapshot recovery. | OBJECT |
recovery[‘files’] | Shards recovery statistic in files. | OBJECT |
recovery[‘files’][‘percent’] | Percentage of files already recovered. | FLOAT |
recovery[‘files’][‘recovered’] | Number of actual files recovered in the shard. Includes both existing and reused files. | INTEGER |
recovery[‘files’][‘reused’] | Total number of files reused from a local copy while recovering the shard. | INTEGER |
recovery[‘files’][‘used’] | Total number of files in the shard. | INTEGER |
recovery[‘size’] | Shards recovery statistic in bytes. | OBJECT |
recovery[‘size’][‘percent’] | Percentage of bytes already recovered. | FLOAT |
recovery[‘size’][‘recovered’] | Number of actual bytes recovered in the shard. Includes both existing and reused bytes. | LONG |
recovery[‘size’][‘reused’] | Number of bytes reused from a local copy while recovering the shard. | LONG |
recovery[‘size’][‘used’] | Total number of bytes in the shard. | LONG |
recovery[‘stage’] | Recovery stage:
| STRING |
recovery[‘total_time’] | Returns elapsed time from the start of the shard recovery. | LONG |
recovery[‘type’] | Recovery type:
| STRING |
relocating_node | The node ID which the shard is getting relocated to at the time. | STRING |
routing_state | The current state of a shard as defined by the routing. Possible states of the shard routing are:
| STRING |
schema_name | The schema name. This will be “blob” for shards of blob tables and “doc” for shards of common tables without a defined schema. | STRING |
size | Current size in bytes. This value is cached for max. 10 seconds to reduce file system access. | LONG |
state | The current state of the shard. Possible states are:
| STRING |
table_name | The table name. | STRING |
Note
The sys.shards
table is subject to Shard Table Permissions.
Example
For example, you can query shards like this:
cr> select schema_name as schema,
... table_name as t,
... id,
... partition_ident as p_i,
... num_docs as docs,
... primary,
... relocating_node as r_n,
... routing_state as r_state,
... state,
... orphan_partition as o_p
... from sys.shards where table_name = 'locations' and id = 1;
+--------+-----------+----+-----+------+---------+------+---------+---------+-------+
| schema | t | id | p_i | docs | primary | r_n | r_state | state | o_p |
+--------+-----------+----+-----+------+---------+------+---------+---------+-------+
| doc | locations | 1 | | 8 | TRUE | NULL | STARTED | STARTED | FALSE |
+--------+-----------+----+-----+------+---------+------+---------+---------+-------+
SELECT 1 row in set (... sec)
Jobs, Operations, and Logs
To let you inspect the activities currently taking place in a cluster, CrateDB provides system tables that let you track current cluster jobs and operations. See Jobs Table and Operations Table.
Jobs and operations that finished executing are additionally recorded in memory. There are two retention policies available to control how many records should be kept.
One option is to configure the maximum number of records which should be kept. Once the configured table size is reached, the older log records are deleted as newer records are added. This is configurable using stats.jobs_log_size and stats.operations_log_size.
Another option is to configure an expiration time for the records. In this case, the records in the logs tables are periodically cleared if they are older than the expiry time. This behaviour is configurable using stats.jobs_log_expiration and stats.operations_log_expiration.
In addition to these retention policies, there is a memory limit in place preventing these tables from taking up too much memory. The amount of memory that can be used to store the jobs can be configured using stats.breaker.log.jobs.limit and stats.breaker.log.operations.limit. If the memory limit is reached, an error message will be logged and the log table will be cleared completely.
It is also possible to define a filter which must match for jobs to be recorded after they finished executing. This can be useful to only record slow queries or queries that failed due to an error. This filter can be configured using the stats.jobs_log_filer setting.
Furthermore, there is a second filter setting which also results in a log entry in the regular CrateDB log file for all finished jobs that match this filter. This can be configured using stats.jobs_log_persistent_filter. This could be used to create a persistent slow query log.
Jobs
The sys.jobs
table is a constantly updated view of all jobs that are currently being executed in the cluster.
Table Schema
Column Name | Description | Return Type |
---|---|---|
id | The job UUID. This job ID is generated by the sytem. | STRING |
node | Information about the node that created the job. | OBJECT |
node[‘id’] | The id of the node. | STRING |
node[‘name’] | The name of the node. | STRING |
started | The point in time when the job started. | TIMESTAMP |
stmt | Shows the data query or manipulation statement represented by this job. | STRING |
username | The user who is executing the statement. | STRING |
The field username
corresponds to the SESSION_USER that is performing the query:
cr> select stmt, username, started from sys.jobs where stmt like 'sel% from %jobs%';
+---------------------------------------------------------------------------------+----------+-...-----+
| stmt | username | started |
+---------------------------------------------------------------------------------+----------+-...-----+
| select stmt, username, started from sys.jobs where stmt like 'sel% from %jobs%' | crate | ... |
+---------------------------------------------------------------------------------+----------+-...-----+
SELECT 1 row in set (... sec)
Note
If the enterprise edition is disabled or the user management module is not available, the username
is represented as crate
.
Every request that queries data or manipulates data is considered a “job” if it is a valid query. Requests that are not valid queries (for example, a request that tries to query a non-existent table) will not show up as jobs.
Jobs Metrics
The sys.jobs_metrics
table provides an overview of the query latency in the cluster. Jobs metrics are not persisted across node restarts.
The metrics are aggregated for each node and each unique classification of the statements.
Note
In order to reduce the memory requirements for these metrics, the times are statistically sampled and therefore may have slight inaccuracies. In addition, durations are only tracked up to 10 minutes. Statements taking longer than that are capped to 10 minutes.
sys.jobs_metrics
Table Schema
Column Name | Description | Return Type |
---|---|---|
node | An object containing the id and name of the node on which the metrics have been sampled. | OBJECT |
classification | An object containing the statement classification. | OBJECT |
classification[‘type’] | The general type of the statement. Types are: INSERT , SELECT , UPDATE , DELETE , COPY , DDL , and MANAGEMENT . | STRING |
classification[‘labels’] | Labels are only available for certain statement types that can be classified more accurately than just by their type. | STRING_ARRAY |
total_count | Total number of queries executed | LONG |
failed_count | Total number of queries that failed to complete successfully. | LONG |
sum_of_durations | Sum of durations in ms of all executed queries per statement type. | LONG |
stdev | The standard deviation of the query latencies | DOUBLE |
mean | The mean query latency in ms | DOUBLE |
max | The maximum query latency in ms | LONG |
min | The minimum query latency in ms | LONG |
percentiles | An object containing different percentiles | OBJECT |
Classification
Certain statement types (such as SELECT
statements) have additional labels in their classification. These labels are the names of the logical plan operators that are involved in the query.
For example, the following UNION
statement:
SELECT name FROM t1 where id = 1
UNION ALL
SELECT name FROM t2 where id < 2
would result in the following labels:
Union`
for the UNION ALLGet
for the left SELECTCollect
for the right SELECT
Note
Labels may be subject to change as they only represent internal properties of the statement!
Operations
The sys.operations
table is a constantly updated view of all operations that are currently being executed in the cluster:
cr> select node['name'], job_id, name, used_bytes from sys.operations
... order by name limit 1;
+--------------+--------...-+-----...-+------------+
| node['name'] | job_id | name | used_bytes |
+--------------+--------...-+-----...-+------------+
| crate | ... | ... | ... |
+--------------+--------...-+-----...-+------------+
SELECT 1 row in set (... sec)
An operation is a node-specific sub-component of a job (for when a job involves multi-node processing). Jobs that do not require multi-node processing will not produce any operations.
Table Schema
Column Name | Description | Return Type |
---|---|---|
id | The operation UUID. This operation ID is generated by the sytem. | STRING |
job_id | The job id this operation belongs to. | STRING |
name | The name of the operation. | STRING |
node | Information about the node that created the operation. | OBJECT |
node[‘id’] | The id of the node. | STRING |
node[‘name’] | The name of the node. | STRING |
started | The point in time when the operation started. | TIMESTAMP |
used_bytes | Currently loaded amount of data by the operation. | LONG |
Note
In some cases, operations are generated for internal CrateDB work that does not directly correspond to a user request. These entries do not have corresponding entries in sys.jobs
.
Logs
The sys.jobs and sys.operations tables have corresponding log tables: sys.jobs_log
and sys.operations_log
.
sys.jobs_log
Table Schema
Column Name | Description | Return Type |
---|---|---|
id | The job ID. | STRING |
ended | The point in time when the job finished. | TIMESTAMP |
error | If the job encountered an error, this will hold the error message. | STRING |
started | The point in time when the job started. | TIMESTAMP |
stmt | Shows the data query or manipulation statement executed by the job. | STRING |
username | The user who executed the statement. | STRING |
classification | An object containing the statement classification. | OBJECT |
classification[‘type’] | The general type of the statement. Types are: INSERT , SELECT , UPDATE , DELETE , COPY , DDL , and MANAGEMENT . | STRING |
classification[‘labels’] | Labels are only available for certain statement types that can be classified more accurately than just by their type. | STRING_ARRAY |
Note
You can control which jobs are recorded using the stats.jobs_log_filter
sys.operations_log
Table Schema
Column Name | Description | Return Type |
---|---|---|
id | The operation ID. | STRING |
job_id | The job id. | STRING |
ended | The point in time when the operation finished. | TIMESTAMP |
error | If the operation encountered an error, this will hold the error message. | STRING |
name | The name of the operation. | STRING |
started | The point in time when the operation started. | TIMESTAMP |
used_bytes | The amount of data loaded by the operation. | LONG |
After a job or operation finishes, the corresponding entry will be moved into the corresponding log table:
cr> select id, stmt, username, started, ended, error
... from sys.jobs_log order by ended desc limit 2;
+-...+----------------------------------------------...-+----------+-...-----+-...---+-------+
| id | stmt | username | started | ended | error |
+-...+----------------------------------------------...-+----------+-...-----+-...---+-------+
| ...| select node['name'], ... | crate | ... | ... | NULL |
| ...| select stmt, username, started from sys.jobs ... | crate | ... | ... | NULL |
+-...+----------------------------------------------...-+----------+-...-----+-...---+-------+
SELECT 2 rows in set (... sec)
Invalid queries are also logged in the sys.jobs_log
table, i.e. queries that never make it to the sys.jobs
table because they could not be executed.
The log tables are bound by a fixed size (stats.jobs_log_size) or by an expiration time (stats.jobs_log_expiration)
See Collecting Stats for information on how to configure logs.
Caution
If you deactivate statistics tracking, the logs tables will be truncated.
Cluster Checks
The table sys.checks
exposes a list of internal cluster checks and results of their validation.
The sys.checks
table looks like this:
Column Name | Description | Return Type |
---|---|---|
id | The unique check id. | INTEGER |
severity | The level of severity. The higher the value of the field the higher severity. | INTEGER |
description | The description message for the setting check. | STRING |
passed | The flag determines whether the check for the setting has passed. | BOOLEAN |
Here’s an example query:
cr> select id, description from sys.checks order by id;
+----+--------------------------------------------------------------...-+
| id | description |
+----+--------------------------------------------------------------...-+
| 1 | The setting 'discovery.zen.minimum_master_nodes' must not be ... |
| 2 | The total number of partitions of one or more partitioned tab... |
| 3 | The following tables need to be recreated for compatibility w... |
| 6 | Your CrateDB license is valid. Enjoy CrateDB! |
+----+--------------------------------------------------------------...-+
SELECT 4 rows in set (... sec)
Cluster checks are also indicated in the CrateDB admin console. When all cluster checks (and all Node Checks) pass, the Checks icon will be green. Here’s what it looks like when some checks are failing at the CRITICAL severity level:
Current Checks
Minimum Master Nodes
The check for the discovery.zen.minimum_master_nodes setting verifies that the minimum number of nodes is equal/greater than the half of maximum number of nodes in the cluster.
(N / 2) + 1 <= M
where N
is the number of nodes in the cluster, and M
is the value of the setting discovery.zen.minimum_master_nodes.
You can change the value (via SET and RESET) permanently by issuing the following SQL statement:
SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = M;
Number of Partitions
This check warns if any partitioned table has more than 1000 partitions to detect the usage of a high cardinality field for partitioning.
Tables need to be recreated
Warning
Do not attempt to upgrade your cluster to a newer major version if this cluster check is failing. Follow the instructions below to get this cluster check passing.
This check warns you if there are tables that need to be recreated for compatibility with future major versions of CrateDB.
If you try to upgrade to the next major version of CrateDB with tables that have not been recreated, CrateDB will refuse to start.
To recreate a table, you have to create new tables, copy over the data and rename or remove the old table.
1) Use SHOW CREATE TABLE to get the schema required to create an empty copy of the table to recreate:
SHOW CREATE TABLE your_table;
2) Create a new temporary table, using the schema returned from SHOW CREATE TABLE:
CREATE TABLE tmp_your_table (...);
Prevent inserts to the original table:
ALTER TABLE your_table SET ("blocks.read_only" = true);
Copy the data:
INSERT INTO tmp_your_table (...) (SELECT ... FROM your_table);
Swap the tables:
ALTER CLUSTER SWAP TABLE tmp_your_table TO your_table;
Confirm the new
your_table
contains all data and has the new version:SELECT count(*) FROM your_table;
SELECT version FROM information_schema.tables where table_name = 'your_table';
Drop the now obsolete old table:
ALTER TABLE tmp_your_table SET ("blocks.read_only" = false);
DROP TABLE tmp_your_table;
When all tables have been recreated, this cluster check will pass.
Note
Snapshots of your tables created prior to them being upgraded will not work with future versions of CrateDB. For this reason, you should create a new snapshot for each of your tables. (See Snapshots.)
License expiry check
This check warns you when your license is close to expiration. It will yield a MEDIUM
alert when your license is valid for less than 15 days and a HIGH
alert when your license is valid for less than a day. It’s highly recommended you request a new license when this check triggers in order to avoid the situation where operations are rejected due to an invalid license.
Health
The sys.health
table lists the health of each table and table partition. The health is computed by checking the states of the shard of each table/partition.
Column Name | Description | Return Type |
---|---|---|
table_name | The table name. | STRING |
table_schema | The schema of the table. | STRING |
partition_ident | The ident of the partition. NULL for non-partitioned tables. | STRING |
health | The health label. Can be RED, YELLOW or GREEN. | STRING |
severity | The health as a short value. Useful when ordering on health. | SHORT |
missing_shards | The number of not assigned or started shards. | INTEGER |
underreplicated_shards | The number of shards which are not fully replicated. | INTEGER |
Both missing_shards
and underreplicated_shards
might return -1
if the cluster is in an unhealthy state that prevents the exact number from being calculated. This could be the case when the cluster can’t elect a master, because there are not enough eligible nodes available.
cr> select * from sys.health order by severity desc, table_name;
+--------+----------------+-----------------+----------+------------+--------------+------------------------+
| health | missing_shards | partition_ident | severity | table_name | table_schema | underreplicated_shards |
+--------+----------------+-----------------+----------+------------+--------------+------------------------+
| GREEN | 0 | | 1 | locations | doc | 0 |
| GREEN | 0 | | 1 | quotes | doc | 0 |
+--------+----------------+-----------------+----------+------------+--------------+------------------------+
SELECT 2 rows in set (... sec)
The health with the highest severity will always define the health of the query scope.
Example of getting a cluster health (health of all tables):
cr> select health from sys.health order by severity desc limit 1;
+--------+
| health |
+--------+
| GREEN |
+--------+
SELECT 1 row in set (... sec)
Health Definition
Health | Description |
---|---|
RED | At least one primary shard is missing (primary shard not started or unassigned). |
YELLOW | At least one shard is underreplicated (replica shard not started or unassigned). |
GREEN | All primary and replica shards have been started. |
Note
The sys.health
table is subject to Shard Table Permissions as it will expose a summary of table shard states.
Repositories
The table sys.repositories
lists all configured repositories that can be used to create, manage and restore snapshots (see Snapshots).
Column Name | Description | Return Type |
---|---|---|
name | The repository name | STRING |
type | The type of the repository determining how and where the repository stores its snapshots. | STRING |
settings | The configuration settings the repository has been created with. The specific settings depend on the repository type, see CREATE REPOSITORY. | OBJECT |
cr> SELECT name, type, settings FROM sys.repositories
... ORDER BY name;
+---------+------+---------------------------------------------------...--+
| name | type | settings |
+---------+------+---------------------------------------------------...--+
| my_repo | fs | {"compress": "true", "location": "repo_location", ...} |
+---------+------+---------------------------------------------------...--+
SELECT 1 row in set (... sec)
Snapshots
The table sys.snapshots
lists all existing snapshots in all configured repositories (see Snapshots).
Column Name | Description | Return Type |
---|---|---|
name | The name of the snapshot | STRING |
repository | The name of the repository that contains this snapshot. | STRING |
concrete_indices | Contains the names of all tables and partitions that are contained in this snapshot how they are represented as ES index names. | ARRAY |
started | The point in time when the creation of the snapshot started. Changes made after that are not stored in this snapshot. | TIMESTAMP |
finished | The point in time when the snapshot creation finished. | TIMESTAMP |
state | The current state of the snapshot. One of: IN_PROGRESS , SUCCESS , PARTIAL , or FAILED . | STRING |
version | An internal version this snapshot was created with. | STRING |
Snapshot/Restore operates on a per-shard basis. Hence, the state
column indicates whether all (SUCCESS
), some (PARTIAL
), or no shards(FAILED
) have been backed up. PARTIAL
snapshots are the result of some primaries becoming unavailable while taking the snapshot when there are no replicas at hand (cluster state is RED). If there are replicas of the (now unavailable) primaries (cluster state is YELLOW) the snapshot succeeds and all shards are included (state SUCCESS
). Building on a PARTIAL
snapshot will include all primaries again.
Warning
In case of a PARTIAL
state another snapshot should be created in order to guarantee a full backup! Only SUCCESS
includes all shards.
The concrete_indices
column contains the names of all Elasticsearch indices that were stored in the snapshot. A normal CrateDB table maps to one Elasticsearch index, a partitioned table maps to one Elasticsearch index per partition. The mapping follows the following pattern:
CrateDB table / partition name | concrete_indices entry |
---|---|
doc.my_table | my_table |
my_schema.my_table | my_schema.my_table |
doc.parted_table (value=null) | .partitioned.my_table.0400 |
my_schema.parted_table (value=null) | my_schema..partitioned.my_table.0400 |
cr> SELECT "repository", name, state, concrete_indices
... FROM sys.snapshots order by "repository", name;
+------------+-------------+---------+-----------------...-+
| repository | name | state | concrete_indices |
+------------+-------------+---------+-----------------...-+
| my_repo | my_snapshot | SUCCESS | [...] |
+------------+-------------+---------+-----------------...-+
SELECT 1 row in set (... sec)
Summits
The sys.summits
table contains the information about the mountains in the Alps higher than 2000m. The mountain names from the table are also used to generate random nodes names.
Users
The sys.users
table contains all existing database users in the cluster. The table is only available in the CrateDB Enterprise Edition.
Column Name | Description | Return Type |
---|---|---|
name | The name of the database user. | STRING |
superuser | BOOLEAN flag to indicate whether the user is a superuser. | BOOLEAN |
Allocations
The sys.allocations
table contains information about shards and their allocation state. The table contains:
- shards that are unassigned and why they are unassigned
- shards that are assigned but cannot be moved or rebalanced and why they remain on their current node
It can help to identify problems if shard allocations behave different than expected, e.g. when a shard stays unassigned or a shard does not move off a node.
Column Name | Description | Return Type |
---|---|---|
table_schema | Schema name of the table of the shard. | STRING |
table_name | Table name of the shard. | STRING |
partition_ident | Identifier of the partition of the shard. NULL if the table is not partitioned. | STRING |
shard_id | ID of the effected shard. | INTEGER |
node_id | ID of the node on which the shard resides. NULL if the shard is unassigned. | STRING |
primary | Whether the shard is a primary shard. | BOOLEAN |
current_state | Current state of the shard. Possible states are: UNASSIGNED , INITIALIZING , STARTED , RELOCATING | STRING |
explanation | Explanation why the shard cannot be allocated, moved or rebalanced. | STRING |
decisions | A list of decisions that describe in detail why the shard in the current state. | ARRAY |
decisions[‘node_id’] | ID of the node of the decision. | STRING |
decisions[‘node_name’] | Name of the node of the decision. | STRING |
decisions[‘explanations’] | Detailed list of human readable explanations why the node decided whether to allocate or rebalance the shard. Returns NULL if there is no need to rebalance the shard. | ARRAY |
Note
The sys.allocations
table is subject to Shard Table Permissions.
Shard Table Permissions
Accessing tables that return shards (sys.shards
, sys.allocations
) is subjected to the same privileges constraints as the other tables. Namely, in order to query them, the connected user needs to have the DQL
privilege on that particular table, either directly or inherited from the SCHEMA
or CLUSTER
(for more information on privileges inheritance see Hierarchical Inheritance of Privileges).
However, being able to query shard returning system tables will not allow the user to retrieve all the rows in the table, as they may contain information related to tables, which the connected user does not have any privileges for. The only rows that will be returned will be the ones the user is allowed to access.
For example, if the user john
has any privilege on the doc.books
table but no privilege at all on doc.locations
, when john
issues a SELECT * FROM sys.shards
statement, the shards information related to the doc.locations
table will not be returned.