Cluster Settings
Cluster settings apply to all nodes of a CockroachDB cluster and control, for example, whether or not to share diagnostic details with Cockroach Labs as well as advanced options for debugging and cluster tuning.
They can be updated anytime after a cluster has been started, but only by a member of the admin
role, to which the root
user belongs by default.
Note:
In contrast to cluster-wide settings, node-level settings apply to a single node. They are defined by flags passed to the cockroach start
command when starting a node and cannot be changed without stopping and restarting the node. For more details, see Start a Node.
Settings
Warning:
Many cluster settings are intended for tuning CockroachDB internals. Before changing these settings, we strongly encourage you to discuss your goals with Cockroach Labs; otherwise, you use them at your own risk.
Setting | Type | Default | Description |
---|---|---|---|
changefeed.experimental_poll_interval | duration | 1s | polling interval for the prototype changefeed implementation (WARNING: may compromise cluster stability or correctness; do not edit without supervision) |
changefeed.push.enabled | boolean | true | if set, changed are pushed instead of pulled. This requires the kv.rangefeed.enabled setting. See https://www.cockroachlabs.com/docs/v19.1/change-data-capture.html#enable-rangefeeds-to-reduce-latency |
cloudstorage.gs.default.key | string |
| if set, JSON key to use during Google Cloud Storage operations |
cloudstorage.http.custom_ca | string |
| custom root CA (appended to system's default CAs) for verifying certificates when interacting with HTTPS storage |
cloudstorage.timeout | duration | 10m0s | the timeout for import/export storage operations |
cluster.organization | string |
| organization name |
cluster.preserve_downgrade_option | string |
| disable (automatic or manual) cluster version upgrade from the specified version until reset |
compactor.enabled | boolean | true | when false, the system will reclaim space occupied by deleted data less aggressively |
compactor.max_record_age | duration | 24h0m0s | discard suggestions not processed within this duration (WARNING: may compromise cluster stability or correctness; do not edit without supervision) |
compactor.min_interval | duration | 15s | minimum time interval to wait before compacting (WARNING: may compromise cluster stability or correctness; do not edit without supervision) |
compactor.threshold_available_fraction | float | 0.1 | consider suggestions for at least the given percentage of the available logical space (zero to disable) (WARNING: may compromise cluster stability or correctness; do not edit without supervision) |
compactor.threshold_bytes | byte size | 256 MiB | minimum expected logical space reclamation required before considering an aggregated suggestion (WARNING: may compromise cluster stability or correctness; do not edit without supervision) |
compactor.threshold_used_fraction | float | 0.1 | consider suggestions for at least the given percentage of the used logical space (zero to disable) (WARNING: may compromise cluster stability or correctness; do not edit without supervision) |
debug.panic_on_failed_assertions | boolean | false | panic when an assertion fails rather than reporting |
diagnostics.forced_stat_reset.interval | duration | 2h0m0s | interval after which pending diagnostics statistics should be discarded even if not reported |
diagnostics.reporting.enabled | boolean | true | enable reporting diagnostic metrics to cockroach labs |
diagnostics.reporting.interval | duration | 1h0m0s | interval at which diagnostics data should be reported (should be shorter than diagnostics.forced_stat_reset.interval) |
diagnostics.reporting.send_crash_reports | boolean | true | send crash and panic reports |
external.graphite.endpoint | string |
| if nonempty, push server metrics to the Graphite or Carbon server at the specified host:port |
external.graphite.interval | duration | 10s | the interval at which metrics are pushed to Graphite (if enabled) |
jobs.registry.leniency | duration | 1m0s | the amount of time to defer any attempts to reschedule a job |
jobs.retention_time | duration | 336h0m0s | the amount of time to retain records for completed jobs before |
kv.allocator.lease_rebalancing_aggressiveness | float | 1 | set greater than 1.0 to rebalance leases toward load more aggressively, or between 0 and 1.0 to be more conservative about rebalancing leases |
kv.allocator.load_based_lease_rebalancing.enabled | boolean | true | set to enable rebalancing of range leases based on load and latency |
kv.allocator.load_based_rebalancing | enumeration | 2 | whether to rebalance based on the distribution of QPS across stores [off = 0, leases = 1, leases and replicas = 2] |
kv.allocator.qps_rebalance_threshold | float | 0.25 | minimum fraction away from the mean a store's QPS (such as queries per second) can be before it is considered overfull or underfull |
kv.allocator.range_rebalance_threshold | float | 0.05 | minimum fraction away from the mean a store's range count can be before it is considered overfull or underfull |
kv.bulk_io_write.concurrent_addsstable_requests | integer | 1 | number of AddSSTable requests a store will handle concurrently before queuing |
kv.bulk_io_write.concurrent_export_requests | integer | 3 | number of export requests a store will handle concurrently before queuing |
kv.bulk_io_write.concurrent_import_requests | integer | 1 | number of import requests a store will handle concurrently before queuing |
kv.bulk_io_write.max_rate | byte size | 8.0 EiB | the rate limit (bytes/sec) to use for writes to disk on behalf of bulk io ops |
kv.bulk_sst.sync_size | byte size | 2.0 MiB | threshold after which non-Rocks SST writes must fsync (0 disables) |
kv.closed_timestamp.close_fraction | float | 0.2 | fraction of closed timestamp target duration specifying how frequently the closed timestamp is advanced |
kv.closed_timestamp.follower_reads_enabled | boolean | true | allow (all) replicas to serve consistent historical reads based on closed timestamp information |
kv.closed_timestamp.target_duration | duration | 30s | if nonzero, attempt to provide closed timestamp notifications for timestamps trailing cluster time by approximately this duration |
kv.follower_read.target_multiple | float | 3 | if above 1, encourages the distsender to perform a read against the closest replica if a request is older than kv.closed_timestamp.target_duration (1 + kv.closed_timestamp.close_fraction this) less a clock uncertainty interval. This value also is used to create follower_timestamp(). (WARNING: may compromise cluster stability or correctness; do not edit without supervision) |
kv.import.batch_size | byte size | 32 MiB | the maximum size of the payload in an AddSSTable request (WARNING: may compromise cluster stability or correctness; do not edit without supervision) |
kv.raft.command.max_size | byte size | 64 MiB | maximum size of a raft command |
kv.raft_log.disable_synchronization_unsafe | boolean | false | set to true to disable synchronization on Raft log writes to persistent storage. Setting to true risks data loss or data corruption on server crashes. The setting is meant for internal testing only and SHOULD NOT be used in production. |
kv.range.backpressure_range_size_multiplier | float | 2 | multiple of range_max_bytes that a range is allowed to grow to without splitting before writes to that range are blocked, or 0 to disable |
kv.range_descriptor_cache.size | integer | 1000000 | maximum number of entries in the range descriptor and leaseholder caches |
kv.range_merge.queue_enabled | boolean | true | whether the automatic merge queue is enabled |
kv.range_merge.queue_interval | duration | 1s | how long the merge queue waits between processing replicas (WARNING: may compromise cluster stability or correctness; do not edit without supervision) |
kv.range_split.by_load_enabled | boolean | true | allow automatic splits of ranges based on where load is concentrated |
kv.range_split.load_qps_threshold | integer | 250 | the QPS over which, the range becomes a candidate for load based splitting |
kv.rangefeed.concurrent_catchup_iterators | integer | 64 | number of rangefeeds catchup iterators a store will allow concurrently before queueing |
kv.rangefeed.enabled | boolean | false | if set, rangefeed registration is enabled |
kv.snapshot_rebalance.max_rate | byte size | 8.0 MiB | the rate limit (bytes/sec) to use for rebalance and upreplication snapshots |
kv.snapshot_recovery.max_rate | byte size | 8.0 MiB | the rate limit (bytes/sec) to use for recovery snapshots |
kv.transaction.max_intents_bytes | integer | 262144 | maximum number of bytes used to track write intents in transactions |
kv.transaction.max_refresh_spans_bytes | integer | 256000 | maximum number of bytes used to track refresh spans in serializable transactions |
kv.transaction.write_pipelining_enabled | boolean | true | if enabled, transactional writes are pipelined through Raft consensus |
kv.transaction.write_pipelining_max_batch_size | integer | 128 | if non-zero, defines that maximum size batch that will be pipelined through Raft consensus |
kv.transaction.write_pipelining_max_outstanding_size | byte size | 256 KiB | maximum number of bytes used to track in-flight pipelined writes before disabling pipelining |
rocksdb.min_wal_sync_interval | duration | 0s | minimum duration between syncs of the RocksDB WAL |
schemachanger.backfiller.buffer_size | byte size | 196 MiB | amount to buffer in memory during backfills |
schemachanger.backfiller.max_sst_size | byte size | 16 MiB | target size for ingested files during backfills |
schemachanger.bulk_index_backfill.batch_size | integer | 5000 | number of rows to process at a time during bulk index backfill |
schemachanger.bulk_index_backfill.enabled | boolean | true | backfill indexes in bulk via addsstable |
schemachanger.lease.duration | duration | 5m0s | the duration of a schema change lease |
schemachanger.lease.renew_fraction | float | 0.5 | the fraction of schemachanger.lease_duration remaining to trigger a renew of the lease |
server.clock.forward_jump_check_enabled | boolean | false | if enabled, forward clock jumps > max_offset/2 will cause a panic |
server.clock.persist_upper_bound_interval | duration | 0s | the interval between persisting the wall time upper bound of the clock. The clock does not generate a wall time greater than the persisted timestamp and will panic if it sees a wall time greater than this value. When cockroach starts, it waits for the wall time to catch-up till this persisted timestamp. This guarantees monotonic wall time across server restarts. Not setting this or setting a value of 0 disables this feature. |
server.consistency_check.interval | duration | 24h0m0s | the time between range consistency checks; set to 0 to disable consistency checking |
server.declined_reservation_timeout | duration | 1s | the amount of time to consider the store throttled for up-replication after a reservation was declined |
server.eventlog.ttl | duration | 2160h0m0s | if nonzero, event log entries older than this duration are deleted every 10m0s. Should not be lowered below 24 hours. |
server.failed_reservation_timeout | duration | 5s | the amount of time to consider the store throttled for up-replication after a failed reservation call |
server.goroutine_dump.num_goroutines_threshold | integer | 1000 | a threshold beyond which if number of goroutines increases, then goroutine dump can be triggered |
server.goroutine_dump.total_dump_size_limit | byte size | 500 MiB | total size of goroutine dumps to be kept. Dumps are GC'ed in the order of creation time. The latest dump is always kept even if its size exceeds the limit. |
server.heap_profile.max_profiles | integer | 5 | maximum number of profiles to be kept. Profiles with lower score are GC'ed, but latest profile is always kept. |
server.heap_profile.system_memory_threshold_fraction | float | 0.85 | fraction of system memory beyond which if Rss increases, then heap profile is triggered |
server.host_based_authentication.configuration | string |
| host-based authentication configuration to use during connection authentication |
server.rangelog.ttl | duration | 720h0m0s | if nonzero, range log entries older than this duration are deleted every 10m0s. Should not be lowered below 24 hours. |
server.remote_debugging.mode | string | local | set to enable remote debugging, localhost-only or disable (any, local, off) |
server.shutdown.drain_wait | duration | 0s | the amount of time a server waits in an unready state before proceeding with the rest of the shutdown process |
server.shutdown.query_wait | duration | 10s | the server will wait for at least this amount of time for active queries to finish |
server.time_until_store_dead | duration | 5m0s | the time after which if there is no new gossiped information about a store, it is considered dead |
server.web_session_timeout | duration | 168h0m0s | the duration that a newly created web session will be valid |
sql.defaults.default_int_size | integer | 8 | the size, in bytes, of an INT type |
sql.defaults.distsql | enumeration | 1 | default distributed SQL execution mode [off = 0, auto = 1, on = 2] |
sql.defaults.experimental_vectorize | enumeration | 0 | default experimental_vectorize mode [off = 0, on = 1, always = 2] |
sql.defaults.optimizer | enumeration | 1 | default cost-based optimizer mode [off = 0, on = 1, local = 2] |
sql.defaults.reorder_joins_limit | integer | 4 | default number of joins to reorder |
sql.defaults.results_buffer.size | byte size | 16 KiB | default size of the buffer that accumulates results for a statement or a batch of statements before they are sent to the client. This can be overridden on an individual connection with the 'results_buffer_size' parameter. Note that auto-retries generally only happen while no results have been delivered to the client, so reducing this size can increase the number of retriable errors a client receives. On the other hand, increasing the buffer size can increase the delay until the client receives the first result row. Updating the setting only affects new connections. Setting to 0 disables any buffering. |
sql.defaults.serial_normalization | enumeration | 0 | default handling of SERIAL in table definitions [rowid = 0, virtual_sequence = 1, sql_sequence = 2] |
sql.distsql.distribute_index_joins | boolean | true | if set, for index joins we instantiate a join reader on every node that has a stream; if not set, we use a single join reader |
sql.distsql.flow_stream_timeout | duration | 10s | amount of time incoming streams wait for a flow to be set up before erroring out |
sql.distsql.interleaved_joins.enabled | boolean | true | if set we plan interleaved table joins instead of merge joins when possible |
sql.distsql.max_running_flows | integer | 500 | maximum number of concurrent flows that can be run on a node |
sql.distsql.merge_joins.enabled | boolean | true | if set, we plan merge joins when possible |
sql.distsql.temp_storage.joins | boolean | true | set to true to enable use of disk for distributed sql joins |
sql.distsql.temp_storage.sorts | boolean | true | set to true to enable use of disk for distributed sql sorts |
sql.distsql.temp_storage.workmem | byte size | 64 MiB | maximum amount of memory in bytes a processor can use before falling back to temp storage |
sql.metrics.statement_details.dump_to_logs | boolean | false | dump collected statement statistics to node logs when periodically cleared |
sql.metrics.statement_details.enabled | boolean | true | collect per-statement query statistics |
sql.metrics.statement_details.plan_collection.enabled | boolean | true | periodically save a logical plan for each fingerprint |
sql.metrics.statement_details.plan_collection.period | duration | 5m0s | the time until a new logical plan is collected |
sql.metrics.statement_details.threshold | duration | 0s | minimum execution time to cause statistics to be collected |
sql.parallel_scans.enabled | boolean | true | parallelizes scanning different ranges when the maximum result size can be deduced |
sql.query_cache.enabled | boolean | true | enable the query cache |
sql.stats.automatic_collection.enabled | boolean | true | automatic statistics collection mode |
sql.stats.automatic_collection.fraction_stale_rows | float | 0.2 | target fraction of stale rows per table that will trigger a statistics refresh |
sql.stats.automatic_collection.max_fraction_idle | float | 0.9 | maximum fraction of time that automatic statistics sampler processors are idle |
sql.stats.automatic_collection.min_stale_rows | integer | 500 | target minimum number of stale rows per table that will trigger a statistics refresh |
sql.stats.post_events.enabled | boolean | false | if set, an event is shown for every CREATE STATISTICS job |
sql.tablecache.lease.refresh_limit | integer | 50 | maximum number of tables to periodically refresh leases for |
sql.trace.log_statement_execute | boolean | false | set to true to enable logging of executed statements |
sql.trace.session_eventlog.enabled | boolean | false | set to true to enable session tracing |
sql.trace.txn.enable_threshold | duration | 0s | duration beyond which all transactions are traced (set to 0 to disable) |
timeseries.storage.enabled | boolean | true | if set, periodic timeseries data is stored within the cluster; disabling is not recommended unless you are storing the data elsewhere |
timeseries.storage.resolution_10s.ttl | duration | 240h0m0s | the maximum age of time series data stored at the 10 second resolution. Data older than this is subject to rollup and deletion. |
timeseries.storage.resolution_30m.ttl | duration | 2160h0m0s | the maximum age of time series data stored at the 30 minute resolution. Data older than this is subject to deletion. |
trace.debug.enable | boolean | false | if set, traces for recent requests can be seen in the /debug page |
trace.lightstep.token | string |
| if set, traces go to Lightstep using this token |
trace.zipkin.collector | string |
| if set, traces go to the given Zipkin instance (example: '127.0.0.1:9411'); ignored if trace.lightstep.token is set |
version | custom validation | 19.1 | set the active cluster version in the format ' |
View current cluster settings
Use the SHOW CLUSTER SETTING
statement.
Change a cluster setting
Use the SET CLUSTER SETTING
statement.
Before changing a cluster setting, please note the following:
Changing a cluster setting is not instantaneous, as the change must be propagated to other nodes in the cluster.
It's not recommended to change cluster settings upgrading to a new version of CockroachDB; wait until all nodes have been upgraded and then make the change.