- Common Errors
- connection refused
- node is running secure mode, SSL connection required
- restart transaction
- node belongs to cluster <cluster ID> but is attempting to connect to a gossip network for cluster <another cluster ID>
- open file descriptor limit of <number> is under the minimum required <number>
- replicas failing with "0 of 1 store with an attribute matching []; likely not enough nodes in cluster
- clock synchronization error: this node is more than 500ms away from at least half of the known nodes
- context deadline exceeded
- result is ambiguous
- invalid value for parameter "TimeZone"
- Something else?
Common Errors
This page helps you understand and resolve error messages written to stderr
or your logs.
Topic | Message |
---|---|
Client connection | connection refused |
Client connection | node is running secure mode, SSL connection required |
Transaction retries | restart transaction |
Node startup | node belongs to cluster <cluster ID> but is attempting to connect to a gossip network for cluster <another cluster ID> |
Node configuration | clock synchronization error: this node is more than 500ms away from at least half of the known nodes |
Node configuration | open file descriptor limit of <number> is under the minimum required <number> |
Replication | replicas failing with "0 of 1 store with an attribute matching []; likely not enough nodes in cluster" |
Deadline exceeded | context deadline exceeded |
Ambiguous results | result is ambiguous |
Time zone data | invalid value for parameter "TimeZone" |
connection refused
This message indicates a client is trying to connect to a node that is either not running or is not listening on the specified interfaces (i.e., hostname or port).
To resolve this issue, do one of the following:
- If the node hasn't yet been started, start the node.
- If you specified a
—listen-addr
and/or a—advertise-addr
flag when starting the node, you must include the specified IP address/hostname and port with all othercockroach
commands or change theCOCKROACH_HOST
environment variable.
If you're not sure what the IP address/hostname and port values might have been, you can look in the node's logs. If necessary, you can also kill thecockroach
process, and then restart the node:
$ pkill cockroach
$ cockroach start [flags]
node is running secure mode, SSL connection required
This message indicates that the cluster is using TLS encryption to protect network communication, and the client is trying to open a connection without using the required TLS certificates.
To resolve this issue, use the cockroach cert client-create
command to generate a client certificate and key for the user trying to connect. For a secure deployment walkthrough, including generating security certificates and connecting clients, see Manual Deployment.
restart transaction
Messages with the error code 40001
and the string restart transaction
indicate that a transaction failed because it conflicted with another concurrent or recent transaction accessing the same data. The transaction needs to be retried by the client. See client-side transaction retries for more details.
Several different types of transaction retry errors are described below:
Note:
Your application's retry logic does not need to distinguish between these types of errors. They are listed here for reference.
Tip:
To understand how transactions work in CockroachDB, and why transaction retries are necessary to maintain serializable isolation in a distributed database, see:
read within uncertainty interval
Uncertainty errors can occur when two transactions which start on different gateway nodes attempt to operate on the same data at close to the same time. The uncertainty comes from the fact that we can't tell which one started first - the clocks on the two gateway nodes may not be perfectly in sync.
For example, if the clock on node A is ahead of the clock on node B, a transaction started on node A may be able to commit a write with a timestamp that is still in the "future" from the perspective of node B. A later transaction that starts on node B should be able to see the earlier write from node A, even if B's clock has not caught up to A. The "read within uncertainty interval" occurs if we discover this situation in the middle of a transaction, when it is too late for the database to handle it automatically. When node B's transaction retries, it will unambiguously occur after the transaction from node A.
Note that as long as the client-side retry protocol is followed, a transaction that has restarted once is much less likely to hit another uncertainty error, and the —max-offset
option provides an upper limit on how long a transaction can continue to restart due to uncertainty.
When errors like this occur, the application has the following options:
- Prefer consistent historical reads using AS OF SYSTEM TIME to reduce contention.
- Design the schema and queries to reduce contention. For information on how to avoid contention, see Understanding and Avoiding Transaction Contention.
- Be prepared to retry on uncertainty (and other) errors. For more information, see Transaction retries.
Note:
Uncertainty errors are a form of transaction conflict. For more information about transaction conflicts, see Transaction conflicts.
transaction deadline exceeded
New in v19.1: Errors which were previously reported to the client as opaque TransactionStatusError
s are now transaction retry errors with the error message "transaction deadline exceeded" and error code 40001
.
This error can occur for long-running transactions (with execution time on the order of minutes) that also experience conflicts with other transactions and thus attempt to commit at a timestamp different than their original timestamp. If the timestamp at which the transaction attempts to commit is above a "deadline" imposed by the various schema elements that the transaction has used (i.e. table structures), then this error might get returned to the client.
When this error occurs, the application must retry the transaction. For more information about how to retry transactions, see Transaction retries.
Note:
For more information about the mechanics of the transaction conflict resolution process described above, see Life of a Distributed Transaction.
node belongs to cluster <cluster ID> but is attempting to connect to a gossip network for cluster <another cluster ID>
This message usually indicates that a node tried to connect to a cluster, but the node is already a member of a different cluster. This is determined by metadata in the node's data directory. To resolve this issue, do one of the following:
- Choose a different directory to store the CockroachDB data:
$ cockroach start [flags] --store=[new directory] --join=[cluster host]:26257
- Remove the existing directory and start a node joining the cluster again:
$ rm -r cockroach-data/
$ cockroach start [flags] --join=[cluster host]:26257
This message can also occur in the following scenario:
- The first node of a cluster is started without the
—join
flag. - Subsequent nodes are started with the
—join
flag pointing to the first node. - The first node is stopped and restarted after the node's data directory is deleted or using a new directory. This causes the first node to initialize a new cluster.
The other nodes, still communicating with the first node, notice that their cluster ID and the first node's cluster ID do not match.
To avoid this scenario, update your scripts to use the new, recommended approach to initializing a cluster:Start each initial node of the cluster with the
—join
flag set to addresses of 3 to 5 of the initial nodes.- Run the
cockroach init
command against any node to perform a one-time cluster initialization. - When adding more nodes, start them with the same
—join
flag as used for the initial nodes.
For more guidance, see this example.
open file descriptor limit of <number> is under the minimum required <number>
CockroachDB can use a large number of open file descriptors, often more than is available by default. This message indicates that the machine on which a CockroachDB node is running is under CockroachDB's recommended limits.
For more details on CockroachDB's file descriptor limits and instructions on increasing the limit on various platforms, see File Descriptors Limit.
replicas failing with "0 of 1 store with an attribute matching []; likely not enough nodes in cluster
When running a single-node cluster
When running a single-node CockroachDB cluster, an error about replicas failing will eventually show up in the node's log files, for example:
E160407 09:53:50.337328 storage/queue.go:511 [replicate] 7 replicas failing with "0 of 1 store with an attribute matching []; likely not enough nodes in cluster"
This happens because CockroachDB expects three nodes by default. If you do not intend to add additional nodes, you can stop this error by using ALTER RANGE … CONFIGURE ZONE
to update your default zone configuration to expect only one node:
# Insecure cluster:
$ cockroach sql --execute="ALTER RANGE default CONFIGURE ZONE USING num_replicas=1;" --insecure
# Secure cluster:
$ cockroach sql --execute="ALTER RANGE default CONFIGURE ZONE USING num_replicas=1;" --certs-dir=[path to certs directory]
The zone's replica count is reduced to 1. For more information, see ALTER RANGE … CONFIGURE ZONE
and Configure Replication Zones.
When running a multi-node cluster
When running a multi-node CockroachDB cluster, if you see an error like the one above about replicas failing, some nodes might not be able to talk to each other. For recommended actions, see Cluster Setup Troubleshooting.
clock synchronization error: this node is more than 500ms away from at least half of the known nodes
This error indicates that a node has spontaneously shut down because it detected that its clock is out of synch with at least half of the other nodes in the cluster by 80% of the maximum offset allowed (500ms by default). CockroachDB requires moderate levels of clock synchronization to preserve data consistency, so the node shutting down in this way avoids the risk of consistency anomalies.
To prevent this from happening, you should run clock synchronization software on each node. For guidance on synchronizing clocks, see the tutorial for your deployment environment:
Environment | Recommended Approach |
---|---|
Manual | Use NTP with Google's external NTP service. |
AWS | Use the Amazon Time Sync Service. |
Azure | Disable Hyper-V time synchronization and use NTP with Google's external NTP service. |
Digital Ocean | Use NTP with Google's external NTP service. |
GCE | Use NTP with Google's internal NTP service. |
context deadline exceeded
This message occurs when a component of CockroachDB gives up because it was relying on another component that has not behaved as expected, for example, another node dropped a network connection. To investigate further, look in the node's logs for the primary failure that is the root cause.
result is ambiguous
In a distributed system, some errors can have ambiguous results. Forexample, if you receive a connection closed
error while processing aCOMMIT
statement, you cannot tell whether the transactionsuccessfully committed or not. These errors are possible in anydatabase, but CockroachDB is somewhat more likely to produce them thanother databases because ambiguous results can be caused by failuresbetween the nodes of a cluster. These errors are reported with thePostgreSQL error code 40003
(statement_completion_unknown
) and themessage result is ambiguous
.
Ambiguous errors can be caused by nodes crashing, network failures, ortimeouts. If you experience a lot of these errors when things areotherwise stable, look for performance issues. Note that ambiguity isonly possible for the last statement of a transaction (COMMIT
orRELEASE SAVEPOINT
) or for statements outside a transaction. If a connection drops during a transaction that has not yet tried to commit, the transaction will definitely be aborted.
In general, you should handle ambiguous errors the same way asconnection closed
errors. If your transaction isidempotent,it is safe to retry it on ambiguous errors. UPSERT
operations aretypically idempotent, and other transactions can be written to beidempotent by verifying the expected state before performing anywrites. Increment operations such as UPDATE my_table SET x=x+1 WHERE
are typical examples of operations that cannot easily be madeidempotent. If your transaction is not idempotent, then you shoulddecide whether to retry or not based on whether it would be better foryour application to apply the transaction twice or return an error tothe user.
id=$1
invalid value for parameter "TimeZone"
This error indicates that the machine running the CockroachDB node is missing the tzdata
library (sometimes called tz
or zoneinfo
), which is required by certain features of CockroachDB that use time zone data, for example, to support using location-based names as time zone identifiers.
To resolve this issue, install the tzdata
library and keep it up-to-date. It's important for all nodes to have the same version, so when updating the library, do so as quickly as possible across all nodes.
For details about other libraries the CockroachDB binary for Linux depends on, see Dependencies.
Something else?
If we do not have a solution here, you can try using our other support resources, including:
- Other troubleshooting pages
- StackOverflow
- CockroachDB Community Forum
- Chatting with our developers on Gitter (To open Gitter without leaving these docs, click Help in the lower-right corner of any page.)