Handling errors

When using the YDB SDK, there are situations where errors need to be handled.

Errors can be divided into three categories:

  • Temporary failures (retryable, hereinafter — R): Include a short-term loss of network connectivity, temporary unavailability, or overload of a YDB subsystem, or a failure of YDB to respond to a query within the set timeout. If one of these errors occurs, a retry of the failed query is likely to be successful after a certain period of time.

  • Errors that can’t be fixed with a retry (non retryable, hereinafter — N): Include incorrectly written queries, YDB internal errors, or queries that mismatch the data schema. There is no need to retry a query. This situation requires additional intervention by the developer.

  • Errors that can presumably be fixed with a retry after the client app response (conditionally retryable, hereinafter — C): Include no response within the set timeout or an authentication request.

The YDB SDK provides a built-in mechanism for handling temporary failures. By default, the SDK uses the recommended retry policy that can be changed to meet the requirements of the client app. YDB returns termination codes that let you determine whether a retry is appropriate and which interval to select.

You should retry an operation only if an error refers to a temporary failure. Don’t attempt to retry invalid operations, such as inserting a row with an existing primary key value into a table or inserting data that mismatches the table schema.

It’s extremely important to optimize the number of retries and the interval between them. An excessive number of retries and too short an interval between them cause excessive load. An insufficient number of retries prevents the operation from completion.

When selecting an interval, the following strategies are usually used:

  • Exponential backoff. For each subsequent attempt, the interval increases exponentially.
  • Intervals in increments. For each subsequent attempt, the interval increases in certain increments.
  • Constant intervals. Retries are made at the same intervals.
  • Instant retry. Retries are made immediately.
  • Random selection. Retries are made after a randomly selected time interval.

When you select an interval and the number of retries, consider the YDB termination statuses.

Don’t use endless retries. This may cause an excessive load.

Don’t repeat instant retries more than once.

Logging errors

When using the SDK, we recommend logging all errors and exceptions:

  • Log the number of retries made. An increase in the number of regular retries often indicates that there are issues.
  • Log all errors, including their types, termination codes, and their causes.
  • Log the total operation execution time, including operations that terminate after retries.

Termination statuses

Below are termination statuses that can be returned when working with the SDK.

Error types:

  • R (retryable): Temporary failures
  • N (non retryable): Errors that can’t be fixed by a retry
  • C (conditionally retryable): Errors that can presumably be fixed by a retry after the client app’s response is received
StatusDescriptionResponseType
SUCCESSThe query was processed successfullyContinue execution
BAD_REQUESTError in query syntax, required fields missingCheck the queryN
INTERNAL_ERRORUnknown internal errorContact the developersN
ABORTEDThe operation was aborted (for example, due to lock invalidation, KIKIMR_LOCKS_INVALIDATED in detailed error messages)Retry the entire transactionR
UNAUTHENTICATEDAuthentication is required.Check the token in use. If the token is valid, retry the query.N
UNAUTHORIZEDAccess to the requested object (table, directory) is deniedRequest access from the DB administrator.N
UNAVAILABLEPart of the system is not availableRetry the last action (query)R
UNDETERMINEDUnknown transaction status. The query ended with a failure, which made it impossible to determine the status of the transaction. Queries that terminate with this status are subject to transaction integrity and atomicity guarantees. That is, either all changes are registered or the entire transaction is canceled.For idempotent transactions, you can retry the entire transaction after a small delay. Otherwise, the response depends on the application logic.C
OVERLOADEDPart of the system is overloadedRetry the last action (query), reduce the rate of queriesR
SCHEME_ERRORThe query doesn’t match the schemaFix the query or schemaN
GENERIC_ERRORAn unclassified error, possibly related to the querySee the detailed error message and contact the developersN
TIMEOUTThe query timeout expiredCan be repeated in case of idempotent queriesC
BAD_SESSIONThis session is no longer availableRe-create a sessionN
PRECONDITION_FAILEDThe query cannot be executed for the current state (for example, inserting data into a table with an existing key)Fix the state or query and retryC
TRANSPORT_UNAVAILABLEA transport error, the endpoint is unavailable, or the connection was interrupted and can’t be reestablishedCheck the endpoint or other network settingsC
CLIENT_RESOURCE_EXHAUSTEDThere are not enough resources available to fulfill the queryReduce the rate of queries and check client balancingR
CLIENT_DEADLINE_EXCEEDEDThe query wasn’t processed during the specified client timeout, a different network issueCheck the correctness of the specified timeout, network access, endpoint, or other network settings, reduce the rate of queries, and optimize themC