Error code

Learn the error code of bRPC client.

brpc use brpc::Controller to set and get parameters for one RPC. Controller::ErrorCode() and Controller::ErrorText() return error code and description of the RPC respectively, only accessible after completion of the RPC, otherwise the result is undefined. ErrorText() is defined by the base class of the Controller: google::protobuf::RpcController, while ErrorCode() is defined by brpc::Controller. Controller also has a method Failed() to tell whether RPC fails or not. Relations between the three methods:

  • When Failed() is true, ErrorCode() must be non-zero and ErrorText() be non-empty.
  • When Failed() is false, ErrorCode() is 0 and ErrorText() is undefined (it’s empty in brpc currently, but you’d better not rely on this)

Mark RPC as failed

Both client and server in brpc have Controller, which can be set with setFailed() to modify ErrorCode and ErrorText. Multiple calls to Controller::SetFailed leave the last ErrorCode and concatenate ErrorTexts rather than leaving the last one. The framework elaborates ErrorTexts by adding extra prefixes: number of retries at client-side and address of the server at server-side.

Controller::SetFailed() at client-side is usually called by the framework, such as sending failure, incomplete response, and so on. Error may be set at client-side under some situations. For example, you may set error to the RPC if an additional check before sending the request is failed.

Controller::SetFailed() at server-side is often called by the user in the service callback. Generally speaking when error occurs, users call SetFailed(), release all the resources, and return from the callback. The framework fills the error code and message into the response according to communication protocol. When the response is received, the error inside are set into the client-side Controller so that users can fetch them after end of RPC. Note that server does not print errors to clients by default, as frequent loggings may impact performance of the server significantly due to heavy disk IO. A client crazily producing errors could slow the entire server down and affect all other clients, which can even become an attacking method against the server. If you really want to see error messages on the server, turn on the gflag -log_error_text (modifiable at run-time), the server will log the ErrorText of corresponding Controller of each failed RPC.

Error Code in brpc

All error codes in brpc are defined in errno.proto, in which those begin with SYS_ are defined by linux system and exactly same with the ones defined in /usr/include/errno.h. The reason that we put it in .proto is to cross language. The rest of the error codes are defined by brpc.

berror(error_code) gets description for the error code, and berror() gets description for current system errno. Note that ErrorText() != berror(ErorCode()) since ErrorText() contains more specific information. brpc includes berror by default so that you can use it in your project directly.

Following table shows common error codes and their descriptions:

Error CodeValueRetryDescriptionLogging message
EAGAIN11YesToo many requests at the same time, hardly happening as it’s a soft limit.Resource temporarily unavailable
ENODATA611. The server list returned by Naming Service is empty. 2. When Naming Service changes with all instances modified, Naming Service updates LB by first Remove all and then Add all, the LB instance list may become empty within a short period of time.Fail to select server from xxx
ETIMEDOUT110YesConnection timeout.Connection timed out
EHOSTDOWN112YesPossible reasons: A. The list returned by Naming Server is not empty, but LB cannot select an available server, and LB returns an EHOSTDOWN error. Specific possible reasons: a. Server is exiting (returned ELOGOFF) b. Server was blocked because of some previous failure, the specific logic of the block: 1. For single connection type, the only connection socket is blocked by SetFail, and there are many occurrences of SetFailed in the code to trigger this block. 2. For pooled/short connection type, only when the error number meets does_error_affect_main_socket (ECONNREFUSED, ENETUNREACH, EHOSTUNREACH or EINVAL) will it be blocked 3. After blocking, there is a CheckHealth thread to do health check, Just try to connect, the check interval is controlled by the health_check_interval_s of SocketOptions, and the Socket will be unblocked if it is connected successfully. B. Use the SingleServer method to initialize the Channel (without LB), and the only connection is LOGOFF or blocked (same as above)“Fail to select server from …” “Not connected to … yet”
ENOSERVICE1001NoCan’t locate the service, hardly happening and usually being ENOMETHOD instead
ENOMETHOD1002NoCan’t locate the method.Misc forms, common ones are “Fail to find method=…”
EREQUEST1003Nofail to serialize the request, may be set on either client-side or server-sideMisc forms: “Missing required fields in request: …” “Fail to parse request message, …” “Bad request”
EAUTH1004NoAuthentication failed“Authentication failed”
ETOOMANYFAILS1005NoToo many sub-channel failures inside a ParallelChannel“%d/%d channels failed, fail_limit=%d”
EBACKUPREQUEST1007YesSet when backup requests are triggered. Not returned by ErrorCode() directly, viewable from spans in /rpcz“reached backup timeout=%dms”
ERPCTIMEDOUT1008NoRPC timeout.“reached timeout=%dms”
EFAILEDSOCKET1009YesThe connection is broken during RPC“The socket was SetFailed”
EHTTP1010NoHTTP responses with non 2xx status code are treated as failure and set with this code. No retry by default, changeable by customizing RetryPolicy.Bad http call
EOVERCROWDED1011YesToo many messages to buffer at the sender side. Usually caused by lots of concurrent asynchronous requests. Modifiable by -socket_max_unwritten_bytes, 8MB by default.The server is overcrowded
EINTERNAL2001NoThe default error for Controller::SetFailed without specifying a one.Internal Server Error
ERESPONSE2002Nofail to serialize the response, may be set on either client-side or server-sideMisc forms: “Missing required fields in response: …” “Fail to parse response message, “ “Bad response”
ELOGOFF2003YesServer has been stopped“Server is going to quit”
ELIMIT2004YesNumber of requests being processed concurrently exceeds ServerOptions.max_concurrency“Reached server’s limit=%d on concurrent requests”

User-defined Error Code

In C/C++, error code can be defined in macros, constants or enums:

  1. #define ESTOP -114 // C/C++
  2. static const int EMYERROR = 30; // C/C++
  3. const int EMYERROR2 = -31; // C++ only

If you need to get the error description through berror, register it in the global scope of your c/cpp file by BAIDU_REGISTER_ERRNO(error_code, description), for example:

  1. BAIDU_REGISTER_ERRNO(ESTOP, "the thread is stopping")
  2. BAIDU_REGISTER_ERRNO(EMYERROR, "my error")

Note that strerror and strerror_r do not recognize error codes defined by BAIDU_REGISTER_ERRNO. Neither does the %m used in printf. You must use %s paired with berror:

  1. errno = ESTOP;
  2. printf("Describe errno: %m\n"); // [Wrong] Describe errno: Unknown error -114
  3. printf("Describe errno: %s\n", strerror_r(errno, NULL, 0)); // [Wrong] Describe errno: Unknown error -114
  4. printf("Describe errno: %s\n", berror()); // [Correct] Describe errno: the thread is stopping
  5. printf("Describe errno: %s\n", berror(errno)); // [Correct] Describe errno: the thread is stopping

When the registration of an error code is duplicated, a linking error is generated provided it’s defined in C++:

  1. redefinition of `class BaiduErrnoHelper<30>'

Or the program aborts before start:

  1. Fail to define EMYERROR(30) which is already defined as `Read-only file system', abort

You have to make sure that different modules have same understandings on same ErrorCode. Otherwise, interactions between two modules that interpret an error code differently may be undefined. To prevent this from happening, you’d better follow these:

  • Prefer system error codes which have fixed values and meanings, generally.
  • Share code on error definitions between multiple modules to prevent inconsistencies after modifications.
  • Use BAIDU_REGISTER_ERRNO to describe new error code to ensure that same error code is defined only once inside a process.

Last modified October 7, 2024: Oncall report (1b7065e)