VTGate Buffering
VTGate in Vitess supports the buffering of queries in certain situations. The original intention of this feature was to reduce, not necessarily eliminate, downtime during planned fail overs PlannedReparentShard (PRS). VTGate has been extended to provide buffering in some additional fail over situations, e.g. during resharding.
Note that buffering is not intended for, nor active during, unplanned failovers or other unplanned issues with a PRIMARY
tablet during normal operations. There are some new heuristics built in the keyspace_events
implementation for scenarios where the PRIMARY
is offline. However, you should not rely on these at this time.
Buffering can be somewhat involved, and there are a number of tricky edge cases. We will discuss these in the context of an application’s experience, starting with the simplest case: that of buffering during a PRS (PlannedReparentShard) operation. Examples of various edge cases can be found in Buffering Scenarios.
VTGate flags to enable buffering
First, let us cover the flags that need to be set in VTGate to enable buffering:
--enable_buffer
: Enables buffering. Not enabled by default--enable_buffer_dry_run
: Enable logging of if/when buffering would trigger, without actually buffering anything. Useful for testing. Default:false
--buffer_implementation
: Default:keyspace_events
. More consistent results have been seen withkeyspace_events
. However, if the legacy behavior is needed you may usehealthcheck
.--buffer_size
: Default:1000
This should be sized to the appropriate number of expected request during a buffering event. Typically, if each connection has one request then, each connection will consume one buffer slot of thebuffer_size
and be put in a “pause” state as it is buffered on the vtgate. The resource consideration for setting this flag are memory resources.--buffer_drain_concurrency
: Default:1
. If the buffer is of any significant size, you probably want to increase this proportionally.--buffer_keyspace_shards
: Can be used to limit buffering to only certain keyspaces. Should not be necessary in most cases, defaults to watching all keyspaces.--buffer_max_failover_duration
: Default:20s
. If buffering is active longer than this set duration, stop buffering and return errors to the client.--buffer_window
: Default:10s
. The maximum time any individual request should be buffered for. Should probably be less than the value for--buffer_max_failover_duration
. Adjust according to your application requirements. Be aware, if your MySQL client haswrite_timeout
orread_timeout
settings those values should be greater than thebuffer_max_failover_duration
.--buffer_min_time_between_failovers
: Default1m
. If consecutive fail overs for a shard happens within less than this duration, do not buffer again. The purpose of this setting is to avoid consecutive fail over events where vtgate may be buffering, but never purging the buffer.
Types of queries that can be buffered
- Only requests to tablets of type
PRIMARY
are buffered. In-flight requests to aREPLICA
in the process of transitioning toPRIMARY
because of a PRS should be unaffected, and do not require buffering.
What happens during a PlannedReparentShard with Buffering
Fundamentally Vitess will:
- Hold up and buffer any queries sent to the
PRIMARY
tablet for a shard. - Wait for replication on a primary candidate
REPLICA
to catch up to the currentPRIMARY
. - Perform the actions which demote the
PRIMARY
to aREPLICA
and promote a primary candidateREPLICA
toPRIMARY
. - Drain the buffered queries to the new
PRIMARY
tablet. - Begin the countdown timer for
buffer_max_failover_duration
.
This process is not guaranteed to eliminate errors to the application, but rather reduce them or make them less frequent. The application should still endeavor to handle errors appropriately if/when they occur (e.g. unplanned outages/fail overs, etc.)
How does it work?
The following steps are considerably simplified, but to give a high level overview of how buffering works:
- All buffering is done in
vtgate
- When a shard begins a fail over or resharding event, and a query is sent from
vtgate
tovttablet
,vttablet
will return a certain type of error tovtgate
(vtrpcpb.Code_CLUSTER_EVENT
). - This error indicates to
vtgate
that it is appropriate to buffer this request. - Separately the various timers associated with the flags above are being maintained to timeout and return errors to the application when appropriate, e.g. if an individual request was buffered for too long; or if buffering start “too long” ago.
- When the failover is complete, and the tablet starts accepting queries again, we start draining the buffered queries, with a concurrency as indicated by the
buffer_drain_concurrency
value. - When the buffer is drained, the buffering is complete. We maintain a timer based on
buffer_min_time_between_failovers
to make sure we do not buffer again if another fail over starts within that period.
What the application sees
When buffering executes as expected the application will see a pause in their query processing. Each query will consume a slot from the configured buffer_size
and will be paused for the duration of buffer_window
before errors are returned. Once the failover event completes, the buffer will begin to drain at a concurrency set by the buffer_drain_concurrency
value. Next a countdown timer will start set by buffer_min_time_between_failovers
. During this period any future buffers will be disabled. Once the buffer_min_time_between_failovers
timer expires, buffering will be enabled once again.
Potential Errors
Buffering was implemented to minimize downtime, but there is still potential for errors to occur, and your application should be configured to handle them appropriately. Below are a few errors which may occur:
Lost connection
Error Number: 2013
Error Message: Lost connection to MySQL server during query (timed out)
Due to the nature of buffering and pausing your queries the MySQL client will see delays in their query request. If your client has a read_timeout
or write_timeout
set, the value should be greater than the buffer_window
value set in vtgate.
Primary not serving
Error Number: 1105
Error Message: target: ${KEYSPACE}.0.primary: primary is not serving, there is a reparent operation in progress
This is the most common error you will see in regards to buffering. You may get this result in the application for a variety of reasons:
enable_buffer
is not configured; by default buffers are disabled.enable_buffer_dry_run
is configured to be true; no buffering actions are taken when this setting is enabled.buffer_keyspace_shards
is not configured for the keyspace on which the PRS event is being executed.buffer_size
is set to be lower than the number of incoming queries; any incoming request over thebuffer_size
will see this error.- A new buffering event occurs before the
buffer_min_time_between_fail overs
has expired. buffer_max_failover_duration
has been exceeded; buffering is discontinued and this error is returned.
Next Steps
You may want to review the scenarios in Buffering Scenarios.