Outlier detection

This documentation is for the Envoy v3 API.

As of Envoy v1.18 the v2 API has been removed and is no longer supported.

If you are upgrading from v2 API config you may wish to view the v2 API documentation:

api/v2/cluster/outlier_detection.proto

config.cluster.v3.OutlierDetection

[config.cluster.v3.OutlierDetection proto]

See the architecture overview for more information on outlier detection.

  1. {
  2. "consecutive_5xx": "{...}",
  3. "interval": "{...}",
  4. "base_ejection_time": "{...}",
  5. "max_ejection_percent": "{...}",
  6. "enforcing_consecutive_5xx": "{...}",
  7. "enforcing_success_rate": "{...}",
  8. "success_rate_minimum_hosts": "{...}",
  9. "success_rate_request_volume": "{...}",
  10. "success_rate_stdev_factor": "{...}",
  11. "consecutive_gateway_failure": "{...}",
  12. "enforcing_consecutive_gateway_failure": "{...}",
  13. "split_external_local_origin_errors": "...",
  14. "consecutive_local_origin_failure": "{...}",
  15. "enforcing_consecutive_local_origin_failure": "{...}",
  16. "enforcing_local_origin_success_rate": "{...}",
  17. "failure_percentage_threshold": "{...}",
  18. "enforcing_failure_percentage": "{...}",
  19. "enforcing_failure_percentage_local_origin": "{...}",
  20. "failure_percentage_minimum_hosts": "{...}",
  21. "failure_percentage_request_volume": "{...}",
  22. "max_ejection_time": "{...}",
  23. "max_ejection_time_jitter": "{...}"
  24. }

consecutive_5xx

(UInt32Value) The number of consecutive 5xx responses or local origin errors that are mapped to 5xx error codes before a consecutive 5xx ejection occurs. Defaults to 5.

interval

(Duration) The time interval between ejection analysis sweeps. This can result in both new ejections as well as hosts being returned to service. Defaults to 10000ms or 10s.

base_ejection_time

(Duration) The base time that a host is ejected for. The real time is equal to the base time multiplied by the number of times the host has been ejected and is capped by max_ejection_time. Defaults to 30000ms or 30s.

max_ejection_percent

(UInt32Value) The maximum % of an upstream cluster that can be ejected due to outlier detection. Defaults to 10% but will eject at least one host regardless of the value.

enforcing_consecutive_5xx

(UInt32Value) The % chance that a host will be actually ejected when an outlier status is detected through consecutive 5xx. This setting can be used to disable ejection or to ramp it up slowly. Defaults to 100.

enforcing_success_rate

(UInt32Value) The % chance that a host will be actually ejected when an outlier status is detected through success rate statistics. This setting can be used to disable ejection or to ramp it up slowly. Defaults to 100.

success_rate_minimum_hosts

(UInt32Value) The number of hosts in a cluster that must have enough request volume to detect success rate outliers. If the number of hosts is less than this setting, outlier detection via success rate statistics is not performed for any host in the cluster. Defaults to 5.

success_rate_request_volume

(UInt32Value) The minimum number of total requests that must be collected in one interval (as defined by the interval duration above) to include this host in success rate based outlier detection. If the volume is lower than this setting, outlier detection via success rate statistics is not performed for that host. Defaults to 100.

success_rate_stdev_factor

(UInt32Value) This factor is used to determine the ejection threshold for success rate outlier ejection. The ejection threshold is the difference between the mean success rate, and the product of this factor and the standard deviation of the mean success rate: mean - (stdev * success_rate_stdev_factor). This factor is divided by a thousand to get a double. That is, if the desired factor is 1.9, the runtime value should be 1900. Defaults to 1900.

consecutive_gateway_failure

(UInt32Value) The number of consecutive gateway failures (502, 503, 504 status codes) before a consecutive gateway failure ejection occurs. Defaults to 5.

enforcing_consecutive_gateway_failure

(UInt32Value) The % chance that a host will be actually ejected when an outlier status is detected through consecutive gateway failures. This setting can be used to disable ejection or to ramp it up slowly. Defaults to 0.

split_external_local_origin_errors

(bool) Determines whether to distinguish local origin failures from external errors. If set to true the following configuration parameters are taken into account: consecutive_local_origin_failure, enforcing_consecutive_local_origin_failure and enforcing_local_origin_success_rate. Defaults to false.

consecutive_local_origin_failure

(UInt32Value) The number of consecutive locally originated failures before ejection occurs. Defaults to 5. Parameter takes effect only when split_external_local_origin_errors is set to true.

enforcing_consecutive_local_origin_failure

(UInt32Value) The % chance that a host will be actually ejected when an outlier status is detected through consecutive locally originated failures. This setting can be used to disable ejection or to ramp it up slowly. Defaults to 100. Parameter takes effect only when split_external_local_origin_errors is set to true.

enforcing_local_origin_success_rate

(UInt32Value) The % chance that a host will be actually ejected when an outlier status is detected through success rate statistics for locally originated errors. This setting can be used to disable ejection or to ramp it up slowly. Defaults to 100. Parameter takes effect only when split_external_local_origin_errors is set to true.

failure_percentage_threshold

(UInt32Value) The failure percentage to use when determining failure percentage-based outlier detection. If the failure percentage of a given host is greater than or equal to this value, it will be ejected. Defaults to 85.

enforcing_failure_percentage

(UInt32Value) The % chance that a host will be actually ejected when an outlier status is detected through failure percentage statistics. This setting can be used to disable ejection or to ramp it up slowly. Defaults to 0.

enforcing_failure_percentage_local_origin

(UInt32Value) The % chance that a host will be actually ejected when an outlier status is detected through local-origin failure percentage statistics. This setting can be used to disable ejection or to ramp it up slowly. Defaults to 0.

failure_percentage_minimum_hosts

(UInt32Value) The minimum number of hosts in a cluster in order to perform failure percentage-based ejection. If the total number of hosts in the cluster is less than this value, failure percentage-based ejection will not be performed. Defaults to 5.

failure_percentage_request_volume

(UInt32Value) The minimum number of total requests that must be collected in one interval (as defined by the interval duration above) to perform failure percentage-based ejection for this host. If the volume is lower than this setting, failure percentage-based ejection will not be performed for this host. Defaults to 50.

max_ejection_time

(Duration) The maximum time that a host is ejected for. See base_ejection_time for more information. If not specified, the default value (300000ms or 300s) or base_ejection_time value is applied, whatever is larger.

max_ejection_time_jitter

(Duration) The maximum amount of jitter to add to the ejection time, in order to prevent a ‘thundering herd’ effect where all proxies try to reconnect to host at the same time. See max_ejection_time_jitter Defaults to 0s.