- Networking Problems
- Requests are rejected by Envoy
- Route rules don’t seem to affect traffic flow
- Destination rule policy not activated
- 503 errors after setting destination rule
- Route rules have no effect on ingress gateway requests
- Headless TCP services losing connection
- Envoy is crashing under load
- Envoy won’t connect to my HTTP/1.0 service
- 404 errors occur when multiple gateways configured with same TLS certificate
Networking Problems
Requests are rejected by Envoy
Requests may be rejected for various reasons. The best way to understand why requests are being rejected isby inspecting Envoy’s access logs. By default, access logs are output to the standard output of the container.Run the following command to see the log:
$ kubectl logs PODNAME -c istio-proxy -n NAMESPACE
In the default access log format, Envoy response flags and Mixer policy status are located after the response code,if you are using a custom log format, make sure to include %RESPONSE_FLAGS%
and %DYNAMIC_METADATA(istio.mixer:status)%
.
Refer to the Envoy response flagsfor details of response flags.
Common response flags are:
NR
: No route configured, check yourDestinationRule
orVirtualService
.UO
: Upstream overflow with circuit breaking, check your circuit breaker configuration inDestinationRule
.UF
: Failed to connect to upstream, if you’re using Istio authentication, check for amutual TLS configuration conflict.A request is rejected by Mixer if the response flag isUAEX
and the Mixer policy status is not-
.
Common Mixer policy statuses are:
UNAVAILABLE
: Envoy cannot connect to Mixer and the policy is configured to fail close.UNAUTHENTICATED
: The request is rejected by Mixer authentication.PERMISSION_DENIED
: The request is rejected by Mixer authorization.RESOURCE_EXHAUSTED
: The request is rejected by Mixer quota.INTERNAL
: The request is rejected due to Mixer internal error.
Route rules don’t seem to affect traffic flow
With the current Envoy sidecar implementation, up to 100 requests may be required for weightedversion distribution to be observed.
If route rules are working perfectly for the Bookinfo sample,but similar version routing rules have no effect on your own application, it may be thatyour Kubernetes services need to be changed slightly.Kubernetes services must adhere to certain restrictions in order to take advantage ofIstio’s L7 routing features.Refer to the [Requirements for Pods and Services]/docs/ops/prep/requirements/)for details.
Another potential issue is that the route rules may simply be slow to take effect.The Istio implementation on Kubernetes utilizes an eventually consistentalgorithm to ensure all Envoy sidecars have the correct configurationincluding all route rules. A configuration change will take some timeto propagate to all the sidecars. With large deployments thepropagation will take longer and there may be a lag time on theorder of seconds.
Destination rule policy not activated
Although destination rules are associated with a particular destination host,the activation of subset-specific policies depends on route rule evaluation.
When routing a request, Envoy first evaluates route rules in virtual servicesto determine if a particular subset is being routed to.If so, only then will it activate any destination rule policies corresponding to the subset.Consequently, Istio only applies the policies you define for specific subsets ifyou explicitly routed traffic to the corresponding subset.
For example, consider the following destination rule as the one and only configuration defined for thereviews service, that is, there are no route rules in a corresponding VirtualService
definition:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
subsets:
- name: v1
labels:
version: v1
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
Even if Istio’s default round-robin routing calls “v1” instances on occasion,maybe even always if “v1” is the only running version, the above traffic policy will never be invoked.
You can fix the above example in one of two ways:
- Move the traffic policy in the destination rule up a level to make the policyapply to any subset, for example:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
subsets:
- name: v1
labels:
version: v1
- Define proper route rules for the service using a
VirtualService
.For example, add a simple route rule for thev1
subset of thereviews
service:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
The default Istio behavior conveniently sends traffic from any sourceto all versions of the destination service without you setting any rules.As soon as you need to differentiate between the versions of a service,you need to define routing rules.Due to this fact, we consider a best practice to set a default routing rulefor every service from the start.
503 errors after setting destination rule
If requests to a service immediately start generating HTTP 503 errors after you applied a DestinationRule
and the errors continue until you remove or revert the DestinationRule
, then the DestinationRule
is probablycausing a TLS conflict for the service.
For example, if you configure mutual TLS in the cluster globally, the DestinationRule
must include the following trafficPolicy
:
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
Otherwise, the mode defaults to DISABLE
causing client proxy sidecars to make plain HTTP requestsinstead of TLS encrypted requests. Thus, the requests conflict with the server proxy because the server proxy expectsencrypted requests.
To confirm there is a conflict, check whether the STATUS
field in the output of the istioctl authn tls-check
commandis set to CONFLICT
for your service. For example, a command similar to the following could be used to checkfor a conflict with the httpbin
service:
$ istioctl authn tls-check istio-ingressgateway-db454d49b-lmtg8.istio-system httpbin.default.svc.cluster.local
HOST:PORT STATUS SERVER CLIENT AUTHN POLICY DESTINATION RULE
httpbin.default.svc.cluster.local:8000 CONFLICT mTLS HTTP default/ httpbin/default
Whenever you apply a DestinationRule
, ensure the trafficPolicy
TLS mode matches the global server configuration.
Route rules have no effect on ingress gateway requests
Let’s assume you are using an ingress Gateway
and corresponding VirtualService
to access an internal service.For example, your VirtualService
looks something like this:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "myapp.com" # or maybe "*" if you are testing without DNS using the ingress-gateway IP (e.g., http://1.2.3.4/hello)
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /hello
route:
- destination:
host: helloworld.default.svc.cluster.local
- match:
...
You also have a VirtualService
which routes traffic for the helloworld service to a particular subset:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: helloworld
spec:
hosts:
- helloworld.default.svc.cluster.local
http:
- route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
In this situation you will notice that requests to the helloworld service via the ingress gateway willnot be directed to subset v1 but instead will continue to use default round-robin routing.
The ingress requests are using the gateway host (e.g., myapp.com
)which will activate the rules in the myapp VirtualService
that routes to any endpoint of the helloworld service.Only internal requests with the host helloworld.default.svc.cluster.local
will use thehelloworld VirtualService
which directs traffic exclusively to subset v1.
To control the traffic from the gateway, you need to also include the subset rule in the myapp VirtualService
:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "myapp.com" # or maybe "*" if you are testing without DNS using the ingress-gateway IP (e.g., http://1.2.3.4/hello)
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /hello
route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
- match:
...
Alternatively, you can combine both VirtualServices
into one unit if possible:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp.com # cannot use "*" here since this is being combined with the mesh services
- helloworld.default.svc.cluster.local
gateways:
- mesh # applies internally as well as externally
- myapp-gateway
http:
- match:
- uri:
prefix: /hello
gateways:
- myapp-gateway #restricts this rule to apply only to ingress gateway
route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
- match:
- gateways:
- mesh # applies to all services inside the mesh
route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
Headless TCP services losing connection
If istio-citadel
is deployed, Envoy is restarted every 45 days to refresh certificates.This causes the disconnection of TCP streams or long-running connections between services.
You should build resilience into your application for this type ofdisconnect, but if you still want to prevent the disconnects fromhappening, you will need to disable mutual TLS and the istio-citadel
deployment.
First, edit your istio
configuration to disable mutual TLS:
$ kubectl edit configmap -n istio-system istio
$ kubectl delete pods -n istio-system -l istio=pilot
Next, scale down the istio-citadel
deployment to disable Envoy restarts:
$ kubectl scale --replicas=0 deploy/istio-citadel -n istio-system
This should stop Istio from restarting Envoy and disconnecting TCP connections.
Envoy is crashing under load
Check your ulimit -a
. Many systems have a 1024 open file descriptor limit by default which will cause Envoy to assert and crash with:
[2017-05-17 03:00:52.735][14236][critical][assert] assert failure: fd_ != -1: external/envoy/source/common/network/connection_impl.cc:58
Make sure to raise your ulimit. Example: ulimit -n 16384
Envoy won’t connect to my HTTP/1.0 service
Envoy requires HTTP/1.1
or HTTP/2
traffic for upstream services. For example, when using NGINX for serving traffic behind Envoy, youwill need to set the proxy_http_version directive in your NGINX configuration to be “1.1”, since the NGINX default is 1.0.
Example configuration:
upstream http_backend {
server 127.0.0.1:8080;
keepalive 16;
}
server {
...
location /http/ {
proxy_pass http://http_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
...
}
}
404 errors occur when multiple gateways configured with same TLS certificate
Configuring more than one gateway using the same TLS certificate will cause browsersthat leverage HTTP/2 connection reuse(i.e., most browsers) to produce 404 errors when accessing a second host after aconnection to another host has already been established.
For example, let’s say you have 2 hosts that share the same TLS certificate like this:
- Wildcard certificate
*.test.com
installed inistio-ingressgateway
Gateway
configurationgw1
with hostservice1.test.com
, selectoristio: ingressgateway
, and TLS using gateway’s mounted (wildcard) certificateGateway
configurationgw2
with hostservice2.test.com
, selectoristio: ingressgateway
, and TLS using gateway’s mounted (wildcard) certificateVirtualService
configurationvs1
with hostservice1.test.com
and gatewaygw1
VirtualService
configurationvs2
with hostservice2.test.com
and gatewaygw2
Since both gateways are served by the same workload (i.e., selectoristio: ingressgateway
) requests to both services(service1.test.com
andservice2.test.com
) will resolve to the same IP. Ifservice1.test.com
is accessed first, itwill return the wildcard certificate (*.test.com
) indicating that connections toservice2.test.com
can use the same certificate.Browsers like Chrome and Firefox will consequently reuse the existing connection for requests toservice2.test.com
.Since the gateway (gw1
) has no route forservice2.test.com
, it will then return a 404 (Not Found) response.
You can avoid this problem by configuring a single wildcard Gateway
, instead of two (gw1
and gw2
).Then, simply bind both VirtualServices
to it like this:
Gateway
configurationgw
with host*.test.com
, selectoristio: ingressgateway
, and TLS using gateway’s mounted (wildcard) certificateVirtualService
configurationvs1
with hostservice1.test.com
and gatewaygw
VirtualService
configurationvs2
with hostservice2.test.com
and gatewaygw