Fault Injection
This task shows you how to inject faults to test the resiliency of your application.
Before you begin
Set up Istio by following the instructions in theInstallation guide.
Deploy the Bookinfo sample application including thedefault destination rules.
Review the fault injection discussion in theTraffic Management concepts doc.
Apply application version routing by either performing therequest routing task or byrunning the following commands:
$ kubectl apply -f @samples/bookinfo/networking/virtual-service-all-v1.yaml@
$ kubectl apply -f @samples/bookinfo/networking/virtual-service-reviews-test-v2.yaml@
With the above configuration, this is how requests flow:
productpage
→reviews:v2
→ratings
(only for userjason
)productpage
→reviews:v1
(for everyone else)
Injecting an HTTP delay fault
To test the Bookinfo application microservices for resiliency, inject a 7s delaybetween the reviews:v2
and ratings
microservices for user jason
. This testwill uncover a bug that was intentionally introduced into the Bookinfo app.
Note that the reviews:v2
service has a 10s hard-coded connection timeout forcalls to the ratings
service. Even with the 7s delay that you introduced, youstill expect the end-to-end flow to continue without any errors.
- Create a fault injection rule to delay traffic coming from the test user
jason
.
$ kubectl apply -f @samples/bookinfo/networking/virtual-service-ratings-test-delay.yaml@
- Confirm the rule was created:
$ kubectl get virtualservice ratings -o yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
...
spec:
hosts:
- ratings
http:
- fault:
delay:
fixedDelay: 7s
percentage:
value: 100
match:
- headers:
end-user:
exact: jason
route:
- destination:
host: ratings
subset: v1
- route:
- destination:
host: ratings
subset: v1
Allow several seconds for the new rule to propagate to all pods.
Testing the delay configuration
Open the Bookinfo web application in your browser.
On the
/productpage
web page, log in as userjason
.
You expect the Bookinfo home page to load without errors in approximately7 seconds. However, there is a problem: the Reviews section displays an errormessage:
Error fetching product reviews!
Sorry, product reviews are currently unavailable for this book.
View the web page response times:
- Open the Developer Tools menu in you web browser.
- Open the Network tab
- Reload the
/productpage
web page. You will see that the page actually loads in about 6 seconds.
Understanding what happened
You’ve found a bug. There are hard-coded timeouts in the microservices that havecaused the reviews
service to fail.
As expected, the 7s delay you introduced doesn’t affect the reviews
servicebecause the timeout between the reviews
and ratings
service is hard-coded at 10s.However, there is also a hard-coded timeout between the productpage
and the reviews
service,coded as 3s + 1 retry for 6s total.As a result, the productpage
call to reviews
times out prematurely and throws an error after 6s.
Bugs like this can occur in typical enterprise applications where different teamsdevelop different microservices independently. Istio’s fault injection rules help you identify such anomalieswithout impacting end users.
Notice that the fault injection test is restricted to when the logged in user isjason
. If you login as any other user, you will not experience any delays.
Fixing the bug
You would normally fix the problem by:
- Either increasing the
productpage
toreviews
service timeout or decreasing thereviews
toratings
timeout - Stopping and restarting the fixed microservice
- Confirming that the
/productpage
web page returns its response without any errors.However, you already have a fix running in v3 of thereviews
service.Thereviews:v3
service reduces thereviews
toratings
timeout from 10s to 2.5sso that it is compatible with (less than) the timeout of the downstreamproductpage
requests.
If you migrate all traffic to reviews:v3
as described in thetraffic shifting task, you can thentry to change the delay rule to any amount less that 2.5s, for example 2s, and confirmthat the end-to-end flow continues without any errors.
Injecting an HTTP abort fault
Another way to test microservice resiliency is to introduce an HTTP abort fault.In this task, you will introduce an HTTP abort to the ratings
microservices forthe test user jason
.
In this case, you expect the page to load immediately and display the Ratingsservice is currently unavailable
message.
- Create a fault injection rule to send an HTTP abort for user
jason
:
$ kubectl apply -f @samples/bookinfo/networking/virtual-service-ratings-test-abort.yaml@
- Confirm the rule was created:
$ kubectl get virtualservice ratings -o yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
...
spec:
hosts:
- ratings
http:
- fault:
abort:
httpStatus: 500
percentage:
value: 100
match:
- headers:
end-user:
exact: jason
route:
- destination:
host: ratings
subset: v1
- route:
- destination:
host: ratings
subset: v1
Testing the abort configuration
Open the Bookinfo web application in your browser.
On the
/productpage
, log in as userjason
.
If the rule propagated successfully to all pods, the page loadsimmediately and the Ratings service is currently unavailable
message appears.
- If you log out from user
jason
or open the Bookinfo application in an anonymouswindow (or in another browser), you will see that/productpage
still callsreviews:v1
(which does not callratings
at all) for everybody butjason
. Therefore youwill not see any error message.
Cleanup
- Remove the application routing rules:
$ kubectl delete -f @samples/bookinfo/networking/virtual-service-all-v1.yaml@
- If you are not planning to explore any follow-on tasks, refer to theBookinfo cleanup instructionsto shutdown the application.
See also
Istio as a Proxy for External Services
Configure Istio ingress gateway to act as a proxy for external services.
Multi-Mesh Deployments for Isolation and Boundary Protection
Deploy environments that require isolation into separate meshes and enable inter-mesh communication by mesh federation.
Secure Control of Egress Traffic in Istio, part 3
Comparison of alternative solutions to control egress traffic including performance considerations.
Secure Control of Egress Traffic in Istio, part 2
Use Istio Egress Traffic Control to prevent attacks involving egress traffic.
Secure Control of Egress Traffic in Istio, part 1
Attacks involving egress traffic and requirements for egress traffic control.
Version Routing in a Multicluster Service Mesh
Configuring Istio route rules in a multicluster service mesh.