Debugging Your Service
The demo application emojivoto has some issues. Let's use that and Linkerd todiagnose an application that fails in ways which are a little more subtle thanthe entire service crashing. This guide assumes that you've followed the stepsin the Getting Started guide and have Linkerd and thedemo application running in a Kubernetes cluster. If you've not done that yet,go get started and come back when you're done!
If you glance at the Linkerd dashboard (by running the linkerd dashboard
command), you should see all the resources in the emojivoto
namespace,including the deployments. Each deployment running Linkerd shows success rate,requests per second and latency percentiles.
That's pretty neat, but the first thing you might notice is that the successrate is well below 100%! Click on web
and let's dig in.
You should now be looking at the Deployment page for the web deployment. The firstthing you'll see here is that the web deployment is taking traffic from vote-bot
(a deployment included with emojivoto to continually generate a low level oflive traffic). The web deployment also has two outgoing dependencies, emoji
and voting
.
While the emoji deployment is handling every request from web successfully, itlooks like the voting deployment is failing some requests! A failure in a dependentdeployment may be exactly what is causing the errors that web is returning.
Let's scroll a little further down the page, we'll see a live list of alltraffic that is incoming to and outgoing from web
. This is interesting:
There are two calls that are not at 100%: the first is vote-bot's call to the/api/vote
endpoint. The second is the VoteDoughnut
call from the webdeployment to its dependent deployment, voting
. Very interesting! Since/api/vote
is an incoming call, and VoteDoughnut
is an outgoing call, this isa good clue that this endpoint is what's causing the problem!
Finally, to dig a little deeper, we can click on the tap
icon in the far rightcolumn. This will take us to the live list of requests that match only thisendpoint. You'll see Unknown
under the GRPC status
column. This is becausethe requests are failing with agRPC status code 2,which is a common error response as you can see fromthe code. Linkerd is aware of gRPC's response classification without anyother configuration!
At this point, we have everything required to get the endpoint fixed and restorethe overall health of our applications.