Logstash

Logstash is a powerful, open source, unstructured data processing program that can accept text data from many different sources (directly over TCP/UDP, via Unix sockets, or by reading in files from disk for example), in many different formats and transform those inputs into structured, searchable documents.

One of the most common use cases is to take text based system and application logs, extract individual fields (e.g. host names, error codes, timing data, and so on), and make the data available in Elasticsearch for searching and reporting.

In this guide, we will cover the basics of getting your Traffic Server log data into Logstash. Going the next step and building fancy Kibana dashboards on top of that is currently left as an exercise for the reader.

Traffic Server Log Formats

Traffic Server provides a very flexible set of logging outputs. Almost any format can be constructed. The full range of options is covered in the Logging chapter.

This guide will walk you through using the appropriate filters in Logstash for the common logging formats in Traffic Server. If you have constructed your own custom log formats, you will need to build upon these examples and refer to the Logstash documentation to produce custom filters capable of parsing your own formats.

Logstash Input

For the on-disk logs produced by Traffic Server, you will want to use Logstash’s file input plugin. Note that your logs must be in ASCII format, not binary, for the plugin to work.

Assuming that your Traffic Server event logs are named access-<rotationtimestamp>.log and stored at /var/log/trafficserver/, the following Logstash input configuration should work:

  1. input {
  2. file {
  3. path => /var/log/trafficserver/access-*.log
  4. }
  5. }

Logstash provides some additional tweaking options, which are explained in the file plugin documentation but the above provides the bare minimum required to have Logstash read log data from local disks.

Logstash Filters

The grok filter in Logstash allows you to completely tailor the parsing of your source data and extract as many or as few fields as you like.

Some patterns are already built and can be used very easily. If you have built custom log formats for Traffic Server, you may need to write your own patterns, however.

Squid Compatible

The Squid log format includes, unsurprisingly, a few useful fields for proxy servers. Using the following grok pattern will extract this information from your Traffic Server logs if you employ the Squid compatible log format:

  1. filter {
  2. grok {
  3. match => { "message" => "%{NUMBER:timestamp} %{NUMBER:timetoserve} %{IPORHOST:clientip} %{WORD:cachecode}/%{NUMBER:response} %{NUMBER:bytes} %{WORD:verb} %{NOTSPACE:request} %{USER:auth} %{NOTSPACE:route} %{DATA:contenttype}" }
  4. }
  5. date {
  6. match => [ "timestamp", "UNIX" ]
  7. }
  8. }

The resulting structured document will contain the following fields:

FieldDescription
timestampDate and time of the client request.
timetoserveTime, in seconds, from initial client connection to Traffic Server until the last byte has been sent back to client from Traffic Server.
clientipClient IP address or hostname.
cachecodeCache Result Codes.
responseHTTP response status code sent by Traffic Server to the client.
bytesLength, in bytes, of the Traffic Server response to the client, including headers.
verbHTTP method (e.g. GET, POST, etc.) of the client request.
requestURL specified by the client request.
authAuthentication username supplied by the client, if present.
routeProxy hierarchy route; the route used by Traffic Server to retrieve the cache object.
contenttypeContent type of the response.

Netscape Common

If your Traffic Server instance is already outputting Netscape Common format logs, then Logstash’s COMMONAPACHELOG pattern will handle your logs out of the box. Add the following filter block to your Logstash configuration:

  1. filter {
  2. grok {
  3. match => { "message" => "%{COMMONAPACHELOG}" }
  4. }
  5. }

This will produce a structured document for each log entry with the following fields:

FieldDescription
clientipClient IP address or hostname.
identAlways a literal - character for Traffic Server logs.
authThe authentication username for the client request. A - means no authentication was required (or supplied).
timestampThe date and time of the client request.
verbHTTP method used for the request (e.g. GET, POST, etc.).
requestURL specified by the client request.
httpversionHTTP version (e.g. 1.1) used by the client.
rawrequestSee note below.
responseHTTP status code used for Traffic Server response (not the origin’s response code).
bytesLength of Traffic Server response to client, in bytes.

Note

rawrequest is populated when the usual "<verb> <request> http/<httpversion>" pattern was not matched. In that event, those three fields will be missing from the document, and instead rawrequest will have the original string.

Netscape Extended

The following pattern adds to Common Apache to support the additional fields found in Netscape Extended:

  1. filter {
  2. grok {
  3. match => { "message" => "%{COMMONAPACHELOG} %{NUMBER:originstatus} %{NUMBER:originrespbytes} %{NUMBER:clientreqbytes} %{NUMBER:proxyreqbytes} %{NUMBER:clienthdrbytes} %{NUMBER:proxyresphdrbytes} %{NUMBER:proxyreqhdrbytes} %{NUMBER:originhdrbytes} %{NUMBER:timetoserve}" }
  4. }
  5. }

Because this starts out with the COMMONAPACHELOG pattern, you will get all of the fields mentioned in Netscape Common above, as well as the following:

FieldDescription
originstatusHTTP status code returned by origin server.
originrespbytesBody length, in bytes, of origin’s response to Traffic Server.
clientreqbytesBody length, in bytes, of client request to Traffic Server.
proxyreqbytesBody length, in bytes, of Traffic Server request to origin.
clienthdrbytesHeader length, in bytes, of client request to Traffic Server.
proxyresphdrbytesHeader length, in bytes, of Traffic Server response to client.
proxyreqhdrbytesHeader length, in bytes, of Traffic Server request to origin.
originhdrbytesHeader length, in bytes, of origin’s response to Traffic Server.
timetoserveTime, in seconds, from initial client connection to Traffic Server until the last byte has been sent back to client from Traffic Server.

Further Reading