Logstash

Logstash is a real-time event processing engine. It’s part of the OpenSearch stack which includes OpenSearch, Beats, and OpenSearch Dashboards.

You can send events to Logstash from many different sources. Logstash processes the events and sends it one or more destinations. For example, you can send access logs from a web server to Logstash. Logstash extracts useful information from each log and sends it to a destination like OpenSearch.

Sending events to Logstash lets you decouple event processing from your app. Your app only needs to send events to Logstash and doesn’t need to know anything about what happens to the events afterwards.

The open-source community originally built Logstash for processing log data but now you can process any type of events, including events in XML or JSON format.

Structure of a pipeline

The way that Logstash works is that you configure a pipeline that has three phases⁠—inputs, filters, and outputs.

Each phase uses one or more plugins. Logstash has over 200 built-in plugins so chances are that you’ll find what you need. Apart from the built-in plugins, you can use plugins from the community or even write your own.

The structure of a pipeline is as follows:

  1. input {
  2. input_plugin => {}
  3. }
  4. filter {
  5. filter_plugin => {}
  6. }
  7. output {
  8. output_plugin => {}
  9. }

where:

  • input receives events like logs from multiple sources simultaneously. Logstash supports a number of input plugins for TCP/UDP, files, syslog, Microsoft Windows EventLogs, stdin, HTTP, and so on. You can also use an open source collection of input tools called Beats to gather events. The input plugin sends the events to a filter.
  • filter parses and enriches the events in one way or the other. Logstash has a large collection of filter plugins that modify events and pass them on to an output. For example, a grok filter parses unstructured events into fields and a mutate filter changes fields. Filters are executed sequentially.
  • output ships the filtered events to one or more destinations. Logstash supports a wide range of output plugins for destinations like OpenSearch, TCP/UDP, emails, files, stdout, HTTP, Nagios, and so on.

Both the input and output phases support codecs to process events as they enter or exit the pipeline. Some of the popular codecs are json and multiline. The json codec processes data that’s in JSON format and the multiline codec merges multiple line events into a single line.

You can also write conditional statements within pipeline configurations to perform certain actions, if a certain criteria is met.

Install Logstash

The OpenSearch Logstash plugin has two installation options at this time: Linux (ARM64/X64) and Docker (ARM64/X64).

Make sure you have Java Development Kit (JDK) version 8 or 11 installed.

If you’re migrating from an existing Logstash installation, you can install the OpenSearch output plugin manually and update pipeline.conf. We include this plugin by default in our tarball and Docker downloads.

Tarball

  1. Download the Logstash tarball from OpenSearch downloads.

  2. Navigate to the downloaded folder in the terminal and extract the files:

    1. tar -zxvf logstash-oss-with-opensearch-output-plugin-7.16.2-linux-x64.tar.gz
  3. Navigate to the logstash-7.16.2 directory.

    • You can add your pipeline configurations to the config directory. Logstash saves any data from the plugins in the data directory. The bin directory contains the binaries for starting Logstash and managing plugins.

Docker

  1. Pull the Logstash oss package with the OpenSearch output plugin image:

    1. docker pull opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.2
  2. Create a Docker network:

    1. docker network create test
  3. Start OpenSearch with this network:

    1. docker run -p 9200:9200 -p 9600:9600 --name opensearch --net test -e "discovery.type=single-node" opensearchproject/opensearch:1.2.0
  4. Start Logstash:

    1. docker run -it --rm --name logstash --net test opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.2 -e 'input { stdin { } } output {
    2. opensearch {
    3. hosts => ["https://opensearch:9200"]
    4. index => "opensearch-logstash-docker-%{+YYYY.MM.dd}"
    5. user => "admin"
    6. password => "admin"
    7. ssl => true
    8. ssl_certificate_verification => false
    9. }
    10. }'

Process text from the terminal

You can define a pipeline that listens for events on stdin and outputs events on stdout. stdin and stdout refer to the terminal in which you’re running Logstash.

To enter some text in the terminal and see the event data in the output:

  1. Use the -e argument to pass a pipeline configuration directly to the Logstash binary. In this case, stdin is the input plugin and stdout is the output plugin:

    1. bin/logstash -e "input { stdin { } } output { stdout { } }"

    Add the —debug flag to see a more detailed output.

  2. Enter “hello world” in your terminal. Logstash processes the text and outputs it back to the terminal:

    1. {
    2. "message" => "hello world",
    3. "host" => "a483e711a548.ant.amazon.com",
    4. "@timestamp" => 2021-05-30T05:15:56.816Z,
    5. "@version" => "1"
    6. }

    The message field contains your raw input. The host field is an IP address when you don’t run Logstash locally. @timestamp shows the date and time for when the event is processed. Logstash uses the @version field for internal processing.

  3. Press Ctrl + C to shut down Logstash.

Troubleshooting

If you already have a Logstash process running, you’ll get an error. To fix this issue:

  1. Delete the .lock file from the data directory:

    1. cd data
    2. rm -rf .lock
  2. Restart Logstash.

Process JSON or HTTP input and output it to a file

To define a pipeline that handles JSON requests:

  1. Open the config/pipeline.conf file in any text editor you like. You can create a pipeline configuration file with any extension, the .conf extension is a Logstash convention. Add the json codec to accept JSON as the input and the file plugin to output the processed events to a .txt file:

    1. input {
    2. stdin {
    3. codec => json
    4. }
    5. }
    6. output {
    7. file {
    8. path => "output.txt"
    9. }
    10. }

    To process inputs from a file, add an input file to the events-data directory and then pass its path to the file plugin at the input:

    1. input {
    2. file {
    3. path => "events-data/input_data.log"
    4. }
    5. }
  2. Start Logstash:

    1. $ bin/logstash -f config/pipeline.conf

    config/pipeline.conf is a relative path to the pipeline.conf file. You can use an absolute path as well.

  3. Add a JSON object in the terminal:

    1. { "amount": 10, "quantity": 2}

    The pipeline only handles a single line of input. If you paste some JSON that spans multiple lines, you’ll get an error.

  4. Check that the fields from the JSON object are added to the output.txt file:

    1. $ cat output.txt
    2. {
    3. "@version": "1",
    4. "@timestamp": "2021-05-30T05:52:52.421Z",
    5. "host": "a483e711a548.ant.amazon.com",
    6. "amount": 10,
    7. "quantity": 2
    8. }

    If you type in some invalid JSON as the input, you’ll see a JSON parsing error. Logstash doesn’t discard the invalid JSON because you still might want to do something with it. For example, you can trigger an email or send a notification to a Slack channel.

To define a pipeline that handles HTTP requests:

  1. Use the http plugin to send events to Logstash through HTTP:

    1. input {
    2. http {
    3. host => "127.0.0.1"
    4. port => 8080
    5. }
    6. }
    7. output {
    8. file {
    9. path => "output.txt"
    10. }
    11. }

    If you don’t specify any options, the http plugin binds to localhost and listens on port 8080.

  2. Start Logstash:

    1. $ bin/logstash -f config/pipeline.conf
  3. Use Postman to send an HTTP request. Set Content-Type to an HTTP header with a value of application/json:

    1. PUT 127.0.0.1:8080
    2. {
    3. "amount": 10,
    4. "quantity": 2
    5. }

    Or, you can use the curl command:

    1. curl -XPUT -H "Content-Type: application/json" -d ' {"amount": 7, "quantity": 3 }' http://localhost:8080 (http://localhost:8080/)

    Even though we haven’t added the json plugin to the input, the pipeline configuration still works because the HTTP plugin automatically applies the appropriate codec based on the Content-Type header. If you specify a value of applications/json, Logstash parses the request body as JSON.

    The headers field contains the HTTP headers that Logstash receives:

    1. {
    2. "host": "127.0.0.1",
    3. "quantity": "3",
    4. "amount": 10,
    5. "@timestamp": "2021-05-30T06:05:48.135Z",
    6. "headers": {
    7. "http_version": "HTTP/1.1",
    8. "request_method": "PUT",
    9. "http_user_agent": "PostmanRuntime/7.26.8",
    10. "connection": "keep-alive",
    11. "postman_token": "c6cd29cf-1b37-4420-8db3-9faec66b9e7e",
    12. "http_host": "127.0.0.1:8080",
    13. "cache_control": "no-cache",
    14. "request_path": "/",
    15. "content_type": "application/json",
    16. "http_accept": "*/*",
    17. "content_length": "41",
    18. "accept_encoding": "gzip, deflate, br"
    19. },
    20. "@version": "1"
    21. }

Automatically reload the pipeline configuration

You can configure Logstash to detect any changes to the pipeline configuration file or the input log file and automatically reload the configuration.

The stdin plugin doesn’t supporting automatic reloading.

  1. Add an option named start_position with a value of beginning to the input plugin:

    1. input {
    2. file {
    3. path => "/Users/<user>/Desktop/logstash7-12.1/events-data/input_file.log"
    4. start_position => "beginning"
    5. }
    6. }

    Logstash only processes any new events added to the input file and ignores the ones that it has already processed to avoid processing the same event more than once on restart.

    Logstash records its progress in a file that’s referred to as a sinceDB file. Logstash creates a sinceDB file for each file that it watches for changes.

  2. Open the sinceDB file to check how much of the input files are processed:

    1. cd data/plugins/inputs/file/
    2. ls -al
    3. -rw-r--r-- 1 user staff 0 Jun 13 10:50 .sincedb_9e484f2a9e6c0d1bdfe6f23ac107ffc5
    4. cat .sincedb_9e484f2a9e6c0d1bdfe6f23ac107ffc5
    5. 51575938 1 4 7727

    The last number in the sinceDB file (7727) is the byte offset of the last known event processed.

  3. To process the input file from the beginning, delete the sinceDB file:

    1. rm .sincedb_*
  4. Start Logstash with a —-config.reload.automatic argument:

    1. bin/logstash -f config/pipeline.conf --config.reload.automatic

    The reload option only reloads if you add a new line at the end of the pipeline configuration file.

    Sample output:

    1. {
    2. "message" => "216.243.171.38 - - [20/Sep/2017:19:11:52 +0200] \"GET /products/view/123 HTTP/1.1\" 200 12798 \"https://codingexplained.com/products\" \"Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)\"",
    3. "@version" => "1",
    4. "host" => "a483e711a548.ant.amazon.com",
    5. "path" => "/Users/kumarjao/Desktop/odfe1/logstash-7.12.1/events-data/input_file.log",
    6. "@timestamp" => 2021-06-13T18:03:30.423Z
    7. }
    8. {
    9. "message" => "91.59.108.75 - - [20/Sep/2017:20:11:43 +0200] \"GET /js/main.js HTTP/1.1\" 200 588 \"https://codingexplained.com/products/view/863\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0\"",
    10. "@version" => "1",
    11. "host" => "a483e711a548.ant.amazon.com",
    12. "path" => "/Users/kumarjao/Desktop/odfe1/logstash-7.12.1/events-data/input_file.log",
    13. "@timestamp" => 2021-06-13T18:03:30.424Z
    14. }
  5. Add a new line to the input file.

    • Logstash immediately detects the change and processes the new line as an event.
  6. Make a change to the pipeline.conf file.
    • Logstash immediately detects the change and reloads the modified pipeline.