Logstash
Logstash is a real-time event processing engine. It’s part of the OpenSearch stack which includes OpenSearch, Beats, and OpenSearch Dashboards.
You can send events to Logstash from many different sources. Logstash processes the events and sends it one or more destinations. For example, you can send access logs from a web server to Logstash. Logstash extracts useful information from each log and sends it to a destination like OpenSearch.
Sending events to Logstash lets you decouple event processing from your app. Your app only needs to send events to Logstash and doesn’t need to know anything about what happens to the events afterwards.
The open-source community originally built Logstash for processing log data but now you can process any type of events, including events in XML or JSON format.
Structure of a pipeline
The way that Logstash works is that you configure a pipeline that has three phases—inputs, filters, and outputs.
Each phase uses one or more plugins. Logstash has over 200 built-in plugins so chances are that you’ll find what you need. Apart from the built-in plugins, you can use plugins from the community or even write your own.
The structure of a pipeline is as follows:
input {
input_plugin => {}
}
filter {
filter_plugin => {}
}
output {
output_plugin => {}
}
where:
input
receives events like logs from multiple sources simultaneously. Logstash supports a number of input plugins for TCP/UDP, files, syslog, Microsoft Windows EventLogs, stdin, HTTP, and so on. You can also use an open source collection of input tools called Beats to gather events. The input plugin sends the events to a filter.filter
parses and enriches the events in one way or the other. Logstash has a large collection of filter plugins that modify events and pass them on to an output. For example, agrok
filter parses unstructured events into fields and amutate
filter changes fields. Filters are executed sequentially.output
ships the filtered events to one or more destinations. Logstash supports a wide range of output plugins for destinations like OpenSearch, TCP/UDP, emails, files, stdout, HTTP, Nagios, and so on.
Both the input and output phases support codecs to process events as they enter or exit the pipeline. Some of the popular codecs are json
and multiline
. The json
codec processes data that’s in JSON format and the multiline
codec merges multiple line events into a single line.
You can also write conditional statements within pipeline configurations to perform certain actions, if a certain criteria is met.
Install Logstash
The OpenSearch Logstash plugin has two installation options at this time: Linux (ARM64/X64) and Docker (ARM64/X64).
Make sure you have Java Development Kit (JDK) version 8 or 11 installed.
If you’re migrating from an existing Logstash installation, you can install the OpenSearch output plugin manually and update pipeline.conf. We include this plugin by default in our tarball and Docker downloads.
Tarball
Download the Logstash tarball from OpenSearch downloads.
Navigate to the downloaded folder in the terminal and extract the files:
tar -zxvf logstash-oss-with-opensearch-output-plugin-7.16.2-linux-x64.tar.gz
Navigate to the
logstash-7.16.2
directory.- You can add your pipeline configurations to the
config
directory. Logstash saves any data from the plugins in thedata
directory. Thebin
directory contains the binaries for starting Logstash and managing plugins.
- You can add your pipeline configurations to the
Docker
Pull the Logstash oss package with the OpenSearch output plugin image:
docker pull opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.2
Create a Docker network:
docker network create test
Start OpenSearch with this network:
docker run -p 9200:9200 -p 9600:9600 --name opensearch --net test -e "discovery.type=single-node" opensearchproject/opensearch:1.2.0
Start Logstash:
docker run -it --rm --name logstash --net test opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.2 -e 'input { stdin { } } output {
opensearch {
hosts => ["https://opensearch:9200"]
index => "opensearch-logstash-docker-%{+YYYY.MM.dd}"
user => "admin"
password => "admin"
ssl => true
ssl_certificate_verification => false
}
}'
Process text from the terminal
You can define a pipeline that listens for events on stdin
and outputs events on stdout
. stdin
and stdout
refer to the terminal in which you’re running Logstash.
To enter some text in the terminal and see the event data in the output:
Use the
-e
argument to pass a pipeline configuration directly to the Logstash binary. In this case,stdin
is the input plugin andstdout
is the output plugin:bin/logstash -e "input { stdin { } } output { stdout { } }"
Add the
—debug
flag to see a more detailed output.Enter “hello world” in your terminal. Logstash processes the text and outputs it back to the terminal:
{
"message" => "hello world",
"host" => "a483e711a548.ant.amazon.com",
"@timestamp" => 2021-05-30T05:15:56.816Z,
"@version" => "1"
}
The
message
field contains your raw input. Thehost
field is an IP address when you don’t run Logstash locally.@timestamp
shows the date and time for when the event is processed. Logstash uses the@version
field for internal processing.Press
Ctrl + C
to shut down Logstash.
Troubleshooting
If you already have a Logstash process running, you’ll get an error. To fix this issue:
Delete the
.lock
file from thedata
directory:cd data
rm -rf .lock
Restart Logstash.
Process JSON or HTTP input and output it to a file
To define a pipeline that handles JSON requests:
Open the
config/pipeline.conf
file in any text editor you like. You can create a pipeline configuration file with any extension, the.conf
extension is a Logstash convention. Add thejson
codec to accept JSON as the input and thefile
plugin to output the processed events to a.txt
file:input {
stdin {
codec => json
}
}
output {
file {
path => "output.txt"
}
}
To process inputs from a file, add an input file to the
events-data
directory and then pass its path to thefile
plugin at the input:input {
file {
path => "events-data/input_data.log"
}
}
Start Logstash:
$ bin/logstash -f config/pipeline.conf
config/pipeline.conf
is a relative path to thepipeline.conf
file. You can use an absolute path as well.Add a JSON object in the terminal:
{ "amount": 10, "quantity": 2}
The pipeline only handles a single line of input. If you paste some JSON that spans multiple lines, you’ll get an error.
Check that the fields from the JSON object are added to the
output.txt
file:$ cat output.txt
{
"@version": "1",
"@timestamp": "2021-05-30T05:52:52.421Z",
"host": "a483e711a548.ant.amazon.com",
"amount": 10,
"quantity": 2
}
If you type in some invalid JSON as the input, you’ll see a JSON parsing error. Logstash doesn’t discard the invalid JSON because you still might want to do something with it. For example, you can trigger an email or send a notification to a Slack channel.
To define a pipeline that handles HTTP requests:
Use the
http
plugin to send events to Logstash through HTTP:input {
http {
host => "127.0.0.1"
port => 8080
}
}
output {
file {
path => "output.txt"
}
}
If you don’t specify any options, the
http
plugin binds tolocalhost
and listens on port 8080.Start Logstash:
$ bin/logstash -f config/pipeline.conf
Use Postman to send an HTTP request. Set
Content-Type
to an HTTP header with a value ofapplication/json
:PUT 127.0.0.1:8080
{
"amount": 10,
"quantity": 2
}
Or, you can use the
curl
command:curl -XPUT -H "Content-Type: application/json" -d ' {"amount": 7, "quantity": 3 }' http://localhost:8080 (http://localhost:8080/)
Even though we haven’t added the
json
plugin to the input, the pipeline configuration still works because the HTTP plugin automatically applies the appropriate codec based on theContent-Type
header. If you specify a value ofapplications/json
, Logstash parses the request body as JSON.The
headers
field contains the HTTP headers that Logstash receives:{
"host": "127.0.0.1",
"quantity": "3",
"amount": 10,
"@timestamp": "2021-05-30T06:05:48.135Z",
"headers": {
"http_version": "HTTP/1.1",
"request_method": "PUT",
"http_user_agent": "PostmanRuntime/7.26.8",
"connection": "keep-alive",
"postman_token": "c6cd29cf-1b37-4420-8db3-9faec66b9e7e",
"http_host": "127.0.0.1:8080",
"cache_control": "no-cache",
"request_path": "/",
"content_type": "application/json",
"http_accept": "*/*",
"content_length": "41",
"accept_encoding": "gzip, deflate, br"
},
"@version": "1"
}
Automatically reload the pipeline configuration
You can configure Logstash to detect any changes to the pipeline configuration file or the input log file and automatically reload the configuration.
The stdin
plugin doesn’t supporting automatic reloading.
Add an option named
start_position
with a value ofbeginning
to the input plugin:input {
file {
path => "/Users/<user>/Desktop/logstash7-12.1/events-data/input_file.log"
start_position => "beginning"
}
}
Logstash only processes any new events added to the input file and ignores the ones that it has already processed to avoid processing the same event more than once on restart.
Logstash records its progress in a file that’s referred to as a
sinceDB
file. Logstash creates asinceDB
file for each file that it watches for changes.Open the
sinceDB
file to check how much of the input files are processed:cd data/plugins/inputs/file/
ls -al
-rw-r--r-- 1 user staff 0 Jun 13 10:50 .sincedb_9e484f2a9e6c0d1bdfe6f23ac107ffc5
cat .sincedb_9e484f2a9e6c0d1bdfe6f23ac107ffc5
51575938 1 4 7727
The last number in the
sinceDB
file (7727) is the byte offset of the last known event processed.To process the input file from the beginning, delete the
sinceDB
file:rm .sincedb_*
Start Logstash with a
—-config.reload.automatic
argument:bin/logstash -f config/pipeline.conf --config.reload.automatic
The
reload
option only reloads if you add a new line at the end of the pipeline configuration file.Sample output:
{
"message" => "216.243.171.38 - - [20/Sep/2017:19:11:52 +0200] \"GET /products/view/123 HTTP/1.1\" 200 12798 \"https://codingexplained.com/products\" \"Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)\"",
"@version" => "1",
"host" => "a483e711a548.ant.amazon.com",
"path" => "/Users/kumarjao/Desktop/odfe1/logstash-7.12.1/events-data/input_file.log",
"@timestamp" => 2021-06-13T18:03:30.423Z
}
{
"message" => "91.59.108.75 - - [20/Sep/2017:20:11:43 +0200] \"GET /js/main.js HTTP/1.1\" 200 588 \"https://codingexplained.com/products/view/863\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0\"",
"@version" => "1",
"host" => "a483e711a548.ant.amazon.com",
"path" => "/Users/kumarjao/Desktop/odfe1/logstash-7.12.1/events-data/input_file.log",
"@timestamp" => 2021-06-13T18:03:30.424Z
}
Add a new line to the input file.
- Logstash immediately detects the change and processes the new line as an event.
- Make a change to the
pipeline.conf
file.- Logstash immediately detects the change and reloads the modified pipeline.