Troubleshooting Promtail
This document describes known failure modes of promtail
on edge cases and theadopted trade-offs.
A tailed file is truncated while promtail
is not running
Given the following order of events:
promtail
is tailing/app.log
promtail
current position for/app.log
is100
(byte offset)promtail
is stopped/app.log
is truncated and new logs are appended to itpromtail
is restarted
When promtail
is restarted, it reads the previous position (100
) from thepositions file. Two scenarios are then possible:
/app.log
size is less than the position before truncating/app.log
size is greater than or equal to the position before truncating
If the /app.log
file size is less than the previous position, then the file isdetected as truncated and logs will be tailed starting from position 0
.Otherwise, if the /app.log
file size is greater than or equal to the previousposition, promtail
can’t detect it was truncated while not running and willcontinue tailing the file from position 100
.
Generally speaking, promtail
uses only the path to the file as key in thepositions file. Whenever promtail
is started, for each file path referenced inthe positions file, promtail
will read the file from the beginning if the filesize is less than the offset stored in the position file, otherwise it willcontinue from the offset, regardless the file has been truncated or rolledmultiple times while promtail
was not running.
Loki is unavailable
For each tailing file, promtail
reads a line, process it through theconfigured pipeline_stages
and push the log entry to Loki. Log entries arebatched together before getting pushed to Loki, based on the max batch durationclient.batch-wait
and size client.batch-size-bytes
, whichever comes first.
In case of any error while sending a log entries batch, promtail
adopts a“retry then discard” strategy:
promtail
retries to send log entry to the ingester up tomaxretries
times- If all retries fail,
promtail
discards the batch of log entries (which willbe lost) and proceeds with the next one
You can configure the maxretries
and the delay between two retries via thebackoff_config
in the promtail config file:
clients:
- url: INGESTER-URL
backoff_config:
minbackoff: 100ms
maxbackoff: 10s
maxretries: 10
The following table shows an example of the total delay applied by the backoff algorithmwith minbackoff: 100ms
and maxbackoff: 10s
:
Retry | Min delay | Max delay | Total min delay | Total max delay |
---|---|---|---|---|
1 | 100ms | 200ms | 100ms | 200ms |
2 | 200ms | 400ms | 300ms | 600ms |
3 | 400ms | 800ms | 700ms | 1.4s |
4 | 800ms | 1.6s | 1.5s | 3s |
5 | 1.6s | 3.2s | 3.1s | 6.2s |
6 | 3.2s | 6.4s | 6.3s | 12.6s |
7 | 6.4s | 10s | 12.7s | 22.6s |
8 | 6.4s | 10s | 19.1s | 32.6s |
9 | 6.4s | 10s | 25.5s | 42.6s |
10 | 6.4s | 10s | 31.9s | 52.6s |
11 | 6.4s | 10s | 38.3s | 62.6s |
12 | 6.4s | 10s | 44.7s | 72.6s |
13 | 6.4s | 10s | 51.1s | 82.6s |
14 | 6.4s | 10s | 57.5s | 92.6s |
15 | 6.4s | 10s | 63.9s | 102.6s |
16 | 6.4s | 10s | 70.3s | 112.6s |
17 | 6.4s | 10s | 76.7s | 122.6s |
18 | 6.4s | 10s | 83.1s | 132.6s |
19 | 6.4s | 10s | 89.5s | 142.6s |
20 | 6.4s | 10s | 95.9s | 152.6s |
Log entries pushed after a promtail
crash / panic / abruptly termination
When promtail
shuts down gracefully, it saves the last read offsets in thepositions file, so that on a subsequent restart it will continue tailing logswithout duplicates neither losses.
In the event of a crash or abruptly termination, promtail
can’t save the lastread offsets in the positions file. When restarted, promtail
will read thepositions file saved at the last sync period and will continue tailing the filesfrom there. This means that if new log entries have been read and pushed to theingester between the last sync period and the crash, these log entries will besent again to the ingester on promtail
restart.
However, for each log stream (set of unique labels) the Loki ingester skips alllog entries received out of timestamp order. For this reason, even if duplicatedlogs may be sent from promtail
to the ingester, entries whose timestamp isolder than the latest received will be discarded to avoid having duplicatedlogs. To leverage this, it’s important that your pipeline_stages
includethe timestamp
stage, parsing the log entry timestamp from the log line insteadof relying on the default behaviour of setting the timestamp as the point intime when the line is read by promtail
.