file
file source is used for log collection.
Example
sources:
- type: file
name: accesslog
Tips
If you use logconfig/clusterlogconfig to collect container logs, additional fields are added to the file source, please refer to here.
paths
field | type | required | default | description |
---|
paths | string array | true | none | The collected paths are matched using glob expressions. Support glob expansion expressions Brace Expansion and Glob Star |
Example
Object files to be collected:
/tmp/loggie/service/order/access.log
/tmp/loggie/service/order/access.log.2022-04-11
/tmp/loggie/service/pay/access.log
/tmp/loggie/service/pay/access.log.2022-04-11
Corresponding configuration:
sources:
- type: file
paths:
- /tmp/loggie/**/access.log{,.[2-9][0-9][0-9][0-9]-[01][0-9]-[0123][0-9]}
excludeFiles
field | type | required | default | description |
---|
excludeFiles | string array | false | none | Exclude collected files regular expression |
Example
sources:
- type: file
paths:
- /tmp/*.log
excludeFiles:
- \.gz$
ignoreOlder
field | type | required | default | description |
---|
ignoreOlder | time.Duration | false | none | for example, 48h, which means to ignore files whose update time is 2 days ago |
ignoreSymlink
field | type | required | default | description |
---|
ignoreSymlink | bool | false | false | whether to ignore symbolic links (soft links) files |
field | type | required | default | description |
---|
addonMeta | bool | false | false | whether to add the default log collection state meta information |
event example
{
"body": "this is test",
"state": {
"pipeline": "local",
"source": "demo",
"filename": "/var/log/a.log",
"timestamp": "2006-01-02T15:04:05.000Z",
"offset": 1024,
"bytes": 4096,
"hostname": "node-1"
}
}
state explanation:
- pipeline: the name of the pipeline where it is located
- source: the name of the source where it is located
- filename: the name of the collected file
- timestamp: the timestamp of the collection time
- offset: the offset of the collected data in the file
- bytes: the number of bytes of data collected
- hostname: the name of the node where it is located
workerCount
field | type | required | default | description |
---|
workerCount | int | false | 1 | The number of worker threads (goroutines) that read the contents of the file. Consider increasing it when there are more than 100 files on a single node |
readBufferSize
field | type | required | default | description |
---|
readBufferSize | int | false | 65536 | The amount of data to read from the file at a time. Default 64K=65536 |
maxContinueRead
field | type | required | default | description |
---|
maxContinueRead | int | false | 16 | The number of times the content of the same file is read continuously. Reaching this number of times cause forced switch to the next file to read. The main function is to prevent active files from occupying reading resources all the time, in which case inactive files cannot be read and collected for a long time. |
maxContinueReadTimeout
field | type | required | default | description |
---|
maxContinueReadTimeout | time.Duration | false | 3s | The maximum reading time of the same file. If this time is exceeded, the next file will be forced to be read. Similar to maxContinueRead |
inactiveTimeout
field | type | required | default | description |
---|
inactiveTimeout | time.Duration | false | 3s | If the file has exceeded inactiveTimeout from the last collection, it is considered that the file has entered an inactive state (that is, the last log has been written), and that the last line of log can be collected safely. |
firstNBytesForIdentifier
field | type | required | default | description |
---|
firstNBytesForIdentifier | int | false | 128 | Use the first n characters of the collected target file to generate the file unique code. If the size of the file is less than n, the file will not be collected temporarily. The main purpose is to accurately identify a file in combination with file inode information and to determine whether the file is deleted or renamed. |
charset
Encoding conversion, used to convert different encodings to utf8.
Example
sources:
- type: file
name: demo
paths:
- /tmp/log/*.log
fields:
topic: "loggie"
charset: "gbk"
field | type | required | default | description |
---|
charset | string | false | utf-8 | Matching model for extracted fields |
The currently supported encoding formats for converting to utf-8 are:
nop
plain
utf-8
gbk
big5
euc-jp
iso2022-jp
shift-jis
euc-kr
iso8859-6e
iso8859-6i
iso8859-8e
iso8859-8i
iso8859-1
iso8859-2
iso8859-3
iso8859-4
iso8859-5
iso8859-6
iso8859-7
iso8859-8
iso8859-9
iso8859-10
iso8859-13
iso8859-14
iso8859-15
iso8859-16
cp437
cp850
cp852
cp855
cp858
cp860
cp862
cp863
cp865
cp866
ebcdic-037
ebcdic-1040
ebcdic-1047
koi8r
koi8u
macintosh
macintosh-cyrillic
windows1250
windows1251
windows1252
windows1253
windows1254
windows1255
windows1256
windows1257
windows1258
windows874
utf-16be-bom
utf-16le-bom
lineDelimiter
Newline symbol configuration
Example
sources:
- type: file
name: demo
lineDelimiter:
type: carriage_return_line_feed
value: "\r\n"
charset: gbk
type
field | type | required | default | description |
---|
type | bool | false | auto | value is only valid when type is custom |
Currently supported types are:
auto
line_feed
vertical_tab
form_feed
carriage_return
carriage_return_line_feed
next_line
line_separator
paragraph_separator
null_terminator
The corresponding newline symbols are:
auto: {'\u000A'},
line_feed: {'\u000A'},
vertical_tab: {'\u000B'},
form_feed: {'\u000C'},
carriage_return: {'\u000D'},
carriage_return_line_feed: []byte("\u000D\u000A"),
next_line: {'\u0085'},
line_separator: []byte("\u2028"),
paragraph_separator: []byte("\u2029"),
null_terminator: {'\u0000'},
```
### value
<table><thead><tr><th><code>field</code></th><th><code>type</code></th><th><code>required</code></th><th><code>default</code></th><th><code>description</code></th></tr></thead><tbody><tr><td>value</td><td>string</td><td>false</td><td>\n</td><td>newline symbol</td></tr></tbody></table>
### charset
<table><thead><tr><th><code>field</code></th><th><code>type</code></th><th><code>required</code></th><th><code>default</code></th><th><code>description</code></th></tr></thead><tbody><tr><td>charset</td><td>string</td><td>false</td><td>utf-8</td><td>newline symbol encoding</td></tr></tbody></table>
## multi
Multi-line collection configuration
Example
sources:
- type: file
name: accesslog
multi:
active: true
```
active
field | type | required | default | description |
---|
active | bool | false | false | whether to enable multi-line |
pattern
field | type | required | default | description |
---|
pattern | string | required when multi.active=true | false | A regular expression that is used to judge whether a line is a brand new log. For example, if it is configured as ‘^[‘, it is considered that a line beginning with [ is a new log, otherwise the content of this line is merged into the previous log as part of the previous log. |
maxLines
field | type | required | default | description |
---|
maxLines | int | false | 500 | Number of lines a log can contains at most. The default is 500 lines. If the upper limit is exceeded, the current log will be forced to be sent, and the excess will be used as a new log. |
maxBytes
field | type | required | default | description |
---|
maxBytes | int64 | false | 131072 | Number of bytes a log can contains at most. The default is 128K. If the upper limit is exceeded, the current log will be forced to be sent, and the excess will be used as a new log. |
timeout
field | type | required | default | description |
---|
timeout | time.Duration | false | 5s | How long to wait for a log to be collected as a complete log. The default is 5s. If the upper limit is exceeded, the current log will be sent, and the excess will be used as a new log. |
ack
Configuration related to the confirmation of the source. If you need to make sure at least once
, you need to turn on the ack mechanism, but there will be a certain performance loss.
Caution
This configuration can only be configured in defaults
Example
defaults:
sources:
- type: file
ack:
enable: true
enable
field | type | required | default | description |
---|
enable | bool | false | true | Whether to enable confirmation |
maintenanceInterval
field | type | required | default | description |
---|
maintenanceInterval | time.Duration | false | 20h | maintenance cycle. Used to regularly clean up expired confirmation data (such as the ack information of files that are no longer collected) |
db
Use sqlite3
as database. Save the file name, file inode, offset of file collection and other information during the collection process. Used to restore the last collection progress after logie reload or restart.
Caution
This configuration can only be configured in defaults.
Example
defaults:
sources:
- type: file
db:
file: "./data/loggie.db"
file
field | type | required | default | description |
---|
file | string | false | ./data/loggie.db | database file path |
tableName
field | type | required | default | description |
---|
tableName | string | false | registry | database table name |
flushTimeout
field | type | required | default | description |
---|
flushTimeout | time.Duration | false | 2s | write the collected information to the database regularly |
bufferSize
field | type | required | default | description |
---|
bufferSize | int | false | 2048 | The buffer size of the collection information written into the database |
cleanInactiveTimeout
field | type | required | default | description |
---|
cleanInactiveTimeout | time.Duration | false | 504h | Clean up outdated data in the database. If the update time of the data exceeds the configured value, the data will be deleted. 21 days by default. |
cleanScanInterval
field | type | required | default | description |
---|
cleanScanInterval | time.Duration | false | 1h | Periodically check the database for outdated data. Check every 1 hour by default |
watcher
Configuration for monitoring file changes
Caution
This configuration can only be configured in defaults
Example
defaults:
sources:
- type: file
watcher:
enableOsWatch: true
enableOsWatch
field | type | required | default | description |
---|
enableOsWatch | bool | false | true | Whether to enable the monitoring notification mechanism of the OS. For example, inotify of linux |
scanTimeInterval
field | type | required | default | description |
---|
scanTimeInterval | time.Duration | false | 10s | Periodically check file status changes (such as file creation, deletion, etc.). Check every 10s by default |
maintenanceInterval
field | type | required | default | description |
---|
maintenanceInterval | time.Duration | false | 5m | Periodic maintenance work (such as reporting and collecting statistics, cleaning files, etc.) |
fdHoldTimeoutWhenInactive
field | type | required | default | description |
---|
fdHoldTimeoutWhenInactive | time.Duration | false | 5m | When the time from the last collection of the file to the present exceeds the limit (the file has not been written for a long time, it is considered that there is a high probability that the content will not be written again), the handle of the file will be released to release system resources |
fdHoldTimeoutWhenRemove
field | type | required | default | description |
---|
fdHoldTimeoutWhenRemove | time.Duration | false | 5m | When the file is deleted and the collection is not completed, it will wait for the maximum time to complete the collection. If the limit is exceeded, no matter whether the file is finally collected or not, the handle will be released directly and no longer collected. |
maxOpenFds
field | type | required | default | description |
---|
maxOpenFds | int | false | 512 | The maximum number of open file handles. If the limit is exceeded, the files will not be collected temporarily |
maxEofCount
field | type | required | default | description |
---|
maxEofCount | int | false | 3 | The maximum number of times EoF is encountered in consecutive reads of a file. If the limit is exceeded, it is considered that the file is temporarily inactive and will enter the “zombie” queue to wait for the update event to be activated. |
cleanWhenRemoved
field | type | required | default | description |
---|
cleanWhenRemoved | bool | false | true | When the file is deleted, whether to delete the collection-related information in the db synchronously. |
readFromTail
field | type | required | default | description |
---|
readFromTail | bool | false | false | Whether to start collecting from the latest line of the file, regardless of writing history. It is suitable for scenarios such as migration of collection systems. |
taskStopTimeout
field | type | required | default | description |
---|
taskStopTimeout | time.Duration | false | 30s | The timeout period for the collection task to exit. It is a bottom-up solution when Loggie cannot be reloaded. |
cleanFiles
File clearing related configuration. Expired and collected files will be deleted directly from the disk to free up disk space.
maxHistoryDays
field | type | required | default | description |
---|
maxHistoryDays | int | false | none | Maximum number of days to keep files (after collection). If the limit is exceeded, the file will be deleted directly from the disk. If not configured, the file will never be deleted |