Build && Deployment

1、Configuration

  1. cd inlong-agent

The agent supports two modes of operation: local operation and online operation

Agent configuration

Online operation needs to pull the configuration from inlong-manager, the configuration conf/agent.properties is as follows:

  1. agent.fetcher.classname=org.apache.inlong.agent.plugin.fetcher.ManagerFetcher (the class name for fetch tasks, default ManagerFetcher
  2. agent.local.ip=Write local ip
  3. agent.manager.vip.http.host=manager web host
  4. agent.manager.vip.http.port=manager web port

2、run

After decompression, run the following command

  1. sh agent.sh start

3、Add job configuration in real time

3.1 agent.properties Modify the following two places

  1. # whether enable http service
  2. agent.http.enable=true
  3. # http default port
  4. agent.http.port=Available ports

3.2 Execute the following command

  1. curl --location --request POST 'http://localhost:8008/config/job' \
  2. --header 'Content-Type: application/json' \
  3. --data '{
  4. "job": {
  5. "dir": {
  6. "path": "",
  7. "pattern": "/data/inlong-agent/test.log"
  8. },
  9. "trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
  10. "id": 1,
  11. "thread": {
  12. "running": {
  13. "core": "4"
  14. },
  15. "onejob": true
  16. },
  17. "name": "fileAgentTest",
  18. "source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
  19. "sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
  20. "channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
  21. },
  22. "proxy": {
  23. "groupId": "groupId10",
  24. "streamId": "groupId10"
  25. },
  26. "op": "add"
  27. }'
  1. The meaning of each parameter is
  2. - job.dir.pattern: Configure the read file path, which can include regular expressions
  3. - job.trigger: Trigger name, the default is DirectoryTrigger, the function is to monitor the files under the folder to generate events
  4. - job.source: The type of data source used, the default is TextFileSource, which reads text files
  5. - job.sinkThe type of writer used, the default is ProxySink, which sends messages to the proxy
  6. - proxy.groupId: The groupId type used when writing proxy, groupId is group id showed on data access in inlong-manager, not the topic name.
  7. - proxy.streamId: The streamId type used when writing proxy, streamId is the data flow id showed on data flow window in inlong-manager

4、eg for directory config

  1. E.g:
  2. /data/inlong-agent/test.log //Represents reading the new file test.log in the inlong-agent folder
  3. /data/inlong-agent/test[0-9]{1} // means to read the new file test in the inlong-agent folder followed by a number at the end
  4. /data/inlong-agent/test //If test is a directory, it means to read all new files under test
  5. /data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits, followed by. or end with one. or more digits (? stands for optional, can match Examples: "5", "1.5" and "2.21"

5. Support to get data time from file name

  1. Agent supports obtaining the time from the file name as the production time of the data. The configuration instructions are as follows:
  2. /data/inlong-agent/***YYYYMMDDHH***
  3. Where YYYYDDMMHH represents the data time, YYYY represents the year, MM represents the month, DD represents the day, and HH represents the hour
  4. Where *** is any character
  5. At the same time, you need to add the current data cycle to the job conf, the current support day cycle and hour cycle,
  6. When adding a task, add the property job.cycleUnit
  7. job.cycleUnit contains the following two types:
  8. 1. D: Represents the data time and day dimension
  9. 2. H: Represents the data time and hour dimension
  10. E.g:
  11. The configuration data source is
  12. /data/inlong-agent/YYYYMMDDHH.log
  13. Write data to 2021020211.log
  14. Configure job.cycleUnit as D
  15. Then the agent will try the 202020211.log file at the time of 202020211. When reading the data in the file, it will write all the data to the backend proxy at the time of 20210202.
  16. If job.cycleUnit is configured as H
  17. When collecting data in the 2021020211.log file, all data will be written to the backend proxy at the time of 2021020211
  18. Examples of job submission
  1. curl --location --request POST'http://localhost:8008/config/job' \
  2. --header'Content-Type: application/json' \
  3. --data'{
  4. "job": {
  5. "dir": {
  6. "path": "",
  7. "pattern": "/data/inlong-agent/test.log"
  8. },
  9. "trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
  10. "id": 1,
  11. "thread": {
  12. "running": {
  13. "core": "4"
  14. }
  15. },
  16. "name": "fileAgentTest",
  17. "cycleUnit": "D",
  18. "source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
  19. "sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
  20. "channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
  21. },
  22. "proxy": {
  23. "group": "group10",
  24. "group": "group10"
  25. },
  26. "op": "add"
  27. }'

6. Support time offset reading

  1. After the configuration is read by time, if you want to read data at other times than the current time, you can configure the time offset to complete
  2. Configure the job attribute name as job.timeOffset, the value is number + time dimension, time dimension includes day and hour
  3. For example, the following settings are supported
  4. 1. 1d Read the data one day after the current time
  5. 2. -1h read the data one hour before the current time
  6. Examples of job submission
  1. curl --location --request POST'http://localhost:8008/config/job' \
  2. --header'Content-Type: application/json' \
  3. --data'{
  4. "job": {
  5. "dir": {
  6. "path": "",
  7. "pattern": "/data/inlong-agent/test.log"
  8. },
  9. "trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
  10. "id": 1,
  11. "thread": {
  12. "running": {
  13. "core": "4"
  14. }
  15. },
  16. "name": "fileAgentTest",
  17. "cycleUnit": "D",
  18. "timeOffset": "-1d",
  19. "source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
  20. "sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
  21. "channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
  22. },
  23. "proxy": {
  24. "groupId": "groupId10",
  25. "streamId": "streamId10"
  26. },
  27. "op": "add"
  28. }'