Hive Example

Hive Example

Here we use a simple example to help you experience InLong by Docker.

Install Hive

Hive is the necessary component. If you don’t have Hive in your machine, we recommand using Docker to install it. Details can be found here.

Note that if you use Docker, you need to add a port mapping 8020:8020, because it’s the port of HDFS DefaultFS, and we need to use it later.

Install InLong

Before we begin, we need to install InLong. Here we provide two ways:

Install InLong with Docker by according to the instructions here.(Recommanded)
Install InLong binary according to the instructions here.

Create a data access

After deployment, we first enter the “Data Access” interface, click “Create an Access” in the upper right corner to create a new date access, and fill in the data streams group information as shown in the figure below.

Then we click the next button, and fill in the stream information as shown in the figure below.

Note that the message source is “File”, and we don’t need to create a message source manually.

Then we fill in the following information in the “data information” column below.

Then we select Hive in the data flow and click “Add” to add Hive configuration

Note that the target table does not need to be created in advance, as InLong Manager will automatically create the table for us after the access is approved. Also, please use connection test to ensure that InLong Manager can connect to your Hive.

Then we click the “Submit for Approval” button, the connection will be created successfully and enter the approval state.

Approve the data access

Then we enter the “Approval Management” interface and click “My Approval” to approve the data access that we just applied for.

At this point, the data access has been created successfully. We can see that the corresponding table has been created in Hive, and we can see that the corresponding topic has been created successfully in the management GUI of TubeMQ.

Configure the agent

Create a collect job by using curl to make a request.

curl --location --request POST 'http://localhost:8008/config/job' \
--header 'Content-Type: application/json' \
--data '{
"job": {
"dir": {
"path": "",
"pattern": "/data/collect-data/test.log"
},
"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
"id": 1,
"thread": {
"running": {
"core": "4"
}
},
"name": "fileAgentTest",
"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
},
"proxy": {
"inlongGroupId": "b_test_group",
"inlongStreamId": "test_stream"
},
"op": "add"
}'

At this point, the agent is configured successfully. Then we need to create a new file ./collect-data/test.log and add content to it to trigger the agent to send data to the dataproxy.

mkdir collect-data
END=100000
for ((i=1;i<=END;i++)); do
    sleep 3
    echo "name_$i | $i" >> ./collect-data/test.log
done

Then we can observe the logs of agent and dataproxy, and we can see that the relevant data has been sent successfully.

$ docker logs agent
$ docker logs dataproxy