RocketMQ Connect in Action 5

Elasticsearch Source -> RocketMQ Connect -> Elasticsearch Sink

preparatory work

Start RocketMQ

  1. Linux/Unix/Mac
  2. 64bit JDK 1.8+;
  3. Maven 3.2.x+;
  4. Start RocketMQ. Either RocketMQ 4.x or RocketMQ 5.x 5.x version can be used;
  5. Test RocketMQ message sending and receiving using the tool.

Here, use the environment variable NAMESRV_ADDR to inform the tool client of the NameServer address of RocketMQ as localhost:9876.

  1. #$ cd distribution/target/rocketmq-4.9.7/rocketmq-4.9.7
  2. $ cd distribution/target/rocketmq-5.1.4/rocketmq-5.1.4
  3. $ export NAMESRV_ADDR=localhost:9876
  4. $ sh bin/tools.sh org.apache.rocketmq.example.quickstart.Producer
  5. SendResult [sendStatus=SEND_OK, msgId= ...
  6. $ sh bin/tools.sh org.apache.rocketmq.example.quickstart.Consumer
  7. ConsumeMessageThread_%d Receive New Messages: [MessageExt...

Note: RocketMQ has the feature of automatically creating Topic and Group. When sending or subscribing to messages, if the corresponding Topic or Group does not exist, RocketMQ will automatically create them. Therefore, there is no need to create Topic and Group in advance.

Here’s the English translation of the content:

Building the Connector Runtime

Clone the repository and build the RocketMQ Connect project:

  1. git clone https://github.com/apache/rocketmq-connect.git
  2. cd rocketmq-connect
  3. export RMQ_CONNECT_HOME=`pwd`
  4. mvn -Prelease-connect -Dmaven.test.skip=true clean install -U

Build Elasticsearch Connector Plugin

Build the Elasticsearch RocketMQ Connector plugin:

  1. cd $RMQ_CONNECT_HOME/connectors/rocketmq-connect-elasticsearch/
  2. mvn clean package -Dmaven.test.skip=true

Copy the compiled Elasticsearch RocketMQ Connector plugin JAR file into the plugin directory used by the runtime:

  1. mkdir -p /Users/YourUsername/rocketmqconnect/connector-plugins
  2. cp target/rocketmq-connect-elasticsearch-1.0.0-jar-with-dependencies.jar /Users/YourUsername/rocketmqconnect/connector-plugins

Run Connector Worker in Standalone Mode

Modify the connect-standalone.conf file to configure the RocketMQ connection address and other information.

  1. cd $RMQ_CONNECT_HOME/distribution/target/rocketmq-connect-0.0.1-SNAPSHOT/rocketmq-connect-0.0.1-SNAPSHOT
  2. vim conf/connect-standalone.conf

Example configuration information is as follows:

  1. workerId=standalone-worker
  2. storePathRootDir=/Users/YourUsername/rocketmqconnect/storeRoot
  3. ## Http port for user to access REST API
  4. httpPort=8082
  5. # Rocketmq namesrvAddr
  6. namesrvAddr=localhost:9876
  7. # RocketMQ acl
  8. aclEnable=false
  9. #accessKey=rocketmq
  10. #secretKey=12345678
  11. clusterName="DefaultCluster"
  12. # Plugin path for loading Source/Sink Connectors
  13. pluginPaths=/Users/YourUsername/rocketmqconnect/connector-plugins

In standalone mode, RocketMQ Connect persistently stores the synchronization checkpoint information in the local file directory specified by storePathRootDir.

storePathRootDir=/Users/YourUsername/rocketmqconnect/storeRoot

If you want to reset the synchronization checkpoint, delete the persistence files:

  1. rm -rf /Users/YourUsername/rocketmqconnect/storeRoot/*

To start Connector Worker in standalone mode:

  1. sh bin/connect-standalone.sh -c conf/connect-standalone.conf &

Set Up Elasticsearch Services

Elasticsearch is an open-source search and analytics engine.

We’ll use two separate Docker instances of Elasticsearch to serve as our source and destination databases:

  1. docker pull docker.elastic.co/elasticsearch/elasticsearch:7.15.1
  2. docker run --name es1 -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
  3. -v /Users/YourUsername/rocketmqconnect/es/es1_data:/usr/share/elasticsearch/data \
  4. -d docker.elastic.co/elasticsearch/elasticsearch:7.15.1
  5. docker run --name es2 -p 9201:9200 -p 9301:9300 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
  6. -v /Users/YourUsername/rocketmqconnect/es/es2_data:/usr/share/elasticsearch/data \
  7. -d docker.elastic.co/elasticsearch/elasticsearch:7.15.1

Explanation of Docker commands:

  • --name es2: Specifies a name for the container, e.g., es2.
  • -p 9201:9200 -p 9301:9300: Maps ports 9200 and 9300 on the Elasticsearch container to host ports 9201 and 9301 so that the Elasticsearch service can be accessed via the host.
  • -e discovery.type=single-node: configures Elasticsearch to work on a single node without discovering other nodes in a cluster, suitable for single-server deployment.
  • -v /Users/YourUsername/rocketmqconnect/es/es2_data:/usr/share/elasticsearch/data: Mounts a directory on the host to /usr/share/elasticsearch/data within the container for persistent storage of Elasticsearch data.

This runs a custom-configured instance of Elasticsearch with persistent data storage on a container accessible through port 9200 on the host machine, making it useful for development or testing environments on a local machine.

View the Elasticsearch logs:

  1. docker logs -f es1
  2. docker logs -f es2

Verify that Elasticsearch has started successfully:

  1. # Check Elasticsearch instance 1
  2. curl -XGET http://localhost:9200
  3. # Check Elasticsearch instance 2
  4. curl -XGET http://localhost:9201

A successful connection and correct operation will result in JSON responses containing information about Elasticsearch and its version number.

Set Up Kibana Services

Kibana is an open-source data visualization tool that allows users to interactively explore and understand data stored within Elasticsearch clusters. It offers rich features such as charts, graphs, and dashboards.

For convenience, we’ll set up two separate instances of Kibana in Docker and link them to our previously established Elasticsearch containers using the following command:

  1. docker pull docker.elastic.co/kibana/kibana:7.15.1
  2. docker run --name kibana1 --link es1:elasticsearch -p 5601:5601 -d docker.elastic.co/kibana/kibana:7.15.1
  3. docker run --name kibana2 --link es2:elasticsearch -p 5602:5601 -d docker.elastic.co/kibana/kibana:7.15.1

Explanation of Docker Commands:

  • --name kibana2: Assigns a name to the new container, e.g., kibana2
  • --link es2:elasticsearch: Links the container to another named Elasticsearch instance (in this case, ‘es2’). This enables communication between Kibana and Elasticsearch.
  • -p 5602:5601: Maps Kibana’s default port (5601) to the same port on the host machine to make it accessible through the browser.
  • -d: runs the Docker container in detached mode.

Once the container has launched, you can monitor its log output:

  1. docker logs -f kibana1
  2. docker logs -f kibana2

To access Kibana console pages, simply visit following addresses in your browser

If they load correctly, it indicates successful startup of the respective Kibana instances.

Write Test Data to the Source Elasticsearch

Kibana’s Dev Tools can help you interact and operate directly with Elasticsearch in Kibana. You can execute various queries and operations, analyze and understand the returned data. Refer to the documentation console-kibana.

Bulk Write Test Data

Access the Kibana1 console through the browser, find Dev Tools from the left menu, and enter the following commands on the page to write test data:

  1. POST /_bulk
  2. { "index" : { "_index" : "connect_es" } }
  3. { "id": "1", "field1": "value1", "field2": "value2" }
  4. { "index" : { "_index" : "connect_es" } }
  5. { "id": "2", "field1": "value3", "field2": "value4" }

Note:

  • connect_es: The index name for the data.
  • id/field1/field2: These are field names, and 1, value1, value2 represent the values for the fields.

Note: There is a limitation in rocketmq-connect-elasticsearch, which requires a field in the data that can be used for >= comparison operations (string or number). This field will be used to record the synchronization checkpoint. In the above example, the id field is a globally unique, incrementing numerical field.

Query Data

To query data within an index, use the following command:

  1. GET /connect_es/_search
  2. {
  3. "size": 100
  4. }

If there is no data available, the response will be:

  1. {
  2. "error" : {
  3. ...
  4. "type" : "index_not_found_exception",
  5. "reason" : "no such index [connect_es]",
  6. "resource.type" : "index_or_alias",
  7. "resource.id" : "connect_es",
  8. "index_uuid" : "_na_",
  9. "index" : "connect_es"
  10. },
  11. "status" : 404
  12. }

If there is data available, the response will be:

  1. {
  2. ...
  3. "hits" : {
  4. "total" : {
  5. "value" : 2,
  6. "relation" : "eq"
  7. },
  8. "max_score" : 1.0,
  9. "hits" : [
  10. {
  11. "_index" : "connect_es",
  12. "_type" : "_doc",
  13. "_id" : "_dx49osBb46Z9cN4hYCg",
  14. "_score" : 1.0,
  15. "_source" : {
  16. "id" : "1",
  17. "field1" : "value1",
  18. "field2" : "value2"
  19. }
  20. },
  21. {
  22. "_index" : "connect_es",
  23. "_type" : "_doc",
  24. "_id" : "_tx49osBb46Z9cN4hYCg",
  25. "_score" : 1.0,
  26. "_source" : {
  27. "id" : "2",
  28. "field1" : "value3",
  29. "field2" : "value4"
  30. }
  31. }
  32. ]
  33. }
  34. }

Delete Data

If you need to delete data within an index due to repeated testing or other reasons, you can use the following command:

  1. DELETE /connect_es

Start Connector

Start Elasticsearch Source Connector

Run the following command to start the ES source connector. The connector will connect to Elasticsearch and read document data from the connect_es index. It will parse the Elasticsearch document data and package it into a generic ConnectRecord object, which will be sent to a RocketMQ topic for consumption by the Sink Connector.

  1. curl -X POST -H "Content-Type: application/json" http://127.0.0.1:8082/connectors/elasticsearchSourceConnector -d '{
  2. "connector.class":"org.apache.rocketmq.connect.elasticsearch.connector.ElasticsearchSourceConnector",
  3. "elasticsearchHost":"localhost",
  4. "elasticsearchPort":9200,
  5. "index":{
  6. "connect_es": {
  7. "primaryShards":1,
  8. "id":1
  9. }
  10. },
  11. "max.tasks":2,
  12. "connect.topicname":"ConnectEsTopic",
  13. "value.converter":"org.apache.rocketmq.connect.runtime.converter.record.json.JsonConverter",
  14. "key.converter":"org.apache.rocketmq.connect.runtime.converter.record.json.JsonConverter"
  15. }'

Note: The startup command specifies that the source ES should synchronize the connect_es index, and the incrementing field in the index is id. Data will be fetched starting from id=1.

If the curl request returns status:200, it indicates a successful creation, and the sample response will be:

{“status”:200,”body”:{“connector.class”:”…

If you see the following logs, it indicates that the file source connector has started successfully.

  1. tail -100f ~/logs/rocketmqconnect/connect_runtime.log

Start connector elasticsearchSourceConnector and set target state STARTED successed!!

Start Elasticsearch Sink Connector

Run the following command to start the ES sink connector. The connector will subscribe to data from the RocketMQ topic and consume it. It will convert each message into document data and write it to the destination ES.

  1. curl -X POST -H "Content-Type: application/json" http://127.0.0.1:8082/connectors/elasticsearchSinkConnector -d '{
  2. "connector.class":"org.apache.rocketmq.connect.elasticsearch.connector.ElasticsearchSinkConnector",
  3. "elasticsearchHost":"localhost",
  4. "elasticsearchPort":9201,
  5. "max.tasks":2,
  6. "connect.topicnames":"ConnectEsTopic",
  7. "value.converter":"org.apache.rocketmq.connect.runtime.converter.record.json.JsonConverter",
  8. "key.converter":"org.apache.rocketmq.connect.runtime.converter.record.json.JsonConverter"
  9. }'

Note: The startup command specifies the address and port of the destination ES, which corresponds to the previously started ES2 in Docker.

If the curl request returns status:200, it indicates a successful creation, and the sample response will be:

{“status”:200,”body”:{“connector.class”:”…

If you see the following logs, it indicates that the file source connector has started successfully:

  1. tail -100f ~/logs/rocketmqconnect/connect_runtime.log

Start connector elasticsearchSinkConnector and set target state STARTED successed!!

To check if the sink connector has written data to the destination ES index:

  1. Access the Kibana2 console address in the browser: http://localhost:5602
  2. In the Kibana2 Dev Tools page, query the data within the index. If it matches the data in the source ES1, it means the connector is running properly.
  1. GET /connect_es/_search
  2. {
  3. "size": 100
  4. }