Synchronize Data Using Insert Method

Users can use INSERT statement to import data through MySQL protocol.

The INSERT statement is used in a similar way to the INSERT statement used in databases such as MySQL. The INSERT statement supports the following two syntaxes:

  1. * INSERT INTO table SELECT ...
  2. * INSERT INTO table VALUES(...)

Here we only introduce the second way. For a detailed description of the INSERT command, see the INSERT command documentation.

Single write

Single write means that the user directly executes an INSERT command. An example is as follows:

  1. INSERT INTO example_tbl (col1, col2, col3) VALUES (1000, "test", 3.25);

For Doris, an INSERT command is a complete import transaction.

Therefore, whether it is importing one piece of data or multiple pieces of data, we do not recommend using this method for data import in the production environment. The INSERT operation of high-frequency words will result in a large number of small files in the storage layer, which will seriously affect the system performance.

This method is only used for simple offline tests or low-frequency operations.

Or you can use the following methods for batch insert operations:

  1. INSERT INTO example_tbl VALUES
  2. (1000, "baidu1", 3.25)
  3. (2000, "baidu2", 4.25)
  4. (3000, "baidu3", 5.25);

We recommend that the number of inserts in a batch be as large as possible, such as thousands or even 10,000 at a time. Or you can use PreparedStatement to perform batch inserts through the following procedure.

JDBC example

Here we give a simple JDBC batch INSERT code example:

  1. package demo.doris;
  2. import java.sql.Connection;
  3. import java.sql.DriverManager;
  4. import java.sql.PreparedStatement;
  5. import java.sql.SQLException;
  6. public class DorisJDBCDemo {
  7. private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
  8. private static final String DB_URL_PATTERN = "jdbc:mysql://%s:%d/%s?rewriteBatchedStatements=true";
  9. private static final String HOST = "127.0.0.1"; // Leader Node host
  10. private static final int PORT = 9030; // query port of Leader Node
  11. private static final String DB = "demo";
  12. private static final String TBL = "test_1";
  13. private static final String USER = "admin";
  14. private static final String PASSWD = "my_pass";
  15. private static final int INSERT_BATCH_SIZE = 10000;
  16. public static void main(String[] args) {
  17. insert();
  18. }
  19. private static void insert() {
  20. // Be careful not to add a semicolon ";" at the end
  21. String query = "insert into " + TBL + " values(?, ?)";
  22. // Set Label to be idempotent.
  23. // String query = "insert into " + TBL + " WITH LABEL my_label values(?, ?)";
  24. Connection conn = null;
  25. PreparedStatement stmt = null;
  26. String dbUrl = String.format(DB_URL_PATTERN, HOST, PORT, DB);
  27. try {
  28. Class.forName(JDBC_DRIVER);
  29. conn = DriverManager.getConnection(dbUrl, USER, PASSWD);
  30. stmt = conn.prepareStatement(query);
  31. for (int i =0; i < INSERT_BATCH_SIZE; i++) {
  32. stmt.setInt(1, i);
  33. stmt.setInt(2, i * 100);
  34. stmt.addBatch();
  35. }
  36. int[] res = stmt.executeBatch();
  37. System.out.println(res);
  38. } catch (Exception e) {
  39. e.printStackTrace();
  40. } finally {
  41. try {
  42. if (stmt != null) {
  43. stmt.close();
  44. }
  45. } catch (SQLException se2) {
  46. se2.printStackTrace();
  47. }
  48. try {
  49. if (conn != null) conn.close();
  50. } catch (SQLException se) {
  51. se.printStackTrace();
  52. }
  53. }
  54. }
  55. }

Please note the following:

  1. The JDBC connection string needs to add the rewriteBatchedStatements=true parameter and use the PreparedStatement method.

    Currently, Doris does not support PrepareStatemnt on the server side, so the JDBC Driver will perform batch Prepare on the client side.

    rewriteBatchedStatements=true will ensure that the Driver executes batches. And finally form an INSERT statement of the following form and send it to Doris:

    1. INSERT INTO example_tbl VALUES
    2. (1000, "baidu1", 3.25)
    3. (2000, "baidu2", 4.25)
    4. (3000, "baidu3", 5.25);
  2. Batch size

    Because batch processing is performed on the client, if a batch is too large, it will occupy the memory resources of the client, so you need to pay attention.

    Doris will support PrepareStatemnt on the server in the future, so stay tuned.

  3. Import atomicity

    Like other import methods, the INSERT operation itself supports atomicity. Each INSERT operation is an import transaction, which guarantees atomic writing of all data in an INSERT.

    As mentioned earlier, we recommend that when using INSERT to import data, use the “batch” method to import, rather than a single insert.

    At the same time, we can set a Label for each INSERT operation. Through the Label mechanism, the idempotency and atomicity of operations can be guaranteed, and the data will not be lost or heavy in the end. For the specific usage of Label in INSERT, you can refer to the INSERT document.