Tablet Local Debug

During the online operation of Doris, various bugs may occur due to various reasons. For example: the replica is inconsistent, the data exists in the version diff, etc.

At this time, it is necessary to copy the copy data of the tablet online to the local environment for reproduction, and then locate the problem.

1. Get information about the tablet

The tablet id can be confirmed by the BE log, and then the information can be obtained by the following command (assuming the tablet id is 10020).

Get information such as DbId/TableId/PartitionId where the tablet is located.

  1. mysql> show tablet 10020\G
  2. *************************** 1. row ***************************
  3. DbName: default_cluster:db1
  4. TableName: tbl1
  5. PartitionName: tbl1
  6. IndexName: tbl1
  7. DbId: 10004
  8. TableId: 10016
  9. PartitionId: 10015
  10. IndexId: 10017
  11. IsSync: true
  12. Order: 1
  13. DetailCmd: SHOW PROC '/dbs/10004/10016/partitions/10015/10017/10020';

Execute DetailCmd in the previous step to obtain information such as BackendId/SchemaHash.

  1. mysql> SHOW PROC '/dbs/10004/10016/partitions/10015/10017/10020'\G
  2. *************************** 1. row ***************************
  3. ReplicaId: 10021
  4. BackendId: 10003
  5. Version: 3
  6. LstSuccessVersion: 3
  7. LstFailedVersion: -1
  8. LstFailedTime: NULL
  9. SchemaHash: 785778507
  10. LocalDataSize: 780
  11. RemoteDataSize: 0
  12. RowCount: 2
  13. State: NORMAL
  14. IsBad: false
  15. VersionCount: 3
  16. PathHash: 7390150550643804973
  17. MetaUrl: http://192.168.10.1:8040/api/meta/header/10020
  18. CompactionStatus: http://192.168.10.1:8040/api/compaction/show?tablet_id=10020

Create tablet snapshot and get table creation statement

  1. mysql> admin copy tablet 10020 properties("backend_id" = "10003", "version" = "2")\G
  2. *************************** 1. row ***************************
  3. TabletId: 10020
  4. BackendId: 10003
  5. Ip: 192.168.10.1
  6. Path: /path/to/be/storage/snapshot/20220830101353.2.3600
  7. ExpirationMinutes: 60
  8. CreateTableStmt: CREATE TABLE `tbl1` (
  9. `k1` int(11) NULL,
  10. `k2` int(11) NULL
  11. ) ENGINE=OLAP
  12. DUPLICATE KEY(`k1`, `k2`)
  13. DISTRIBUTED BY HASH(k1) BUCKETS 1
  14. PROPERTIES (
  15. "replication_num" = "1",
  16. "version_info" = "2"
  17. );

The admin copy tablet command can generate a snapshot file of the corresponding replica and version for the specified tablet. Snapshot files are stored in the Path directory of the BE node indicated by the Ip field.

There will be a directory named tablet id under this directory, which will be packaged as a whole for later use. (Note that the directory is kept for a maximum of 60 minutes, after which it is automatically deleted).

  1. cd /path/to/be/storage/snapshot/20220830101353.2.3600
  2. tar czf 10020.tar.gz 10020/

The command will also generate the table creation statement corresponding to the tablet at the same time. Note that this table creation statement is not the original table creation statement, its bucket number and replica number are both 1, and the versionInfo field is specified. This table building statement is used later when loading the tablet locally.

So far, we have obtained all the necessary information, the list is as follows:

  1. Packaged tablet data, such as 10020.tar.gz.
  2. Create a table statement.

2. Load Tablet locally

  1. Build a local debugging environment

    Deploy a single-node Doris cluster (1FE, 1BE) locally, and the deployment version is the same as the online cluster. If the online deployment version is DORIS-1.1.1, the local environment also deploys the DORIS-1.1.1 version.

  2. Create a table

    Create a table in the local environment using the create table statement from the previous step.

  3. Get the tablet information of the newly created table

    Because the number of buckets and replicas of the newly created table is 1, there will only be one tablet with one replica:

    1. mysql> show tablets from tbl1\G
    2. *************************** 1. row ***************************
    3. TabletId: 10017
    4. ReplicaId: 10018
    5. BackendId: 10003
    6. SchemaHash: 44622287
    7. Version: 1
    8. LstSuccessVersion: 1
    9. LstFailedVersion: -1
    10. LstFailedTime: NULL
    11. LocalDataSize: 0
    12. RemoteDataSize: 0
    13. RowCount: 0
    14. State: NORMAL
    15. LstConsistencyCheckTime: NULL
    16. CheckVersion: -1
    17. VersionCount: -1
    18. PathHash: 7390150550643804973
    19. MetaUrl: http://192.168.10.1:8040/api/meta/header/10017
    20. CompactionStatus: http://192.168.10.1:8040/api/compaction/show?tablet_id=10017
    1. mysql> show tablet 10017\G
    2. *************************** 1. row ***************************
    3. DbName: default_cluster:db1
    4. TableName: tbl1
    5. PartitionName: tbl1
    6. IndexName: tbl1
    7. DbId: 10004
    8. TableId: 10015
    9. PartitionId: 10014
    10. IndexId: 10016
    11. IsSync: true
    12. Order: 0
    13. DetailCmd: SHOW PROC '/dbs/10004/10015/partitions/10014/10016/10017';

    Here we will record the following information:

    • TableId
    • PartitionId
    • TabletId
    • SchemaHash

    At the same time, we also need to go to the data directory of the BE node in the debugging environment to confirm the shard id where the new tablet is located:

    1. cd /path/to/storage/data/*/10017 && pwd

    This command will enter the directory where the tablet 10017 is located and display the path. Here we will see a path similar to the following:

    1. /path/to/storage/data/0/10017

    where 0 is the shard id.

  4. Modify Tablet Data

    Unzip the tablet data package obtained in the first step. The editor opens the 10017.hdr.json file, and modifies the following fields to the information obtained in the previous step:

    1. "table_id":10015
    2. "partition_id":10014
    3. "tablet_id":10017
    4. "schema_hash":44622287
    5. "shard_id":0
  5. Load the tablet

    First, stop the debug environment’s BE process (./bin/stop_be.sh). Then copy all the .dat files in the same level directory of the 10017.hdr.json file to the /path/to/storage/data/0/10017/44622287 directory. This directory is the directory where the debugging environment tablet we obtained in step 3 is located. 10017/44622287 are the tablet id and schema hash respectively.

    Delete the original tablet meta with the meta_tool tool. The tool is located in the be/lib directory.

    1. ./lib/meta_tool --root_path=/path/to/storage --operation=delete_meta --tablet_id=10017 --schema_hash=44622287

    Where /path/to/storage is the data root directory of BE. If the deletion is successful, the delete successfully log will appear.

    Load the new tablet meta via the meta_tool tool.

    1. ./lib/meta_tool --root_path=/path/to/storage --operation=load_meta --json_meta_path=/path/to/10017.hdr.json

    If the load is successful, the load successfully log will appear.

  6. Verification

    Restart the debug environment’s BE process (./bin/start_be.sh). Query the table, if correct, you can query the data of the loaded tablet, or reproduce the online problem.