Memory Tracker

SinceVersion 1.2.0

The Memory Tracker records the memory usage of the Doris BE process, including the memory used in the life cycle of tasks such as query, import, Compaction, and Schema Change, as well as various caches for memory control and analysis.

principle

Each query, import and other tasks in the system will create its own Memory Tracker when it is initialized, and put the Memory Tracker into TLS (Thread Local Storage) during execution, and each memory application and release of the BE process will be in the Mem Hook Consume the Memory Tracker in the middle, and display it after the final summary.

For detailed design and implementation, please refer to: https://cwiki.apache.org/confluence/display/DORIS/DSIP-002%3A+Refactor+memory+tracker+on+BE https://shimo.im/docs/DT6JXDRkdTvdyV3G

View statistics

The real-time memory statistics results can be viewed through Doris BE’s Web page http://ip:http\_port/mem\_tracker. For the memory statistics results of historical queries, you can view the peakMemoryBytes of each query in fe/log/fe.audit.log, or search Deregister query/load memory tracker, queryId in be/log/be.INFO `View memory peaks per query on a single BE.

Home /mem_tracker

image

  1. Type: Divide the memory used by Doris BE into the following categories
  • process: The total memory of the process, the sum of all other types.
  • global: Global Memory Tracker with the same life cycle and process, such as each Cache, Tablet Manager, Storage Engine, etc.
  • query: the in-memory sum of all queries.
  • load: Sum of all imported memory.
  • tc/jemalloc_cache: The general memory allocator TCMalloc or Jemalloc cache, you can view the original profile of the memory allocator in real time at http://ip:http\_port/memz.
  • compaction, schema_change, consistency, batch_load, clone: ​​corresponding to the memory sum of all Compaction, Schema Change, Consistency, Batch Load, and Clone tasks respectively.
  1. Current Consumption(Bytes): current memory value, unit B.
  2. Current Consumption(Normalize): .G.M.K formatted output of the current memory value.
  3. Peak Consumption (Bytes): The peak value of the memory after the BE process is started, the unit is B, and it will be reset after the BE restarts.
  4. Peak Consumption(Normalize): The .G.M.K formatted output of the memory peak value after the BE process starts, and resets after the BE restarts.

Global Type /mem_tracker?type=global

image

  1. Label: Memory Tracker name
  2. Parent Label: It is used to indicate the parent-child relationship between two Memory Trackers. The memory recorded by the Child Tracker is a subset of the Parent Tracker. There may be intersections between the memories recorded by different Trackers with the same Parent.
  • Orphan: Tracker consumed by default. Memory that does not specify a tracker will be recorded in Orphan by default. In addition to the Child Tracker subdivided below, Orphan also includes some memory that is inconvenient to accurately subdivide and count, including BRPC.

    • LoadChannelMgr: The sum of the memory of all imported Load Channel stages, used to write the scanned data to the Segment file on disk, a subset of Orphan.
    • StorageEngine:, the memory consumed by the storage engine during loading the data directory, a subset of Orphan.
    • SegCompaction: The memory sum of all SegCompaction tasks, a subset of Orphan.
    • SegmentMeta: memory use by segment meta data such as footer or index page, a subset of Orphan.
    • TabletManager: The memory consumed by the storage engine get, add, and delte Tablet, a subset of Orphan.
    • BufferAllocator: Only used for memory multiplexing in the non-vectorized Partitioned Agg process, a subset of Orphan.
  • DataPageCache: Used to cache data Pages to speed up Scan.

  • IndexPageCache: The index used to cache the data Page, used to speed up Scan.

  • SegmentCache: Used to cache opened Segments, such as index information.

  • DiskIO: Used to cache Disk IO data, only used in non-vectorization.

  • ChunkAllocator: Used to cache power-of-2 memory blocks, and reuse memory at the application layer.

  • LastestSuccessChannelCache: Used to cache the LoadChannel of the import receiver.

  • DeleteBitmap AggCache: Gets aggregated delete_bitmap on rowset_id and version.

Query Type /mem_tracker?type=query

image

  1. Limit: The upper limit of memory used by a single query, show session variables to view and modify exec_mem_limit.
  2. Label: The label naming rule of the Tracker for a single query is Query#Id=xxx.
  3. Parent Label: Parent is the Tracker record of Query#Id=xxx to query the memory used by different operators during execution.

Load Type /mem_tracker?type=load

image

  1. Limit: The import is divided into two stages: Fragment Scan and Load Channel to write Segment to disk. The upper memory limit of the Scan phase can be viewed and modified through show session variables; the segment write disk phase does not have a separate memory upper limit for each import, but the total upper limit of all imports, corresponding to load_process_max_memory_limit_percent.
  2. Label: The label naming rule of a single import Scan stage Tracker is Load#Id=xxx; the Label naming rule of a single import Segment write disk stage Tracker is LoadChannel#senderIp=xxx#loadID=xxx.
  3. Parent Label: Parent is the Tracker of Load#Id=xxx, which records the memory used by different operators during the import Scan stage; Parent is the Tracker of LoadChannelMgrTrackerSet, which records the Insert and The memory used by the Flush disk process is associated with the last loadID of the Label to write to the disk stage Tracker of the Segment.