内存超限错误分析

SinceVersion 1.2.0

当查询或导入报错Memory limit exceeded时,可能的原因:进程内存超限、系统剩余可用内存不足、超过单次查询执行的内存上限。

  1. ERROR 1105 (HY000): errCode = 2, detailMessage = Memory limit exceeded:<consuming tracker:<xxx>, xxx. backend 172.1.1.1 process memory used xxx GB, limit xxx GB. If query tracker exceed, `set exec_mem_limit=8G` to change limit, details mem usage see be.INFO.

进程内存超限 OR 系统剩余可用内存不足

当返回如下报错时,说明进程内存超限,或者系统剩余可用内存不足,具体原因看内存统计值。

  1. ERROR 1105 (HY000): errCode = 2, detailMessage = Memory limit exceeded:<consuming tracker:<Query#Id=3c88608cf35c461d-95fe88969aa6fc30>, process memory used 2.68 GB exceed limit 2.47 GB or sys mem available 50.95 GB less than low water mark 3.20 GB, failed alloc size 2.00 MB>, executing msg:<execute:<ExecNode:VAGGREGATION_NODE (id=7)>>. backend 172.1.1.1 process memory used 2.68 GB, limit 2.47 GB. If query tracker exceed, `set exec_mem_limit=8G` to change limit, details mem usage see be.INFO

错误信息分析

错误信息分为三部分: 1、Memory limit exceeded:<consuming tracker:<Query#Id=3c88608cf35c461d-95fe88969aa6fc30>:当前正在执行query 3c88608cf35c461d-95fe88969aa6fc30的内存申请过程中发现内存超限。 2、process memory used 2.68 GB exceed limit 2.47 GB or sys mem available 50.95 GB less than low water mark 3.20 GB, failed alloc size 2.00 MB:超限的原因是 BE 进程使用的内存 2.68GB 超过了 2.47GB 的limit,limit的值来自 be.conf 中的 mem_limit * system MemTotal,默认等于操作系统总内存的80%,当前操作系统剩余可用内存 50.95 GB 仍高于最低水位 3.2GB,本次尝试申请 2MB 的内存。 3、executing msg:<execute:<ExecNode:VAGGREGATION_NODE (id=7)>>, backend 172.24.47.117 process memory used 2.68 GB, limit 2.47 GB:本次内存申请的位置是ExecNode:VAGGREGATION_NODE (id=7)>,当前BE节点的IP是 172.1.1.1,以及再次打印BE节点的内存统计。

日志分析

同时可以在 log/be.INFO 中找到如下日志,确认当前进程内存使用是否符合预期,日志同样分为三部分: 1、Process Memory Summary:进程内存统计。 2、Alloc Stacktrace:触发内存超限检测的栈,这不一定是大内存申请的位置。 3、Memory Tracker Summary:进程 memory tracker 统计,参考 Memory Tracker 分析使用内存的位置。 注意: 1、进程内存超限日志的打印间隔是1s,进程内存超限后,BE大多数位置的内存申请都会感知,并尝试做出预定的回调方法,并打印进程内存超限日志,所以如果日志中 Try Alloc 的值很小,则无须关注Alloc Stacktrace,直接分析Memory Tracker Summary即可。 2、当进程内存超限后,BE会触发内存GC。

  1. W1127 17:23:16.372572 19896 mem_tracker_limiter.cpp:214] System Mem Exceed Limit Check Failed, Try Alloc: 1062688
  2. Process Memory Summary:
  3. process memory used 2.68 GB limit 2.47 GB, sys mem available 50.95 GB min reserve 3.20 GB, tc/jemalloc allocator cache 51.97 MB
  4. Alloc Stacktrace:
  5. @ 0x50028e8 doris::MemTrackerLimiter::try_consume()
  6. @ 0x50027c1 doris::ThreadMemTrackerMgr::flush_untracked_mem<>()
  7. @ 0x595f234 malloc
  8. @ 0xb888c18 operator new()
  9. @ 0x8f316a2 google::LogMessage::Init()
  10. @ 0x5813fef doris::FragmentExecState::coordinator_callback()
  11. @ 0x58383dc doris::PlanFragmentExecutor::send_report()
  12. @ 0x5837ea8 doris::PlanFragmentExecutor::update_status()
  13. @ 0x58355b0 doris::PlanFragmentExecutor::open()
  14. @ 0x5815244 doris::FragmentExecState::execute()
  15. @ 0x5817965 doris::FragmentMgr::_exec_actual()
  16. @ 0x581fffb std::_Function_handler<>::_M_invoke()
  17. @ 0x5a6f2c1 doris::ThreadPool::dispatch_thread()
  18. @ 0x5a6843f doris::Thread::supervise_thread()
  19. @ 0x7feb54f931ca start_thread
  20. @ 0x7feb5576add3 __GI___clone
  21. @ (nil) (unknown)
  22. Memory Tracker Summary:
  23. Type=consistency, Used=0(0 B), Peak=0(0 B)
  24. Type=batch_load, Used=0(0 B), Peak=0(0 B)
  25. Type=clone, Used=0(0 B), Peak=0(0 B)
  26. Type=schema_change, Used=0(0 B), Peak=0(0 B)
  27. Type=compaction, Used=0(0 B), Peak=0(0 B)
  28. Type=load, Used=0(0 B), Peak=0(0 B)
  29. Type=query, Used=206.67 MB(216708729 B), Peak=565.26 MB(592723181 B)
  30. Type=global, Used=930.42 MB(975614571 B), Peak=1017.42 MB(1066840223 B)
  31. Type=tc/jemalloc_cache, Used=51.97 MB(54494616 B), Peak=-1.00 B(-1 B)
  32. Type=process, Used=1.16 GB(1246817916 B), Peak=-1.00 B(-1 B)
  33. MemTrackerLimiter Label=Orphan, Type=global, Limit=-1.00 B(-1 B), Used=474.20 MB(497233597 B), Peak=649.18 MB(680718208 B)
  34. MemTracker Label=BufferAllocator, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
  35. MemTracker Label=LoadChannelMgr, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
  36. MemTracker Label=StorageEngine, Parent Label=Orphan, Used=320.56 MB(336132488 B), Peak=322.56 MB(338229824 B)
  37. MemTracker Label=SegCompaction, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
  38. MemTracker Label=SegmentMeta, Parent Label=Orphan, Used=948.64 KB(971404 B), Peak=943.64 KB(966285 B)
  39. MemTracker Label=TabletManager, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
  40. MemTrackerLimiter Label=DataPageCache, Type=global, Limit=-1.00 B(-1 B), Used=455.22 MB(477329882 B), Peak=454.18 MB(476244180 B)
  41. MemTrackerLimiter Label=IndexPageCache, Type=global, Limit=-1.00 B(-1 B), Used=1.00 MB(1051092 B), Peak=0(0 B)
  42. MemTrackerLimiter Label=SegmentCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
  43. MemTrackerLimiter Label=DiskIO, Type=global, Limit=2.47 GB(2655423201 B), Used=0(0 B), Peak=0(0 B)
  44. MemTrackerLimiter Label=ChunkAllocator, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
  45. MemTrackerLimiter Label=LastSuccessChannelCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
  46. MemTrackerLimiter Label=DeleteBitmap AggCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)

系统剩余可用内存计算

当错误信息中系统可用内存小于低水位线时,同样当作进程内存超限处理,其中系统可用内存的值来自于/proc/meminfo中的MemAvailable,当MemAvailable不足时继续内存申请可能返回 std::bad_alloc 或者导致BE进程OOM,因为刷新进程内存统计和BE内存GC都具有一定的滞后性,所以预留小部分内存buffer作为低水位线,尽可能避免OOM。

其中MemAvailable是操作系统综合考虑当前空闲的内存、buffer、cache、内存碎片等因素给出的一个在尽可能不触发swap的情况下可以提供给用户进程使用的内存总量,一个简单的计算公式: MemAvailable = MemFree - LowWaterMark + (PageCache - min(PageCache / 2, LowWaterMark)),和 cmd free看到的available值相同,具体可参考: https://serverfault.com/questions/940196/why-is-memavailable-a-lot-less-than-memfreebufferscached https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34e431b0ae398fc54ea69ff85ec700722c9da773

低水位线默认最大1.6G,根据MemTotalvm/min_free_kbytesconfg::mem_limitconfig::max_sys_mem_available_low_water_mark_bytes共同算出,并避免浪费过多内存。其中MemTotal是系统总内存,取值同样来自/proc/meminfovm/min_free_kbytes是操作系统给内存GC过程预留的buffer,取值通常在 0.4% 到 5% 之间,某些云服务器上vm/min_free_kbytes可能为5%,这会导致直观上系统可用内存比真实值少;调大config::max_sys_mem_available_low_water_mark_bytes将在大于16G内存的机器上,为Full GC预留更多的内存buffer,反之调小将尽可能充分使用内存。

查询或导入单次执行内存超限

当返回如下报错时,说明超过单次执行内存限制。

  1. ERROR 1105 (HY000): errCode = 2, detailMessage = Memory limit exceeded:<consuming tracker:<Query#Id=f78208b15e064527-a84c5c0b04c04fcf>, failed alloc size 1.03 MB, exceeded tracker:<Query#Id=f78208b15e064527-a84c5c0b04c04fcf>, limit 100.00 MB, peak used 99.29 MB, current used 99.25 MB>, executing msg:<execute:<ExecNode:VHASH_JOIN_NODE (id=4)>>. backend 172.24.47.117 process memory used 1.13 GB, limit 98.92 GB. If query tracker exceed, `set exec_mem_limit=8G` to change limit, details mem usage see log/be.INFO.

错误信息分析

错误信息分为三部分: 1、Memory limit exceeded:<consuming tracker:<Query#Id=f78208b15e064527-a84c5c0b04c04fcf>:当前正在执行query f78208b15e064527-a84c5c0b04c04fcf的内存申请过程中发现内存超限。 2、failed alloc size 1.03 MB, exceeded tracker:<Query#Id=f78208b15e064527-a84c5c0b04c04fcf>, limit 100.00 MB, peak used 99.29 MB, current used 99.25 MB:本次尝试申请 1.03MB 的内存,但此时query f78208b15e064527-a84c5c0b04c04fcf memory tracker 的当前 consumption 为 99.28MB 加上 1.03MB 后超过了 100MB 的limit,limit的值来自 session variables 中的 exec_mem_limit,默认4G。 3、executing msg:<execute:<ExecNode:VHASH_JOIN_NODE (id=4)>>. backend 172.24.47.117 process memory used 1.13 GB, limit 98.92 GB. If query tracker exceed, set exec_mem_limit=8G to change limit, details mem usage see be.INFO.:本次内存申请的位置是VHASH_JOIN_NODE (id=4),并提示可通过 set exec_mem_limit 来调高单次查询的内存上限。

日志分析

set global enable_profile=true后,可以在单次查询内存超限时,在 log/be.INFO 中打印日志,用于确认当前查询内存使用是否符合预期。 同时可以在 log/be.INFO 中找到如下日志,确认当前查询内存使用是否符合预期,日志同样分为三部分: 1、Process Memory Summary:进程内存统计。 2、Alloc Stacktrace:触发内存超限检测的栈,这不一定是大内存申请的位置。 3、Memory Tracker Summary:当前查询的 memory tracker 统计,可以看到查询每个算子当前使用的内存和峰值,具体可参考 Memory Tracker。 注意:一个查询在内存超限后只会打印一次日志,此时查询的多个线程都会感知,并尝试等待内存释放,或者cancel当前查询,如果日志中 Try Alloc 的值很小,则无须关注Alloc Stacktrace,直接分析Memory Tracker Summary即可。

  1. W1128 01:34:11.016165 357796 mem_tracker_limiter.cpp:191] Memory limit exceeded:<consuming tracker:<Query#Id=78208b15e064527-a84c5c0b04c04fcf>, failed alloc size 4.00 MB, exceeded tracker:<Query#Id=78208b15e064527-a84c5c0b04c04fcf>, limit 100.00 MB, peak used 98.59 MB,
  2. current used 96.88 MB>, executing msg:<execute:<ExecNode:VHASH_JOIN_NODE (id=2)>>. backend 172.24.47.117 process memory used 1.13 GB, limit 98.92 GB. If query tracker exceed, `set exec_mem_limit=8G` to change limit, details mem usage see be.INFO.
  3. Process Memory Summary:
  4. process memory used 1.13 GB limit 98.92 GB, sys mem available 45.15 GB min reserve 3.20 GB, tc/jemalloc allocator cache 27.62 MB
  5. Alloc Stacktrace:
  6. @ 0x66cf73a doris::vectorized::HashJoinNode::_materialize_build_side()
  7. @ 0x69cb1ee doris::vectorized::VJoinNodeBase::open()
  8. @ 0x66ce27a doris::vectorized::HashJoinNode::open()
  9. @ 0x5835dad doris::PlanFragmentExecutor::open_vectorized_internal()
  10. @ 0x58351d2 doris::PlanFragmentExecutor::open()
  11. @ 0x5815244 doris::FragmentExecState::execute()
  12. @ 0x5817965 doris::FragmentMgr::_exec_actual()
  13. @ 0x581fffb std::_Function_handler<>::_M_invoke()
  14. @ 0x5a6f2c1 doris::ThreadPool::dispatch_thread()
  15. @ 0x5a6843f doris::Thread::supervise_thread()
  16. @ 0x7f6faa94a1ca start_thread
  17. @ 0x7f6fab121dd3 __GI___clone
  18. @ (nil) (unknown)
  19. Memory Tracker Summary:
  20. MemTrackerLimiter Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Type=query, Limit=100.00 MB(104857600 B), Used=64.75 MB(67891182 B), Peak=104.70 MB(109786406 B)
  21. MemTracker Label=Scanner#QueryId=78208b15e064527-a84c5c0b04c04fcf, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
  22. MemTracker Label=RuntimeFilterMgr, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=2.09 KB(2144 B), Peak=0(0 B)
  23. MemTracker Label=BufferedBlockMgr2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
  24. MemTracker Label=ExecNode:VHASH_JOIN_NODE (id=2), Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=-61.44 MB(-64426656 B), Peak=290.33 KB(297296 B)
  25. MemTracker Label=ExecNode:VEXCHANGE_NODE (id=9), Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=6.12 KB(6264 B), Peak=5.84 KB(5976 B)
  26. MemTracker Label=VDataStreamRecvr:78208b15e064527-a84c5c0b04c04fd2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=6.12 KB(6264 B), Peak=5.84 KB(5976 B)
  27. MemTracker Label=ExecNode:VEXCHANGE_NODE (id=10), Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=-41.20 MB(-43198024 B), Peak=1.46 MB(1535656 B)
  28. MemTracker Label=VDataStreamRecvr:78208b15e064527-a84c5c0b04c04fd2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=-41.20 MB(-43198024 B), Peak=1.46 MB(1535656 B)
  29. MemTracker Label=VDataStreamSender:78208b15e064527-a84c5c0b04c04fd2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=2.34 KB(2400 B), Peak=0(0 B)
  30. MemTracker Label=Scanner#QueryId=78208b15e064527-a84c5c0b04c04fcf, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=58.12 MB(60942224 B), Peak=57.41 MB(60202848 B)
  31. MemTracker Label=RuntimeFilterMgr, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
  32. MemTracker Label=BufferedBlockMgr2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
  33. MemTracker Label=ExecNode:VNewOlapScanNode(customer) (id=1), Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=9.55 MB(10013424 B), Peak=10.20 MB(10697136 B)
  34. MemTracker Label=VDataStreamSender:78208b15e064527-a84c5c0b04c04fd1, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=59.80 MB(62701880 B), Peak=59.16 MB(62033048 B)
  35. MemTracker Label=Scanner#QueryId=78208b15e064527-a84c5c0b04c04fcf, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
  36. MemTracker Label=RuntimeFilterMgr, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=13.62 KB(13952 B), Peak=0(0 B)
  37. MemTracker Label=BufferedBlockMgr2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
  38. MemTracker Label=ExecNode:VNewOlapScanNode(lineorder) (id=0), Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=6.03 MB(6318064 B), Peak=4.02 MB(4217664 B)
  39. MemTracker Label=VDataStreamSender:78208b15e064527-a84c5c0b04c04fd0, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=2.34 KB(2400 B), Peak=0(0 B)