20. 目前 Go 语言的 GC 还存在哪些问题?
尽管 Go 团队宣称 STW 停顿时间得以优化到 100 微秒级别,但这本质上是一种取舍。原本的 STW 某种意义上来说其实转移到了可能导致用户代码停顿的几个位置;除此之外,由于运行时调度器的实现方式,同样对 GC 存在一定程度的影响。
目前 Go 中的 GC 仍然存在以下问题:
1. Mark Assist 停顿时间过长
package main
import (
"fmt"
"os"
"runtime"
"runtime/trace"
"time"
)
const (
windowSize = 200000
msgCount = 1000000
)
var (
best time.Duration = time.Second
bestAt time.Time
worst time.Duration
worstAt time.Time
start = time.Now()
)
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
for i := 0; i < 5; i++ {
measure()
worst = 0
best = time.Second
runtime.GC()
}
}
func measure() {
var c channel
for i := 0; i < msgCount; i++ {
c.sendMsg(i)
}
fmt.Printf("Best send delay %v at %v, worst send delay: %v at %v. Wall clock: %v \n", best, bestAt.Sub(start), worst, worstAt.Sub(start), time.Since(start))
}
type channel [windowSize][]byte
func (c *channel) sendMsg(id int) {
start := time.Now()
// 模拟发送
(*c)[id%windowSize] = newMsg(id)
end := time.Now()
elapsed := end.Sub(start)
if elapsed > worst {
worst = elapsed
worstAt = end
}
if elapsed < best {
best = elapsed
bestAt = end
}
}
func newMsg(n int) []byte {
m := make([]byte, 1024)
for i := range m {
m[i] = byte(n)
}
return m
}
运行此程序我们可以得到类似下面的结果:
$ go run main.go
Best send delay 330ns at 773.037956ms, worst send delay: 7.127915ms at 579.835487ms. Wall clock: 831.066632ms
Best send delay 331ns at 873.672966ms, worst send delay: 6.731947ms at 1.023969626s. Wall clock: 1.515295559s
Best send delay 330ns at 1.812141567s, worst send delay: 5.34028ms at 2.193858359s. Wall clock: 2.199921749s
Best send delay 338ns at 2.722161771s, worst send delay: 7.479482ms at 2.665355216s. Wall clock: 2.920174197s
Best send delay 337ns at 3.173649445s, worst send delay: 6.989577ms at 3.361716121s. Wall clock: 3.615079348s
在这个结果中,第一次的最坏延迟时间高达 7.12 毫秒,发生在程序运行 578 毫秒左右。通过 go tool trace
可以发现,这个时间段中,Mark Assist 执行了 7112312ns,约为 7.127915ms;可见,此时最坏情况下,标记辅助拖慢了用户代码的执行,是造成 7 毫秒延迟的原因。
2. Sweep 停顿时间过长
同样还是刚才的例子,如果我们仔细观察 Mark Assist 后发生的 Sweep 阶段,竟然对用户代码的影响长达约 30ms,根据调用栈信息可以看到,该 Sweep 过程发生在内存分配阶段:
3. 由于 GC 算法的不正确性导致 GC 周期被迫重新执行
此问题很难复现,但是一个已知的问题,根据 Go 团队的描述,能够在 1334 次构建中发生一次 [15],我们可以计算出其触发概率约为 0.0007496251874。虽然发生概率很低,但一旦发生,GC 需要被重新执行,非常不幸。
4. 创建大量 Goroutine 后导致 GC 消耗更多的 CPU
这个问题可以通过以下程序进行验证:
func BenchmarkGCLargeGs(b *testing.B) {
wg := sync.WaitGroup{}
for ng := 100; ng <= 1000000; ng *= 10 {
b.Run(fmt.Sprintf("#g-%d", ng), func(b *testing.B) {
// 创建大量 goroutine,由于每次创建的 goroutine 会休眠
// 从而运行时不会复用正在休眠的 goroutine,进而不断创建新的 g
wg.Add(ng)
for i := 0; i < ng; i++ {
go func() {
time.Sleep(100 * time.Millisecond)
wg.Done()
}()
}
wg.Wait()
// 现运行一次 GC 来提供一致的内存环境
runtime.GC()
// 记录运行 b.N 次 GC 需要的时间
b.ResetTimer()
for i := 0; i < b.N; i++ {
runtime.GC()
}
})
}
}
其结果可以通过如下指令来获得:
$ go test -bench=BenchmarkGCLargeGs -run=^$ -count=5 -v . | tee 4.txt
$ benchstat 4.txt
name time/op
GCLargeGs/#g-100-12 192µs ± 5%
GCLargeGs/#g-1000-12 331µs ± 1%
GCLargeGs/#g-10000-12 1.22ms ± 1%
GCLargeGs/#g-100000-12 10.9ms ± 3%
GCLargeGs/#g-1000000-12 32.5ms ± 4%
这种情况通常发生于峰值流量后,大量 goroutine 由于任务等待被休眠,从而运行时不断创建新的 goroutine,旧的 goroutine 由于休眠未被销毁且得不到复用,导致 GC 需要扫描的执行栈越来越多,进而完成 GC 所需的时间越来越长。一个解决办法是使用 goroutine 池来限制创建的 goroutine 数量。
总结
GC 是一个复杂的系统工程,本文讨论的二十个问题尽管已经展现了一个相对全面的 Go GC。但它们仍然只是 GC 这一宏观问题的一小部分较为重要的内容,还有非常多的细枝末节、研究进展无法在有限的篇幅内完整讨论。
从 Go 诞生之初,Go 团队就一直在对 GC 的表现进行实验与优化,但仍然有诸多未解决的公开问题,我们不妨对 GC 未来的改进拭目以待。
进一步阅读的主要参考文献
- [1] Ian Lance Taylor. Why golang garbage-collector not implement Generational and Compact gc? May 2017. https://groups.google.com/forum/#!msg/golang-nuts/KJiyv2mV2pU/wdBUH1mHCAAJ
- [2] Go Team.
debug.GCStats
. Last access: Jan, 2020. https://golang.org/pkg/runtime/debug/#GCStats - [3] Go Team.
runtime.MemStats
. Last access: Jan, 2020. https://golang.org/pkg/runtime/#MemStats - [4] Austin Clements, Rick Hudson. Proposal: Eliminate STW stack re-scanning. Oct, 2016. https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md
- [5] Austin Clements. Go 1.5 concurrent garbage collector pacing. Mar, 2015. https://docs.google.com/document/d/1wmjrocXIWTr1JxU-3EQBI6BK6KgtiFArkG47XK73xIQ/edit#
- [6] Austin Clements. Proposal: Separate soft and hard heap size goal. Oct, 2017. https://github.com/golang/proposal/blob/master/design/14951-soft-heap-limit.md
- [7] Go Team. HTTP pprof. Last access: Jan, 2020. https://golang.org/pkg/net/http/pprof/
- [8] Go Team. Runtime pprof. Last access: Jan, 2020. https://golang.org/pkg/runtime/pprof/
- [9] Go Team. Package trace. Last access: Jan, 2020. https://golang.org/pkg/runtime/trace/
- [10] Caleb Spare. proposal: runtime: add a mechanism for specifying a minimum target heap size. Last access: Jan, 2020. https://github.com/golang/go/issues/23044
- [11] Austin Clements, Rick Hudson. Proposal: Concurrent stack re-scanning. Oct, 2016. https://github.com/golang/proposal/blob/master/design/17505-concurrent-rescan.md
- [12] Rick Hudson, Austin Clements. Request Oriented Collector (ROC) Algorithm. Jun, 2016. https://docs.google.com/document/d/1gCsFxXamW8RRvOe5hECz98Ftk-tcRRJcDFANj2VwCB0/edit
- [13] Rick Hudson. runtime: constants and data structures for generational GC. Mar, 2019. https://go-review.googlesource.com/c/go/+/137476/12
- [14] Austin Clements. Sub-millisecond GC pauses. Oct, 2016. https://groups.google.com/d/msg/golang-dev/Ab1sFeoZg_8/_DaL0E8fAwAJ
- [15] Austin Clements. runtime: error message: P has cached GC work at end of mark termination. Nov, 2018. https://github.com/golang/go/issues/27993#issuecomment-441719687
其他参考文献
- [16] Dmitry Soshnikov. Writing a Memory Allocator. Feb. 2019. http://dmitrysoshnikov.com/compilers/writing-a-memory-allocator/#more-3590
- [17] William Kennedy. Garbage Collection In Go : Part II - GC Traces. May 2019. https://www.ardanlabs.com/blog/2019/05/garbage-collection-in-go-part2-gctraces.html
- [18] Rhys Hiltner. An Introduction to go tool trace. Last access: Jan, 2020. https://about.sourcegraph.com/go/an-introduction-to-go-tool-trace-rhys-hiltner
- [19] 煎鱼. 用 GODEBUG 看 GC. Sep, 2019. https://segmentfault.com/a/1190000020255157
- [20] 煎鱼. Go 大杀器之跟踪剖析 trace. Last access: Jan, 2020. https://eddycjy.gitbook.io/golang/di-9-ke-gong-ju/go-tool-trace