• Austin Clements's avatar
    runtime: eliminate mark 2 and fix mark termination race · 9108ae77
    Austin Clements authored
    The mark 2 phase was originally introduced as a way to reduce the
    chance of entering STW mark termination while there was still marking
    work to do. It works by flushing and disabling all local work caches
    so that all enqueued work becomes immediately globally visible.
    However, mark 2 is not only slow–disabling caches makes marking and
    the write barrier both much more expensive–but also imperfect. There
    is still a rare but possible race (~once per all.bash) that can cause
    GC to enter mark termination while there is still marking work. This
    race is detailed at
    https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md#appendix-mark-completion-race
    The effect of this is that mark termination must still cope with the
    possibility that there may be work remaining after a concurrent mark
    phase. Dealing with this increases STW pause time and increases the
    complexity of mark termination.
    
    Furthermore, a similar but far more likely race can cause early
    transition from mark 1 to mark 2. This is unfortunate because it
    causes performance instability because of the cost of mark 2.
    
    This CL fixes this by replacing mark 2 with a distributed termination
    detection algorithm. This algorithm is correct, so it eliminates the
    mark termination race, and doesn't require disabling local caches. It
    ensures that there are no grey objects upon entering mark termination.
    With this change, we're one step closer to eliminating marking from
    mark termination entirely (it's still used by STW GC and checkmarks
    mode).
    
    This CL does not eliminate the gcBlackenPromptly global flag, though
    it is always set to false now. It will be removed in a cleanup CL.
    
    This led to only minor variations in the go1 benchmarks
    (https://perf.golang.org/search?q=upload:20180909.1) and compilebench
    benchmarks (https://perf.golang.org/search?q=upload:20180910.2).
    
    This significantly improves performance of the garbage benchmark, with
    no impact on STW times:
    
    name                        old time/op    new time/op   delta
    Garbage/benchmem-MB=64-12    2.21ms ± 1%   2.05ms ± 1%   -7.38% (p=0.000 n=18+19)
    Garbage/benchmem-MB=1024-12  2.30ms ±16%   2.20ms ± 7%   -4.51% (p=0.001 n=20+20)
    
    name                        old STW-ns/GC  new STW-ns/GC  delta
    Garbage/benchmem-MB=64-12      138k ±44%     141k ±23%     ~    (p=0.309 n=19+20)
    Garbage/benchmem-MB=1024-12    159k ±25%     178k ±98%     ~    (p=0.798 n=16+18)
    
    name                        old STW-ns/op  new STW-ns/op                delta
    Garbage/benchmem-MB=64-12     4.42k ±44%    4.24k ±23%     ~    (p=0.531 n=19+20)
    Garbage/benchmem-MB=1024-12     591 ±24%      636 ±111%    ~    (p=0.309 n=16+18)
    
    (https://perf.golang.org/search?q=upload:20180910.1)
    
    Updates #26903.
    Updates #17503.
    
    Change-Id: Icbd1e12b7a12a76f423c9bf033b13cb363e4cd19
    Reviewed-on: https://go-review.googlesource.com/c/134318
    Run-TryBot: Austin Clements <austin@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
    9108ae77
gc_test.go 13.5 KB