• Dmitriy Vyukov's avatar
    runtime: do not collect GC roots explicitly · cb133c66
    Dmitriy Vyukov authored
    Currently we collect (add) all roots into a global array in a single-threaded GC phase.
    This hinders parallelism.
    With this change we just kick off parallel for for number_of_goroutines+5 iterations.
    Then parallel for callback decides whether it needs to scan stack of a goroutine
    scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely.
    This requires to store all goroutines in an array instead of a linked list
    (to allow direct indexing).
    This CL also removes DebugScan functionality. It is broken because it uses
    unbounded stack, so it can not run on g0. When it was working, I've found
    it helpless for debugging issues because the two algorithms are too different now.
    This change would require updating the DebugScan, so it's simpler to just delete it.
    
    With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same.
    
    garbage-8
    allocated                 2987886      2989221      +0.04%
    allocs                      62885        62887      +0.00%
    cputime                  21286000     21272000      -0.07%
    gc-pause-one             26633247     24885421      -6.56%
    gc-pause-total             873570       811264      -7.13%
    rss                     242089984    242515968      +0.18%
    sys-gc                   13934336     13869056      -0.47%
    sys-heap                205062144    205062144      +0.00%
    sys-other                12628288     12628288      +0.00%
    sys-stack                11534336     11927552      +3.41%
    sys-total               243159104    243487040      +0.13%
    time                      2809477      2740795      -2.44%
    
    R=golang-codereviews, rsc
    CC=cshapiro, golang-codereviews, khr
    https://golang.org/cl/46860043
    cb133c66
mprof.goc 10.3 KB