-
Dmitriy Vyukov authored
Currently we collect (add) all roots into a global array in a single-threaded GC phase. This hinders parallelism. With this change we just kick off parallel for for number_of_goroutines+5 iterations. Then parallel for callback decides whether it needs to scan stack of a goroutine scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely. This requires to store all goroutines in an array instead of a linked list (to allow direct indexing). This CL also removes DebugScan functionality. It is broken because it uses unbounded stack, so it can not run on g0. When it was working, I've found it helpless for debugging issues because the two algorithms are too different now. This change would require updating the DebugScan, so it's simpler to just delete it. With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same. garbage-8 allocated 2987886 2989221 +0.04% allocs 62885 62887 +0.00% cputime 21286000 21272000 -0.07% gc-pause-one 26633247 24885421 -6.56% gc-pause-total 873570 811264 -7.13% rss 242089984 242515968 +0.18% sys-gc 13934336 13869056 -0.47% sys-heap 205062144 205062144 +0.00% sys-other 12628288 12628288 +0.00% sys-stack 11534336 11927552 +3.41% sys-total 243159104 243487040 +0.13% time 2809477 2740795 -2.44% R=golang-codereviews, rsc CC=cshapiro, golang-codereviews, khr https://golang.org/cl/46860043
cb133c66