Commit 17db6e04 authored by Austin Clements's avatar Austin Clements

runtime: use heap scan size as estimate of GC scan work

Currently, the GC uses a moving average of recent scan work ratios to
estimate the total scan work required by this cycle. This is in turn
used to compute how much scan work should be done by mutators when
they allocate in order to perform all expected scan work by the time
the allocated heap reaches the heap goal.

However, our current scan work estimate can be arbitrarily wrong if
the heap topography changes significantly from one cycle to the
next. For example, in the go1 benchmarks, at the beginning of each
benchmark, the heap is dominated by a 256MB no-scan object, so the GC
learns that the scan density of the heap is very low. In benchmarks
that then rapidly allocate pointer-dense objects, by the time of the
next GC cycle, our estimate of the scan work can be too low by a large
factor. This in turn lets the mutator allocate faster than the GC can
collect, allowing it to get arbitrarily far ahead of the scan work
estimate, which leads to very long GC cycles with very little mutator
assist that can overshoot the heap goal by large margins. This is
particularly easy to demonstrate with BinaryTree17:

$ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17
gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P
gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P
testing: warning: no tests to run
PASS
BenchmarkBinaryTree17	gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced)
gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P
       1       9150447353 ns/op

Change this estimate to instead use the *current* scannable heap
size. This has the advantage of being based solely on the current
state of the heap, not on past densities or reachable heap sizes, so
it isn't susceptible to falling behind during these sorts of phase
changes. This is strictly an over-estimate, but it's better to
over-estimate and get more assist than necessary than it is to
under-estimate and potentially spiral out of control. Experiments with
scaling this estimate back showed no obvious benefit for mutator
utilization, heap size, or assist time.

This new estimate has little effect for most benchmarks, including
most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge
effect for benchmarks that triggered the bad pacer behavior:

name                   old mean              new mean              delta
BinaryTree17            10.0s × (1.00,1.00)    3.5s × (0.98,1.01)  -64.93% (p=0.000)
Fannkuch11              2.74s × (1.00,1.01)   2.65s × (1.00,1.00)   -3.52% (p=0.000)
FmtFprintfEmpty        56.4ns × (0.99,1.00)  57.8ns × (1.00,1.01)   +2.43% (p=0.000)
FmtFprintfString        187ns × (0.99,1.00)   185ns × (0.99,1.01)   -1.19% (p=0.010)
FmtFprintfInt           184ns × (1.00,1.00)   183ns × (1.00,1.00)  (no variance)
FmtFprintfIntInt        321ns × (1.00,1.00)   315ns × (1.00,1.00)   -1.80% (p=0.000)
FmtFprintfPrefixedInt   266ns × (1.00,1.00)   263ns × (1.00,1.00)   -1.22% (p=0.000)
FmtFprintfFloat         353ns × (1.00,1.00)   353ns × (1.00,1.00)   -0.13% (p=0.035)
FmtManyArgs            1.21µs × (1.00,1.00)  1.19µs × (1.00,1.00)   -1.33% (p=0.000)
GobDecode              9.69ms × (1.00,1.00)  9.59ms × (1.00,1.00)   -1.07% (p=0.000)
GobEncode              7.89ms × (0.99,1.01)  7.74ms × (1.00,1.00)   -1.92% (p=0.000)
Gzip                    391ms × (1.00,1.00)   392ms × (1.00,1.00)     ~    (p=0.522)
Gunzip                 97.1ms × (1.00,1.00)  97.0ms × (1.00,1.00)   -0.10% (p=0.000)
HTTPClientServer       55.7µs × (0.99,1.01)  56.7µs × (0.99,1.01)   +1.81% (p=0.001)
JSONEncode             19.1ms × (1.00,1.00)  19.0ms × (1.00,1.00)   -0.85% (p=0.000)
JSONDecode             66.8ms × (1.00,1.00)  66.9ms × (1.00,1.00)     ~    (p=0.288)
Mandelbrot200          4.13ms × (1.00,1.00)  4.12ms × (1.00,1.00)   -0.08% (p=0.000)
GoParse                3.97ms × (1.00,1.01)  4.01ms × (1.00,1.00)   +0.99% (p=0.000)
RegexpMatchEasy0_32     114ns × (1.00,1.00)   115ns × (0.99,1.00)     ~    (p=0.070)
RegexpMatchEasy0_1K     376ns × (1.00,1.00)   376ns × (1.00,1.00)     ~    (p=0.900)
RegexpMatchEasy1_32    94.9ns × (1.00,1.00)  96.3ns × (1.00,1.01)   +1.53% (p=0.001)
RegexpMatchEasy1_1K     568ns × (1.00,1.00)   567ns × (1.00,1.00)   -0.22% (p=0.001)
RegexpMatchMedium_32    159ns × (1.00,1.00)   159ns × (1.00,1.00)     ~    (p=0.178)
RegexpMatchMedium_1K   46.4µs × (1.00,1.00)  46.6µs × (1.00,1.00)   +0.29% (p=0.000)
RegexpMatchHard_32     2.37µs × (1.00,1.00)  2.37µs × (1.00,1.00)     ~    (p=0.722)
RegexpMatchHard_1K     71.1µs × (1.00,1.00)  71.2µs × (1.00,1.00)     ~    (p=0.229)
Revcomp                 565ms × (1.00,1.00)   562ms × (1.00,1.00)   -0.52% (p=0.000)
Template               81.0ms × (1.00,1.00)  80.2ms × (1.00,1.00)   -0.97% (p=0.000)
TimeParse               380ns × (1.00,1.00)   380ns × (1.00,1.00)     ~    (p=0.148)
TimeFormat              405ns × (0.99,1.00)   385ns × (0.99,1.00)   -5.00% (p=0.000)

Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4
Reviewed-on: https://go-review.googlesource.com/9696Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
parent 3be3cbd5
......@@ -238,13 +238,6 @@ const (
// GOMAXPROCS. The high-level design of this algorithm is documented
// at http://golang.org/s/go15gcpacing.
var gcController = gcControllerState{
// Initial work ratio guess.
//
// TODO(austin): This is based on the work ratio of the
// compiler on ./all.bash. Run a wider variety of programs and
// see what their work ratios are.
workRatioAvg: 0.5 / float64(ptrSize),
// Initial trigger ratio guess.
triggerRatio: 7 / 8.0,
}
......@@ -254,6 +247,10 @@ type gcControllerState struct {
// is updated atomically during the cycle. Updates may be
// batched arbitrarily, since the value is only read at the
// end of the cycle.
//
// Currently this is the bytes of heap scanned. For most uses,
// this is an opaque unit of work, but for estimation the
// definition is important.
scanWork int64
// bgScanCredit is the scan work credit accumulated by the
......@@ -299,10 +296,6 @@ type gcControllerState struct {
// dedicated mark workers get started.
dedicatedMarkWorkersNeeded int64
// workRatioAvg is a moving average of the scan work ratio
// (scan work per byte marked).
workRatioAvg float64
// assistRatio is the ratio of allocated bytes to scan work
// that should be performed by mutator assists. This is
// computed at the beginning of each cycle.
......@@ -399,21 +392,16 @@ func (c *gcControllerState) startCycle() {
// improved estimates. This should be called periodically during
// concurrent mark.
func (c *gcControllerState) revise() {
// Estimate the size of the marked heap. We don't have much to
// go on, so at the beginning of the cycle this uses the
// marked heap size from last cycle. If the reachable heap has
// grown since last cycle, we'll eventually mark more than
// this and we can revise our estimate. This way, if we
// overshoot our initial estimate, the assist ratio will climb
// smoothly and put more pressure on mutator assists to finish
// the cycle.
heapMarkedEstimate := memstats.heap_marked
if heapMarkedEstimate < work.bytesMarked {
heapMarkedEstimate = work.bytesMarked
}
// Compute the expected work based on this estimate.
scanWorkExpected := uint64(float64(heapMarkedEstimate) * c.workRatioAvg)
// Compute the expected scan work. This is a strict upper
// bound on the possible scan work in the current heap.
//
// You might consider dividing this by 2 (or by
// (100+GOGC)/100) to counter this over-estimation, but
// benchmarks show that this has almost no effect on mean
// mutator utilization, heap size, or assist time and it
// introduces the danger of under-estimating and letting the
// mutator outpace the garbage collector.
scanWorkExpected := memstats.heap_scan
// Compute the mutator assist ratio so by the time the mutator
// allocates the remaining heap bytes up to next_gc, it will
......@@ -443,9 +431,6 @@ func (c *gcControllerState) endCycle() {
// transient changes. Values near 1 may be unstable.
const triggerGain = 0.5
// EWMA weight given to this cycle's scan work ratio.
const workRatioWeight = 0.75
// Stop the revise timer
deltimer(&c.reviseTimer)
......@@ -484,12 +469,6 @@ func (c *gcControllerState) endCycle() {
c.triggerRatio = goalGrowthRatio * 0.95
}
// Compute the scan work ratio for this cycle.
workRatio := float64(c.scanWork) / float64(work.bytesMarked)
// Update EWMA of recent scan work ratios.
c.workRatioAvg = workRatioWeight*workRatio + (1-workRatioWeight)*c.workRatioAvg
if debug.gcpacertrace > 0 {
// Print controller state in terms of the design
// document.
......@@ -502,14 +481,12 @@ func (c *gcControllerState) endCycle() {
u_a := utilization
u_g := gcGoalUtilization
W_a := c.scanWork
w_a := workRatio
w_ewma := c.workRatioAvg
print("pacer: H_m_prev=", H_m_prev,
" h_t=", h_t, " H_T=", H_T,
" h_a=", h_a, " H_a=", H_a,
" h_g=", h_g, " H_g=", H_g,
" u_a=", u_a, " u_g=", u_g,
" W_a=", W_a, " w_a=", w_a, " w_ewma=", w_ewma,
" W_a=", W_a,
" goalΔ=", goalGrowthRatio-h_t,
" actualΔ=", h_a-h_t,
" u_a/u_g=", u_a/u_g,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment