Commit 17db6e04 authored by Austin Clements's avatar Austin Clements

runtime: use heap scan size as estimate of GC scan work

Currently, the GC uses a moving average of recent scan work ratios to
estimate the total scan work required by this cycle. This is in turn
used to compute how much scan work should be done by mutators when
they allocate in order to perform all expected scan work by the time
the allocated heap reaches the heap goal.

However, our current scan work estimate can be arbitrarily wrong if
the heap topography changes significantly from one cycle to the
next. For example, in the go1 benchmarks, at the beginning of each
benchmark, the heap is dominated by a 256MB no-scan object, so the GC
learns that the scan density of the heap is very low. In benchmarks
that then rapidly allocate pointer-dense objects, by the time of the
next GC cycle, our estimate of the scan work can be too low by a large
factor. This in turn lets the mutator allocate faster than the GC can
collect, allowing it to get arbitrarily far ahead of the scan work
estimate, which leads to very long GC cycles with very little mutator
assist that can overshoot the heap goal by large margins. This is
particularly easy to demonstrate with BinaryTree17:

$ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17
gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P
gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P
testing: warning: no tests to run
PASS
BenchmarkBinaryTree17	gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced)
gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P
       1       9150447353 ns/op

Change this estimate to instead use the *current* scannable heap
size. This has the advantage of being based solely on the current
state of the heap, not on past densities or reachable heap sizes, so
it isn't susceptible to falling behind during these sorts of phase
changes. This is strictly an over-estimate, but it's better to
over-estimate and get more assist than necessary than it is to
under-estimate and potentially spiral out of control. Experiments with
scaling this estimate back showed no obvious benefit for mutator
utilization, heap size, or assist time.

This new estimate has little effect for most benchmarks, including
most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge
effect for benchmarks that triggered the bad pacer behavior:

name                   old mean              new mean              delta
BinaryTree17            10.0s × (1.00,1.00)    3.5s × (0.98,1.01)  -64.93% (p=0.000)
Fannkuch11              2.74s × (1.00,1.01)   2.65s × (1.00,1.00)   -3.52% (p=0.000)
FmtFprintfEmpty        56.4ns × (0.99,1.00)  57.8ns × (1.00,1.01)   +2.43% (p=0.000)
FmtFprintfString        187ns × (0.99,1.00)   185ns × (0.99,1.01)   -1.19% (p=0.010)
FmtFprintfInt           184ns × (1.00,1.00)   183ns × (1.00,1.00)  (no variance)
FmtFprintfIntInt        321ns × (1.00,1.00)   315ns × (1.00,1.00)   -1.80% (p=0.000)
FmtFprintfPrefixedInt   266ns × (1.00,1.00)   263ns × (1.00,1.00)   -1.22% (p=0.000)
FmtFprintfFloat         353ns × (1.00,1.00)   353ns × (1.00,1.00)   -0.13% (p=0.035)
FmtManyArgs            1.21µs × (1.00,1.00)  1.19µs × (1.00,1.00)   -1.33% (p=0.000)
GobDecode              9.69ms × (1.00,1.00)  9.59ms × (1.00,1.00)   -1.07% (p=0.000)
GobEncode              7.89ms × (0.99,1.01)  7.74ms × (1.00,1.00)   -1.92% (p=0.000)
Gzip                    391ms × (1.00,1.00)   392ms × (1.00,1.00)     ~    (p=0.522)
Gunzip                 97.1ms × (1.00,1.00)  97.0ms × (1.00,1.00)   -0.10% (p=0.000)
HTTPClientServer       55.7µs × (0.99,1.01)  56.7µs × (0.99,1.01)   +1.81% (p=0.001)
JSONEncode             19.1ms × (1.00,1.00)  19.0ms × (1.00,1.00)   -0.85% (p=0.000)
JSONDecode             66.8ms × (1.00,1.00)  66.9ms × (1.00,1.00)     ~    (p=0.288)
Mandelbrot200          4.13ms × (1.00,1.00)  4.12ms × (1.00,1.00)   -0.08% (p=0.000)
GoParse                3.97ms × (1.00,1.01)  4.01ms × (1.00,1.00)   +0.99% (p=0.000)
RegexpMatchEasy0_32     114ns × (1.00,1.00)   115ns × (0.99,1.00)     ~    (p=0.070)
RegexpMatchEasy0_1K     376ns × (1.00,1.00)   376ns × (1.00,1.00)     ~    (p=0.900)
RegexpMatchEasy1_32    94.9ns × (1.00,1.00)  96.3ns × (1.00,1.01)   +1.53% (p=0.001)
RegexpMatchEasy1_1K     568ns × (1.00,1.00)   567ns × (1.00,1.00)   -0.22% (p=0.001)
RegexpMatchMedium_32    159ns × (1.00,1.00)   159ns × (1.00,1.00)     ~    (p=0.178)
RegexpMatchMedium_1K   46.4µs × (1.00,1.00)  46.6µs × (1.00,1.00)   +0.29% (p=0.000)
RegexpMatchHard_32     2.37µs × (1.00,1.00)  2.37µs × (1.00,1.00)     ~    (p=0.722)
RegexpMatchHard_1K     71.1µs × (1.00,1.00)  71.2µs × (1.00,1.00)     ~    (p=0.229)
Revcomp                 565ms × (1.00,1.00)   562ms × (1.00,1.00)   -0.52% (p=0.000)
Template               81.0ms × (1.00,1.00)  80.2ms × (1.00,1.00)   -0.97% (p=0.000)
TimeParse               380ns × (1.00,1.00)   380ns × (1.00,1.00)     ~    (p=0.148)
TimeFormat              405ns × (0.99,1.00)   385ns × (0.99,1.00)   -5.00% (p=0.000)

Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4
Reviewed-on: https://go-review.googlesource.com/9696Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
parent 3be3cbd5
...@@ -238,13 +238,6 @@ const ( ...@@ -238,13 +238,6 @@ const (
// GOMAXPROCS. The high-level design of this algorithm is documented // GOMAXPROCS. The high-level design of this algorithm is documented
// at http://golang.org/s/go15gcpacing. // at http://golang.org/s/go15gcpacing.
var gcController = gcControllerState{ var gcController = gcControllerState{
// Initial work ratio guess.
//
// TODO(austin): This is based on the work ratio of the
// compiler on ./all.bash. Run a wider variety of programs and
// see what their work ratios are.
workRatioAvg: 0.5 / float64(ptrSize),
// Initial trigger ratio guess. // Initial trigger ratio guess.
triggerRatio: 7 / 8.0, triggerRatio: 7 / 8.0,
} }
...@@ -254,6 +247,10 @@ type gcControllerState struct { ...@@ -254,6 +247,10 @@ type gcControllerState struct {
// is updated atomically during the cycle. Updates may be // is updated atomically during the cycle. Updates may be
// batched arbitrarily, since the value is only read at the // batched arbitrarily, since the value is only read at the
// end of the cycle. // end of the cycle.
//
// Currently this is the bytes of heap scanned. For most uses,
// this is an opaque unit of work, but for estimation the
// definition is important.
scanWork int64 scanWork int64
// bgScanCredit is the scan work credit accumulated by the // bgScanCredit is the scan work credit accumulated by the
...@@ -299,10 +296,6 @@ type gcControllerState struct { ...@@ -299,10 +296,6 @@ type gcControllerState struct {
// dedicated mark workers get started. // dedicated mark workers get started.
dedicatedMarkWorkersNeeded int64 dedicatedMarkWorkersNeeded int64
// workRatioAvg is a moving average of the scan work ratio
// (scan work per byte marked).
workRatioAvg float64
// assistRatio is the ratio of allocated bytes to scan work // assistRatio is the ratio of allocated bytes to scan work
// that should be performed by mutator assists. This is // that should be performed by mutator assists. This is
// computed at the beginning of each cycle. // computed at the beginning of each cycle.
...@@ -399,21 +392,16 @@ func (c *gcControllerState) startCycle() { ...@@ -399,21 +392,16 @@ func (c *gcControllerState) startCycle() {
// improved estimates. This should be called periodically during // improved estimates. This should be called periodically during
// concurrent mark. // concurrent mark.
func (c *gcControllerState) revise() { func (c *gcControllerState) revise() {
// Estimate the size of the marked heap. We don't have much to // Compute the expected scan work. This is a strict upper
// go on, so at the beginning of the cycle this uses the // bound on the possible scan work in the current heap.
// marked heap size from last cycle. If the reachable heap has //
// grown since last cycle, we'll eventually mark more than // You might consider dividing this by 2 (or by
// this and we can revise our estimate. This way, if we // (100+GOGC)/100) to counter this over-estimation, but
// overshoot our initial estimate, the assist ratio will climb // benchmarks show that this has almost no effect on mean
// smoothly and put more pressure on mutator assists to finish // mutator utilization, heap size, or assist time and it
// the cycle. // introduces the danger of under-estimating and letting the
heapMarkedEstimate := memstats.heap_marked // mutator outpace the garbage collector.
if heapMarkedEstimate < work.bytesMarked { scanWorkExpected := memstats.heap_scan
heapMarkedEstimate = work.bytesMarked
}
// Compute the expected work based on this estimate.
scanWorkExpected := uint64(float64(heapMarkedEstimate) * c.workRatioAvg)
// Compute the mutator assist ratio so by the time the mutator // Compute the mutator assist ratio so by the time the mutator
// allocates the remaining heap bytes up to next_gc, it will // allocates the remaining heap bytes up to next_gc, it will
...@@ -443,9 +431,6 @@ func (c *gcControllerState) endCycle() { ...@@ -443,9 +431,6 @@ func (c *gcControllerState) endCycle() {
// transient changes. Values near 1 may be unstable. // transient changes. Values near 1 may be unstable.
const triggerGain = 0.5 const triggerGain = 0.5
// EWMA weight given to this cycle's scan work ratio.
const workRatioWeight = 0.75
// Stop the revise timer // Stop the revise timer
deltimer(&c.reviseTimer) deltimer(&c.reviseTimer)
...@@ -484,12 +469,6 @@ func (c *gcControllerState) endCycle() { ...@@ -484,12 +469,6 @@ func (c *gcControllerState) endCycle() {
c.triggerRatio = goalGrowthRatio * 0.95 c.triggerRatio = goalGrowthRatio * 0.95
} }
// Compute the scan work ratio for this cycle.
workRatio := float64(c.scanWork) / float64(work.bytesMarked)
// Update EWMA of recent scan work ratios.
c.workRatioAvg = workRatioWeight*workRatio + (1-workRatioWeight)*c.workRatioAvg
if debug.gcpacertrace > 0 { if debug.gcpacertrace > 0 {
// Print controller state in terms of the design // Print controller state in terms of the design
// document. // document.
...@@ -502,14 +481,12 @@ func (c *gcControllerState) endCycle() { ...@@ -502,14 +481,12 @@ func (c *gcControllerState) endCycle() {
u_a := utilization u_a := utilization
u_g := gcGoalUtilization u_g := gcGoalUtilization
W_a := c.scanWork W_a := c.scanWork
w_a := workRatio
w_ewma := c.workRatioAvg
print("pacer: H_m_prev=", H_m_prev, print("pacer: H_m_prev=", H_m_prev,
" h_t=", h_t, " H_T=", H_T, " h_t=", h_t, " H_T=", H_T,
" h_a=", h_a, " H_a=", H_a, " h_a=", h_a, " H_a=", H_a,
" h_g=", h_g, " H_g=", H_g, " h_g=", h_g, " H_g=", H_g,
" u_a=", u_a, " u_g=", u_g, " u_a=", u_a, " u_g=", u_g,
" W_a=", W_a, " w_a=", w_a, " w_ewma=", w_ewma, " W_a=", W_a,
" goalΔ=", goalGrowthRatio-h_t, " goalΔ=", goalGrowthRatio-h_t,
" actualΔ=", h_a-h_t, " actualΔ=", h_a-h_t,
" u_a/u_g=", u_a/u_g, " u_a/u_g=", u_a/u_g,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment