Commit 6383fb61 authored by Austin Clements's avatar Austin Clements

runtime: deduct correct sweep credit

deductSweepCredit expects the size in bytes of the span being
allocated, but mCentral_CacheSpan passes the size of a single object
in the span. As a result, we don't sweep enough on that call and when
mCentral_CacheSpan later calls reimburseSweepCredit, it's very likely
to underflow mheap_.spanBytesAlloc, which causes the next call to
deductSweepCredit to think it owes a huge number of pages and finish
off the whole sweep.

In addition to causing the occasional allocation that triggers the
full sweep to be potentially extremely expensive relative to other
allocations, this can indirectly slow down many other allocations.
deductSweepCredit uses sweepone to sweep spans, which returns
fully-unused spans to the heap, where these spans are freed and
coalesced with neighboring free spans. On the other hand, when
mCentral_CacheSpan sweeps a span, it does so with the intent to
immediately reuse that span and, as a result, will not return the span
to the heap even if it is fully unused. This saves on the cost of
locking the heap, finding a span, and initializing that span. For
example, before this change, with GOMAXPROCS=1 (or the background
sweeper disabled) BinaryTree17 returned roughly 220K spans to the heap
and allocated new spans from the heap roughly 232K times. After this
change, it returns 1.3K spans to the heap and allocates new spans from
the heap 39K times. (With background sweeping these numbers are
effectively unchanged because the background sweeper sweeps almost all
of the spans with sweepone; however, parallel sweeping saves more than
the cost of allocating spans from the heap.)

Fixes #13535.
Fixes #13589.

name                      old time/op    new time/op    delta
BinaryTree17-12              3.03s ± 1%     2.86s ± 4%  -5.61%  (p=0.000 n=18+20)
Fannkuch11-12                2.48s ± 1%     2.49s ± 1%    ~     (p=0.060 n=17+20)
FmtFprintfEmpty-12          50.7ns ± 1%    50.9ns ± 1%  +0.43%  (p=0.025 n=15+16)
FmtFprintfString-12          174ns ± 2%     174ns ± 2%    ~     (p=0.539 n=19+20)
FmtFprintfInt-12             158ns ± 1%     158ns ± 1%    ~     (p=0.300 n=18+20)
FmtFprintfIntInt-12          269ns ± 2%     269ns ± 2%    ~     (p=0.784 n=20+18)
FmtFprintfPrefixedInt-12     233ns ± 1%     234ns ± 1%    ~     (p=0.389 n=18+18)
FmtFprintfFloat-12           309ns ± 1%     310ns ± 1%  +0.25%  (p=0.048 n=18+18)
FmtManyArgs-12              1.10µs ± 1%    1.10µs ± 1%    ~     (p=0.259 n=18+19)
GobDecode-12                7.81ms ± 1%    7.72ms ± 1%  -1.17%  (p=0.000 n=19+19)
GobEncode-12                6.56ms ± 0%    6.55ms ± 1%    ~     (p=0.433 n=17+19)
Gzip-12                      318ms ± 2%     317ms ± 1%    ~     (p=0.578 n=19+18)
Gunzip-12                   42.1ms ± 2%    42.0ms ± 0%  -0.45%  (p=0.007 n=18+16)
HTTPClientServer-12         63.9µs ± 1%    64.0µs ± 1%    ~     (p=0.146 n=17+19)
JSONEncode-12               16.4ms ± 1%    16.4ms ± 1%    ~     (p=0.271 n=19+19)
JSONDecode-12               58.1ms ± 1%    58.0ms ± 1%    ~     (p=0.152 n=18+18)
Mandelbrot200-12            3.85ms ± 0%    3.85ms ± 0%    ~     (p=0.126 n=19+18)
GoParse-12                  3.71ms ± 1%    3.64ms ± 1%  -1.86%  (p=0.000 n=20+18)
RegexpMatchEasy0_32-12       100ns ± 2%     100ns ± 1%    ~     (p=0.588 n=20+20)
RegexpMatchEasy0_1K-12       346ns ± 1%     347ns ± 1%  +0.27%  (p=0.014 n=17+20)
RegexpMatchEasy1_32-12      82.9ns ± 3%    83.5ns ± 3%    ~     (p=0.096 n=19+20)
RegexpMatchEasy1_1K-12       506ns ± 1%     506ns ± 1%    ~     (p=0.530 n=19+19)
RegexpMatchMedium_32-12      129ns ± 2%     129ns ± 1%    ~     (p=0.566 n=20+19)
RegexpMatchMedium_1K-12     39.4µs ± 1%    39.4µs ± 1%    ~     (p=0.713 n=19+20)
RegexpMatchHard_32-12       2.05µs ± 1%    2.06µs ± 1%  +0.36%  (p=0.008 n=18+20)
RegexpMatchHard_1K-12       61.6µs ± 1%    61.7µs ± 1%    ~     (p=0.286 n=19+20)
Revcomp-12                   538ms ± 1%     541ms ± 2%    ~     (p=0.081 n=18+19)
Template-12                 71.5ms ± 2%    71.6ms ± 1%    ~     (p=0.513 n=20+19)
TimeParse-12                 357ns ± 1%     357ns ± 1%    ~     (p=0.935 n=19+18)
TimeFormat-12                352ns ± 1%     352ns ± 1%    ~     (p=0.293 n=19+20)
[Geo mean]                  62.0µs         61.9µs       -0.21%

name              old time/op  new time/op  delta
XBenchGarbage-12  5.83ms ± 2%  5.86ms ± 3%    ~     (p=0.247 n=19+20)

Change-Id: I790bb530adace27ccf25d372f24a11954b88443c
Reviewed-on: https://go-review.googlesource.com/17745
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
parent d26a0952
......@@ -32,7 +32,8 @@ func (c *mcentral) init(sizeclass int32) {
// Allocate a span to use in an MCache.
func (c *mcentral) cacheSpan() *mspan {
// Deduct credit for this span allocation and sweep if necessary.
deductSweepCredit(uintptr(class_to_size[c.sizeclass]), 0)
spanBytes := uintptr(class_to_allocnpages[c.sizeclass]) * _PageSize
deductSweepCredit(spanBytes, 0)
lock(&c.lock)
sg := mheap_.sweepgen
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment