• Austin Clements's avatar
    runtime: bound scanobject to ~100 µs · cf4f1d07
    Austin Clements authored
    Currently the time spent in scanobject is proportional to the size of
    the object being scanned. Since scanobject is non-preemptible, large
    objects can cause significant goroutine (and even whole application)
    delays through several means:
    
    1. If a GC assist picks up a large object, the allocating goroutine is
       blocked for the whole scan, even if that scan well exceeds that
       goroutine's debt.
    
    2. Since the scheduler does not run on the P performing a large object
       scan, goroutines in that P's run queue do not run unless they are
       stolen by another P (which can take some time). If there are a few
       large objects, all of the Ps may get tied up so the scheduler
       doesn't run anywhere.
    
    3. Even if a large object is scanned by a background worker and other
       Ps are still running the scheduler, the large object scan doesn't
       flush background credit until the whole scan is done. This can
       easily cause all allocations to block in assists, waiting for
       credit, causing an effective STW.
    
    Fix this by splitting large objects into 128 KB "oblets" and scanning
    at most one oblet at a time. Since we can scan 1–2 MB/ms, this equates
    to bounding scanobject at roughly 100 µs. This improves assist
    behavior both because assists can no longer get "unlucky" and be stuck
    scanning a large object, and because it causes the background worker
    to flush credit and unblock assists more frequently when scanning
    large objects. This also improves GC parallelism if the heap consists
    primarily of a small number of very large objects by letting multiple
    workers scan a large objects in parallel.
    
    Fixes #10345. Fixes #16293.
    
    This substantially improves goroutine latency in the benchmark from
    issue #16293, which exercises several forms of very large objects:
    
    name                 old max-latency    new max-latency    delta
    SliceNoPointer-12           154µs ± 1%        155µs ±  2%     ~     (p=0.087 n=13+12)
    SlicePointer-12             314ms ± 1%       5.94ms ±138%  -98.11%  (p=0.000 n=19+20)
    SliceLivePointer-12        1148ms ± 0%       4.72ms ±167%  -99.59%  (p=0.000 n=19+20)
    MapNoPointer-12           72509µs ± 1%        408µs ±325%  -99.44%  (p=0.000 n=19+18)
    ChanPointer-12              313ms ± 0%       4.74ms ±140%  -98.49%  (p=0.000 n=18+20)
    ChanLivePointer-12         1147ms ± 0%       3.30ms ±149%  -99.71%  (p=0.000 n=19+20)
    
    name                 old P99.9-latency  new P99.9-latency  delta
    SliceNoPointer-12           113µs ±25%         107µs ±12%     ~     (p=0.153 n=20+18)
    SlicePointer-12          309450µs ± 0%         133µs ±23%  -99.96%  (p=0.000 n=20+20)
    SliceLivePointer-12         961ms ± 0%        1.35ms ±27%  -99.86%  (p=0.000 n=20+20)
    MapNoPointer-12            448µs ±288%         119µs ±18%  -73.34%  (p=0.000 n=18+20)
    ChanPointer-12           309450µs ± 0%         134µs ±23%  -99.96%  (p=0.000 n=20+19)
    ChanLivePointer-12          961ms ± 0%        1.35ms ±27%  -99.86%  (p=0.000 n=20+20)
    
    This has negligible effect on all metrics from the garbage, JSON, and
    HTTP x/benchmarks.
    
    It shows slight improvement on some of the go1 benchmarks,
    particularly Revcomp, which uses some multi-megabyte buffers:
    
    name                      old time/op    new time/op    delta
    BinaryTree17-12              2.46s ± 1%     2.47s ± 1%  +0.32%  (p=0.012 n=20+20)
    Fannkuch11-12                2.82s ± 0%     2.81s ± 0%  -0.61%  (p=0.000 n=17+20)
    FmtFprintfEmpty-12          50.8ns ± 5%    50.5ns ± 2%    ~     (p=0.197 n=17+19)
    FmtFprintfString-12          131ns ± 1%     132ns ± 0%  +0.57%  (p=0.000 n=20+16)
    FmtFprintfInt-12             117ns ± 0%     116ns ± 0%  -0.47%  (p=0.000 n=15+20)
    FmtFprintfIntInt-12          180ns ± 0%     179ns ± 1%  -0.78%  (p=0.000 n=16+20)
    FmtFprintfPrefixedInt-12     186ns ± 1%     185ns ± 1%  -0.55%  (p=0.000 n=19+20)
    FmtFprintfFloat-12           263ns ± 1%     271ns ± 0%  +2.84%  (p=0.000 n=18+20)
    FmtManyArgs-12               741ns ± 1%     742ns ± 1%    ~     (p=0.190 n=19+19)
    GobDecode-12                7.44ms ± 0%    7.35ms ± 1%  -1.21%  (p=0.000 n=20+20)
    GobEncode-12                6.22ms ± 1%    6.21ms ± 1%    ~     (p=0.336 n=20+19)
    Gzip-12                      220ms ± 1%     219ms ± 1%    ~     (p=0.130 n=19+19)
    Gunzip-12                   37.9ms ± 0%    37.9ms ± 1%    ~     (p=1.000 n=20+19)
    HTTPClientServer-12         82.5µs ± 3%    82.6µs ± 3%    ~     (p=0.776 n=20+19)
    JSONEncode-12               16.4ms ± 1%    16.5ms ± 2%  +0.49%  (p=0.003 n=18+19)
    JSONDecode-12               53.7ms ± 1%    54.1ms ± 1%  +0.71%  (p=0.000 n=19+18)
    Mandelbrot200-12            4.19ms ± 1%    4.20ms ± 1%    ~     (p=0.452 n=19+19)
    GoParse-12                  3.38ms ± 1%    3.37ms ± 1%    ~     (p=0.123 n=19+19)
    RegexpMatchEasy0_32-12      72.1ns ± 1%    71.8ns ± 1%    ~     (p=0.397 n=19+17)
    RegexpMatchEasy0_1K-12       242ns ± 0%     242ns ± 0%    ~     (p=0.168 n=17+20)
    RegexpMatchEasy1_32-12      72.1ns ± 1%    72.1ns ± 1%    ~     (p=0.538 n=18+19)
    RegexpMatchEasy1_1K-12       385ns ± 1%     384ns ± 1%    ~     (p=0.388 n=20+20)
    RegexpMatchMedium_32-12      112ns ± 1%     112ns ± 3%    ~     (p=0.539 n=20+20)
    RegexpMatchMedium_1K-12     34.4µs ± 2%    34.4µs ± 2%    ~     (p=0.628 n=18+18)
    RegexpMatchHard_32-12       1.80µs ± 1%    1.80µs ± 1%    ~     (p=0.522 n=18+19)
    RegexpMatchHard_1K-12       54.0µs ± 1%    54.1µs ± 1%    ~     (p=0.647 n=20+19)
    Revcomp-12                   387ms ± 1%     369ms ± 5%  -4.89%  (p=0.000 n=17+19)
    Template-12                 62.3ms ± 1%    62.0ms ± 0%  -0.48%  (p=0.002 n=20+17)
    TimeParse-12                 314ns ± 1%     314ns ± 0%    ~     (p=1.011 n=20+13)
    TimeFormat-12                358ns ± 0%     354ns ± 0%  -1.12%  (p=0.000 n=17+20)
    [Geo mean]                  53.5µs         53.3µs       -0.23%
    
    Change-Id: I2a0a179d1d6bf7875dd054b7693dd12d2a340132
    Reviewed-on: https://go-review.googlesource.com/23540
    Run-TryBot: Austin Clements <austin@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
    cf4f1d07
mgcwork.go 11.3 KB