• Austin Clements's avatar
    cmd/compile: don't produce a past-the-end pointer in range loops · 837ed98d
    Austin Clements authored
    Currently, range loops over slices and arrays are compiled roughly
    like:
    
    for i, x := range s { b }
      ⇓
    for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b }
      ⇓
    i, _n, _p := 0, len(s), &s[0]
    goto cond
    body:
    { b }
    i, _p = i+1, _p + unsafe.Sizeof(s[0])
    cond:
    if i < _n { goto body } else { goto end }
    end:
    
    The problem with this lowering is that _p may temporarily point past
    the end of the allocation the moment before the loop terminates. Right
    now this isn't a problem because there's never a safe-point during
    this brief moment.
    
    We're about to introduce safe-points everywhere, so this bad pointer
    is going to be a problem. We could mark the increment as an unsafe
    block, but this inhibits reordering opportunities and could result in
    infrequent safe-points if the body is short.
    
    Instead, this CL fixes this by changing how we compile range loops to
    never produce this past-the-end pointer. It changes the lowering to
    roughly:
    
    i, _n, _p := 0, len(s), &s[0]
    if i < _n { goto body } else { goto end }
    top:
    _p += unsafe.Sizeof(s[0])
    body:
    { b }
    i++
    if i < _n { goto top } else { goto end }
    end:
    
    Notably, the increment is split into two parts: we increment the index
    before checking the condition, but increment the pointer only *after*
    the condition check has succeeded.
    
    The implementation builds on the OFORUNTIL construct that was
    introduced during the loop preemption experiments, since OFORUNTIL
    places the increment and condition after the loop body. To support the
    extra "late increment" step, we further define OFORUNTIL's "List"
    field to contain the late increment statements. This makes all of this
    a relatively small change.
    
    This depends on the improvements to the prove pass in CL 102603. With
    the current lowering, bounds-check elimination knows that i < _n in
    the body because the body block is dominated by the cond block. In the
    new lowering, deriving this fact requires detecting that i < _n on
    *both* paths into body and hence is true in body. CL 102603 made prove
    able to detect this.
    
    The code size effect of this is minimal. The cmd/go binary on
    linux/amd64 increases by 0.17%. Performance-wise, this actually
    appears to be a net win, though it's mostly noise:
    
    name                      old time/op    new time/op    delta
    BinaryTree17-12              2.80s ± 0%     2.61s ± 1%  -6.88%  (p=0.000 n=20+18)
    Fannkuch11-12                2.41s ± 0%     2.42s ± 0%  +0.05%  (p=0.005 n=20+20)
    FmtFprintfEmpty-12          41.6ns ± 5%    41.4ns ± 6%    ~     (p=0.765 n=20+19)
    FmtFprintfString-12         69.4ns ± 3%    69.3ns ± 1%    ~     (p=0.084 n=19+17)
    FmtFprintfInt-12            76.1ns ± 1%    77.3ns ± 1%  +1.57%  (p=0.000 n=19+19)
    FmtFprintfIntInt-12          122ns ± 2%     123ns ± 3%  +0.95%  (p=0.015 n=20+20)
    FmtFprintfPrefixedInt-12     153ns ± 2%     151ns ± 3%  -1.27%  (p=0.013 n=20+20)
    FmtFprintfFloat-12           215ns ± 0%     216ns ± 0%  +0.47%  (p=0.000 n=20+16)
    FmtManyArgs-12               486ns ± 1%     498ns ± 0%  +2.40%  (p=0.000 n=20+17)
    GobDecode-12                6.43ms ± 0%    6.50ms ± 0%  +1.08%  (p=0.000 n=18+19)
    GobEncode-12                5.43ms ± 1%    5.47ms ± 0%  +0.76%  (p=0.000 n=20+20)
    Gzip-12                      218ms ± 1%     218ms ± 1%    ~     (p=0.883 n=20+20)
    Gunzip-12                   38.8ms ± 0%    38.9ms ± 0%    ~     (p=0.644 n=19+19)
    HTTPClientServer-12         76.2µs ± 1%    76.4µs ± 2%    ~     (p=0.218 n=20+20)
    JSONEncode-12               12.2ms ± 0%    12.3ms ± 1%  +0.45%  (p=0.000 n=19+19)
    JSONDecode-12               54.2ms ± 1%    53.3ms ± 0%  -1.67%  (p=0.000 n=20+20)
    Mandelbrot200-12            3.71ms ± 0%    3.71ms ± 0%    ~     (p=0.143 n=19+20)
    GoParse-12                  3.22ms ± 0%    3.19ms ± 1%  -0.72%  (p=0.000 n=20+20)
    RegexpMatchEasy0_32-12      76.7ns ± 1%    75.8ns ± 1%  -1.19%  (p=0.000 n=20+17)
    RegexpMatchEasy0_1K-12       245ns ± 1%     243ns ± 0%  -0.72%  (p=0.000 n=18+17)
    RegexpMatchEasy1_32-12      71.9ns ± 0%    71.7ns ± 1%  -0.39%  (p=0.006 n=12+18)
    RegexpMatchEasy1_1K-12       358ns ± 1%     354ns ± 1%  -1.13%  (p=0.000 n=20+19)
    RegexpMatchMedium_32-12      105ns ± 2%     105ns ± 1%  -0.63%  (p=0.007 n=19+20)
    RegexpMatchMedium_1K-12     31.9µs ± 1%    31.9µs ± 1%    ~     (p=1.000 n=17+17)
    RegexpMatchHard_32-12       1.51µs ± 1%    1.52µs ± 2%  +0.46%  (p=0.042 n=18+18)
    RegexpMatchHard_1K-12       45.3µs ± 1%    45.5µs ± 2%  +0.44%  (p=0.029 n=18+19)
    Revcomp-12                   388ms ± 1%     385ms ± 0%  -0.57%  (p=0.000 n=19+18)
    Template-12                 63.0ms ± 1%    63.3ms ± 0%  +0.50%  (p=0.000 n=19+20)
    TimeParse-12                 309ns ± 1%     307ns ± 0%  -0.62%  (p=0.000 n=20+20)
    TimeFormat-12                328ns ± 0%     333ns ± 0%  +1.35%  (p=0.000 n=19+19)
    [Geo mean]                  47.0µs         46.9µs       -0.20%
    
    (https://perf.golang.org/search?q=upload:20180326.1)
    
    For #10958.
    For #24543.
    
    Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a
    Reviewed-on: https://go-review.googlesource.com/102604
    Run-TryBot: Austin Clements <austin@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarKeith Randall <khr@golang.org>
    Reviewed-by: 's avatarDavid Chase <drchase@google.com>
    837ed98d
prove.go 12 KB