• Philip Hofer's avatar
    cmd/internal/obj/arm: improve static branch prediction for wrapper prologue · a143f5d6
    Philip Hofer authored
    This is a follow-up to CL 36893.
    
    Move the unlikely branch in the wrapper prologue to the end
    of the function, where it has minimal impact on the instruction
    cache. Static branch prediction is also less likely to choose
    a forward branch.
    
    Updates #19042
    
    sort benchmarks:
    name                  old time/op  new time/op  delta
    SearchWrappers-4      1.44µs ± 0%  1.45µs ± 0%  +1.15%  (p=0.000 n=9+10)
    SortString1K-4        1.02ms ± 0%  1.04ms ± 0%  +2.39%  (p=0.000 n=10+10)
    SortString1K_Slice-4   960µs ± 0%   989µs ± 0%  +2.95%  (p=0.000 n=9+10)
    StableString1K-4       218µs ± 0%   213µs ± 0%  -2.13%  (p=0.000 n=10+10)
    SortInt1K-4            541µs ± 0%   543µs ± 0%  +0.30%  (p=0.003 n=9+10)
    StableInt1K-4          760µs ± 1%   763µs ± 1%  +0.38%  (p=0.011 n=10+10)
    StableInt1K_Slice-4    840µs ± 1%   779µs ± 0%  -7.31%  (p=0.000 n=9+10)
    SortInt64K-4          55.2ms ± 0%  55.4ms ± 1%  +0.34%  (p=0.012 n=10+8)
    SortInt64K_Slice-4    56.2ms ± 0%  55.6ms ± 1%  -1.16%  (p=0.000 n=10+10)
    StableInt64K-4        70.9ms ± 1%  71.0ms ± 0%    ~     (p=0.315 n=10+7)
    Sort1e2-4              250µs ± 0%   249µs ± 1%    ~     (p=0.315 n=9+10)
    Stable1e2-4            600µs ± 0%   594µs ± 0%  -1.09%  (p=0.000 n=9+10)
    Sort1e4-4             51.2ms ± 0%  51.4ms ± 1%  +0.40%  (p=0.001 n=9+10)
    Stable1e4-4            204ms ± 1%   199ms ± 1%  -2.27%  (p=0.000 n=10+10)
    Sort1e6-4              8.42s ± 0%   8.44s ± 0%  +0.28%  (p=0.000 n=8+9)
    Stable1e6-4            43.3s ± 0%   42.5s ± 1%  -1.89%  (p=0.000 n=9+9)
    
    Change-Id: I827559aa557fdba211a38ce3f77137b471c5c67e
    Reviewed-on: https://go-review.googlesource.com/37611
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    Reviewed-by: 's avatarJosh Bleecher Snyder <josharian@gmail.com>
    a143f5d6
obj5.go 20.7 KB