• Ben Shi's avatar
    cmd/compile/internal/ssa: optimize arm64 with FNMULS/FNMULD · ebb77aa8
    Ben Shi authored
    FNMULS&FNMULD are efficient arm64 instructions, which can be used
    to improve FP performance. This CL use them to optimize pairs of neg-mul
    operations.
    
    Here are benchmark test results on Raspberry Pi 3 with ArchLinux.
    
    1. A special test case gets about 15% improvement.
    (https://github.com/benshi001/ugo1/blob/master/fpmul_test.go)
    FPMul-4                     485µs ± 0%     410µs ± 0%  -15.49%  (p=0.000 n=26+23)
    
    2. There is little regression in the go1 benchmark (excluding noise).
    name                     old time/op    new time/op    delta
    BinaryTree17-4              42.0s ± 3%     42.1s ± 2%    ~     (p=0.542 n=39+40)
    Fannkuch11-4                33.3s ± 3%     32.9s ± 1%    ~     (p=0.200 n=40+32)
    FmtFprintfEmpty-4           534ns ± 0%     534ns ± 0%    ~     (all equal)
    FmtFprintfString-4         1.09µs ± 1%    1.09µs ± 0%    ~     (p=0.950 n=32+32)
    FmtFprintfInt-4            1.14µs ± 0%    1.14µs ± 1%    ~     (p=0.571 n=32+31)
    FmtFprintfIntInt-4         1.79µs ± 3%    1.76µs ± 0%  -1.42%  (p=0.004 n=40+34)
    FmtFprintfPrefixedInt-4    2.17µs ± 0%    2.17µs ± 0%    ~     (p=0.073 n=31+34)
    FmtFprintfFloat-4          3.33µs ± 3%    3.28µs ± 0%  -1.46%  (p=0.001 n=40+34)
    FmtManyArgs-4              7.28µs ± 6%    7.19µs ± 0%    ~     (p=0.641 n=40+33)
    GobDecode-4                96.5ms ± 4%    96.5ms ± 9%    ~     (p=0.214 n=40+40)
    GobEncode-4                79.5ms ± 0%    80.7ms ± 4%  +1.51%  (p=0.000 n=34+40)
    Gzip-4                      4.53s ± 4%     4.56s ± 4%  +0.60%  (p=0.000 n=40+40)
    Gunzip-4                    451ms ± 3%     442ms ± 0%  -1.93%  (p=0.000 n=40+32)
    HTTPClientServer-4          530µs ± 1%     535µs ± 1%  +0.88%  (p=0.000 n=39+39)
    JSONEncode-4                214ms ± 4%     211ms ± 0%    ~     (p=0.059 n=40+31)
    JSONDecode-4                865ms ± 5%     864ms ± 4%  -0.06%  (p=0.003 n=40+40)
    Mandelbrot200-4            52.0ms ± 3%    52.1ms ± 3%    ~     (p=0.556 n=40+40)
    GoParse-4                  43.1ms ± 8%    42.1ms ± 0%    ~     (p=0.083 n=40+33)
    RegexpMatchEasy0_32-4      1.02µs ± 3%    1.02µs ± 4%  +0.06%  (p=0.020 n=40+40)
    RegexpMatchEasy0_1K-4      3.90µs ± 0%    3.96µs ± 3%  +1.58%  (p=0.000 n=31+40)
    RegexpMatchEasy1_32-4       967ns ± 4%     981ns ± 3%  +1.40%  (p=0.000 n=40+40)
    RegexpMatchEasy1_1K-4      6.41µs ± 4%    6.43µs ± 3%    ~     (p=0.386 n=40+40)
    RegexpMatchMedium_32-4     1.76µs ± 3%    1.78µs ± 3%  +1.08%  (p=0.000 n=40+40)
    RegexpMatchMedium_1K-4      561µs ± 0%     562µs ± 0%  +0.09%  (p=0.003 n=34+31)
    RegexpMatchHard_32-4       31.5µs ± 2%    31.1µs ± 4%  -1.17%  (p=0.000 n=30+40)
    RegexpMatchHard_1K-4        960µs ± 3%     950µs ± 4%  -1.02%  (p=0.016 n=40+40)
    Revcomp-4                   7.79s ± 7%     7.79s ± 4%    ~     (p=0.859 n=40+40)
    Template-4                  889ms ± 6%     872ms ± 3%  -1.86%  (p=0.025 n=40+31)
    TimeParse-4                4.80µs ± 0%    4.89µs ± 3%  +1.71%  (p=0.001 n=31+40)
    TimeFormat-4               4.70µs ± 1%    4.78µs ± 3%  +1.57%  (p=0.000 n=33+40)
    [Geo mean]                  710µs          709µs       -0.13%
    
    name                     old speed      new speed      delta
    GobDecode-4              7.96MB/s ± 4%  7.96MB/s ± 9%    ~     (p=0.174 n=40+40)
    GobEncode-4              9.65MB/s ± 0%  9.51MB/s ± 4%  -1.45%  (p=0.000 n=34+40)
    Gzip-4                   4.29MB/s ± 4%  4.26MB/s ± 4%  -0.59%  (p=0.000 n=40+40)
    Gunzip-4                 43.0MB/s ± 3%  43.9MB/s ± 0%  +1.90%  (p=0.000 n=40+32)
    JSONEncode-4             9.09MB/s ± 4%  9.22MB/s ± 0%    ~     (p=0.429 n=40+31)
    JSONDecode-4             2.25MB/s ± 5%  2.25MB/s ± 4%    ~     (p=0.278 n=40+40)
    GoParse-4                1.35MB/s ± 7%  1.37MB/s ± 0%    ~     (p=0.071 n=40+25)
    RegexpMatchEasy0_32-4    31.5MB/s ± 3%  31.5MB/s ± 4%  -0.08%  (p=0.018 n=40+40)
    RegexpMatchEasy0_1K-4     263MB/s ± 0%   259MB/s ± 3%  -1.51%  (p=0.000 n=31+40)
    RegexpMatchEasy1_32-4    33.1MB/s ± 4%  32.6MB/s ± 3%  -1.38%  (p=0.000 n=40+40)
    RegexpMatchEasy1_1K-4     160MB/s ± 4%   159MB/s ± 3%    ~     (p=0.364 n=40+40)
    RegexpMatchMedium_32-4    565kB/s ± 3%   562kB/s ± 2%    ~     (p=0.208 n=40+40)
    RegexpMatchMedium_1K-4   1.82MB/s ± 0%  1.82MB/s ± 0%  -0.27%  (p=0.000 n=34+31)
    RegexpMatchHard_32-4     1.02MB/s ± 3%  1.03MB/s ± 4%  +1.04%  (p=0.000 n=32+40)
    RegexpMatchHard_1K-4     1.07MB/s ± 4%  1.08MB/s ± 4%  +0.94%  (p=0.003 n=40+40)
    Revcomp-4                32.6MB/s ± 7%  32.6MB/s ± 4%    ~     (p=0.965 n=40+40)
    Template-4               2.18MB/s ± 6%  2.22MB/s ± 3%  +1.83%  (p=0.020 n=40+31)
    [Geo mean]               7.77MB/s       7.78MB/s       +0.16%
    
    3. There is little change in the compilecmp benchmark (excluding noise).
    name        old time/op       new time/op       delta
    Template          2.37s ± 3%        2.35s ± 4%    ~     (p=0.529 n=10+10)
    Unicode           1.38s ± 8%        1.36s ± 5%    ~     (p=0.247 n=10+10)
    GoTypes           8.10s ± 2%        8.10s ± 2%    ~     (p=0.971 n=10+10)
    Compiler          40.5s ± 4%        40.8s ± 1%    ~     (p=0.529 n=10+10)
    SSA                115s ± 2%         115s ± 3%    ~     (p=0.684 n=10+10)
    Flate             1.45s ± 5%        1.46s ± 3%    ~     (p=0.796 n=10+10)
    GoParser          1.86s ± 4%        1.84s ± 2%    ~     (p=0.095 n=9+10)
    Reflect           5.11s ± 2%        5.13s ± 2%    ~     (p=0.315 n=10+10)
    Tar               2.22s ± 3%        2.23s ± 1%    ~     (p=0.299 n=9+7)
    XML               2.72s ± 3%        2.72s ± 3%    ~     (p=0.912 n=10+10)
    [Geo mean]        5.03s             5.02s       -0.21%
    
    name        old user-time/op  new user-time/op  delta
    Template          2.92s ± 2%        2.89s ± 1%    ~     (p=0.247 n=10+10)
    Unicode           1.71s ± 5%        1.69s ± 4%    ~     (p=0.393 n=10+10)
    GoTypes           9.78s ± 2%        9.76s ± 2%    ~     (p=0.631 n=10+10)
    Compiler          49.1s ± 2%        49.1s ± 1%    ~     (p=0.796 n=10+10)
    SSA                144s ± 1%         144s ± 2%    ~     (p=0.796 n=10+10)
    Flate             1.74s ± 2%        1.73s ± 3%    ~     (p=0.842 n=10+9)
    GoParser          2.23s ± 3%        2.25s ± 2%    ~     (p=0.143 n=10+10)
    Reflect           5.93s ± 3%        5.98s ± 2%    ~     (p=0.211 n=10+9)
    Tar               2.65s ± 2%        2.69s ± 3%  +1.51%  (p=0.010 n=9+10)
    XML               3.25s ± 2%        3.21s ± 1%  -1.24%  (p=0.035 n=10+9)
    [Geo mean]        6.07s             6.07s       -0.08%
    
    name        old text-bytes    new text-bytes    delta
    HelloSize         641kB ± 0%        641kB ± 0%    ~     (all equal)
    
    name        old data-bytes    new data-bytes    delta
    HelloSize        9.46kB ± 0%       9.46kB ± 0%    ~     (all equal)
    
    name        old bss-bytes     new bss-bytes     delta
    HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
    
    name        old exe-bytes     new exe-bytes     delta
    HelloSize        1.24MB ± 0%       1.24MB ± 0%    ~     (all equal)
    
    Change-Id: Id095d998c380eef929755124084df02446a6b7c1
    Reviewed-on: https://go-review.googlesource.com/92555
    Run-TryBot: Cherry Zhang <cherryyz@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
    ebb77aa8
Name
Last commit
Last update
..
archive Loading commit data...
bufio Loading commit data...
builtin Loading commit data...
bytes Loading commit data...
cmd Loading commit data...
compress Loading commit data...
container Loading commit data...
context Loading commit data...
crypto Loading commit data...
database/sql Loading commit data...
debug Loading commit data...
encoding Loading commit data...
errors Loading commit data...
expvar Loading commit data...
flag Loading commit data...
fmt Loading commit data...
go Loading commit data...
hash Loading commit data...
html Loading commit data...
image Loading commit data...
index/suffixarray Loading commit data...
internal Loading commit data...
io Loading commit data...
log Loading commit data...
math Loading commit data...
mime Loading commit data...
net Loading commit data...
os Loading commit data...
path Loading commit data...
plugin Loading commit data...
reflect Loading commit data...
regexp Loading commit data...
runtime Loading commit data...
sort Loading commit data...
strconv Loading commit data...
strings Loading commit data...
sync Loading commit data...
syscall Loading commit data...
testing Loading commit data...
text Loading commit data...
time Loading commit data...
unicode Loading commit data...
unsafe Loading commit data...
vendor/golang_org/x Loading commit data...
Make.dist Loading commit data...
all.bash Loading commit data...
all.bat Loading commit data...
all.rc Loading commit data...
androidtest.bash Loading commit data...
bootstrap.bash Loading commit data...
buildall.bash Loading commit data...
clean.bash Loading commit data...
clean.bat Loading commit data...
clean.rc Loading commit data...
cmp.bash Loading commit data...
iostest.bash Loading commit data...
make.bash Loading commit data...
make.bat Loading commit data...
make.rc Loading commit data...
naclmake.bash Loading commit data...
nacltest.bash Loading commit data...
race.bash Loading commit data...
race.bat Loading commit data...
run.bash Loading commit data...
run.bat Loading commit data...
run.rc Loading commit data...