• Ben Shi's avatar
    cmd/compile/internal/ssa: optimize arm64 with FNMULS/FNMULD · ebb77aa8
    Ben Shi authored
    FNMULS&FNMULD are efficient arm64 instructions, which can be used
    to improve FP performance. This CL use them to optimize pairs of neg-mul
    operations.
    
    Here are benchmark test results on Raspberry Pi 3 with ArchLinux.
    
    1. A special test case gets about 15% improvement.
    (https://github.com/benshi001/ugo1/blob/master/fpmul_test.go)
    FPMul-4                     485µs ± 0%     410µs ± 0%  -15.49%  (p=0.000 n=26+23)
    
    2. There is little regression in the go1 benchmark (excluding noise).
    name                     old time/op    new time/op    delta
    BinaryTree17-4              42.0s ± 3%     42.1s ± 2%    ~     (p=0.542 n=39+40)
    Fannkuch11-4                33.3s ± 3%     32.9s ± 1%    ~     (p=0.200 n=40+32)
    FmtFprintfEmpty-4           534ns ± 0%     534ns ± 0%    ~     (all equal)
    FmtFprintfString-4         1.09µs ± 1%    1.09µs ± 0%    ~     (p=0.950 n=32+32)
    FmtFprintfInt-4            1.14µs ± 0%    1.14µs ± 1%    ~     (p=0.571 n=32+31)
    FmtFprintfIntInt-4         1.79µs ± 3%    1.76µs ± 0%  -1.42%  (p=0.004 n=40+34)
    FmtFprintfPrefixedInt-4    2.17µs ± 0%    2.17µs ± 0%    ~     (p=0.073 n=31+34)
    FmtFprintfFloat-4          3.33µs ± 3%    3.28µs ± 0%  -1.46%  (p=0.001 n=40+34)
    FmtManyArgs-4              7.28µs ± 6%    7.19µs ± 0%    ~     (p=0.641 n=40+33)
    GobDecode-4                96.5ms ± 4%    96.5ms ± 9%    ~     (p=0.214 n=40+40)
    GobEncode-4                79.5ms ± 0%    80.7ms ± 4%  +1.51%  (p=0.000 n=34+40)
    Gzip-4                      4.53s ± 4%     4.56s ± 4%  +0.60%  (p=0.000 n=40+40)
    Gunzip-4                    451ms ± 3%     442ms ± 0%  -1.93%  (p=0.000 n=40+32)
    HTTPClientServer-4          530µs ± 1%     535µs ± 1%  +0.88%  (p=0.000 n=39+39)
    JSONEncode-4                214ms ± 4%     211ms ± 0%    ~     (p=0.059 n=40+31)
    JSONDecode-4                865ms ± 5%     864ms ± 4%  -0.06%  (p=0.003 n=40+40)
    Mandelbrot200-4            52.0ms ± 3%    52.1ms ± 3%    ~     (p=0.556 n=40+40)
    GoParse-4                  43.1ms ± 8%    42.1ms ± 0%    ~     (p=0.083 n=40+33)
    RegexpMatchEasy0_32-4      1.02µs ± 3%    1.02µs ± 4%  +0.06%  (p=0.020 n=40+40)
    RegexpMatchEasy0_1K-4      3.90µs ± 0%    3.96µs ± 3%  +1.58%  (p=0.000 n=31+40)
    RegexpMatchEasy1_32-4       967ns ± 4%     981ns ± 3%  +1.40%  (p=0.000 n=40+40)
    RegexpMatchEasy1_1K-4      6.41µs ± 4%    6.43µs ± 3%    ~     (p=0.386 n=40+40)
    RegexpMatchMedium_32-4     1.76µs ± 3%    1.78µs ± 3%  +1.08%  (p=0.000 n=40+40)
    RegexpMatchMedium_1K-4      561µs ± 0%     562µs ± 0%  +0.09%  (p=0.003 n=34+31)
    RegexpMatchHard_32-4       31.5µs ± 2%    31.1µs ± 4%  -1.17%  (p=0.000 n=30+40)
    RegexpMatchHard_1K-4        960µs ± 3%     950µs ± 4%  -1.02%  (p=0.016 n=40+40)
    Revcomp-4                   7.79s ± 7%     7.79s ± 4%    ~     (p=0.859 n=40+40)
    Template-4                  889ms ± 6%     872ms ± 3%  -1.86%  (p=0.025 n=40+31)
    TimeParse-4                4.80µs ± 0%    4.89µs ± 3%  +1.71%  (p=0.001 n=31+40)
    TimeFormat-4               4.70µs ± 1%    4.78µs ± 3%  +1.57%  (p=0.000 n=33+40)
    [Geo mean]                  710µs          709µs       -0.13%
    
    name                     old speed      new speed      delta
    GobDecode-4              7.96MB/s ± 4%  7.96MB/s ± 9%    ~     (p=0.174 n=40+40)
    GobEncode-4              9.65MB/s ± 0%  9.51MB/s ± 4%  -1.45%  (p=0.000 n=34+40)
    Gzip-4                   4.29MB/s ± 4%  4.26MB/s ± 4%  -0.59%  (p=0.000 n=40+40)
    Gunzip-4                 43.0MB/s ± 3%  43.9MB/s ± 0%  +1.90%  (p=0.000 n=40+32)
    JSONEncode-4             9.09MB/s ± 4%  9.22MB/s ± 0%    ~     (p=0.429 n=40+31)
    JSONDecode-4             2.25MB/s ± 5%  2.25MB/s ± 4%    ~     (p=0.278 n=40+40)
    GoParse-4                1.35MB/s ± 7%  1.37MB/s ± 0%    ~     (p=0.071 n=40+25)
    RegexpMatchEasy0_32-4    31.5MB/s ± 3%  31.5MB/s ± 4%  -0.08%  (p=0.018 n=40+40)
    RegexpMatchEasy0_1K-4     263MB/s ± 0%   259MB/s ± 3%  -1.51%  (p=0.000 n=31+40)
    RegexpMatchEasy1_32-4    33.1MB/s ± 4%  32.6MB/s ± 3%  -1.38%  (p=0.000 n=40+40)
    RegexpMatchEasy1_1K-4     160MB/s ± 4%   159MB/s ± 3%    ~     (p=0.364 n=40+40)
    RegexpMatchMedium_32-4    565kB/s ± 3%   562kB/s ± 2%    ~     (p=0.208 n=40+40)
    RegexpMatchMedium_1K-4   1.82MB/s ± 0%  1.82MB/s ± 0%  -0.27%  (p=0.000 n=34+31)
    RegexpMatchHard_32-4     1.02MB/s ± 3%  1.03MB/s ± 4%  +1.04%  (p=0.000 n=32+40)
    RegexpMatchHard_1K-4     1.07MB/s ± 4%  1.08MB/s ± 4%  +0.94%  (p=0.003 n=40+40)
    Revcomp-4                32.6MB/s ± 7%  32.6MB/s ± 4%    ~     (p=0.965 n=40+40)
    Template-4               2.18MB/s ± 6%  2.22MB/s ± 3%  +1.83%  (p=0.020 n=40+31)
    [Geo mean]               7.77MB/s       7.78MB/s       +0.16%
    
    3. There is little change in the compilecmp benchmark (excluding noise).
    name        old time/op       new time/op       delta
    Template          2.37s ± 3%        2.35s ± 4%    ~     (p=0.529 n=10+10)
    Unicode           1.38s ± 8%        1.36s ± 5%    ~     (p=0.247 n=10+10)
    GoTypes           8.10s ± 2%        8.10s ± 2%    ~     (p=0.971 n=10+10)
    Compiler          40.5s ± 4%        40.8s ± 1%    ~     (p=0.529 n=10+10)
    SSA                115s ± 2%         115s ± 3%    ~     (p=0.684 n=10+10)
    Flate             1.45s ± 5%        1.46s ± 3%    ~     (p=0.796 n=10+10)
    GoParser          1.86s ± 4%        1.84s ± 2%    ~     (p=0.095 n=9+10)
    Reflect           5.11s ± 2%        5.13s ± 2%    ~     (p=0.315 n=10+10)
    Tar               2.22s ± 3%        2.23s ± 1%    ~     (p=0.299 n=9+7)
    XML               2.72s ± 3%        2.72s ± 3%    ~     (p=0.912 n=10+10)
    [Geo mean]        5.03s             5.02s       -0.21%
    
    name        old user-time/op  new user-time/op  delta
    Template          2.92s ± 2%        2.89s ± 1%    ~     (p=0.247 n=10+10)
    Unicode           1.71s ± 5%        1.69s ± 4%    ~     (p=0.393 n=10+10)
    GoTypes           9.78s ± 2%        9.76s ± 2%    ~     (p=0.631 n=10+10)
    Compiler          49.1s ± 2%        49.1s ± 1%    ~     (p=0.796 n=10+10)
    SSA                144s ± 1%         144s ± 2%    ~     (p=0.796 n=10+10)
    Flate             1.74s ± 2%        1.73s ± 3%    ~     (p=0.842 n=10+9)
    GoParser          2.23s ± 3%        2.25s ± 2%    ~     (p=0.143 n=10+10)
    Reflect           5.93s ± 3%        5.98s ± 2%    ~     (p=0.211 n=10+9)
    Tar               2.65s ± 2%        2.69s ± 3%  +1.51%  (p=0.010 n=9+10)
    XML               3.25s ± 2%        3.21s ± 1%  -1.24%  (p=0.035 n=10+9)
    [Geo mean]        6.07s             6.07s       -0.08%
    
    name        old text-bytes    new text-bytes    delta
    HelloSize         641kB ± 0%        641kB ± 0%    ~     (all equal)
    
    name        old data-bytes    new data-bytes    delta
    HelloSize        9.46kB ± 0%       9.46kB ± 0%    ~     (all equal)
    
    name        old bss-bytes     new bss-bytes     delta
    HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
    
    name        old exe-bytes     new exe-bytes     delta
    HelloSize        1.24MB ± 0%       1.24MB ± 0%    ~     (all equal)
    
    Change-Id: Id095d998c380eef929755124084df02446a6b7c1
    Reviewed-on: https://go-review.googlesource.com/92555
    Run-TryBot: Cherry Zhang <cherryyz@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
    ebb77aa8
Name
Last commit
Last update
.github Loading commit data...
api Loading commit data...
doc Loading commit data...
lib/time Loading commit data...
misc Loading commit data...
src Loading commit data...
test Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
AUTHORS Loading commit data...
CONTRIBUTING.md Loading commit data...
CONTRIBUTORS Loading commit data...
LICENSE Loading commit data...
PATENTS Loading commit data...
README.md Loading commit data...
favicon.ico Loading commit data...
robots.txt Loading commit data...