• Ben Shi's avatar
    cmd/compile: optimize ARM code with NMULF/NMULD · 2899c3e8
    Ben Shi authored
    NMULF and NMULD are efficient FP instructions, and the go compiler can
    use them to generate better code.
    
    The benchmark tests of my patch did not show general change, but big
    improvement in special cases.
    
    1.A special test case improved 12.6%.
    https://github.com/benshi001/ugo1/blob/master/fpmul_test.go
    name                     old time/op    new time/op    delta
    FPMul-4                     398µs ± 1%     348µs ± 1%  -12.64%  (p=0.000 n=40+40)
    
    2. the compilecmp test showed little change.
    name        old time/op       new time/op       delta
    Template          2.30s ± 1%        2.31s ± 1%    ~     (p=0.754 n=17+19)
    Unicode           1.31s ± 3%        1.32s ± 5%    ~     (p=0.265 n=20+20)
    GoTypes           7.73s ± 2%        7.73s ± 1%    ~     (p=0.925 n=20+20)
    Compiler          37.0s ± 1%        37.3s ± 2%  +0.79%  (p=0.002 n=19+20)
    SSA               83.8s ± 4%        83.5s ± 2%    ~     (p=0.964 n=20+17)
    Flate             1.43s ± 2%        1.44s ± 1%    ~     (p=0.602 n=20+20)
    GoParser          1.82s ± 2%        1.81s ± 2%    ~     (p=0.141 n=19+20)
    Reflect           5.08s ± 2%        5.08s ± 3%    ~     (p=0.835 n=20+19)
    Tar               2.36s ± 1%        2.35s ± 1%    ~     (p=0.195 n=18+17)
    XML               2.57s ± 2%        2.56s ± 1%    ~     (p=0.283 n=20+17)
    [Geo mean]        4.74s             4.75s       +0.05%
    
    name        old user-time/op  new user-time/op  delta
    Template          2.75s ± 2%        2.75s ± 0%    ~     (p=0.620 n=20+15)
    Unicode           1.59s ± 4%        1.60s ± 4%    ~     (p=0.479 n=20+19)
    GoTypes           9.48s ± 1%        9.47s ± 1%    ~     (p=0.743 n=20+20)
    Compiler          45.7s ± 1%        45.7s ± 1%    ~     (p=0.482 n=19+20)
    SSA                109s ± 1%         109s ± 2%    ~     (p=0.800 n=18+20)
    Flate             1.67s ± 3%        1.67s ± 3%    ~     (p=0.598 n=19+18)
    GoParser          2.15s ± 4%        2.13s ± 3%    ~     (p=0.153 n=20+20)
    Reflect           5.95s ± 2%        5.95s ± 2%    ~     (p=0.961 n=19+20)
    Tar               2.93s ± 2%        2.92s ± 3%    ~     (p=0.242 n=20+19)
    XML               3.02s ± 3%        3.04s ± 3%    ~     (p=0.233 n=19+18)
    [Geo mean]        5.74s             5.74s       -0.04%
    
    name        old text-bytes    new text-bytes    delta
    HelloSize         588kB ± 0%        588kB ± 0%    ~     (all equal)
    
    name        old data-bytes    new data-bytes    delta
    HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)
    
    name        old bss-bytes     new bss-bytes     delta
    HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)
    
    name        old exe-bytes     new exe-bytes     delta
    HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)
    
    3. The go1 benchmark showed little change in total.
    name                     old time/op    new time/op    delta
    BinaryTree17-4              41.8s ± 1%     41.8s ± 1%    ~     (p=0.388 n=40+39)
    Fannkuch11-4                24.1s ± 1%     24.1s ± 1%    ~     (p=0.077 n=40+40)
    FmtFprintfEmpty-4           834ns ± 1%     831ns ± 1%  -0.31%  (p=0.002 n=40+37)
    FmtFprintfString-4         1.34µs ± 1%    1.34µs ± 0%    ~     (p=0.387 n=40+40)
    FmtFprintfInt-4            1.44µs ± 1%    1.44µs ± 1%    ~     (p=0.421 n=40+40)
    FmtFprintfIntInt-4         2.09µs ± 0%    2.09µs ± 1%    ~     (p=0.589 n=40+39)
    FmtFprintfPrefixedInt-4    2.32µs ± 1%    2.33µs ± 1%  +0.15%  (p=0.001 n=40+40)
    FmtFprintfFloat-4          4.51µs ± 0%    4.44µs ± 1%  -1.50%  (p=0.000 n=40+40)
    FmtManyArgs-4              7.94µs ± 0%    7.97µs ± 0%  +0.36%  (p=0.001 n=32+40)
    GobDecode-4                 104ms ± 1%     102ms ± 2%  -1.27%  (p=0.000 n=39+37)
    GobEncode-4                90.5ms ± 1%    90.9ms ± 2%  +0.40%  (p=0.006 n=37+40)
    Gzip-4                      4.10s ± 2%     4.08s ± 1%  -0.30%  (p=0.004 n=40+40)
    Gunzip-4                    603ms ± 0%     602ms ± 1%    ~     (p=0.303 n=37+40)
    HTTPClientServer-4          672µs ± 3%     658µs ± 2%  -2.08%  (p=0.000 n=39+37)
    JSONEncode-4                238ms ± 1%     239ms ± 0%  +0.26%  (p=0.001 n=40+25)
    JSONDecode-4                884ms ± 1%     885ms ± 1%  +0.16%  (p=0.012 n=40+40)
    Mandelbrot200-4            49.3ms ± 0%    49.3ms ± 0%    ~     (p=0.588 n=40+38)
    GoParse-4                  46.3ms ± 1%    46.4ms ± 2%    ~     (p=0.487 n=40+40)
    RegexpMatchEasy0_32-4      1.28µs ± 1%    1.28µs ± 0%  +0.12%  (p=0.003 n=40+40)
    RegexpMatchEasy0_1K-4      7.78µs ± 5%    7.78µs ± 4%    ~     (p=0.825 n=40+40)
    RegexpMatchEasy1_32-4      1.29µs ± 1%    1.29µs ± 0%    ~     (p=0.659 n=40+40)
    RegexpMatchEasy1_1K-4      10.3µs ± 3%    10.4µs ± 2%    ~     (p=0.266 n=40+40)
    RegexpMatchMedium_32-4     2.05µs ± 1%    2.05µs ± 0%  -0.18%  (p=0.002 n=40+28)
    RegexpMatchMedium_1K-4      533µs ± 1%     534µs ± 1%    ~     (p=0.397 n=37+40)
    RegexpMatchHard_32-4       28.9µs ± 1%    28.9µs ± 1%  -0.22%  (p=0.002 n=40+40)
    RegexpMatchHard_1K-4        868µs ± 1%     870µs ± 1%  +0.21%  (p=0.015 n=40+40)
    Revcomp-4                  67.3ms ± 1%    67.2ms ± 2%    ~     (p=0.262 n=38+39)
    Template-4                  1.07s ± 1%     1.07s ± 1%    ~     (p=0.276 n=40+40)
    TimeParse-4                7.16µs ± 1%    7.16µs ± 1%    ~     (p=0.610 n=39+40)
    TimeFormat-4               13.3µs ± 1%    13.3µs ± 1%    ~     (p=0.617 n=38+40)
    [Geo mean]                  720µs          719µs       -0.13%
    
    name                     old speed      new speed      delta
    GobDecode-4              7.39MB/s ± 1%  7.49MB/s ± 2%  +1.25%  (p=0.000 n=39+38)
    GobEncode-4              8.48MB/s ± 1%  8.45MB/s ± 2%  -0.40%  (p=0.005 n=37+40)
    Gzip-4                   4.74MB/s ± 2%  4.75MB/s ± 1%  +0.30%  (p=0.018 n=40+40)
    Gunzip-4                 32.2MB/s ± 0%  32.2MB/s ± 1%    ~     (p=0.272 n=36+40)
    JSONEncode-4             8.15MB/s ± 1%  8.13MB/s ± 0%  -0.26%  (p=0.003 n=40+25)
    JSONDecode-4             2.19MB/s ± 1%  2.19MB/s ± 1%    ~     (p=0.676 n=40+40)
    GoParse-4                1.25MB/s ± 2%  1.25MB/s ± 2%    ~     (p=0.823 n=40+40)
    RegexpMatchEasy0_32-4    25.1MB/s ± 1%  25.1MB/s ± 0%  -0.12%  (p=0.006 n=40+40)
    RegexpMatchEasy0_1K-4     132MB/s ± 5%   132MB/s ± 5%    ~     (p=0.821 n=40+40)
    RegexpMatchEasy1_32-4    24.7MB/s ± 1%  24.7MB/s ± 0%    ~     (p=0.630 n=40+40)
    RegexpMatchEasy1_1K-4    99.1MB/s ± 3%  98.8MB/s ± 2%    ~     (p=0.268 n=40+40)
    RegexpMatchMedium_32-4    487kB/s ± 2%   490kB/s ± 0%  +0.51%  (p=0.001 n=40+40)
    RegexpMatchMedium_1K-4   1.92MB/s ± 1%  1.92MB/s ± 1%    ~     (p=0.208 n=39+40)
    RegexpMatchHard_32-4     1.11MB/s ± 1%  1.11MB/s ± 0%  +0.36%  (p=0.000 n=40+33)
    RegexpMatchHard_1K-4     1.18MB/s ± 1%  1.18MB/s ± 1%    ~     (p=0.207 n=40+37)
    Revcomp-4                37.8MB/s ± 1%  37.8MB/s ± 2%    ~     (p=0.276 n=38+39)
    Template-4               1.82MB/s ± 1%  1.81MB/s ± 1%    ~     (p=0.122 n=38+40)
    [Geo mean]               6.81MB/s       6.81MB/s       +0.06%
    
    fixes #19843
    
    Change-Id: Ief3a0c2b15f59d40c7b40f2784eeb71196685b59
    Reviewed-on: https://go-review.googlesource.com/61150Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
    Run-TryBot: Cherry Zhang <cherryyz@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    2899c3e8
Name
Last commit
Last update
.github Loading commit data...
api Loading commit data...
doc Loading commit data...
lib/time Loading commit data...
misc Loading commit data...
src Loading commit data...
test Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
AUTHORS Loading commit data...
CONTRIBUTING.md Loading commit data...
CONTRIBUTORS Loading commit data...
LICENSE Loading commit data...
PATENTS Loading commit data...
README.md Loading commit data...
favicon.ico Loading commit data...
robots.txt Loading commit data...