• Ben Shi's avatar
    cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU · a2f22a68
    Ben Shi authored
    Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current
    ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to
    generate further optimized ARM code.
    
    My patch implements this optimization. Here are some contrast test
    results against the original go compiler.
    
    1. The total size of all .a files in pkg/ shrinks by 0.03%.
    
    2. The compilecmp benchmark shows a little decline.
    name        old time/op       new time/op       delta
    Template          2.35s ± 1%        2.37s ± 3%  +0.94%  (p=0.006 n=19+19)
    Unicode           1.33s ± 3%        1.33s ± 2%    ~     (p=0.158 n=20+18)
    GoTypes           7.86s ± 2%        7.84s ± 1%    ~     (p=0.284 n=19+18)
    Compiler          37.5s ± 1%        37.7s ± 2%    ~     (p=0.101 n=20+19)
    SSA               83.4s ± 2%        83.6s ± 2%    ~     (p=0.231 n=20+20)
    Flate             1.46s ± 2%        1.45s ± 1%    ~     (p=0.097 n=20+17)
    GoParser          1.86s ± 2%        1.86s ± 4%    ~     (p=0.738 n=20+20)
    Reflect           5.10s ± 1%        5.11s ± 1%    ~     (p=0.290 n=20+18)
    Tar               1.78s ± 2%        1.77s ± 2%    ~     (p=0.166 n=19+20)
    XML               2.61s ± 2%        2.61s ± 2%    ~     (p=0.665 n=19+19)
    [Geo mean]        4.67s             4.68s       +0.16%
    
    name        old user-time/op  new user-time/op  delta
    Template          2.79s ± 3%        2.80s ± 2%    ~     (p=0.662 n=20+20)
    Unicode           1.62s ± 3%        1.64s ± 4%    ~     (p=0.252 n=20+20)
    GoTypes           9.58s ± 2%        9.62s ± 2%    ~     (p=0.250 n=20+20)
    Compiler          46.2s ± 1%        46.2s ± 1%    ~     (p=0.602 n=20+19)
    SSA                108s ± 1%         108s ± 2%    ~     (p=0.242 n=18+20)
    Flate             1.69s ± 3%        1.69s ± 4%    ~     (p=0.470 n=20+20)
    GoParser          2.16s ± 3%        2.20s ± 4%  +1.70%  (p=0.005 n=19+20)
    Reflect           6.02s ± 2%        6.02s ± 2%    ~     (p=0.700 n=20+17)
    Tar               2.11s ± 2%        2.11s ± 3%    ~     (p=0.480 n=18+20)
    XML               3.07s ± 2%        3.11s ± 4%  +1.50%  (p=0.043 n=20+20)
    [Geo mean]        5.61s             5.64s       +0.55%
    
    name        old text-bytes    new text-bytes    delta
    HelloSize         586kB ± 0%        586kB ± 0%    ~     (all equal)
    
    name        old data-bytes    new data-bytes    delta
    HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)
    
    name        old bss-bytes     new bss-bytes     delta
    HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)
    
    name        old exe-bytes     new exe-bytes     delta
    HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)
    
    3. The go1 benchmark shows improvement totally, and even more than 10%
    improvement in the test case Revcomp. 
    name                     old time/op    new time/op    delta
    BinaryTree17-4              42.0s ± 1%     41.5s ± 1%   -1.32%  (p=0.000 n=39+40)
    Fannkuch11-4                24.1s ± 1%     23.6s ± 0%   -2.38%  (p=0.000 n=40+40)
    FmtFprintfEmpty-4           843ns ± 0%     839ns ± 1%   -0.46%  (p=0.000 n=33+40)
    FmtFprintfString-4         1.44µs ± 1%    1.37µs ± 1%   -5.48%  (p=0.000 n=40+35)
    FmtFprintfInt-4            1.44µs ± 1%    1.41µs ± 2%   -1.50%  (p=0.000 n=40+40)
    FmtFprintfIntInt-4         2.07µs ± 1%    2.06µs ± 0%   -0.78%  (p=0.000 n=40+40)
    FmtFprintfPrefixedInt-4    2.50µs ± 1%    2.33µs ± 1%   -6.85%  (p=0.000 n=40+40)
    FmtFprintfFloat-4          4.36µs ± 1%    4.34µs ± 0%   -0.39%  (p=0.017 n=40+40)
    FmtManyArgs-4              8.11µs ± 0%    8.00µs ± 0%   -1.37%  (p=0.000 n=40+40)
    GobDecode-4                 105ms ± 2%     103ms ± 2%   -2.17%  (p=0.000 n=39+39)
    GobEncode-4                90.1ms ± 2%    88.6ms ± 1%   -1.67%  (p=0.000 n=40+39)
    Gzip-4                      4.18s ± 1%     4.09s ± 1%   -2.03%  (p=0.000 n=40+40)
    Gunzip-4                    608ms ± 1%     603ms ± 1%   -0.86%  (p=0.000 n=40+34)
    HTTPClientServer-4          674µs ± 3%     661µs ± 2%   -1.82%  (p=0.000 n=40+39)
    JSONEncode-4                256ms ± 1%     243ms ± 0%   -5.11%  (p=0.000 n=39+31)
    JSONDecode-4                915ms ± 1%     904ms ± 1%   -1.18%  (p=0.000 n=40+36)
    Mandelbrot200-4            49.2ms ± 0%    49.3ms ± 0%     ~     (p=0.254 n=34+40)
    GoParse-4                  46.9ms ± 2%    46.9ms ± 1%     ~     (p=0.737 n=40+39)
    RegexpMatchEasy0_32-4      1.28µs ± 1%    1.27µs ± 1%   -0.71%  (p=0.000 n=40+40)
    RegexpMatchEasy0_1K-4      7.86µs ± 4%    7.67µs ± 4%   -2.46%  (p=0.000 n=38+40)
    RegexpMatchEasy1_32-4      1.28µs ± 1%    1.28µs ± 1%   -0.54%  (p=0.000 n=40+40)
    RegexpMatchEasy1_1K-4      10.4µs ± 2%    10.3µs ± 2%   -0.88%  (p=0.003 n=40+39)
    RegexpMatchMedium_32-4     2.05µs ± 0%    2.04µs ± 0%   -0.34%  (p=0.000 n=40+33)
    RegexpMatchMedium_1K-4      541µs ± 1%     535µs ± 1%   -1.02%  (p=0.000 n=40+38)
    RegexpMatchHard_32-4       29.3µs ± 1%    29.1µs ± 1%   -0.51%  (p=0.000 n=40+40)
    RegexpMatchHard_1K-4        881µs ± 1%     871µs ± 1%   -1.15%  (p=0.000 n=40+40)
    Revcomp-4                  81.7ms ± 2%    67.5ms ± 2%  -17.37%  (p=0.000 n=39+39)
    Template-4                  1.05s ± 1%     1.08s ± 2%   +3.67%  (p=0.000 n=40+40)
    TimeParse-4                7.24µs ± 1%    7.09µs ± 1%   -2.13%  (p=0.000 n=40+40)
    TimeFormat-4               13.2µs ± 1%    13.1µs ± 0%   -0.31%  (p=0.007 n=40+31)
    [Geo mean]                  733µs          718µs        -2.03%
    
    name                     old speed      new speed      delta
    GobDecode-4              7.28MB/s ± 2%  7.44MB/s ± 2%   +2.23%  (p=0.000 n=39+39)
    GobEncode-4              8.52MB/s ± 2%  8.67MB/s ± 1%   +1.70%  (p=0.000 n=40+39)
    Gzip-4                   4.65MB/s ± 1%  4.74MB/s ± 1%   +1.94%  (p=0.000 n=37+40)
    Gunzip-4                 31.9MB/s ± 1%  32.2MB/s ± 1%   +0.90%  (p=0.000 n=40+36)
    JSONEncode-4             7.57MB/s ± 1%  7.98MB/s ± 0%   +5.41%  (p=0.000 n=40+31)
    JSONDecode-4             2.12MB/s ± 1%  2.15MB/s ± 1%   +1.23%  (p=0.000 n=40+40)
    GoParse-4                1.23MB/s ± 1%  1.23MB/s ± 1%     ~     (p=0.769 n=39+40)
    RegexpMatchEasy0_32-4    25.0MB/s ± 1%  25.2MB/s ± 1%   +0.71%  (p=0.000 n=40+40)
    RegexpMatchEasy0_1K-4     130MB/s ± 5%   134MB/s ± 4%   +2.53%  (p=0.000 n=38+40)
    RegexpMatchEasy1_32-4    24.9MB/s ± 1%  25.1MB/s ± 1%   +0.55%  (p=0.000 n=40+40)
    RegexpMatchEasy1_1K-4    98.5MB/s ± 2%  99.4MB/s ± 2%   +0.88%  (p=0.003 n=40+39)
    RegexpMatchMedium_32-4    490kB/s ± 0%   490kB/s ± 0%     ~     (all equal)
    RegexpMatchMedium_1K-4   1.89MB/s ± 1%  1.91MB/s ± 1%   +1.02%  (p=0.000 n=40+38)
    RegexpMatchHard_32-4     1.10MB/s ± 1%  1.10MB/s ± 0%   +0.41%  (p=0.000 n=40+33)
    RegexpMatchHard_1K-4     1.16MB/s ± 1%  1.17MB/s ± 1%   +1.21%  (p=0.000 n=40+40)
    Revcomp-4                31.1MB/s ± 2%  37.6MB/s ± 2%  +21.03%  (p=0.000 n=39+39)
    Template-4               1.86MB/s ± 1%  1.79MB/s ± 1%   -3.51%  (p=0.000 n=40+38)
    [Geo mean]               6.66MB/s       6.80MB/s        +2.13%
    
    fixes #21492
    
    Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d
    Reviewed-on: https://go-review.googlesource.com/58450Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
    Run-TryBot: Cherry Zhang <cherryyz@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    a2f22a68
Name
Last commit
Last update
..
archive Loading commit data...
bufio Loading commit data...
builtin Loading commit data...
bytes Loading commit data...
cmd Loading commit data...
compress Loading commit data...
container Loading commit data...
context Loading commit data...
crypto Loading commit data...
database/sql Loading commit data...
debug Loading commit data...
encoding Loading commit data...
errors Loading commit data...
expvar Loading commit data...
flag Loading commit data...
fmt Loading commit data...
go Loading commit data...
hash Loading commit data...
html Loading commit data...
image Loading commit data...
index/suffixarray Loading commit data...
internal Loading commit data...
io Loading commit data...
log Loading commit data...
math Loading commit data...
mime Loading commit data...
net Loading commit data...
os Loading commit data...
path Loading commit data...
plugin Loading commit data...
reflect Loading commit data...
regexp Loading commit data...
runtime Loading commit data...
sort Loading commit data...
strconv Loading commit data...
strings Loading commit data...
sync Loading commit data...
syscall Loading commit data...
testing Loading commit data...
text Loading commit data...
time Loading commit data...
unicode Loading commit data...
unsafe Loading commit data...
vendor/golang_org/x Loading commit data...
Make.dist Loading commit data...
all.bash Loading commit data...
all.bat Loading commit data...
all.rc Loading commit data...
androidtest.bash Loading commit data...
bootstrap.bash Loading commit data...
buildall.bash Loading commit data...
clean.bash Loading commit data...
clean.bat Loading commit data...
clean.rc Loading commit data...
cmp.bash Loading commit data...
iostest.bash Loading commit data...
make.bash Loading commit data...
make.bat Loading commit data...
make.rc Loading commit data...
naclmake.bash Loading commit data...
nacltest.bash Loading commit data...
race.bash Loading commit data...
race.bat Loading commit data...
run.bash Loading commit data...
run.bat Loading commit data...
run.rc Loading commit data...