• Chad Rosier's avatar
    cmd/compile/internal/ssa: combine consecutive BigEndian stores on arm64 · 39fefa07
    Chad Rosier authored
    This optimization mirrors that which is already implemented for AMD64.  The
    optimization specifically targets the binary.BigEndian.PutUint* functions.
    
    encoding-binary results on Amberwing:
    name                   old time/op    new time/op    delta
    ReadSlice1000Int32s      9.83µs ± 2%    9.78µs ± 1%     ~     (p=0.362 n=9+10)
    ReadStruct               5.24µs ± 3%    5.19µs ± 2%     ~     (p=0.285 n=10+10)
    ReadInts                 8.35µs ± 8%    8.44µs ± 3%     ~     (p=0.323 n=10+10)
    WriteInts                3.38µs ± 3%    3.44µs ±15%     ~     (p=0.921 n=9+10)
    WriteSlice1000Int32s     11.4µs ± 6%    10.2µs ± 4%   -9.94%  (p=0.000 n=10+10)
    PutUint16                 510ns ±12%     500ns ± 0%     ~     (p=0.586 n=10+7)
    PutUint32                 530ns ±15%     490ns ±12%     ~     (p=0.086 n=10+10)
    PutUint64                 550ns ± 0%     470ns ± 6%  -14.52%  (p=0.000 n=7+10)
    LittleEndianPutUint16     500ns ± 0%     475ns ±16%     ~     (p=0.120 n=7+10)
    LittleEndianPutUint32     450ns ± 0%     517ns ±16%  +14.81%  (p=0.004 n=8+9)
    LittleEndianPutUint64     550ns ± 0%     485ns ±13%  -11.82%  (p=0.000 n=8+10)
    PutUvarint32              685ns ±12%     622ns ± 4%   -9.17%  (p=0.005 n=10+9)
    PutUvarint64              735ns ± 9%     711ns ± 9%     ~     (p=0.272 n=10+9)
    [Geo mean]               1.47µs         1.42µs        -3.87%
    
    name                   old speed      new speed      delta
    ReadSlice1000Int32s     407MB/s ± 2%   409MB/s ± 1%     ~     (p=0.362 n=9+10)
    ReadStruct             14.3MB/s ± 3%  14.4MB/s ± 2%     ~     (p=0.250 n=10+10)
    ReadInts               3.59MB/s ± 7%  3.56MB/s ± 4%     ~     (p=0.340 n=10+10)
    WriteInts              8.87MB/s ± 3%  8.74MB/s ±13%     ~     (p=0.890 n=9+10)
    WriteSlice1000Int32s    352MB/s ± 6%   391MB/s ± 4%  +11.03%  (p=0.000 n=10+10)
    PutUint16              3.95MB/s ±13%  4.00MB/s ± 0%     ~     (p=0.312 n=10+7)
    PutUint32              7.62MB/s ±17%  8.21MB/s ±11%     ~     (p=0.086 n=10+10)
    PutUint64              14.6MB/s ± 0%  17.1MB/s ± 6%  +17.28%  (p=0.000 n=7+10)
    LittleEndianPutUint16  4.00MB/s ± 0%  4.23MB/s ±18%     ~     (p=0.176 n=7+10)
    LittleEndianPutUint32  8.89MB/s ± 0%  7.64MB/s ±20%  -14.05%  (p=0.001 n=8+10)
    LittleEndianPutUint64  14.6MB/s ± 0%  16.6MB/s ±12%  +13.86%  (p=0.000 n=8+10)
    PutUvarint32           5.86MB/s ±14%  6.44MB/s ± 5%   +9.84%  (p=0.006 n=10+9)
    PutUvarint64           10.9MB/s ± 8%  11.3MB/s ± 9%     ~     (p=0.373 n=10+9)
    [Geo mean]             14.2MB/s       14.8MB/s        +3.93%
    
    go1 results on Amberwing:
    RegexpMatchEasy0_32       254ns ± 0%     254ns ± 0%    ~     (all equal)
    RegexpMatchEasy0_1K       547ns ± 0%     547ns ± 0%    ~     (all equal)
    RegexpMatchEasy1_32       252ns ± 0%     253ns ± 1%    ~     (p=0.294 n=8+10)
    RegexpMatchEasy1_1K       782ns ± 0%     783ns ± 1%    ~     (p=0.529 n=8+9)
    RegexpMatchMedium_32      316ns ± 0%     316ns ± 0%    ~     (all equal)
    RegexpMatchMedium_1K     51.5µs ± 0%    51.5µs ± 0%    ~     (p=0.645 n=10+9)
    RegexpMatchHard_32       2.75µs ± 0%    2.75µs ± 0%    ~     (all equal)
    RegexpMatchHard_1K       78.7µs ± 0%    78.7µs ± 0%    ~     (p=0.754 n=10+10)
    FmtFprintfEmpty          57.0ns ± 0%    57.0ns ± 0%    ~     (all equal)
    FmtFprintfString          111ns ± 0%     111ns ± 0%    ~     (all equal)
    FmtFprintfInt             114ns ± 0%     114ns ± 1%    ~     (p=0.065 n=9+10)
    FmtFprintfIntInt          182ns ± 0%     178ns ± 0%  -2.20%  (p=0.000 n=10+10)
    FmtFprintfPrefixedInt     225ns ± 0%     227ns ± 0%  +0.89%  (p=0.000 n=10+10)
    FmtFprintfFloat           307ns ± 0%     307ns ± 0%    ~     (p=1.000 n=9+9)
    FmtManyArgs               697ns ± 0%     701ns ± 2%    ~     (p=0.108 n=9+10)
    Gzip                      436ms ± 0%     437ms ± 0%  +0.23%  (p=0.000 n=10+8)
    HTTPClientServer         88.8µs ± 2%    89.6µs ± 1%  +0.98%  (p=0.019 n=10+10)
    JSONEncode               20.1ms ± 1%    20.2ms ± 1%  +0.48%  (p=0.007 n=10+10)
    JSONDecode               94.7ms ± 1%    94.1ms ± 0%  -0.62%  (p=0.000 n=10+9)
    GobDecode                12.6ms ± 2%    12.6ms ± 1%    ~     (p=0.360 n=10+8)
    GobEncode                12.0ms ± 1%    11.9ms ± 1%  -1.34%  (p=0.000 n=10+10)
    Mandelbrot200            5.05ms ± 0%    5.05ms ± 0%  +0.12%  (p=0.000 n=10+10)
    TimeParse                 448ns ± 0%     448ns ± 0%    ~     (p=0.529 n=8+9)
    TimeFormat                501ns ± 1%     501ns ± 1%    ~     (p=1.000 n=10+9)
    Template                 90.6ms ± 0%    89.1ms ± 0%  -1.67%  (p=0.000 n=9+9)
    GoParse                  6.01ms ± 0%    5.96ms ± 0%  -0.83%  (p=0.000 n=10+9)
    BinaryTree17              11.7s ± 0%     11.7s ± 0%    ~     (p=0.481 n=10+10)
    Revcomp                   675ms ± 0%     675ms ± 0%    ~     (p=0.436 n=9+9)
    Fannkuch11                3.26s ± 0%     3.27s ± 1%  +0.57%  (p=0.000 n=10+10)
    [Geo mean]               67.4µs         67.3µs       -0.10%
    
    name                   old speed      new speed      delta
    RegexpMatchEasy0_32     126MB/s ± 0%   126MB/s ± 0%    ~     (p=0.353 n=10+7)
    RegexpMatchEasy0_1K    1.87GB/s ± 0%  1.87GB/s ± 0%    ~     (p=0.275 n=8+10)
    RegexpMatchEasy1_32     127MB/s ± 0%   126MB/s ± 1%    ~     (p=0.110 n=8+10)
    RegexpMatchEasy1_1K    1.31GB/s ± 0%  1.31GB/s ± 1%    ~     (p=0.079 n=8+10)
    RegexpMatchMedium_32   3.16MB/s ± 0%  3.16MB/s ± 0%    ~     (all equal)
    RegexpMatchMedium_1K   19.9MB/s ± 0%  19.9MB/s ± 0%    ~     (p=0.889 n=10+9)
    RegexpMatchHard_32     11.7MB/s ± 0%  11.7MB/s ± 0%    ~     (all equal)
    RegexpMatchHard_1K     13.0MB/s ± 0%  13.0MB/s ± 0%    ~     (p=1.000 n=10+10)
    Gzip                   44.5MB/s ± 0%  44.4MB/s ± 0%  -0.22%  (p=0.000 n=10+8)
    JSONEncode             96.6MB/s ± 1%  96.1MB/s ± 1%  -0.48%  (p=0.007 n=10+10)
    JSONDecode             20.5MB/s ± 1%  20.6MB/s ± 0%  +0.63%  (p=0.000 n=10+9)
    GobDecode              61.0MB/s ± 2%  61.1MB/s ± 1%    ~     (p=0.372 n=10+8)
    GobEncode              63.8MB/s ± 1%  64.7MB/s ± 1%  +1.36%  (p=0.000 n=10+10)
    Template               21.4MB/s ± 0%  21.8MB/s ± 0%  +1.69%  (p=0.000 n=9+9)
    GoParse                9.63MB/s ± 0%  9.71MB/s ± 0%  +0.84%  (p=0.000 n=9+8)
    Revcomp                 377MB/s ± 0%   376MB/s ± 0%    ~     (p=0.399 n=9+9)
    [Geo mean]             56.2MB/s       56.3MB/s       +0.20%
    
    Change-Id: Ic915373f5ef512f9fbc45745860e5db7f6de6286
    Reviewed-on: https://go-review.googlesource.com/97755
    Run-TryBot: Cherry Zhang <cherryyz@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
    39fefa07
Name
Last commit
Last update
.github Loading commit data...
api Loading commit data...
doc Loading commit data...
lib/time Loading commit data...
misc Loading commit data...
src Loading commit data...
test Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
AUTHORS Loading commit data...
CONTRIBUTING.md Loading commit data...
CONTRIBUTORS Loading commit data...
LICENSE Loading commit data...
PATENTS Loading commit data...
README.md Loading commit data...
favicon.ico Loading commit data...
robots.txt Loading commit data...