• Wei Xiao's avatar
    bytes: add asm version of Index for short strings on arm64 · 562346b7
    Wei Xiao authored
    Currently we have special case for 1-byte strings,
    this extends it to strings shorter than 9 bytes on arm64.
    
    Benchmark results:
    name                              old time/op    new time/op    delta
    IndexByte/10-32                     18.6ns ± 0%    18.1ns ± 0%    -2.69%  (p=0.008 n=5+5)
    IndexByte/32-32                     16.8ns ± 1%    16.9ns ± 1%      ~     (p=0.762 n=5+5)
    IndexByte/4K-32                      464ns ± 0%     464ns ± 0%      ~     (all equal)
    IndexByte/4M-32                      528µs ± 1%     506µs ± 1%    -4.17%  (p=0.008 n=5+5)
    IndexByte/64M-32                    18.7ms ± 0%    18.7ms ± 1%      ~     (p=0.730 n=4+5)
    IndexBytePortable/10-32             33.8ns ± 0%    34.9ns ± 3%      ~     (p=0.167 n=5+5)
    IndexBytePortable/32-32             65.3ns ± 0%    66.1ns ± 2%      ~     (p=0.444 n=5+5)
    IndexBytePortable/4K-32             5.88µs ± 0%    5.88µs ± 0%      ~     (p=0.325 n=5+5)
    IndexBytePortable/4M-32             6.03ms ± 0%    6.03ms ± 0%      ~     (p=1.000 n=5+5)
    IndexBytePortable/64M-32            98.8ms ± 0%    98.9ms ± 0%    +0.10%  (p=0.008 n=5+5)
    IndexRune/10-32                     57.7ns ± 0%    49.2ns ± 0%   -14.73%  (p=0.000 n=5+4)
    IndexRune/32-32                     57.7ns ± 0%    58.6ns ± 0%    +1.56%  (p=0.008 n=5+5)
    IndexRune/4K-32                      511ns ± 0%     513ns ± 0%    +0.39%  (p=0.008 n=5+5)
    IndexRune/4M-32                      527µs ± 1%     527µs ± 1%      ~     (p=0.690 n=5+5)
    IndexRune/64M-32                    18.7ms ± 0%    18.7ms ± 1%      ~     (p=0.190 n=4+5)
    IndexRuneASCII/10-32                23.8ns ± 0%    23.8ns ± 0%      ~     (all equal)
    IndexRuneASCII/32-32                24.3ns ± 0%    24.3ns ± 0%      ~     (all equal)
    IndexRuneASCII/4K-32                 468ns ± 0%     468ns ± 0%      ~     (all equal)
    IndexRuneASCII/4M-32                 521µs ± 1%     531µs ± 2%    +1.91%  (p=0.016 n=5+5)
    IndexRuneASCII/64M-32               18.6ms ± 1%    18.5ms ± 0%      ~     (p=0.730 n=5+4)
    Index/10-32                         89.1ns ±13%    25.2ns ± 0%   -71.72%  (p=0.008 n=5+5)
    Index/32-32                          225ns ± 2%     226ns ± 3%      ~     (p=0.683 n=5+5)
    Index/4K-32                         11.9µs ± 0%    11.8µs ± 0%    -0.22%  (p=0.008 n=5+5)
    Index/4M-32                         12.1ms ± 0%    12.1ms ± 0%      ~     (p=0.548 n=5+5)
    Index/64M-32                         197ms ± 0%     197ms ± 0%      ~     (p=0.690 n=5+5)
    IndexEasy/10-32                     46.2ns ± 0%    22.1ns ± 8%   -52.16%  (p=0.008 n=5+5)
    IndexEasy/32-32                     46.2ns ± 0%    47.2ns ± 0%    +2.16%  (p=0.008 n=5+5)
    IndexEasy/4K-32                      499ns ± 0%     502ns ± 0%    +0.44%  (p=0.008 n=5+5)
    IndexEasy/4M-32                      529µs ± 2%     529µs ± 1%      ~     (p=0.841 n=5+5)
    IndexEasy/64M-32                    18.6ms ± 1%    18.7ms ± 1%      ~     (p=0.222 n=5+5)
    IndexAnyASCII/1:1-32                15.7ns ± 0%    15.7ns ± 0%      ~     (all equal)
    IndexAnyASCII/1:2-32                17.2ns ± 0%    17.2ns ± 0%      ~     (all equal)
    IndexAnyASCII/1:4-32                20.0ns ± 0%    20.0ns ± 0%      ~     (all equal)
    IndexAnyASCII/1:8-32                34.8ns ± 0%    34.8ns ± 0%      ~     (all equal)
    IndexAnyASCII/1:16-32               48.1ns ± 0%    48.1ns ± 0%      ~     (all equal)
    IndexAnyASCII/16:1-32               97.9ns ± 1%    97.7ns ± 0%      ~     (p=0.857 n=5+5)
    IndexAnyASCII/16:2-32                102ns ± 0%     102ns ± 0%      ~     (all equal)
    IndexAnyASCII/16:4-32                116ns ± 1%     116ns ± 1%      ~     (p=1.000 n=5+5)
    IndexAnyASCII/16:8-32                141ns ± 1%     141ns ± 0%      ~     (p=0.571 n=5+4)
    IndexAnyASCII/16:16-32               178ns ± 0%     178ns ± 0%      ~     (all equal)
    IndexAnyASCII/256:1-32              1.09µs ± 0%    1.09µs ± 0%      ~     (all equal)
    IndexAnyASCII/256:2-32              1.09µs ± 0%    1.10µs ± 0%    +0.27%  (p=0.008 n=5+5)
    IndexAnyASCII/256:4-32              1.11µs ± 0%    1.11µs ± 0%      ~     (p=0.397 n=5+5)
    IndexAnyASCII/256:8-32              1.10µs ± 0%    1.10µs ± 0%      ~     (p=0.444 n=5+5)
    IndexAnyASCII/256:16-32             1.14µs ± 0%    1.14µs ± 0%      ~     (all equal)
    IndexAnyASCII/4096:1-32             16.5µs ± 0%    16.5µs ± 0%      ~     (p=1.000 n=5+5)
    IndexAnyASCII/4096:2-32             17.0µs ± 0%    17.0µs ± 0%      ~     (p=0.159 n=5+4)
    IndexAnyASCII/4096:4-32             17.1µs ± 0%    17.1µs ± 0%      ~     (p=0.921 n=4+5)
    IndexAnyASCII/4096:8-32             16.5µs ± 0%    16.5µs ± 0%      ~     (p=0.460 n=5+5)
    IndexAnyASCII/4096:16-32            16.5µs ± 0%    16.5µs ± 0%      ~     (p=0.794 n=5+4)
    IndexPeriodic/IndexPeriodic2-32      189µs ± 0%     189µs ± 0%      ~     (p=0.841 n=5+5)
    IndexPeriodic/IndexPeriodic4-32      189µs ± 0%     189µs ± 0%    -0.03%  (p=0.016 n=5+4)
    IndexPeriodic/IndexPeriodic8-32      189µs ± 0%     189µs ± 0%      ~     (p=0.651 n=5+5)
    IndexPeriodic/IndexPeriodic16-32     175µs ± 9%     174µs ± 7%      ~     (p=1.000 n=5+5)
    IndexPeriodic/IndexPeriodic32-32    75.1µs ± 0%    75.1µs ± 0%      ~     (p=0.690 n=5+5)
    IndexPeriodic/IndexPeriodic64-32    42.6µs ± 0%    44.7µs ± 0%    +4.98%  (p=0.008 n=5+5)
    
    name                              old speed      new speed      delta
    IndexByte/10-32                    538MB/s ± 0%   552MB/s ± 0%    +2.65%  (p=0.008 n=5+5)
    IndexByte/32-32                   1.90GB/s ± 1%  1.90GB/s ± 1%      ~     (p=0.548 n=5+5)
    IndexByte/4K-32                   8.82GB/s ± 0%  8.81GB/s ± 0%      ~     (p=0.548 n=5+5)
    IndexByte/4M-32                   7.95GB/s ± 1%  8.29GB/s ± 1%    +4.35%  (p=0.008 n=5+5)
    IndexByte/64M-32                  3.58GB/s ± 0%  3.60GB/s ± 1%      ~     (p=0.730 n=4+5)
    IndexBytePortable/10-32            296MB/s ± 0%   286MB/s ± 3%      ~     (p=0.381 n=4+5)
    IndexBytePortable/32-32            490MB/s ± 0%   485MB/s ± 2%      ~     (p=0.286 n=5+5)
    IndexBytePortable/4K-32            697MB/s ± 0%   697MB/s ± 0%      ~     (p=0.413 n=5+5)
    IndexBytePortable/4M-32            696MB/s ± 0%   695MB/s ± 0%      ~     (p=0.897 n=5+5)
    IndexBytePortable/64M-32           679MB/s ± 0%   678MB/s ± 0%    -0.10%  (p=0.008 n=5+5)
    IndexRune/10-32                    173MB/s ± 0%   203MB/s ± 0%   +17.24%  (p=0.016 n=5+4)
    IndexRune/32-32                    555MB/s ± 0%   546MB/s ± 0%    -1.62%  (p=0.008 n=5+5)
    IndexRune/4K-32                   8.01GB/s ± 0%  7.98GB/s ± 0%    -0.38%  (p=0.008 n=5+5)
    IndexRune/4M-32                   7.97GB/s ± 1%  7.95GB/s ± 1%      ~     (p=0.690 n=5+5)
    IndexRune/64M-32                  3.59GB/s ± 0%  3.58GB/s ± 1%      ~     (p=0.190 n=4+5)
    IndexRuneASCII/10-32               420MB/s ± 0%   420MB/s ± 0%      ~     (p=0.190 n=5+4)
    IndexRuneASCII/32-32              1.32GB/s ± 0%  1.32GB/s ± 0%      ~     (p=0.333 n=5+5)
    IndexRuneASCII/4K-32              8.75GB/s ± 0%  8.75GB/s ± 0%      ~     (p=0.690 n=5+5)
    IndexRuneASCII/4M-32              8.04GB/s ± 1%  7.89GB/s ± 2%    -1.87%  (p=0.016 n=5+5)
    IndexRuneASCII/64M-32             3.61GB/s ± 1%  3.62GB/s ± 0%      ~     (p=0.730 n=5+4)
    Index/10-32                        113MB/s ±14%   397MB/s ± 0%  +249.76%  (p=0.008 n=5+5)
    Index/32-32                        142MB/s ± 2%   141MB/s ± 3%      ~     (p=0.794 n=5+5)
    Index/4K-32                        345MB/s ± 0%   346MB/s ± 0%    +0.22%  (p=0.008 n=5+5)
    Index/4M-32                        345MB/s ± 0%   345MB/s ± 0%      ~     (p=0.619 n=5+5)
    Index/64M-32                       341MB/s ± 0%   341MB/s ± 0%      ~     (p=0.595 n=5+5)
    IndexEasy/10-32                    216MB/s ± 0%   453MB/s ± 8%  +109.60%  (p=0.008 n=5+5)
    IndexEasy/32-32                    692MB/s ± 0%   678MB/s ± 0%    -2.01%  (p=0.008 n=5+5)
    IndexEasy/4K-32                   8.19GB/s ± 0%  8.16GB/s ± 0%    -0.45%  (p=0.008 n=5+5)
    IndexEasy/4M-32                   7.93GB/s ± 2%  7.93GB/s ± 1%      ~     (p=0.841 n=5+5)
    IndexEasy/64M-32                  3.60GB/s ± 1%  3.59GB/s ± 1%      ~     (p=0.222 n=5+5)
    
    Change-Id: I4ca69378a2df6f9ba748c6a2706953ee1bd07343
    Reviewed-on: https://go-review.googlesource.com/96555
    Run-TryBot: Cherry Zhang <cherryyz@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
    562346b7
bytes_arm64.go 1.9 KB