• Cherry Zhang's avatar
    cmd/internal/obj: remove Follow pass · bed8129e
    Cherry Zhang authored
    The Follow pass in the assembler backend reorders and copies
    instructions. This even applies to hand-written assembly code,
    which in many cases don't want to be reordered. Now that the
    SSA compiler does a good job for laying out instructions, the
    benefit of this pass is very little:
    
    AMD64: (old = with Follow, new = without Follow)
    name                      old time/op    new time/op    delta
    BinaryTree17-12              2.78s ± 1%     2.79s ± 1%  +0.44%  (p=0.000 n=20+19)
    Fannkuch11-12                3.11s ± 0%     3.31s ± 1%  +6.16%  (p=0.000 n=19+19)
    FmtFprintfEmpty-12          50.9ns ± 1%    51.6ns ± 3%  +1.40%  (p=0.000 n=17+20)
    FmtFprintfString-12          127ns ± 0%     128ns ± 1%  +0.88%  (p=0.000 n=17+17)
    FmtFprintfInt-12             122ns ± 0%     123ns ± 1%  +0.76%  (p=0.000 n=20+19)
    FmtFprintfIntInt-12          185ns ± 1%     186ns ± 1%  +0.65%  (p=0.000 n=20+19)
    FmtFprintfPrefixedInt-12     192ns ± 1%     202ns ± 1%  +4.99%  (p=0.000 n=20+19)
    FmtFprintfFloat-12           284ns ± 0%     288ns ± 0%  +1.33%  (p=0.000 n=15+19)
    FmtManyArgs-12               807ns ± 0%     804ns ± 0%  -0.44%  (p=0.000 n=16+18)
    GobDecode-12                7.23ms ± 1%    7.21ms ± 1%    ~     (p=0.052 n=20+20)
    GobEncode-12                6.09ms ± 1%    6.12ms ± 1%  +0.41%  (p=0.002 n=19+19)
    Gzip-12                      253ms ± 1%     255ms ± 1%  +0.95%  (p=0.000 n=18+20)
    Gunzip-12                   38.4ms ± 0%    38.5ms ± 0%  +0.34%  (p=0.000 n=17+17)
    HTTPClientServer-12         95.4µs ± 2%    96.1µs ± 1%  +0.78%  (p=0.002 n=19+19)
    JSONEncode-12               16.5ms ± 1%    16.6ms ± 1%  +1.17%  (p=0.000 n=19+19)
    JSONDecode-12               54.6ms ± 1%    55.3ms ± 1%  +1.23%  (p=0.000 n=18+18)
    Mandelbrot200-12            4.47ms ± 0%    4.47ms ± 0%  +0.06%  (p=0.000 n=18+18)
    GoParse-12                  3.47ms ± 1%    3.47ms ± 1%    ~     (p=0.583 n=20+20)
    RegexpMatchEasy0_32-12      84.8ns ± 1%    85.2ns ± 2%  +0.51%  (p=0.022 n=20+20)
    RegexpMatchEasy0_1K-12       206ns ± 1%     206ns ± 1%    ~     (p=0.770 n=20+20)
    RegexpMatchEasy1_32-12      82.8ns ± 1%    83.4ns ± 1%  +0.64%  (p=0.000 n=20+19)
    RegexpMatchEasy1_1K-12       363ns ± 1%     361ns ± 1%  -0.48%  (p=0.007 n=20+20)
    RegexpMatchMedium_32-12      126ns ± 1%     126ns ± 0%  +0.72%  (p=0.000 n=20+20)
    RegexpMatchMedium_1K-12     39.1µs ± 1%    39.8µs ± 0%  +1.73%  (p=0.000 n=19+19)
    RegexpMatchHard_32-12       1.97µs ± 0%    1.98µs ± 1%  +0.29%  (p=0.005 n=18+20)
    RegexpMatchHard_1K-12       59.5µs ± 1%    59.8µs ± 1%  +0.36%  (p=0.000 n=18+20)
    Revcomp-12                   442ms ± 1%     445ms ± 2%  +0.67%  (p=0.000 n=19+20)
    Template-12                 58.0ms ± 1%    57.5ms ± 1%  -0.85%  (p=0.000 n=19+19)
    TimeParse-12                 311ns ± 0%     314ns ± 0%  +0.94%  (p=0.000 n=20+18)
    TimeFormat-12                350ns ± 3%     346ns ± 0%    ~     (p=0.076 n=20+19)
    [Geo mean]                  55.9µs         56.4µs       +0.80%
    
    ARM32:
    name                     old time/op    new time/op    delta
    BinaryTree17-4              30.4s ± 0%     30.1s ± 0%  -1.14%  (p=0.000 n=10+8)
    Fannkuch11-4                13.7s ± 0%     13.6s ± 0%  -0.75%  (p=0.000 n=10+10)
    FmtFprintfEmpty-4           664ns ± 1%     651ns ± 1%  -1.96%  (p=0.000 n=7+8)
    FmtFprintfString-4         1.83µs ± 2%    1.77µs ± 2%  -3.21%  (p=0.000 n=10+10)
    FmtFprintfInt-4            1.57µs ± 2%    1.54µs ± 2%  -2.25%  (p=0.007 n=10+10)
    FmtFprintfIntInt-4         2.37µs ± 2%    2.31µs ± 1%  -2.68%  (p=0.000 n=10+10)
    FmtFprintfPrefixedInt-4    2.14µs ± 2%    2.10µs ± 1%  -1.83%  (p=0.006 n=10+10)
    FmtFprintfFloat-4          3.69µs ± 2%    3.74µs ± 1%  +1.60%  (p=0.000 n=10+10)
    FmtManyArgs-4              9.43µs ± 1%    9.17µs ± 1%  -2.70%  (p=0.000 n=10+10)
    GobDecode-4                76.3ms ± 1%    75.5ms ± 1%  -1.14%  (p=0.003 n=10+10)
    GobEncode-4                70.7ms ± 2%    69.0ms ± 1%  -2.36%  (p=0.000 n=10+10)
    Gzip-4                      2.64s ± 1%     2.65s ± 0%  +0.59%  (p=0.002 n=10+10)
    Gunzip-4                    402ms ± 0%     398ms ± 0%  -1.11%  (p=0.000 n=10+9)
    HTTPClientServer-4          458µs ± 0%     457µs ± 0%    ~     (p=0.247 n=10+10)
    JSONEncode-4                171ms ± 0%     172ms ± 0%  +0.56%  (p=0.000 n=10+10)
    JSONDecode-4                672ms ± 1%     668ms ± 1%    ~     (p=0.105 n=10+10)
    Mandelbrot200-4            33.5ms ± 0%    33.5ms ± 0%    ~     (p=0.156 n=9+10)
    GoParse-4                  33.9ms ± 0%    34.0ms ± 0%  +0.36%  (p=0.031 n=9+9)
    RegexpMatchEasy0_32-4       823ns ± 1%     835ns ± 1%  +1.49%  (p=0.000 n=8+8)
    RegexpMatchEasy0_1K-4      3.99µs ± 0%    4.02µs ± 1%  +0.92%  (p=0.000 n=8+10)
    RegexpMatchEasy1_32-4       877ns ± 3%     904ns ± 2%  +3.07%  (p=0.012 n=10+10)
    RegexpMatchEasy1_1K-4      5.99µs ± 0%    5.97µs ± 1%  -0.38%  (p=0.023 n=8+8)
    RegexpMatchMedium_32-4     1.40µs ± 2%    1.40µs ± 2%    ~     (p=0.590 n=10+9)
    RegexpMatchMedium_1K-4      357µs ± 0%     355µs ± 1%  -0.72%  (p=0.000 n=7+8)
    RegexpMatchHard_32-4       22.3µs ± 0%    22.1µs ± 0%  -0.49%  (p=0.000 n=8+7)
    RegexpMatchHard_1K-4        661µs ± 0%     658µs ± 0%  -0.42%  (p=0.000 n=8+7)
    Revcomp-4                  46.3ms ± 0%    46.3ms ± 0%    ~     (p=0.393 n=10+10)
    Template-4                  753ms ± 1%     750ms ± 0%    ~     (p=0.211 n=10+9)
    TimeParse-4                4.28µs ± 1%    4.22µs ± 1%  -1.34%  (p=0.000 n=8+10)
    TimeFormat-4               9.00µs ± 0%    9.05µs ± 0%  +0.59%  (p=0.000 n=10+10)
    [Geo mean]                  538µs          535µs       -0.55%
    
    ARM64:
    name                     old time/op    new time/op    delta
    BinaryTree17-8              8.39s ± 0%     8.39s ± 0%    ~     (p=0.684 n=10+10)
    Fannkuch11-8                5.95s ± 0%     5.99s ± 0%  +0.63%  (p=0.000 n=10+10)
    FmtFprintfEmpty-8           116ns ± 0%     116ns ± 0%    ~     (all equal)
    FmtFprintfString-8          361ns ± 0%     360ns ± 0%  -0.31%  (p=0.003 n=8+6)
    FmtFprintfInt-8             290ns ± 0%     290ns ± 0%    ~     (p=0.620 n=9+9)
    FmtFprintfIntInt-8          476ns ± 1%     469ns ± 0%  -1.47%  (p=0.000 n=10+6)
    FmtFprintfPrefixedInt-8     412ns ± 2%     417ns ± 2%  +1.39%  (p=0.006 n=9+10)
    FmtFprintfFloat-8           652ns ± 1%     652ns ± 0%    ~     (p=0.161 n=10+8)
    FmtManyArgs-8              1.94µs ± 0%    1.94µs ± 2%    ~     (p=0.781 n=10+10)
    GobDecode-8                17.7ms ± 1%    17.7ms ± 0%    ~     (p=0.962 n=10+7)
    GobEncode-8                15.6ms ± 0%    15.6ms ± 1%    ~     (p=0.063 n=10+10)
    Gzip-8                      786ms ± 0%     787ms ± 0%    ~     (p=0.356 n=10+9)
    Gunzip-8                    127ms ± 0%     127ms ± 0%  +0.08%  (p=0.028 n=10+9)
    HTTPClientServer-8          198µs ± 6%     198µs ± 7%    ~     (p=0.796 n=10+10)
    JSONEncode-8               42.5ms ± 0%    42.2ms ± 0%  -0.73%  (p=0.000 n=9+8)
    JSONDecode-8                158ms ± 1%     162ms ± 0%  +2.28%  (p=0.000 n=10+9)
    Mandelbrot200-8            10.1ms ± 0%    10.1ms ± 0%  -0.01%  (p=0.000 n=10+9)
    GoParse-8                  8.54ms ± 1%    8.63ms ± 1%  +1.06%  (p=0.000 n=10+9)
    RegexpMatchEasy0_32-8       231ns ± 1%     225ns ± 0%  -2.52%  (p=0.000 n=9+10)
    RegexpMatchEasy0_1K-8      1.63µs ± 0%    1.63µs ± 0%    ~     (p=0.170 n=10+10)
    RegexpMatchEasy1_32-8       253ns ± 0%     249ns ± 0%  -1.41%  (p=0.000 n=9+10)
    RegexpMatchEasy1_1K-8      2.08µs ± 0%    2.08µs ± 0%  -0.32%  (p=0.000 n=9+10)
    RegexpMatchMedium_32-8      355ns ± 1%     351ns ± 0%  -1.04%  (p=0.007 n=10+7)
    RegexpMatchMedium_1K-8      104µs ± 0%     104µs ± 0%    ~     (p=0.148 n=10+10)
    RegexpMatchHard_32-8       5.79µs ± 0%    5.79µs ± 0%    ~     (p=0.578 n=10+10)
    RegexpMatchHard_1K-8        176µs ± 0%     176µs ± 0%    ~     (p=0.137 n=10+10)
    Revcomp-8                   1.37s ± 1%     1.36s ± 1%  -0.26%  (p=0.023 n=10+10)
    Template-8                  151ms ± 1%     154ms ± 1%  +2.14%  (p=0.000 n=9+10)
    TimeParse-8                 723ns ± 2%     721ns ± 1%    ~     (p=0.592 n=10+10)
    TimeFormat-8                804ns ± 2%     798ns ± 3%    ~     (p=0.344 n=10+10)
    [Geo mean]                  154µs          154µs       -0.02%
    
    Therefore remove this pass. Also reduce text size by 0.5~2%.
    
    Comment out some dead code in runtime/sys_nacl_amd64p32.s
    which contains undefined symbols.
    
    Change-Id: I1473986fe5b18b3d2554ce96cdc6f0999b8d955d
    Reviewed-on: https://go-review.googlesource.com/36205
    Run-TryBot: Cherry Zhang <cherryyz@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
    bed8129e
sys_nacl_amd64p32.s 9.56 KB