• Giovanni Bajo's avatar
    cmd/compile: add patterns for bit set/clear/complement on amd64 · 79112707
    Giovanni Bajo authored
    This patch completes implementation of BT(Q|L), and adds support
    for BT(S|R|C)(Q|L).
    
    Example of code changes from time.(*Time).addSec:
    
            if t.wall&hasMonotonic != 0 {
      0x1073465               488b08                  MOVQ 0(AX), CX
      0x1073468               4889ca                  MOVQ CX, DX
      0x107346b               48c1e93f                SHRQ $0x3f, CX
      0x107346f               48c1e13f                SHLQ $0x3f, CX
      0x1073473               48f7c1ffffffff          TESTQ $-0x1, CX
      0x107347a               746b                    JE 0x10734e7
    
            if t.wall&hasMonotonic != 0 {
      0x1073435               488b08                  MOVQ 0(AX), CX
      0x1073438               480fbae13f              BTQ $0x3f, CX
      0x107343d               7363                    JAE 0x10734a2
    
    Another example:
    
                            t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
      0x10734c8               4881e1ffffff3f          ANDQ $0x3fffffff, CX
      0x10734cf               48c1e61e                SHLQ $0x1e, SI
      0x10734d3               4809ce                  ORQ CX, SI
      0x10734d6               48b90000000000000080    MOVQ $0x8000000000000000, CX
      0x10734e0               4809f1                  ORQ SI, CX
      0x10734e3               488908                  MOVQ CX, 0(AX)
    
                            t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
      0x107348b		4881e2ffffff3f		ANDQ $0x3fffffff, DX
      0x1073492		48c1e61e		SHLQ $0x1e, SI
      0x1073496		4809f2			ORQ SI, DX
      0x1073499		480fbaea3f		BTSQ $0x3f, DX
      0x107349e		488910			MOVQ DX, 0(AX)
    
    Go1 benchmarks seem unaffected, and I would be surprised
    otherwise:
    
    name                     old time/op    new time/op     delta
    BinaryTree17-4              2.64s ± 4%      2.56s ± 9%  -2.92%  (p=0.008 n=9+9)
    Fannkuch11-4                2.90s ± 1%      2.95s ± 3%  +1.76%  (p=0.010 n=10+9)
    FmtFprintfEmpty-4          35.3ns ± 1%     34.5ns ± 2%  -2.34%  (p=0.004 n=9+8)
    FmtFprintfString-4         57.0ns ± 1%     58.4ns ± 5%  +2.52%  (p=0.029 n=9+10)
    FmtFprintfInt-4            59.8ns ± 3%     59.8ns ± 6%    ~     (p=0.565 n=10+10)
    FmtFprintfIntInt-4         93.9ns ± 3%     91.2ns ± 5%  -2.94%  (p=0.014 n=10+9)
    FmtFprintfPrefixedInt-4     107ns ± 6%      104ns ± 6%    ~     (p=0.099 n=10+10)
    FmtFprintfFloat-4           187ns ± 3%      188ns ± 3%    ~     (p=0.505 n=10+9)
    FmtManyArgs-4               410ns ± 1%      415ns ± 6%    ~     (p=0.649 n=8+10)
    GobDecode-4                5.30ms ± 3%     5.27ms ± 3%    ~     (p=0.436 n=10+10)
    GobEncode-4                4.62ms ± 5%     4.47ms ± 2%  -3.24%  (p=0.001 n=9+10)
    Gzip-4                      197ms ± 4%      193ms ± 3%    ~     (p=0.123 n=10+10)
    Gunzip-4                   30.4ms ± 3%     30.1ms ± 3%    ~     (p=0.481 n=10+10)
    HTTPClientServer-4         76.3µs ± 1%     76.0µs ± 1%    ~     (p=0.236 n=8+9)
    JSONEncode-4               10.5ms ± 9%     10.3ms ± 3%    ~     (p=0.280 n=10+10)
    JSONDecode-4               42.3ms ±10%     41.3ms ± 2%    ~     (p=0.053 n=9+10)
    Mandelbrot200-4            3.80ms ± 2%     3.72ms ± 2%  -2.15%  (p=0.001 n=9+10)
    GoParse-4                  2.88ms ±10%     2.81ms ± 2%    ~     (p=0.247 n=10+10)
    RegexpMatchEasy0_32-4      69.5ns ± 4%     68.6ns ± 2%    ~     (p=0.171 n=10+10)
    RegexpMatchEasy0_1K-4       165ns ± 3%      162ns ± 3%    ~     (p=0.137 n=10+10)
    RegexpMatchEasy1_32-4      65.7ns ± 6%     64.4ns ± 2%  -2.02%  (p=0.037 n=10+10)
    RegexpMatchEasy1_1K-4       278ns ± 2%      279ns ± 3%    ~     (p=0.991 n=8+9)
    RegexpMatchMedium_32-4     99.3ns ± 3%     98.5ns ± 4%    ~     (p=0.457 n=10+9)
    RegexpMatchMedium_1K-4     30.1µs ± 1%     30.4µs ± 2%    ~     (p=0.173 n=8+10)
    RegexpMatchHard_32-4       1.40µs ± 2%     1.41µs ± 4%    ~     (p=0.565 n=10+10)
    RegexpMatchHard_1K-4       42.5µs ± 1%     41.5µs ± 3%  -2.13%  (p=0.002 n=8+9)
    Revcomp-4                   332ms ± 4%      328ms ± 5%    ~     (p=0.720 n=9+10)
    Template-4                 48.3ms ± 2%     49.6ms ± 3%  +2.56%  (p=0.002 n=8+10)
    TimeParse-4                 252ns ± 2%      249ns ± 3%    ~     (p=0.116 n=9+10)
    TimeFormat-4                262ns ± 4%      252ns ± 3%  -4.01%  (p=0.000 n=9+10)
    
    name                     old speed      new speed       delta
    GobDecode-4               145MB/s ± 3%    146MB/s ± 3%    ~     (p=0.436 n=10+10)
    GobEncode-4               166MB/s ± 5%    172MB/s ± 2%  +3.28%  (p=0.001 n=9+10)
    Gzip-4                   98.6MB/s ± 4%  100.4MB/s ± 3%    ~     (p=0.123 n=10+10)
    Gunzip-4                  639MB/s ± 3%    645MB/s ± 3%    ~     (p=0.481 n=10+10)
    JSONEncode-4              185MB/s ± 8%    189MB/s ± 3%    ~     (p=0.280 n=10+10)
    JSONDecode-4             46.0MB/s ± 9%   47.0MB/s ± 2%  +2.21%  (p=0.046 n=9+10)
    GoParse-4                20.1MB/s ± 9%   20.6MB/s ± 2%    ~     (p=0.239 n=10+10)
    RegexpMatchEasy0_32-4     460MB/s ± 4%    467MB/s ± 2%    ~     (p=0.165 n=10+10)
    RegexpMatchEasy0_1K-4    6.19GB/s ± 3%   6.28GB/s ± 3%    ~     (p=0.165 n=10+10)
    RegexpMatchEasy1_32-4     487MB/s ± 5%    497MB/s ± 2%  +2.00%  (p=0.043 n=10+10)
    RegexpMatchEasy1_1K-4    3.67GB/s ± 2%   3.67GB/s ± 3%    ~     (p=0.963 n=8+9)
    RegexpMatchMedium_32-4   10.1MB/s ± 3%   10.1MB/s ± 4%    ~     (p=0.435 n=10+9)
    RegexpMatchMedium_1K-4   34.0MB/s ± 1%   33.7MB/s ± 2%    ~     (p=0.173 n=8+10)
    RegexpMatchHard_32-4     22.9MB/s ± 2%   22.7MB/s ± 4%    ~     (p=0.565 n=10+10)
    RegexpMatchHard_1K-4     24.0MB/s ± 3%   24.7MB/s ± 3%  +2.64%  (p=0.001 n=9+9)
    Revcomp-4                 766MB/s ± 4%    775MB/s ± 5%    ~     (p=0.720 n=9+10)
    Template-4               40.2MB/s ± 2%   39.2MB/s ± 3%  -2.47%  (p=0.002 n=8+10)
    
    The rules match ~1800 times during all.bash.
    
    Fixes #18943
    
    Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
    Reviewed-on: https://go-review.googlesource.com/94766
    Run-TryBot: Giovanni Bajo <rasky@develer.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarKeith Randall <khr@golang.org>
    79112707
mathbits.go 4.12 KB