• Robert Griesemer's avatar
    math/big: faster assembly kernels for AddVx/SubVx for amd64. · 80b3ff9f
    Robert Griesemer authored
    Replaced use of rotate instructions (RCRQ, RCLQ) with ADDQ/SBBQ
    for restoring/saving the carry flag per suggestion from Torbjörn
    Granlund (author of GMP bignum libs for C).
    The rotate instructions tend to be slower on todays machines.
    
    benchmark              old ns/op     new ns/op     delta
    BenchmarkAddVV_1       5.69          5.51          -3.16%
    BenchmarkAddVV_2       7.15          6.87          -3.92%
    BenchmarkAddVV_3       8.69          8.06          -7.25%
    BenchmarkAddVV_4       8.10          8.13          +0.37%
    BenchmarkAddVV_5       8.37          8.47          +1.19%
    BenchmarkAddVV_1e1     13.1          12.0          -8.40%
    BenchmarkAddVV_1e2     78.1          69.4          -11.14%
    BenchmarkAddVV_1e3     815           656           -19.51%
    BenchmarkAddVV_1e4     8137          7345          -9.73%
    BenchmarkAddVV_1e5     100127        93909         -6.21%
    BenchmarkAddVW_1       4.86          4.71          -3.09%
    BenchmarkAddVW_2       5.67          5.50          -3.00%
    BenchmarkAddVW_3       6.51          6.34          -2.61%
    BenchmarkAddVW_4       6.69          6.66          -0.45%
    BenchmarkAddVW_5       7.20          7.21          +0.14%
    BenchmarkAddVW_1e1     10.0          9.34          -6.60%
    BenchmarkAddVW_1e2     45.4          52.3          +15.20%
    BenchmarkAddVW_1e3     417           491           +17.75%
    BenchmarkAddVW_1e4     4760          4852          +1.93%
    BenchmarkAddVW_1e5     69107         67717         -2.01%
    
    benchmark              old MB/s      new MB/s      speedup
    BenchmarkAddVV_1       11241.82      11610.28      1.03x
    BenchmarkAddVV_2       17902.68      18631.82      1.04x
    BenchmarkAddVV_3       22082.43      23835.64      1.08x
    BenchmarkAddVV_4       31588.18      31492.06      1.00x
    BenchmarkAddVV_5       38229.90      37783.17      0.99x
    BenchmarkAddVV_1e1     48891.67      53340.91      1.09x
    BenchmarkAddVV_1e2     81940.61      92191.86      1.13x
    BenchmarkAddVV_1e3     78443.09      97480.44      1.24x
    BenchmarkAddVV_1e4     78644.18      87129.50      1.11x
    BenchmarkAddVV_1e5     63918.48      68150.84      1.07x
    BenchmarkAddVW_1       13165.09      13581.00      1.03x
    BenchmarkAddVW_2       22588.04      23275.41      1.03x
    BenchmarkAddVW_3       29483.82      30303.96      1.03x
    BenchmarkAddVW_4       38286.54      38453.21      1.00x
    BenchmarkAddVW_5       44414.57      44370.59      1.00x
    BenchmarkAddVW_1e1     63816.84      68494.08      1.07x
    BenchmarkAddVW_1e2     140885.41     122427.16     0.87x
    BenchmarkAddVW_1e3     153258.31     130325.28     0.85x
    BenchmarkAddVW_1e4     134447.63     131904.02     0.98x
    BenchmarkAddVW_1e5     92609.41      94509.88      1.02x
    
    Change-Id: Ia473e9ab9c63a955c252426684176bca566645ae
    Reviewed-on: https://go-review.googlesource.com/2503Reviewed-by: 's avatarKeith Randall <khr@golang.org>
    80b3ff9f
arith_amd64.s 7.64 KB