• Brian Kessler's avatar
    math/big: recognize z.Mul(x, x) as squaring of x · 25b040c2
    Brian Kessler authored
    updates #13745
    
    Multiprecision squaring can be done in a straightforward manner
    with about half the multiplications of a basic multiplication
    due to the symmetry of the operands.  This change implements
    basic squaring for nat types and uses it for Int multiplication
    when the same variable is supplied to both arguments of
    z.Mul(x, x). This has some overhead to allocate a temporary
    variable to hold the cross products, shift them to double and
    add them to the diagonal terms.  There is a speed benefit in
    the intermediate range when the overhead is neglible and the
    asymptotic performance of karatsuba multiplication has not been
    reached.
    
    basicSqrThreshold = 20
    karatsubaSqrThreshold = 400
    
    Were set by running calibrate_test.go to measure timing differences
    between the algorithms.  Benchmarks for squaring:
    
    name           old time/op  new time/op  delta
    IntSqr/1-4     51.5ns ±25%  25.1ns ± 7%  -51.38%  (p=0.008 n=5+5)
    IntSqr/2-4     79.1ns ± 4%  72.4ns ± 2%   -8.47%  (p=0.008 n=5+5)
    IntSqr/3-4      102ns ± 4%    97ns ± 5%     ~     (p=0.056 n=5+5)
    IntSqr/5-4      161ns ± 4%   163ns ± 7%     ~     (p=0.952 n=5+5)
    IntSqr/8-4      277ns ± 5%   267ns ± 6%     ~     (p=0.087 n=5+5)
    IntSqr/10-4     358ns ± 3%   360ns ± 4%     ~     (p=0.730 n=5+5)
    IntSqr/20-4    1.07µs ± 3%  1.01µs ± 6%     ~     (p=0.056 n=5+5)
    IntSqr/30-4    2.36µs ± 4%  1.72µs ± 2%  -27.03%  (p=0.008 n=5+5)
    IntSqr/50-4    5.19µs ± 3%  3.88µs ± 4%  -25.37%  (p=0.008 n=5+5)
    IntSqr/80-4    11.3µs ± 4%   8.6µs ± 3%  -23.78%  (p=0.008 n=5+5)
    IntSqr/100-4   16.2µs ± 4%  12.8µs ± 3%  -21.49%  (p=0.008 n=5+5)
    IntSqr/200-4   50.1µs ± 5%  44.7µs ± 3%  -10.65%  (p=0.008 n=5+5)
    IntSqr/300-4    105µs ±11%    95µs ± 3%   -9.50%  (p=0.008 n=5+5)
    IntSqr/500-4    231µs ± 5%   227µs ± 2%     ~     (p=0.310 n=5+5)
    IntSqr/800-4    496µs ± 9%   459µs ± 3%   -7.40%  (p=0.016 n=5+5)
    IntSqr/1000-4   700µs ± 3%   710µs ± 5%     ~     (p=0.841 n=5+5)
    
    Show a speed up of 10-25% in the range where basicSqr is optimal,
    improved single word squaring and no significant difference when
    the fallback to standard multiplication is used.
    
    Change-Id: Iae2c82ca91cf890823f91e5c83bbe9a2c534b72b
    Reviewed-on: https://go-review.googlesource.com/53638Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
    Run-TryBot: Robert Griesemer <gri@golang.org>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    25b040c2
nat_test.go 15.7 KB