• Bill O'Farrell's avatar
    math: use SIMD to accelerate some scalar math functions on s390x · b6a15683
    Bill O'Farrell authored
    Note, most math functions are structured to use stubs, so that they can
    be accelerated with assembly on any platform.
    Sinh, cosh, and tanh were not structued with stubs, so this CL does
    that. This set of routines was chosen as likely to produce good speedups
    with assembly on any platform.
    
    Technique used was minimax polynomial approximation using tables of
    polynomial coefficients, with argument range reduction.
    A table of scaling factors was also used for cosh and log10.
    
                         before       after      speedup
    BenchmarkCos         22.1 ns/op   6.79 ns/op  3.25x
    BenchmarkCosh       125   ns/op  11.7  ns/op 10.68x
    BenchmarkLog10       48.4 ns/op  12.5  ns/op  3.87x
    BenchmarkSin         22.2 ns/op   6.55 ns/op  3.39x
    BenchmarkSinh       125   ns/op  14.2  ns/op  8.80x
    BenchmarkTanh        65.0 ns/op  15.1  ns/op  4.30x
    
    Accuracy was tested against a high precision
    reference function to determine maximum error.
    Approximately 4,000,000 points were tested for each function,
    producing the following result.
    Note: ulperr is error in "units in the last place"
    
           max
          ulperr
    sin    1.43 (returns NaN beyond +-2^50)
    cos    1.79 (returns NaN beyond +-2^50)
    cosh   1.05
    sinh   3.02
    tanh   3.69
    log10  1.75
    
    Also includes a set of tests to test non-vector functions even
    when SIMD is enabled
    
    Change-Id: Icb45f14d00864ee19ed973d209c3af21e4df4edc
    Reviewed-on: https://go-review.googlesource.com/32352
    Run-TryBot: Michael Munday <munday@ca.ibm.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarMichael Munday <munday@ca.ibm.com>
    b6a15683
stubs_arm64.s 1.15 KB