• Michael Munday's avatar
    cmd/compile: emit fused multiply-{add,subtract} instructions on s390x · bd8a39b6
    Michael Munday authored
    Explcitly block fused multiply-add pattern matching when a cast is used
    after the multiplication, for example:
    
        - (a * b) + c        // can emit fused multiply-add
        - float64(a * b) + c // cannot emit fused multiply-add
    
    float{32,64} and complex{64,128} casts of matching types are now kept
    as OCONV operations rather than being replaced with OCONVNOP operations
    because they now imply a rounding operation (and therefore aren't a
    no-op anymore).
    
    Operations (for example, multiplication) on complex types may utilize
    fused multiply-add and -subtract instructions internally. There is no
    way to disable this behavior at the moment.
    
    Improves the performance of the floating point implementation of
    poly1305:
    
    name         old speed     new speed     delta
    64           246MB/s ± 0%  275MB/s ± 0%  +11.48%   (p=0.000 n=10+8)
    1K           312MB/s ± 0%  357MB/s ± 0%  +14.41%  (p=0.000 n=10+10)
    64Unaligned  246MB/s ± 0%  274MB/s ± 0%  +11.43%  (p=0.000 n=10+10)
    1KUnaligned  312MB/s ± 0%  357MB/s ± 0%  +14.39%   (p=0.000 n=10+8)
    
    Updates #17895.
    
    Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
    Reviewed-on: https://go-review.googlesource.com/36963
    Run-TryBot: Michael Munday <munday@ca.ibm.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarKeith Randall <khr@golang.org>
    bd8a39b6
S390XOps.go 38.4 KB