• Keith Randall's avatar
    runtime: amd64, use 4-byte ops for memmove of 4 bytes · a96e117a
    Keith Randall authored
    memmove used to use 2 2-byte load/store pairs to move 4 bytes.
    When the result is loaded with a single 4-byte load, it caused
    a store to load fowarding stall.  To avoid the stall,
    special case memmove to use 4 byte ops for the 4 byte copy case.
    
    We already have a special case for 8-byte copies.
    386 already specializes 4-byte copies.
    I'll do 2-byte copies also, but not for 1.8.
    
    benchmark                 old ns/op     new ns/op     delta
    BenchmarkIssue18740-8     7567          4799          -36.58%
    
    3-byte copies get a bit slower.  Other copies are unchanged.
    name         old time/op   new time/op   delta
    Memmove/3-8   4.76ns ± 5%   5.26ns ± 3%  +10.50%  (p=0.000 n=10+10)
    
    Fixes #18740
    
    Change-Id: Iec82cbac0ecfee80fa3c8fc83828f9a1819c3c74
    Reviewed-on: https://go-review.googlesource.com/35567
    Run-TryBot: Keith Randall <khr@golang.org>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarDavid Chase <drchase@google.com>
    a96e117a
memmove_amd64.s 12.2 KB