• Keith Randall's avatar
    runtime: get rid of most uses of REP for copying/zeroing. · 6c7cbf08
    Keith Randall authored
    REP MOVSQ and REP STOSQ have a really high startup overhead.
    Use a Duff's device to do the repetition instead.
    
    benchmark                 old ns/op     new ns/op     delta
    BenchmarkClearFat32       7.20          1.60          -77.78%
    BenchmarkCopyFat32        6.88          2.38          -65.41%
    BenchmarkClearFat64       7.15          3.20          -55.24%
    BenchmarkCopyFat64        6.88          3.44          -50.00%
    BenchmarkClearFat128      9.53          5.34          -43.97%
    BenchmarkCopyFat128       9.27          5.56          -40.02%
    BenchmarkClearFat256      13.8          9.53          -30.94%
    BenchmarkCopyFat256       13.5          10.3          -23.70%
    BenchmarkClearFat512      22.3          18.0          -19.28%
    BenchmarkCopyFat512       22.0          19.7          -10.45%
    BenchmarkCopyFat1024      36.5          38.4          +5.21%
    BenchmarkClearFat1024     35.1          35.0          -0.28%
    
    TODO: use for stack frame zeroing
    TODO: REP prefixes are still used for "reverse" copying when src/dst
    regions overlap.  Might be worth fixing.
    
    LGTM=rsc
    R=golang-codereviews, rsc
    CC=golang-codereviews, r
    https://golang.org/cl/81370046
    6c7cbf08
asm_386.s 39.6 KB