• Keith Randall's avatar
    runtime: faster x86 memmove (a.k.a. built-in copy()) · 60214492
    Keith Randall authored
    REP instructions have a high startup cost, so we handle small
    sizes with some straightline code.  The REP MOVSx instructions
    are really fast for large sizes.  The cutover is approximately
    1K.  We implement up to 128/256 because that is the maximum
    SSE register load (loading all data into registers before any
    stores lets us ignore copy direction).
    
    (on a Sandy Bridge E5-1650 @ 3.20GHz)
    benchmark               old ns/op    new ns/op    delta
    BenchmarkMemmove0               3            3   +0.86%
    BenchmarkMemmove1               5            5   +5.40%
    BenchmarkMemmove2              18            8  -56.84%
    BenchmarkMemmove3              18            7  -58.45%
    BenchmarkMemmove4              36            7  -78.63%
    BenchmarkMemmove5              36            8  -77.91%
    BenchmarkMemmove6              36            8  -77.76%
    BenchmarkMemmove7              36            8  -77.82%
    BenchmarkMemmove8              18            8  -56.33%
    BenchmarkMemmove9              18            7  -58.34%
    BenchmarkMemmove10             18            7  -58.34%
    BenchmarkMemmove11             18            7  -58.45%
    BenchmarkMemmove12             36            7  -78.51%
    BenchmarkMemmove13             36            7  -78.48%
    BenchmarkMemmove14             36            7  -78.56%
    BenchmarkMemmove15             36            7  -78.56%
    BenchmarkMemmove16             18            7  -58.24%
    BenchmarkMemmove32             18            8  -54.33%
    BenchmarkMemmove64             18            8  -53.37%
    BenchmarkMemmove128            20            9  -55.93%
    BenchmarkMemmove256            25           11  -55.16%
    BenchmarkMemmove512            33           33   -1.19%
    BenchmarkMemmove1024           43           44   +2.06%
    BenchmarkMemmove2048           61           61   +0.16%
    BenchmarkMemmove4096           95           95   +0.00%
    
    R=golang-dev, bradfitz, remyoudompheng, khr, iant, dominik.honnef
    CC=golang-dev
    https://golang.org/cl/9038048
    60214492
memmove_amd64.s 4.52 KB