runtime: speed up amd64 memmove
MOV with SSE registers seems faster than REP MOVSQ if the size being copied is less than about 2K. Previously we didn't use MOV if the memory region is larger than 256 byte. This patch improves the performance of 257 ~ 2048 byte non-overlapping copy by using MOV. Here is the benchmark result on Intel Xeon 3.5GHz (Nehalem). benchmark old ns/op new ns/op delta BenchmarkMemmove16 4 4 +0.42% BenchmarkMemmove32 5 5 -0.20% BenchmarkMemmove64 6 6 -0.81% BenchmarkMemmove128 7 7 -0.82% BenchmarkMemmove256 10 10 +1.92% BenchmarkMemmove512 29 16 -44.90% BenchmarkMemmove1024 37 25 -31.55% BenchmarkMemmove2048 55 44 -19.46% BenchmarkMemmove4096 92 91 -0.76% benchmark old MB/s new MB/s speedup BenchmarkMemmove16 3370.61 3356.88 1.00x BenchmarkMemmove32 6368.68 6386.99 1.00x BenchmarkMemmove64 10367.37 10462.62 1.01x BenchmarkMemmove128 17551.16 17713.48 1.01x BenchmarkMemmove256 24692.81 24142.99 0.98x BenchmarkMemmove512 17428.70 31687.72 1.82x BenchmarkMemmove1024 27401.82 40009.45 1.46x BenchmarkMemmove2048 36884.86 45766.98 1.24x BenchmarkMemmove4096 44295.91 44627.86 1.01x LGTM=khr R=golang-codereviews, gobot, khr CC=golang-codereviews https://golang.org/cl/90500043
Showing
Please
register
or
sign in
to comment