• Josh Bleecher Snyder's avatar
    runtime, cmd/internal/obj/arm: improve arm function prologue · 5353cde0
    Josh Bleecher Snyder authored
    When stack growth is not needed, as it usually is not,
    execute only a single conditional branch
    rather than three conditional instructions.
    This adds 4 bytes to every function,
    but might speed up execution in the common case.
    
    Sample disassembly for
    
    func f() {
    	_ = [128]byte{}
    }
    
    Before:
    
    TEXT main.f(SB) x.go
    	x.go:3	0x2000	e59a1008	MOVW 0x8(R10), R1
    	x.go:3	0x2004	e59fb028	MOVW 0x28(R15), R11
    	x.go:3	0x2008	e08d200b	ADD R11, R13, R2
    	x.go:3	0x200c	e1520001	CMP R1, R2
    	x.go:3	0x2010	91a0300e	MOVW.LS R14, R3
    	x.go:3	0x2014	9b0118a9	BL.LS runtime.morestack_noctxt(SB)
    	x.go:3	0x2018	9afffff8	B.LS main.f(SB)
    	x.go:3	0x201c	e52de084	MOVW.W R14, -0x84(R13)
    	x.go:4	0x2020	e28d1004	ADD $4, R13, R1
    	x.go:4	0x2024	e3a00000	MOVW $0, R0
    	x.go:4	0x2028	eb012255	BL 0x4a984
    	x.go:5	0x202c	e49df084	RET #132
    	x.go:5	0x2030	eafffffe	B 0x2030
    	x.go:5	0x2034	ffffff7c	?
    
    After:
    
    TEXT main.f(SB) x.go
    	x.go:3	0x2000	e59a1008	MOVW 0x8(R10), R1
    	x.go:3	0x2004	e59fb02c	MOVW 0x2c(R15), R11
    	x.go:3	0x2008	e08d200b	ADD R11, R13, R2
    	x.go:3	0x200c	e1520001	CMP R1, R2
    	x.go:3	0x2010	9a000004	B.LS 0x2028
    	x.go:3	0x2014	e52de084	MOVW.W R14, -0x84(R13)
    	x.go:4	0x2018	e28d1004	ADD $4, R13, R1
    	x.go:4	0x201c	e3a00000	MOVW $0, R0
    	x.go:4	0x2020	eb0124dc	BL 0x4b398
    	x.go:5	0x2024	e49df084	RET #132
    	x.go:5	0x2028	e1a0300e	MOVW R14, R3
    	x.go:5	0x202c	eb011b0d	BL runtime.morestack_noctxt(SB)
    	x.go:5	0x2030	eafffff2	B main.f(SB)
    	x.go:5	0x2034	eafffffe	B 0x2034
    	x.go:5	0x2038	ffffff7c	?
    
    Updates #10587.
    
    package sort benchmarks on an iPhone 6:
    
    name            old time/op  new time/op  delta
    SortString1K     569µs ± 0%   565µs ± 1%  -0.75%  (p=0.000 n=23+24)
    StableString1K   872µs ± 1%   870µs ± 1%  -0.16%  (p=0.009 n=23+24)
    SortInt1K        317µs ± 2%   316µs ± 2%    ~     (p=0.410 n=26+26)
    StableInt1K      343µs ± 1%   339µs ± 1%  -1.07%  (p=0.000 n=22+23)
    SortInt64K      30.0ms ± 1%  30.0ms ± 1%    ~     (p=0.091 n=25+24)
    StableInt64K    30.2ms ± 0%  30.0ms ± 0%  -0.69%  (p=0.000 n=22+22)
    Sort1e2          147µs ± 1%   146µs ± 0%  -0.48%  (p=0.000 n=25+24)
    Stable1e2        290µs ± 1%   286µs ± 1%  -1.30%  (p=0.000 n=23+24)
    Sort1e4         29.5ms ± 2%  29.7ms ± 1%  +0.71%  (p=0.000 n=23+23)
    Stable1e4       88.7ms ± 4%  88.6ms ± 8%  -0.07%  (p=0.022 n=26+26)
    Sort1e6          4.81s ± 7%   4.83s ± 7%    ~     (p=0.192 n=26+26)
    Stable1e6        18.3s ± 1%   18.1s ± 1%  -0.76%  (p=0.000 n=25+23)
    SearchWrappers   318ns ± 1%   344ns ± 1%  +8.14%  (p=0.000 n=23+26)
    
    package sort benchmarks on a first generation rpi:
    
    name            old time/op  new time/op  delta
    SearchWrappers  4.13µs ± 0%  3.95µs ± 0%   -4.42%  (p=0.000 n=15+13)
    SortString1K    5.81ms ± 1%  5.82ms ± 2%     ~     (p=0.400 n=14+15)
    StableString1K  9.69ms ± 1%  9.73ms ± 0%     ~     (p=0.121 n=15+11)
    SortInt1K       3.30ms ± 2%  3.66ms ±19%  +10.82%  (p=0.000 n=15+14)
    StableInt1K     5.97ms ±15%  4.17ms ± 8%  -30.05%  (p=0.000 n=15+15)
    SortInt64K       319ms ± 1%   295ms ± 1%   -7.65%  (p=0.000 n=15+15)
    StableInt64K     343ms ± 0%   332ms ± 0%   -3.26%  (p=0.000 n=12+13)
    Sort1e2         3.36ms ± 2%  3.22ms ± 4%   -4.10%  (p=0.000 n=15+15)
    Stable1e2       6.74ms ± 1%  6.43ms ± 2%   -4.67%  (p=0.000 n=15+15)
    Sort1e4          247ms ± 1%   247ms ± 1%     ~     (p=0.331 n=15+14)
    Stable1e4        864ms ± 0%   820ms ± 0%   -5.15%  (p=0.000 n=14+15)
    Sort1e6          41.2s ± 0%   41.2s ± 0%   +0.15%  (p=0.000 n=13+14)
    Stable1e6         192s ± 0%    182s ± 0%   -5.07%  (p=0.000 n=14+14)
    
    Change-Id: I8a9db77e1d4ea1956575895893bc9d04bd81204b
    Reviewed-on: https://go-review.googlesource.com/10497Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
    5353cde0
Name
Last commit
Last update
..
5.out.go Loading commit data...
anames.go Loading commit data...
anames5.go Loading commit data...
asm5.go Loading commit data...
list5.go Loading commit data...
obj5.go Loading commit data...