- 24 Jul, 2015 4 commits
-
-
Josh Bleecher Snyder authored
Change-Id: I4e496c7c7239111133631f76ca25e14be64800c6 Reviewed-on: https://go-review.googlesource.com/12656Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
This prevents panics while attempting to generate code for the runtime package. Now: <unknown line number>: internal compiler error: localOffset of non-LocalSlot value: v10 = ADDQconst <*m> [256] v22 Change-Id: I20ed6ec6aae2c91183b8c826b8ebcc98e8ceebff Reviewed-on: https://go-review.googlesource.com/12655Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
This generates more efficient code. Before: 0x003a 00058 (rr.go:7) LEAQ go.string.hdr."="(SB), BX 0x0041 00065 (rr.go:7) LEAQ 16(BX), BP 0x0045 00069 (rr.go:7) MOVQ BP, 16(SP) After: 0x003a 00058 (rr.go:7) LEAQ go.string."="(SB), BX 0x0041 00065 (rr.go:7) MOVQ BX, 16(SP) It also matches the existing backend and is more robust to other changes, such as CL 11698, which I believe broke the current code. This CL fixes the encoding/base64 tests, as run with: GOGC=off GOSSAPKG=base64 go test -a encoding/base64 Change-Id: I3c475bed1dd3335cc14e13309e11d23f0ed32c17 Reviewed-on: https://go-review.googlesource.com/12654Reviewed-by: Keith Randall <khr@golang.org>
-
Todd Neal authored
Change-Id: I8da76b9a4c5c80e8515e69e105d6349fe3ad9281 Reviewed-on: https://go-review.googlesource.com/12611Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
-
- 23 Jul, 2015 11 commits
-
-
Josh Bleecher Snyder authored
This reduces the time to compile test/slice3.go on my laptop from ~12s to ~3.8s. It reduces the max memory use from ~4.8gb to ~450mb. This is still considerably worse than tip, at 1s and 300mb respectively, but it's getting closer. Hopefully this will fix the build at long last. Change-Id: Iac26b52023f408438cba3ea1b81dcd82ca402b90 Reviewed-on: https://go-review.googlesource.com/12566Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Experimentally, the Ops of v.Args do a good job of differentiating values that will end up in different partitions. Most values have at most two args, so use them. This reduces the wall time to run test/slice3.go on my laptop from ~20s to ~12s. Credit to Todd Neal for the idea. Change-Id: I55d08f09eb678bbe8366924ca2fabcd32526bf41 Reviewed-on: https://go-review.googlesource.com/12565Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Change-Id: Id89ea3b458597dd93d269b9fe5475e9cccc6d992 Reviewed-on: https://go-review.googlesource.com/12562Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
These temporary environment variables make it possible to enable using SSA-generated code for a particular function or package without having to rebuild the compiler. This makes it possible to start bulk testing SSA generated code. First, bump up the default stack size (_StackMin in runtime/stack2.go) to something large like 32768, because without stackmaps we can't grow stacks. Then run something like: for pkg in `go list std` do GOGC=off GOSSAPKG=`basename $pkg` go test -a $pkg done When a test fails, you can re-run those tests, selectively enabling one function after another, until you find the one that is causing trouble. Doing this right now yields some interesting results: * There are several packages for which we generate some code and whose tests pass. Yay! * We can generate code for encoding/base64, but tests there fail, so there's a bug to fix. * Attempting to build the runtime yields a panic during codegen: panic: interface conversion: ssa.Location is nil, not *ssa.LocalSlot * The top unimplemented codegen items are (simplified): 59 genValue not implemented: REPMOVSB 18 genValue not implemented: REPSTOSQ 14 genValue not implemented: SUBQ 9 branch not implemented: If v -> b b. Control: XORQconst <bool> [1] 8 genValue not implemented: MOVQstoreidx8 4 branch not implemented: If v -> b b. Control: SETG <bool> 3 branch not implemented: If v -> b b. Control: SETLE <bool> 2 load flags not implemented: LoadReg8 <flags> 2 genValue not implemented: InvertFlags <flags> 1 store flags not implemented: StoreReg8 <flags> 1 branch not implemented: If v -> b b. Control: SETGE <bool> Change-Id: Ib64809ac0c917e25bcae27829ae634c70d290c7f Reviewed-on: https://go-review.googlesource.com/12547Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
By walking only the current set of partitions at any given point, the cse pass ended up doing lots of extraneous, effectively O(n^2) work. Using a regular for loop allows each cse pass to make as much progress as possible by processing each new class as it is introduced. This can and should be optimized further, but it already reduces by 75% cse time on test/slice3.go. The overall time to compile test/slice3.go is still dominated by the O(n^2) work in the liveness pass. However, Keith is rewriting regalloc anyway. Change-Id: I8be020b2f69352234587eeadeba923481bf43fcc Reviewed-on: https://go-review.googlesource.com/12244Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Here is a concrete case in which this goes wrong. func f_ssa() int { var n int Next: for j := 0; j < 3; j++ { for i := 0; i < 10; i++ { if i == 6 { continue Next } n = i } n += j + j + j + j + j + j + j + j + j + j // j * 10 } return n } What follows is the function printout before and after CSE. Note blocks b8 and b10 in the before case. b8 is the inner loop's condition: i < 10. b10 is the inner loop's increment: i++. v82 is i. On entry to b8, it is either 0 (v19) the first time, or the result of incrementing v82, by way of v29. The CSE pass considered v82 and v49 to be common subexpressions, and eliminated v82 in favor of v49. In the after case, v82 is now dead and will shortly be eliminated. As a result, v29 is also dead, and we have lost the increment. The loop runs forever. BEFORE CSE f_ssa <nil> b1: v1 = Arg <mem> v2 = SP <uint64> v4 = Addr <*int> {~r0} v2 v13 = Zero <mem> [8] v4 v1 v14 = Const <int> v15 = Const <int> v17 = Const <int> [3] v19 = Const <int> v21 = Const <int> [10] v24 = Const <int> [6] v28 = Const <int> [1] v43 = Const <int> [1] Plain -> b3 b2: <- b7 Exit v47 b3: <- b1 Plain -> b4 b4: <- b3 b6 v49 = Phi <int> v15 v44 v68 = Phi <int> v14 v67 v81 = Phi <mem> v13 v81 v18 = Less <bool> v49 v17 If v18 -> b5 b7 b5: <- b4 Plain -> b8 b6: <- b12 b11 v67 = Phi <int> v66 v41 v44 = Add <int> v49 v43 Plain -> b4 b7: <- b4 v47 = Store <mem> v4 v68 v81 Plain -> b2 b8: <- b5 b10 v66 = Phi <int> v68 v82 v82 = Phi <int> v19 v29 v22 = Less <bool> v82 v21 If v22 -> b9 b11 b9: <- b8 v25 = Eq <bool> v82 v24 If v25 -> b12 b13 b10: <- b13 v29 = Add <int> v82 v28 Plain -> b8 b11: <- b8 v32 = Add <int> v49 v49 v33 = Add <int> v32 v49 v34 = Add <int> v33 v49 v35 = Add <int> v34 v49 v36 = Add <int> v35 v49 v37 = Add <int> v36 v49 v38 = Add <int> v37 v49 v39 = Add <int> v38 v49 v40 = Add <int> v39 v49 v41 = Add <int> v66 v40 Plain -> b6 b12: <- b9 Plain -> b6 b13: <- b9 Plain -> b10 AFTER CSE f_ssa <nil> b1: v1 = Arg <mem> v2 = SP <uint64> v4 = Addr <*int> {~r0} v2 v13 = Zero <mem> [8] v4 v1 v14 = Const <int> v15 = Const <int> v17 = Const <int> [3] v19 = Const <int> v21 = Const <int> [10] v24 = Const <int> [6] v28 = Const <int> [1] v43 = Const <int> [1] Plain -> b3 b2: <- b7 Exit v47 b3: <- b1 Plain -> b4 b4: <- b3 b6 v49 = Phi <int> v19 v44 v68 = Phi <int> v19 v67 v81 = Phi <mem> v13 v81 v18 = Less <bool> v49 v17 If v18 -> b5 b7 b5: <- b4 Plain -> b8 b6: <- b12 b11 v67 = Phi <int> v66 v41 v44 = Add <int> v49 v43 Plain -> b4 b7: <- b4 v47 = Store <mem> v4 v68 v81 Plain -> b2 b8: <- b5 b10 v66 = Phi <int> v68 v49 v82 = Phi <int> v19 v29 v22 = Less <bool> v49 v21 If v22 -> b9 b11 b9: <- b8 v25 = Eq <bool> v49 v24 If v25 -> b12 b13 b10: <- b13 v29 = Add <int> v49 v43 Plain -> b8 b11: <- b8 v32 = Add <int> v49 v49 v33 = Add <int> v32 v49 v34 = Add <int> v33 v49 v35 = Add <int> v34 v49 v36 = Add <int> v35 v49 v37 = Add <int> v36 v49 v38 = Add <int> v37 v49 v39 = Add <int> v38 v49 v40 = Add <int> v39 v49 v41 = Add <int> v66 v40 Plain -> b6 b12: <- b9 Plain -> b6 b13: <- b9 Plain -> b10 Change-Id: I16fc4ec527ec63f24f7d0d79d1a4a59bf37269de Reviewed-on: https://go-review.googlesource.com/12444Reviewed-by: Keith Randall <khr@golang.org>
-
Keith Randall authored
Use width-and-signed-specific multiply opcodes. Implement OMUL. A few other cleanups. Fixes #11467 Change-Id: Ib0fe80a1a9b7208dbb8a2b6b652a478847f5d244 Reviewed-on: https://go-review.googlesource.com/12540Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
-
Josh Bleecher Snyder authored
This reduces the wall time to run test/slice3.go on my laptop from >10m to ~20s. This could perhaps be further reduced by using a worklist of blocks and/or implementing the suggestion in the comment in this CL, but at this point, it's fast enough that there is no need. Change-Id: I741119e0c8310051d7185459f78be8b89237b85b Reviewed-on: https://go-review.googlesource.com/12564Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Change-Id: I1af486a69960b9b66d5c2c9bbfcf7db6ef075d8c Reviewed-on: https://go-review.googlesource.com/12563Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Change-Id: Ib33f3b1cfa09f410675d275e214d8ddc246c53c3 Reviewed-on: https://go-review.googlesource.com/12548Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Add label and goto checks and improve test coverage. Implement OSWITCH and OSELECT. Implement OBREAK and OCONTINUE. Allow generation of code in dead blocks. Change-Id: Ibebb7c98b4b2344f46d38db7c9dce058c56beaac Reviewed-on: https://go-review.googlesource.com/12445Reviewed-by: Keith Randall <khr@golang.org>
-
- 22 Jul, 2015 2 commits
-
-
Alexandru Moșoi authored
Handle multiplication with -1, 0, 3, 5, 9 and all powers of two. Change-Id: I8e87e7670dae389aebf6f446d7a56950cacb59e0 Reviewed-on: https://go-review.googlesource.com/12350Reviewed-by: Keith Randall <khr@golang.org>
-
Alexandru Moșoi authored
Change-Id: Ibc645d6cf229ecc18af3549dd3750be9d7451abe Reviewed-on: https://go-review.googlesource.com/12472Reviewed-by: Keith Randall <khr@golang.org>
-
- 21 Jul, 2015 11 commits
-
-
Josh Bleecher Snyder authored
Shorter code, easier to read, no pointless empty slices. Change-Id: Id410364b4f6924b5665188af3373a5e914117c38 Reviewed-on: https://go-review.googlesource.com/12480Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Bad rebase in CL 12439. Change-Id: I7ad359519c6274be37456b655f19bf0ca6ac6692 Reviewed-on: https://go-review.googlesource.com/12449Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
-
Josh Bleecher Snyder authored
Change-Id: I814fd0c2f1a622cca7dfd1b771f81de309a1904c Reviewed-on: https://go-review.googlesource.com/12441Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Change-Id: I8625eff33f5a49dbaaec060c3fa067d7531193c4 Reviewed-on: https://go-review.googlesource.com/12313Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Prior to this fix, a zero-aligned variable such as a flags variable would reset n to 0. While we're here, log the stack layout so that debugging and reading the generated assembly is easier. Change-Id: I18ef83ea95b6ea877c83f2e595e14c48c9ad7d84 Reviewed-on: https://go-review.googlesource.com/12439Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
It is not clear to me what the right implementation is. LoadReg8 and StoreReg8 are introduced during regalloc, so after the amd64 rewrites. But implementing them in genValue seems silly. Change-Id: Ia708209c4604867bddcc0e5d75ecd17cf32f52c3 Reviewed-on: https://go-review.googlesource.com/12437Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Change-Id: I591f2c0465263dcdeef46920aabf1bbb8e7ac5c0 Reviewed-on: https://go-review.googlesource.com/12436Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
This reverts commit 766bcc92. Change-Id: I55413c1aa80d82c856a3ea89b4ffccf80fb58013 Reviewed-on: https://go-review.googlesource.com/12361Reviewed-by: Keith Randall <khr@golang.org>
-
Keith Randall authored
Bake the bit width and signedness into opcodes. Pro: Rewrite rules become easier. Less chance for confusion. Con: Lots more opcodes. Let me know what you think. I'm leaning towards this, but I could be convinced otherwise if people think this is too ugly. Update #11467 Change-Id: Icf1b894268cdf73515877bb123839800d97b9df9 Reviewed-on: https://go-review.googlesource.com/12362Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
-
Josh Bleecher Snyder authored
The verb doesn't do anything, but if/when we move these to the test directory, having it be right will be one fewer thing to remember. Change-Id: Ibf0280d7cc14bf48927e25215de6b91c111983d9 Reviewed-on: https://go-review.googlesource.com/12438Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
This will be used in a subsequent commit. Change-Id: I43eca21f4692d99e164c9f6be0760597c46e6a26 Reviewed-on: https://go-review.googlesource.com/12440Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
- 20 Jul, 2015 2 commits
-
-
Josh Bleecher Snyder authored
Change-Id: I971d0c93632e39aad4e2ba1862f085df820baf8b Reviewed-on: https://go-review.googlesource.com/12431Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
Change-Id: Icbaad6e5cbfc5430a651538fe90c0a9ee664faf4 Reviewed-on: https://go-review.googlesource.com/12360Reviewed-by: Keith Randall <khr@golang.org>
-
- 17 Jul, 2015 1 commit
-
-
Keith Randall authored
*64 is <<6, not <<5. Change-Id: I2eb7e113d5003b2c77fbd3abc3defc4d98976a5e Reviewed-on: https://go-review.googlesource.com/12323Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
- 16 Jul, 2015 4 commits
-
-
Keith Randall authored
Keep track of the outargs size needed at each call. Compute the size of the outargs section of the stack frame. It's just the max of the outargs size at all the callsites in the function. Change-Id: I3d0640f654f01307633b1a5f75bab16e211ea6c0 Reviewed-on: https://go-review.googlesource.com/12178Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
-
Josh Bleecher Snyder authored
Change-Id: Ia56ee9798eefe123d4da04138a6a559d2c25ddf3 Reviewed-on: https://go-review.googlesource.com/12312Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
If we've already hit an Unimplemented, there may be important SSA invariants that do not hold and which could cause ssa.Compile to hang or spin. While we're here, make detected dependency cycles stop execution. Change-Id: Ic7d4eea659e1fe3f2c9b3e8a4eee5567494f46ad Reviewed-on: https://go-review.googlesource.com/12310Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Keith Randall authored
Implement ODOT. Similar to ArrayIndex, StructSelect selects a field out of a larger Value. We may need more ways to rewrite StructSelect, but StructSelect/Load is the typical way it is used. Change-Id: Ida7b8aab3298f4754eaf9fee733974cf8736e45d Reviewed-on: https://go-review.googlesource.com/12265Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
- 15 Jul, 2015 2 commits
-
-
Todd Neal authored
Implements the simple Lengauer-Tarjan algorithm for dominator and post-dominator calculation. benchmark old ns/op new ns/op delta BenchmarkDominatorsLinear-8 1403862 1292741 -7.92% BenchmarkDominatorsFwdBack-8 1270633 1428285 +12.41% BenchmarkDominatorsManyPred-8 225932354 1530886 -99.32% BenchmarkDominatorsMaxPred-8 445994225 1393612 -99.69% BenchmarkDominatorsMaxPredVal-8 447235248 1246899 -99.72% BenchmarkNilCheckDeep1-8 829 1259 +51.87% BenchmarkNilCheckDeep10-8 2199 2397 +9.00% BenchmarkNilCheckDeep100-8 57325 29405 -48.70% BenchmarkNilCheckDeep1000-8 6625837 2933151 -55.73% BenchmarkNilCheckDeep10000-8 763559787 319105541 -58.21% benchmark old MB/s new MB/s speedup BenchmarkDominatorsLinear-8 7.12 7.74 1.09x BenchmarkDominatorsFwdBack-8 7.87 7.00 0.89x BenchmarkDominatorsManyPred-8 0.04 6.53 163.25x BenchmarkDominatorsMaxPred-8 0.02 7.18 359.00x BenchmarkDominatorsMaxPredVal-8 0.02 8.02 401.00x BenchmarkNilCheckDeep1-8 1.21 0.79 0.65x BenchmarkNilCheckDeep10-8 4.55 4.17 0.92x BenchmarkNilCheckDeep100-8 1.74 3.40 1.95x BenchmarkNilCheckDeep1000-8 0.15 0.34 2.27x BenchmarkNilCheckDeep10000-8 0.01 0.03 3.00x Change-Id: Icec3d774422a9bc64914779804c8c0ab73aa72bf Reviewed-on: https://go-review.googlesource.com/11971Reviewed-by: Keith Randall <khr@golang.org>
-
Todd Neal authored
Change-Id: I15aee8095e6388822e2222f1995fe2278ac956ca Reviewed-on: https://go-review.googlesource.com/12129Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
-
- 14 Jul, 2015 2 commits
-
-
Keith Randall authored
Phi ops should always be scheduled first. They have the semantics of all happening simultaneously at the start of the block. The regalloc phase assumes all the phis will appear first. Change-Id: I30291e1fa384a0819205218f1d1ec3aef6d538dd Reviewed-on: https://go-review.googlesource.com/12154Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
-
Brad Fitzpatrick authored
Change-Id: I02b8fb277b486eaf0916ddcd8f28c062d4022d4b Reviewed-on: https://go-review.googlesource.com/12150Reviewed-by: Keith Randall <khr@golang.org>
-
- 13 Jul, 2015 1 commit
-
-
Keith Randall authored
Change-Id: If8a9d5901fa2141d16b1c8d001761ea62bc23207 Reviewed-on: https://go-review.googlesource.com/12141Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-