1. 24 Jul, 2015 4 commits
  2. 23 Jul, 2015 11 commits
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: don't alloc new CSE classes · 851ceebc
      Josh Bleecher Snyder authored
      This reduces the time to compile
      test/slice3.go on my laptop from ~12s to ~3.8s.
      It reduces the max memory use from ~4.8gb to
      ~450mb.
      
      This is still considerably worse than tip,
      at 1s and 300mb respectively, but it's
      getting closer.
      
      Hopefully this will fix the build at long last.
      
      Change-Id: Iac26b52023f408438cba3ea1b81dcd82ca402b90
      Reviewed-on: https://go-review.googlesource.com/12566Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      851ceebc
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: use v.Args[x].Op in CSE key · 317226e6
      Josh Bleecher Snyder authored
      Experimentally, the Ops of v.Args do a good job
      of differentiating values that will end up in
      different partitions.
      
      Most values have at most two args, so use them.
      
      This reduces the wall time to run test/slice3.go
      on my laptop from ~20s to ~12s.
      
      Credit to Todd Neal for the idea.
      
      Change-Id: I55d08f09eb678bbe8366924ca2fabcd32526bf41
      Reviewed-on: https://go-review.googlesource.com/12565Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      317226e6
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: make etypes readable · 5254be3a
      Josh Bleecher Snyder authored
      Change-Id: Id89ea3b458597dd93d269b9fe5475e9cccc6d992
      Reviewed-on: https://go-review.googlesource.com/12562Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      5254be3a
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: add GOSSAFUNC and GOSSAPKG · d298209b
      Josh Bleecher Snyder authored
      These temporary environment variables make it
      possible to enable using SSA-generated code
      for a particular function or package without
      having to rebuild the compiler.
      
      This makes it possible to start bulk testing
      SSA generated code.
      
      First, bump up the default stack size
      (_StackMin in runtime/stack2.go) to something
      large like 32768, because without stackmaps
      we can't grow stacks.
      
      Then run something like:
      
      for pkg in `go list std`
      do
        GOGC=off GOSSAPKG=`basename $pkg` go test -a $pkg
      done
      
      When a test fails, you can re-run those tests,
      selectively enabling one function after another,
      until you find the one that is causing trouble.
      
      Doing this right now yields some interesting results:
      
      * There are several packages for which we generate
        some code and whose tests pass. Yay!
      
      * We can generate code for encoding/base64, but
        tests there fail, so there's a bug to fix.
      
      * Attempting to build the runtime yields a panic during codegen:
        panic: interface conversion: ssa.Location is nil, not *ssa.LocalSlot
      
      * The top unimplemented codegen items are (simplified):
        59 genValue not implemented: REPMOVSB
        18 genValue not implemented: REPSTOSQ
        14 genValue not implemented: SUBQ
         9 branch not implemented: If v -> b b. Control: XORQconst <bool> [1]
         8 genValue not implemented: MOVQstoreidx8
         4 branch not implemented: If v -> b b. Control: SETG <bool>
         3 branch not implemented: If v -> b b. Control: SETLE <bool>
         2 load flags not implemented: LoadReg8 <flags>
         2 genValue not implemented: InvertFlags <flags>
         1 store flags not implemented: StoreReg8 <flags>
         1 branch not implemented: If v -> b b. Control: SETGE <bool>
      
      Change-Id: Ib64809ac0c917e25bcae27829ae634c70d290c7f
      Reviewed-on: https://go-review.googlesource.com/12547Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      d298209b
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: speed up cse · 8c954d57
      Josh Bleecher Snyder authored
      By walking only the current set of partitions
      at any given point, the cse pass ended up doing
      lots of extraneous, effectively O(n^2) work.
      
      Using a regular for loop allows each cse pass to
      make as much progress as possible by processing
      each new class as it is introduced.
      
      This can and should be optimized further,
      but it already reduces by 75% cse time on test/slice3.go.
      
      The overall time to compile test/slice3.go is still
      dominated by the O(n^2) work in the liveness pass.
      However, Keith is rewriting regalloc anyway.
      
      Change-Id: I8be020b2f69352234587eeadeba923481bf43fcc
      Reviewed-on: https://go-review.googlesource.com/12244Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      8c954d57
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: don't combine phi vars from different blocks in CSE · 00437ebe
      Josh Bleecher Snyder authored
      Here is a concrete case in which this goes wrong.
      
      func f_ssa() int {
      	var n int
      Next:
      	for j := 0; j < 3; j++ {
      		for i := 0; i < 10; i++ {
      			if i == 6 {
      				continue Next
      			}
      			n = i
      		}
      		n += j + j + j + j + j + j + j + j + j + j // j * 10
      	}
      	return n
      }
      
      What follows is the function printout before and after CSE.
      
      Note blocks b8 and b10 in the before case.
      
      b8 is the inner loop's condition: i < 10.
      b10 is the inner loop's increment: i++.
      v82 is i. On entry to b8, it is either 0 (v19) the first time,
      or the result of incrementing v82, by way of v29.
      
      The CSE pass considered v82 and v49 to be common subexpressions,
      and eliminated v82 in favor of v49.
      
      In the after case, v82 is now dead and will shortly be eliminated.
      As a result, v29 is also dead, and we have lost the increment.
      The loop runs forever.
      
      BEFORE CSE
      
      f_ssa <nil>
        b1:
          v1 = Arg <mem>
          v2 = SP <uint64>
          v4 = Addr <*int> {~r0} v2
          v13 = Zero <mem> [8] v4 v1
          v14 = Const <int>
          v15 = Const <int>
          v17 = Const <int> [3]
          v19 = Const <int>
          v21 = Const <int> [10]
          v24 = Const <int> [6]
          v28 = Const <int> [1]
          v43 = Const <int> [1]
          Plain -> b3
        b2: <- b7
          Exit v47
        b3: <- b1
          Plain -> b4
        b4: <- b3 b6
          v49 = Phi <int> v15 v44
          v68 = Phi <int> v14 v67
          v81 = Phi <mem> v13 v81
          v18 = Less <bool> v49 v17
          If v18 -> b5 b7
        b5: <- b4
          Plain -> b8
        b6: <- b12 b11
          v67 = Phi <int> v66 v41
          v44 = Add <int> v49 v43
          Plain -> b4
        b7: <- b4
          v47 = Store <mem> v4 v68 v81
          Plain -> b2
        b8: <- b5 b10
          v66 = Phi <int> v68 v82
          v82 = Phi <int> v19 v29
          v22 = Less <bool> v82 v21
          If v22 -> b9 b11
        b9: <- b8
          v25 = Eq <bool> v82 v24
          If v25 -> b12 b13
        b10: <- b13
          v29 = Add <int> v82 v28
          Plain -> b8
        b11: <- b8
          v32 = Add <int> v49 v49
          v33 = Add <int> v32 v49
          v34 = Add <int> v33 v49
          v35 = Add <int> v34 v49
          v36 = Add <int> v35 v49
          v37 = Add <int> v36 v49
          v38 = Add <int> v37 v49
          v39 = Add <int> v38 v49
          v40 = Add <int> v39 v49
          v41 = Add <int> v66 v40
          Plain -> b6
        b12: <- b9
          Plain -> b6
        b13: <- b9
          Plain -> b10
      
      AFTER CSE
      
      f_ssa <nil>
        b1:
          v1 = Arg <mem>
          v2 = SP <uint64>
          v4 = Addr <*int> {~r0} v2
          v13 = Zero <mem> [8] v4 v1
          v14 = Const <int>
          v15 = Const <int>
          v17 = Const <int> [3]
          v19 = Const <int>
          v21 = Const <int> [10]
          v24 = Const <int> [6]
          v28 = Const <int> [1]
          v43 = Const <int> [1]
          Plain -> b3
        b2: <- b7
          Exit v47
        b3: <- b1
          Plain -> b4
        b4: <- b3 b6
          v49 = Phi <int> v19 v44
          v68 = Phi <int> v19 v67
          v81 = Phi <mem> v13 v81
          v18 = Less <bool> v49 v17
          If v18 -> b5 b7
        b5: <- b4
          Plain -> b8
        b6: <- b12 b11
          v67 = Phi <int> v66 v41
          v44 = Add <int> v49 v43
          Plain -> b4
        b7: <- b4
          v47 = Store <mem> v4 v68 v81
          Plain -> b2
        b8: <- b5 b10
          v66 = Phi <int> v68 v49
          v82 = Phi <int> v19 v29
          v22 = Less <bool> v49 v21
          If v22 -> b9 b11
        b9: <- b8
          v25 = Eq <bool> v49 v24
          If v25 -> b12 b13
        b10: <- b13
          v29 = Add <int> v49 v43
          Plain -> b8
        b11: <- b8
          v32 = Add <int> v49 v49
          v33 = Add <int> v32 v49
          v34 = Add <int> v33 v49
          v35 = Add <int> v34 v49
          v36 = Add <int> v35 v49
          v37 = Add <int> v36 v49
          v38 = Add <int> v37 v49
          v39 = Add <int> v38 v49
          v40 = Add <int> v39 v49
          v41 = Add <int> v66 v40
          Plain -> b6
        b12: <- b9
          Plain -> b6
        b13: <- b9
          Plain -> b10
      
      Change-Id: I16fc4ec527ec63f24f7d0d79d1a4a59bf37269de
      Reviewed-on: https://go-review.googlesource.com/12444Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      00437ebe
    • Keith Randall's avatar
      [dev.ssa] cmd/compile/internal/ssa: implement multiplies · be1eb57a
      Keith Randall authored
      Use width-and-signed-specific multiply opcodes.
      Implement OMUL.
      A few other cleanups.
      
      Fixes #11467
      
      Change-Id: Ib0fe80a1a9b7208dbb8a2b6b652a478847f5d244
      Reviewed-on: https://go-review.googlesource.com/12540Reviewed-by: 's avatarJosh Bleecher Snyder <josharian@gmail.com>
      be1eb57a
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: speed up liveness analysis · d5297f72
      Josh Bleecher Snyder authored
      This reduces the wall time to run test/slice3.go
      on my laptop from >10m to ~20s.
      
      This could perhaps be further reduced by using
      a worklist of blocks and/or implementing the
      suggestion in the comment in this CL, but at this
      point, it's fast enough that there is no need.
      
      Change-Id: I741119e0c8310051d7185459f78be8b89237b85b
      Reviewed-on: https://go-review.googlesource.com/12564Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      d5297f72
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: add some common binary ops · e61e7c96
      Josh Bleecher Snyder authored
      Change-Id: I1af486a69960b9b66d5c2c9bbfcf7db6ef075d8c
      Reviewed-on: https://go-review.googlesource.com/12563Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      e61e7c96
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: minor cleanup · e0ac5c53
      Josh Bleecher Snyder authored
      Change-Id: Ib33f3b1cfa09f410675d275e214d8ddc246c53c3
      Reviewed-on: https://go-review.googlesource.com/12548Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      e0ac5c53
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: implement control flow handling · 61aa0953
      Josh Bleecher Snyder authored
      Add label and goto checks and improve test coverage.
      
      Implement OSWITCH and OSELECT.
      
      Implement OBREAK and OCONTINUE.
      
      Allow generation of code in dead blocks.
      
      Change-Id: Ibebb7c98b4b2344f46d38db7c9dce058c56beaac
      Reviewed-on: https://go-review.googlesource.com/12445Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      61aa0953
  3. 22 Jul, 2015 2 commits
  4. 21 Jul, 2015 11 commits
  5. 20 Jul, 2015 2 commits
  6. 17 Jul, 2015 1 commit
  7. 16 Jul, 2015 4 commits
  8. 15 Jul, 2015 2 commits
    • Todd Neal's avatar
      [dev.ssa] cmd/compile/internal : Implement Lengauer-Tarjan for dominators · 078ba138
      Todd Neal authored
      Implements the simple Lengauer-Tarjan algorithm for dominator
      and post-dominator calculation.
      
      benchmark                           old ns/op     new ns/op     delta
      BenchmarkDominatorsLinear-8         1403862       1292741       -7.92%
      BenchmarkDominatorsFwdBack-8        1270633       1428285       +12.41%
      BenchmarkDominatorsManyPred-8       225932354     1530886       -99.32%
      BenchmarkDominatorsMaxPred-8        445994225     1393612       -99.69%
      BenchmarkDominatorsMaxPredVal-8     447235248     1246899       -99.72%
      BenchmarkNilCheckDeep1-8            829           1259          +51.87%
      BenchmarkNilCheckDeep10-8           2199          2397          +9.00%
      BenchmarkNilCheckDeep100-8          57325         29405         -48.70%
      BenchmarkNilCheckDeep1000-8         6625837       2933151       -55.73%
      BenchmarkNilCheckDeep10000-8        763559787     319105541     -58.21%
      
      benchmark                           old MB/s     new MB/s     speedup
      BenchmarkDominatorsLinear-8         7.12         7.74         1.09x
      BenchmarkDominatorsFwdBack-8        7.87         7.00         0.89x
      BenchmarkDominatorsManyPred-8       0.04         6.53         163.25x
      BenchmarkDominatorsMaxPred-8        0.02         7.18         359.00x
      BenchmarkDominatorsMaxPredVal-8     0.02         8.02         401.00x
      BenchmarkNilCheckDeep1-8            1.21         0.79         0.65x
      BenchmarkNilCheckDeep10-8           4.55         4.17         0.92x
      BenchmarkNilCheckDeep100-8          1.74         3.40         1.95x
      BenchmarkNilCheckDeep1000-8         0.15         0.34         2.27x
      BenchmarkNilCheckDeep10000-8        0.01         0.03         3.00x
      
      Change-Id: Icec3d774422a9bc64914779804c8c0ab73aa72bf
      Reviewed-on: https://go-review.googlesource.com/11971Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      078ba138
    • Todd Neal's avatar
      [dev.ssa] cmd/compile: implement OIND · b383de2e
      Todd Neal authored
      Change-Id: I15aee8095e6388822e2222f1995fe2278ac956ca
      Reviewed-on: https://go-review.googlesource.com/12129Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      Reviewed-by: 's avatarJosh Bleecher Snyder <josharian@gmail.com>
      b383de2e
  9. 14 Jul, 2015 2 commits
  10. 13 Jul, 2015 1 commit