1. 23 Jul, 2015 7 commits
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: speed up cse · 8c954d57
      Josh Bleecher Snyder authored
      By walking only the current set of partitions
      at any given point, the cse pass ended up doing
      lots of extraneous, effectively O(n^2) work.
      
      Using a regular for loop allows each cse pass to
      make as much progress as possible by processing
      each new class as it is introduced.
      
      This can and should be optimized further,
      but it already reduces by 75% cse time on test/slice3.go.
      
      The overall time to compile test/slice3.go is still
      dominated by the O(n^2) work in the liveness pass.
      However, Keith is rewriting regalloc anyway.
      
      Change-Id: I8be020b2f69352234587eeadeba923481bf43fcc
      Reviewed-on: https://go-review.googlesource.com/12244Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      8c954d57
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: don't combine phi vars from different blocks in CSE · 00437ebe
      Josh Bleecher Snyder authored
      Here is a concrete case in which this goes wrong.
      
      func f_ssa() int {
      	var n int
      Next:
      	for j := 0; j < 3; j++ {
      		for i := 0; i < 10; i++ {
      			if i == 6 {
      				continue Next
      			}
      			n = i
      		}
      		n += j + j + j + j + j + j + j + j + j + j // j * 10
      	}
      	return n
      }
      
      What follows is the function printout before and after CSE.
      
      Note blocks b8 and b10 in the before case.
      
      b8 is the inner loop's condition: i < 10.
      b10 is the inner loop's increment: i++.
      v82 is i. On entry to b8, it is either 0 (v19) the first time,
      or the result of incrementing v82, by way of v29.
      
      The CSE pass considered v82 and v49 to be common subexpressions,
      and eliminated v82 in favor of v49.
      
      In the after case, v82 is now dead and will shortly be eliminated.
      As a result, v29 is also dead, and we have lost the increment.
      The loop runs forever.
      
      BEFORE CSE
      
      f_ssa <nil>
        b1:
          v1 = Arg <mem>
          v2 = SP <uint64>
          v4 = Addr <*int> {~r0} v2
          v13 = Zero <mem> [8] v4 v1
          v14 = Const <int>
          v15 = Const <int>
          v17 = Const <int> [3]
          v19 = Const <int>
          v21 = Const <int> [10]
          v24 = Const <int> [6]
          v28 = Const <int> [1]
          v43 = Const <int> [1]
          Plain -> b3
        b2: <- b7
          Exit v47
        b3: <- b1
          Plain -> b4
        b4: <- b3 b6
          v49 = Phi <int> v15 v44
          v68 = Phi <int> v14 v67
          v81 = Phi <mem> v13 v81
          v18 = Less <bool> v49 v17
          If v18 -> b5 b7
        b5: <- b4
          Plain -> b8
        b6: <- b12 b11
          v67 = Phi <int> v66 v41
          v44 = Add <int> v49 v43
          Plain -> b4
        b7: <- b4
          v47 = Store <mem> v4 v68 v81
          Plain -> b2
        b8: <- b5 b10
          v66 = Phi <int> v68 v82
          v82 = Phi <int> v19 v29
          v22 = Less <bool> v82 v21
          If v22 -> b9 b11
        b9: <- b8
          v25 = Eq <bool> v82 v24
          If v25 -> b12 b13
        b10: <- b13
          v29 = Add <int> v82 v28
          Plain -> b8
        b11: <- b8
          v32 = Add <int> v49 v49
          v33 = Add <int> v32 v49
          v34 = Add <int> v33 v49
          v35 = Add <int> v34 v49
          v36 = Add <int> v35 v49
          v37 = Add <int> v36 v49
          v38 = Add <int> v37 v49
          v39 = Add <int> v38 v49
          v40 = Add <int> v39 v49
          v41 = Add <int> v66 v40
          Plain -> b6
        b12: <- b9
          Plain -> b6
        b13: <- b9
          Plain -> b10
      
      AFTER CSE
      
      f_ssa <nil>
        b1:
          v1 = Arg <mem>
          v2 = SP <uint64>
          v4 = Addr <*int> {~r0} v2
          v13 = Zero <mem> [8] v4 v1
          v14 = Const <int>
          v15 = Const <int>
          v17 = Const <int> [3]
          v19 = Const <int>
          v21 = Const <int> [10]
          v24 = Const <int> [6]
          v28 = Const <int> [1]
          v43 = Const <int> [1]
          Plain -> b3
        b2: <- b7
          Exit v47
        b3: <- b1
          Plain -> b4
        b4: <- b3 b6
          v49 = Phi <int> v19 v44
          v68 = Phi <int> v19 v67
          v81 = Phi <mem> v13 v81
          v18 = Less <bool> v49 v17
          If v18 -> b5 b7
        b5: <- b4
          Plain -> b8
        b6: <- b12 b11
          v67 = Phi <int> v66 v41
          v44 = Add <int> v49 v43
          Plain -> b4
        b7: <- b4
          v47 = Store <mem> v4 v68 v81
          Plain -> b2
        b8: <- b5 b10
          v66 = Phi <int> v68 v49
          v82 = Phi <int> v19 v29
          v22 = Less <bool> v49 v21
          If v22 -> b9 b11
        b9: <- b8
          v25 = Eq <bool> v49 v24
          If v25 -> b12 b13
        b10: <- b13
          v29 = Add <int> v49 v43
          Plain -> b8
        b11: <- b8
          v32 = Add <int> v49 v49
          v33 = Add <int> v32 v49
          v34 = Add <int> v33 v49
          v35 = Add <int> v34 v49
          v36 = Add <int> v35 v49
          v37 = Add <int> v36 v49
          v38 = Add <int> v37 v49
          v39 = Add <int> v38 v49
          v40 = Add <int> v39 v49
          v41 = Add <int> v66 v40
          Plain -> b6
        b12: <- b9
          Plain -> b6
        b13: <- b9
          Plain -> b10
      
      Change-Id: I16fc4ec527ec63f24f7d0d79d1a4a59bf37269de
      Reviewed-on: https://go-review.googlesource.com/12444Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      00437ebe
    • Keith Randall's avatar
      [dev.ssa] cmd/compile/internal/ssa: implement multiplies · be1eb57a
      Keith Randall authored
      Use width-and-signed-specific multiply opcodes.
      Implement OMUL.
      A few other cleanups.
      
      Fixes #11467
      
      Change-Id: Ib0fe80a1a9b7208dbb8a2b6b652a478847f5d244
      Reviewed-on: https://go-review.googlesource.com/12540Reviewed-by: 's avatarJosh Bleecher Snyder <josharian@gmail.com>
      be1eb57a
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: speed up liveness analysis · d5297f72
      Josh Bleecher Snyder authored
      This reduces the wall time to run test/slice3.go
      on my laptop from >10m to ~20s.
      
      This could perhaps be further reduced by using
      a worklist of blocks and/or implementing the
      suggestion in the comment in this CL, but at this
      point, it's fast enough that there is no need.
      
      Change-Id: I741119e0c8310051d7185459f78be8b89237b85b
      Reviewed-on: https://go-review.googlesource.com/12564Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      d5297f72
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: add some common binary ops · e61e7c96
      Josh Bleecher Snyder authored
      Change-Id: I1af486a69960b9b66d5c2c9bbfcf7db6ef075d8c
      Reviewed-on: https://go-review.googlesource.com/12563Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      e61e7c96
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: minor cleanup · e0ac5c53
      Josh Bleecher Snyder authored
      Change-Id: Ib33f3b1cfa09f410675d275e214d8ddc246c53c3
      Reviewed-on: https://go-review.googlesource.com/12548Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      e0ac5c53
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: implement control flow handling · 61aa0953
      Josh Bleecher Snyder authored
      Add label and goto checks and improve test coverage.
      
      Implement OSWITCH and OSELECT.
      
      Implement OBREAK and OCONTINUE.
      
      Allow generation of code in dead blocks.
      
      Change-Id: Ibebb7c98b4b2344f46d38db7c9dce058c56beaac
      Reviewed-on: https://go-review.googlesource.com/12445Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      61aa0953
  2. 22 Jul, 2015 2 commits
  3. 21 Jul, 2015 11 commits
  4. 20 Jul, 2015 2 commits
  5. 17 Jul, 2015 1 commit
  6. 16 Jul, 2015 4 commits
  7. 15 Jul, 2015 2 commits
    • Todd Neal's avatar
      [dev.ssa] cmd/compile/internal : Implement Lengauer-Tarjan for dominators · 078ba138
      Todd Neal authored
      Implements the simple Lengauer-Tarjan algorithm for dominator
      and post-dominator calculation.
      
      benchmark                           old ns/op     new ns/op     delta
      BenchmarkDominatorsLinear-8         1403862       1292741       -7.92%
      BenchmarkDominatorsFwdBack-8        1270633       1428285       +12.41%
      BenchmarkDominatorsManyPred-8       225932354     1530886       -99.32%
      BenchmarkDominatorsMaxPred-8        445994225     1393612       -99.69%
      BenchmarkDominatorsMaxPredVal-8     447235248     1246899       -99.72%
      BenchmarkNilCheckDeep1-8            829           1259          +51.87%
      BenchmarkNilCheckDeep10-8           2199          2397          +9.00%
      BenchmarkNilCheckDeep100-8          57325         29405         -48.70%
      BenchmarkNilCheckDeep1000-8         6625837       2933151       -55.73%
      BenchmarkNilCheckDeep10000-8        763559787     319105541     -58.21%
      
      benchmark                           old MB/s     new MB/s     speedup
      BenchmarkDominatorsLinear-8         7.12         7.74         1.09x
      BenchmarkDominatorsFwdBack-8        7.87         7.00         0.89x
      BenchmarkDominatorsManyPred-8       0.04         6.53         163.25x
      BenchmarkDominatorsMaxPred-8        0.02         7.18         359.00x
      BenchmarkDominatorsMaxPredVal-8     0.02         8.02         401.00x
      BenchmarkNilCheckDeep1-8            1.21         0.79         0.65x
      BenchmarkNilCheckDeep10-8           4.55         4.17         0.92x
      BenchmarkNilCheckDeep100-8          1.74         3.40         1.95x
      BenchmarkNilCheckDeep1000-8         0.15         0.34         2.27x
      BenchmarkNilCheckDeep10000-8        0.01         0.03         3.00x
      
      Change-Id: Icec3d774422a9bc64914779804c8c0ab73aa72bf
      Reviewed-on: https://go-review.googlesource.com/11971Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      078ba138
    • Todd Neal's avatar
      [dev.ssa] cmd/compile: implement OIND · b383de2e
      Todd Neal authored
      Change-Id: I15aee8095e6388822e2222f1995fe2278ac956ca
      Reviewed-on: https://go-review.googlesource.com/12129Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      Reviewed-by: 's avatarJosh Bleecher Snyder <josharian@gmail.com>
      b383de2e
  8. 14 Jul, 2015 2 commits
  9. 13 Jul, 2015 5 commits
  10. 12 Jul, 2015 4 commits