1. 28 Jul, 2015 3 commits
    • Keith Randall's avatar
      [dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled · 2a5e6c47
      Keith Randall authored
      For integer types less than a machine register, we have to decide
      what the invariants are for the high bits of the register.  We used
      to set the high bits to the correct extension (sign or zero, as
      determined by the type) of the low bits.
      
      This CL makes the compiler ignore the high bits of the register
      altogether (they are junk).
      
      On this plus side, this means ops that generate subword results don't
      have to worry about correctly extending them.  On the minus side,
      ops that consume subword arguments have to deal with the input
      registers not being correctly extended.
      
      For x86, this tradeoff is probably worth it.  Almost all opcodes
      have versions that use only the correct subword piece of their
      inputs.  (The one big exception is array indexing.)  Not many opcodes
      can correctly sign extend on output.
      
      For other architectures, the tradeoff is probably not so clear, as
      they don't have many subword-safe opcodes (e.g. 16-bit compare,
      ignoring the high 16/48 bits).  Fortunately we can decide whether
      we do this per-architecture.
      
      For the machine-independent opcodes, we pretend that the "register"
      size is equal to the type width, so sign extension is immaterial.
      Opcodes that care about the signedness of the input (e.g. compare,
      right shift) have two different variants.
      
      Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d
      Reviewed-on: https://go-review.googlesource.com/12600Reviewed-by: 's avatarJosh Bleecher Snyder <josharian@gmail.com>
      2a5e6c47
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: implement non-numeric comparisons · 9ca24fcd
      Josh Bleecher Snyder authored
      The only slice/interface comparisons that reach
      the backend are comparisons to nil.
      
      Funcs, maps, and channels are references types,
      so pointer equality is enough.
      
      Change-Id: I60a71da46a36202e9bd62ed370ab7d7f2e2800e7
      Reviewed-on: https://go-review.googlesource.com/12715Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      9ca24fcd
    • Alexandru Moșoi's avatar
      [dev.ssa] cmd/compile/internal/ssa/gen: implement OAND. · edff881c
      Alexandru Moșoi authored
      Before this patch there was only partial support for ANDQconst
      which was not lowered. This patch added support for AND operations
      for all bit sizes and signs.
      
      Change-Id: I3a6b2cddfac5361b27e85fcd97f7f3537ebfbcb6
      Reviewed-on: https://go-review.googlesource.com/12761Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      edff881c
  2. 27 Jul, 2015 7 commits
  3. 26 Jul, 2015 1 commit
  4. 24 Jul, 2015 5 commits
  5. 23 Jul, 2015 11 commits
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: don't alloc new CSE classes · 851ceebc
      Josh Bleecher Snyder authored
      This reduces the time to compile
      test/slice3.go on my laptop from ~12s to ~3.8s.
      It reduces the max memory use from ~4.8gb to
      ~450mb.
      
      This is still considerably worse than tip,
      at 1s and 300mb respectively, but it's
      getting closer.
      
      Hopefully this will fix the build at long last.
      
      Change-Id: Iac26b52023f408438cba3ea1b81dcd82ca402b90
      Reviewed-on: https://go-review.googlesource.com/12566Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      851ceebc
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: use v.Args[x].Op in CSE key · 317226e6
      Josh Bleecher Snyder authored
      Experimentally, the Ops of v.Args do a good job
      of differentiating values that will end up in
      different partitions.
      
      Most values have at most two args, so use them.
      
      This reduces the wall time to run test/slice3.go
      on my laptop from ~20s to ~12s.
      
      Credit to Todd Neal for the idea.
      
      Change-Id: I55d08f09eb678bbe8366924ca2fabcd32526bf41
      Reviewed-on: https://go-review.googlesource.com/12565Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      317226e6
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: make etypes readable · 5254be3a
      Josh Bleecher Snyder authored
      Change-Id: Id89ea3b458597dd93d269b9fe5475e9cccc6d992
      Reviewed-on: https://go-review.googlesource.com/12562Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      5254be3a
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: add GOSSAFUNC and GOSSAPKG · d298209b
      Josh Bleecher Snyder authored
      These temporary environment variables make it
      possible to enable using SSA-generated code
      for a particular function or package without
      having to rebuild the compiler.
      
      This makes it possible to start bulk testing
      SSA generated code.
      
      First, bump up the default stack size
      (_StackMin in runtime/stack2.go) to something
      large like 32768, because without stackmaps
      we can't grow stacks.
      
      Then run something like:
      
      for pkg in `go list std`
      do
        GOGC=off GOSSAPKG=`basename $pkg` go test -a $pkg
      done
      
      When a test fails, you can re-run those tests,
      selectively enabling one function after another,
      until you find the one that is causing trouble.
      
      Doing this right now yields some interesting results:
      
      * There are several packages for which we generate
        some code and whose tests pass. Yay!
      
      * We can generate code for encoding/base64, but
        tests there fail, so there's a bug to fix.
      
      * Attempting to build the runtime yields a panic during codegen:
        panic: interface conversion: ssa.Location is nil, not *ssa.LocalSlot
      
      * The top unimplemented codegen items are (simplified):
        59 genValue not implemented: REPMOVSB
        18 genValue not implemented: REPSTOSQ
        14 genValue not implemented: SUBQ
         9 branch not implemented: If v -> b b. Control: XORQconst <bool> [1]
         8 genValue not implemented: MOVQstoreidx8
         4 branch not implemented: If v -> b b. Control: SETG <bool>
         3 branch not implemented: If v -> b b. Control: SETLE <bool>
         2 load flags not implemented: LoadReg8 <flags>
         2 genValue not implemented: InvertFlags <flags>
         1 store flags not implemented: StoreReg8 <flags>
         1 branch not implemented: If v -> b b. Control: SETGE <bool>
      
      Change-Id: Ib64809ac0c917e25bcae27829ae634c70d290c7f
      Reviewed-on: https://go-review.googlesource.com/12547Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      d298209b
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: speed up cse · 8c954d57
      Josh Bleecher Snyder authored
      By walking only the current set of partitions
      at any given point, the cse pass ended up doing
      lots of extraneous, effectively O(n^2) work.
      
      Using a regular for loop allows each cse pass to
      make as much progress as possible by processing
      each new class as it is introduced.
      
      This can and should be optimized further,
      but it already reduces by 75% cse time on test/slice3.go.
      
      The overall time to compile test/slice3.go is still
      dominated by the O(n^2) work in the liveness pass.
      However, Keith is rewriting regalloc anyway.
      
      Change-Id: I8be020b2f69352234587eeadeba923481bf43fcc
      Reviewed-on: https://go-review.googlesource.com/12244Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      8c954d57
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: don't combine phi vars from different blocks in CSE · 00437ebe
      Josh Bleecher Snyder authored
      Here is a concrete case in which this goes wrong.
      
      func f_ssa() int {
      	var n int
      Next:
      	for j := 0; j < 3; j++ {
      		for i := 0; i < 10; i++ {
      			if i == 6 {
      				continue Next
      			}
      			n = i
      		}
      		n += j + j + j + j + j + j + j + j + j + j // j * 10
      	}
      	return n
      }
      
      What follows is the function printout before and after CSE.
      
      Note blocks b8 and b10 in the before case.
      
      b8 is the inner loop's condition: i < 10.
      b10 is the inner loop's increment: i++.
      v82 is i. On entry to b8, it is either 0 (v19) the first time,
      or the result of incrementing v82, by way of v29.
      
      The CSE pass considered v82 and v49 to be common subexpressions,
      and eliminated v82 in favor of v49.
      
      In the after case, v82 is now dead and will shortly be eliminated.
      As a result, v29 is also dead, and we have lost the increment.
      The loop runs forever.
      
      BEFORE CSE
      
      f_ssa <nil>
        b1:
          v1 = Arg <mem>
          v2 = SP <uint64>
          v4 = Addr <*int> {~r0} v2
          v13 = Zero <mem> [8] v4 v1
          v14 = Const <int>
          v15 = Const <int>
          v17 = Const <int> [3]
          v19 = Const <int>
          v21 = Const <int> [10]
          v24 = Const <int> [6]
          v28 = Const <int> [1]
          v43 = Const <int> [1]
          Plain -> b3
        b2: <- b7
          Exit v47
        b3: <- b1
          Plain -> b4
        b4: <- b3 b6
          v49 = Phi <int> v15 v44
          v68 = Phi <int> v14 v67
          v81 = Phi <mem> v13 v81
          v18 = Less <bool> v49 v17
          If v18 -> b5 b7
        b5: <- b4
          Plain -> b8
        b6: <- b12 b11
          v67 = Phi <int> v66 v41
          v44 = Add <int> v49 v43
          Plain -> b4
        b7: <- b4
          v47 = Store <mem> v4 v68 v81
          Plain -> b2
        b8: <- b5 b10
          v66 = Phi <int> v68 v82
          v82 = Phi <int> v19 v29
          v22 = Less <bool> v82 v21
          If v22 -> b9 b11
        b9: <- b8
          v25 = Eq <bool> v82 v24
          If v25 -> b12 b13
        b10: <- b13
          v29 = Add <int> v82 v28
          Plain -> b8
        b11: <- b8
          v32 = Add <int> v49 v49
          v33 = Add <int> v32 v49
          v34 = Add <int> v33 v49
          v35 = Add <int> v34 v49
          v36 = Add <int> v35 v49
          v37 = Add <int> v36 v49
          v38 = Add <int> v37 v49
          v39 = Add <int> v38 v49
          v40 = Add <int> v39 v49
          v41 = Add <int> v66 v40
          Plain -> b6
        b12: <- b9
          Plain -> b6
        b13: <- b9
          Plain -> b10
      
      AFTER CSE
      
      f_ssa <nil>
        b1:
          v1 = Arg <mem>
          v2 = SP <uint64>
          v4 = Addr <*int> {~r0} v2
          v13 = Zero <mem> [8] v4 v1
          v14 = Const <int>
          v15 = Const <int>
          v17 = Const <int> [3]
          v19 = Const <int>
          v21 = Const <int> [10]
          v24 = Const <int> [6]
          v28 = Const <int> [1]
          v43 = Const <int> [1]
          Plain -> b3
        b2: <- b7
          Exit v47
        b3: <- b1
          Plain -> b4
        b4: <- b3 b6
          v49 = Phi <int> v19 v44
          v68 = Phi <int> v19 v67
          v81 = Phi <mem> v13 v81
          v18 = Less <bool> v49 v17
          If v18 -> b5 b7
        b5: <- b4
          Plain -> b8
        b6: <- b12 b11
          v67 = Phi <int> v66 v41
          v44 = Add <int> v49 v43
          Plain -> b4
        b7: <- b4
          v47 = Store <mem> v4 v68 v81
          Plain -> b2
        b8: <- b5 b10
          v66 = Phi <int> v68 v49
          v82 = Phi <int> v19 v29
          v22 = Less <bool> v49 v21
          If v22 -> b9 b11
        b9: <- b8
          v25 = Eq <bool> v49 v24
          If v25 -> b12 b13
        b10: <- b13
          v29 = Add <int> v49 v43
          Plain -> b8
        b11: <- b8
          v32 = Add <int> v49 v49
          v33 = Add <int> v32 v49
          v34 = Add <int> v33 v49
          v35 = Add <int> v34 v49
          v36 = Add <int> v35 v49
          v37 = Add <int> v36 v49
          v38 = Add <int> v37 v49
          v39 = Add <int> v38 v49
          v40 = Add <int> v39 v49
          v41 = Add <int> v66 v40
          Plain -> b6
        b12: <- b9
          Plain -> b6
        b13: <- b9
          Plain -> b10
      
      Change-Id: I16fc4ec527ec63f24f7d0d79d1a4a59bf37269de
      Reviewed-on: https://go-review.googlesource.com/12444Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      00437ebe
    • Keith Randall's avatar
      [dev.ssa] cmd/compile/internal/ssa: implement multiplies · be1eb57a
      Keith Randall authored
      Use width-and-signed-specific multiply opcodes.
      Implement OMUL.
      A few other cleanups.
      
      Fixes #11467
      
      Change-Id: Ib0fe80a1a9b7208dbb8a2b6b652a478847f5d244
      Reviewed-on: https://go-review.googlesource.com/12540Reviewed-by: 's avatarJosh Bleecher Snyder <josharian@gmail.com>
      be1eb57a
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: speed up liveness analysis · d5297f72
      Josh Bleecher Snyder authored
      This reduces the wall time to run test/slice3.go
      on my laptop from >10m to ~20s.
      
      This could perhaps be further reduced by using
      a worklist of blocks and/or implementing the
      suggestion in the comment in this CL, but at this
      point, it's fast enough that there is no need.
      
      Change-Id: I741119e0c8310051d7185459f78be8b89237b85b
      Reviewed-on: https://go-review.googlesource.com/12564Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      d5297f72
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: add some common binary ops · e61e7c96
      Josh Bleecher Snyder authored
      Change-Id: I1af486a69960b9b66d5c2c9bbfcf7db6ef075d8c
      Reviewed-on: https://go-review.googlesource.com/12563Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      e61e7c96
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: minor cleanup · e0ac5c53
      Josh Bleecher Snyder authored
      Change-Id: Ib33f3b1cfa09f410675d275e214d8ddc246c53c3
      Reviewed-on: https://go-review.googlesource.com/12548Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      e0ac5c53
    • Josh Bleecher Snyder's avatar
      [dev.ssa] cmd/compile: implement control flow handling · 61aa0953
      Josh Bleecher Snyder authored
      Add label and goto checks and improve test coverage.
      
      Implement OSWITCH and OSELECT.
      
      Implement OBREAK and OCONTINUE.
      
      Allow generation of code in dead blocks.
      
      Change-Id: Ibebb7c98b4b2344f46d38db7c9dce058c56beaac
      Reviewed-on: https://go-review.googlesource.com/12445Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      61aa0953
  6. 22 Jul, 2015 2 commits
  7. 21 Jul, 2015 11 commits