1. 22 May, 2018 11 commits
    • isharipo's avatar
      cmd/asm: enable AVX512 · 5437cde9
      isharipo authored
      - Uncomment tests for AVX512 encoder
      - Permit instruction suffixes for x86
      - Permit limited reg list [reg-reg] syntax for x86 for multi-source ops
      - EVEX encoding support in obj/x86 (Z-cases, asmevex, etc.)
      - optabs and ytabs generated by x86avxgen (https://golang.org/cl/107216)
      
      Note: suffix formatting implemented with updated CConv function.
      Now arch asm backend should register formatting function by
      calling RegisterOpSuffix.
      
      Updates #22779
      
      Change-Id: I076a167ee49582700e058c56ad74e6696710c8c8
      Reviewed-on: https://go-review.googlesource.com/113315
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      5437cde9
    • Ben Shi's avatar
      test/codegen: improve test cases for arm64 · 8a85bce2
      Ben Shi authored
      1. Some incorrect test cases are disabled.
      2. Some wrong test cases are corrected.
      3. Some new test cases are added.
      
      Change-Id: Ib5d0473d55159f233ddab79f96967eaec7b08597
      Reviewed-on: https://go-review.googlesource.com/113736Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      8a85bce2
    • Austin Clements's avatar
      cmd/compile: output stack map index everywhere it changes · ed7a0682
      Austin Clements authored
      Currently, the code generator only considers outputting stack map
      indexes at CALL instructions. Raise this into the code generator loop
      itself so that changes in the stack map index at any instruction emit
      a PCDATA Prog before the actual instruction.
      
      We'll optimize this in later CLs:
      
      name        old time/op       new time/op       delta
      Template          190ms ± 2%        191ms ± 2%    ~     (p=0.529 n=10+10)
      Unicode          96.4ms ± 1%       98.5ms ± 3%  +2.18%  (p=0.001 n=9+10)
      GoTypes           669ms ± 1%        673ms ± 1%  +0.62%  (p=0.004 n=9+9)
      Compiler          3.18s ± 1%        3.22s ± 1%  +1.06%  (p=0.000 n=10+9)
      SSA               7.59s ± 1%        7.64s ± 1%  +0.66%  (p=0.023 n=10+10)
      Flate             128ms ± 1%        130ms ± 2%  +1.07%  (p=0.043 n=10+10)
      GoParser          157ms ± 2%        158ms ± 3%    ~     (p=0.123 n=10+10)
      Reflect           442ms ± 1%        445ms ± 1%  +0.73%  (p=0.017 n=10+9)
      Tar               179ms ± 1%        180ms ± 1%  +0.58%  (p=0.019 n=9+9)
      XML               229ms ± 1%        232ms ± 2%  +1.27%  (p=0.009 n=10+10)
      [Geo mean]        401ms             405ms       +0.94%
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize         1.46M ± 0%        1.47M ± 0%  +0.84%  (p=0.000 n=10+10)
      [Geo mean]        1.46M             1.47M       +0.84%
      
      For #24543.
      
      Change-Id: I4bfe45b767c9d9db47308a27763b303fa75bfa54
      Reviewed-on: https://go-review.googlesource.com/109350
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      ed7a0682
    • Austin Clements's avatar
      cmd/compile: enable stack maps everywhere except unsafe points · a367f44c
      Austin Clements authored
      This modifies issafepoint in liveness analysis to report almost every
      operation as a safe point. There are four things we don't mark as
      safe-points:
      
      1. Runtime code (other than at calls).
      
      2. go:nosplit functions (other than at calls).
      
      3. Instructions between the load of the write barrier-enabled flag and
         the write.
      
      4. Instructions leading up to a uintptr -> unsafe.Pointer conversion.
      
      We'll optimize this in later CLs:
      
      name        old time/op       new time/op       delta
      Template          185ms ± 2%        190ms ± 2%   +2.95%  (p=0.000 n=10+10)
      Unicode          96.3ms ± 3%       96.4ms ± 1%     ~     (p=0.905 n=10+9)
      GoTypes           658ms ± 0%        669ms ± 1%   +1.72%  (p=0.000 n=10+9)
      Compiler          3.14s ± 1%        3.18s ± 1%   +1.56%  (p=0.000 n=9+10)
      SSA               7.41s ± 2%        7.59s ± 1%   +2.48%  (p=0.000 n=9+10)
      Flate             126ms ± 1%        128ms ± 1%   +2.08%  (p=0.000 n=10+10)
      GoParser          153ms ± 1%        157ms ± 2%   +2.38%  (p=0.000 n=10+10)
      Reflect           437ms ± 1%        442ms ± 1%   +0.98%  (p=0.001 n=10+10)
      Tar               178ms ± 1%        179ms ± 1%   +0.67%  (p=0.035 n=10+9)
      XML               223ms ± 1%        229ms ± 1%   +2.58%  (p=0.000 n=10+10)
      [Geo mean]        394ms             401ms        +1.75%
      
      No effect on binary size because we're not yet emitting these extra
      safe points.
      
      For #24543.
      
      Change-Id: I16a1eebb9183cad7cef9d53c0fd21a973cad6859
      Reviewed-on: https://go-review.googlesource.com/109348
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      a367f44c
    • Austin Clements's avatar
      cmd/compile: introduce LivenessMap and LivenessIndex · d7d9df8a
      Austin Clements authored
      Currently liveness only produces a stack map index at each safe point,
      so the information is summarized in a map[*ssa.Value]int. We're about
      to have both a stack map index and a register map index, so replace
      the int with a LivenessIndex type we can extend, and replace the map
      with a LivenessMap that we can also change more easily in the future.
      
      This also gives us an easy hook for defining the value that means "not
      a safe point".
      
      Passes toolstash -cmp.
      
      For #24543.
      
      Change-Id: Ic4c069839635efed4fd0f603899b80f8be3b56ec
      Reviewed-on: https://go-review.googlesource.com/109347
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      d7d9df8a
    • Austin Clements's avatar
      cmd/internal/obj: consolidate emitting entry stack map · 02495da6
      Austin Clements authored
      The obj package needs to emit the PCDATA to select the entry stack map
      before calling morestack. Currently this is copied for every
      architecture. Since we're about to change how this works, consolidate
      all of these copies into a single helper function.
      
      For #24543.
      
      Change-Id: Ia92d94de78f8e23fd06dba747c43e03e5989f67b
      Reviewed-on: https://go-review.googlesource.com/109346
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      02495da6
    • Austin Clements's avatar
      cmd/compile: don't produce a past-the-end pointer in range loops · 837ed98d
      Austin Clements authored
      Currently, range loops over slices and arrays are compiled roughly
      like:
      
      for i, x := range s { b }
        ⇓
      for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b }
        ⇓
      i, _n, _p := 0, len(s), &s[0]
      goto cond
      body:
      { b }
      i, _p = i+1, _p + unsafe.Sizeof(s[0])
      cond:
      if i < _n { goto body } else { goto end }
      end:
      
      The problem with this lowering is that _p may temporarily point past
      the end of the allocation the moment before the loop terminates. Right
      now this isn't a problem because there's never a safe-point during
      this brief moment.
      
      We're about to introduce safe-points everywhere, so this bad pointer
      is going to be a problem. We could mark the increment as an unsafe
      block, but this inhibits reordering opportunities and could result in
      infrequent safe-points if the body is short.
      
      Instead, this CL fixes this by changing how we compile range loops to
      never produce this past-the-end pointer. It changes the lowering to
      roughly:
      
      i, _n, _p := 0, len(s), &s[0]
      if i < _n { goto body } else { goto end }
      top:
      _p += unsafe.Sizeof(s[0])
      body:
      { b }
      i++
      if i < _n { goto top } else { goto end }
      end:
      
      Notably, the increment is split into two parts: we increment the index
      before checking the condition, but increment the pointer only *after*
      the condition check has succeeded.
      
      The implementation builds on the OFORUNTIL construct that was
      introduced during the loop preemption experiments, since OFORUNTIL
      places the increment and condition after the loop body. To support the
      extra "late increment" step, we further define OFORUNTIL's "List"
      field to contain the late increment statements. This makes all of this
      a relatively small change.
      
      This depends on the improvements to the prove pass in CL 102603. With
      the current lowering, bounds-check elimination knows that i < _n in
      the body because the body block is dominated by the cond block. In the
      new lowering, deriving this fact requires detecting that i < _n on
      *both* paths into body and hence is true in body. CL 102603 made prove
      able to detect this.
      
      The code size effect of this is minimal. The cmd/go binary on
      linux/amd64 increases by 0.17%. Performance-wise, this actually
      appears to be a net win, though it's mostly noise:
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.80s ± 0%     2.61s ± 1%  -6.88%  (p=0.000 n=20+18)
      Fannkuch11-12                2.41s ± 0%     2.42s ± 0%  +0.05%  (p=0.005 n=20+20)
      FmtFprintfEmpty-12          41.6ns ± 5%    41.4ns ± 6%    ~     (p=0.765 n=20+19)
      FmtFprintfString-12         69.4ns ± 3%    69.3ns ± 1%    ~     (p=0.084 n=19+17)
      FmtFprintfInt-12            76.1ns ± 1%    77.3ns ± 1%  +1.57%  (p=0.000 n=19+19)
      FmtFprintfIntInt-12          122ns ± 2%     123ns ± 3%  +0.95%  (p=0.015 n=20+20)
      FmtFprintfPrefixedInt-12     153ns ± 2%     151ns ± 3%  -1.27%  (p=0.013 n=20+20)
      FmtFprintfFloat-12           215ns ± 0%     216ns ± 0%  +0.47%  (p=0.000 n=20+16)
      FmtManyArgs-12               486ns ± 1%     498ns ± 0%  +2.40%  (p=0.000 n=20+17)
      GobDecode-12                6.43ms ± 0%    6.50ms ± 0%  +1.08%  (p=0.000 n=18+19)
      GobEncode-12                5.43ms ± 1%    5.47ms ± 0%  +0.76%  (p=0.000 n=20+20)
      Gzip-12                      218ms ± 1%     218ms ± 1%    ~     (p=0.883 n=20+20)
      Gunzip-12                   38.8ms ± 0%    38.9ms ± 0%    ~     (p=0.644 n=19+19)
      HTTPClientServer-12         76.2µs ± 1%    76.4µs ± 2%    ~     (p=0.218 n=20+20)
      JSONEncode-12               12.2ms ± 0%    12.3ms ± 1%  +0.45%  (p=0.000 n=19+19)
      JSONDecode-12               54.2ms ± 1%    53.3ms ± 0%  -1.67%  (p=0.000 n=20+20)
      Mandelbrot200-12            3.71ms ± 0%    3.71ms ± 0%    ~     (p=0.143 n=19+20)
      GoParse-12                  3.22ms ± 0%    3.19ms ± 1%  -0.72%  (p=0.000 n=20+20)
      RegexpMatchEasy0_32-12      76.7ns ± 1%    75.8ns ± 1%  -1.19%  (p=0.000 n=20+17)
      RegexpMatchEasy0_1K-12       245ns ± 1%     243ns ± 0%  -0.72%  (p=0.000 n=18+17)
      RegexpMatchEasy1_32-12      71.9ns ± 0%    71.7ns ± 1%  -0.39%  (p=0.006 n=12+18)
      RegexpMatchEasy1_1K-12       358ns ± 1%     354ns ± 1%  -1.13%  (p=0.000 n=20+19)
      RegexpMatchMedium_32-12      105ns ± 2%     105ns ± 1%  -0.63%  (p=0.007 n=19+20)
      RegexpMatchMedium_1K-12     31.9µs ± 1%    31.9µs ± 1%    ~     (p=1.000 n=17+17)
      RegexpMatchHard_32-12       1.51µs ± 1%    1.52µs ± 2%  +0.46%  (p=0.042 n=18+18)
      RegexpMatchHard_1K-12       45.3µs ± 1%    45.5µs ± 2%  +0.44%  (p=0.029 n=18+19)
      Revcomp-12                   388ms ± 1%     385ms ± 0%  -0.57%  (p=0.000 n=19+18)
      Template-12                 63.0ms ± 1%    63.3ms ± 0%  +0.50%  (p=0.000 n=19+20)
      TimeParse-12                 309ns ± 1%     307ns ± 0%  -0.62%  (p=0.000 n=20+20)
      TimeFormat-12                328ns ± 0%     333ns ± 0%  +1.35%  (p=0.000 n=19+19)
      [Geo mean]                  47.0µs         46.9µs       -0.20%
      
      (https://perf.golang.org/search?q=upload:20180326.1)
      
      For #10958.
      For #24543.
      
      Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a
      Reviewed-on: https://go-review.googlesource.com/102604
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      837ed98d
    • Austin Clements's avatar
      cmd/compile: detect OFORUNTIL inductive facts in prove · b812eec9
      Austin Clements authored
      Currently, we compile range loops into for loops with the obvious
      initialization and update of the index variable. In this form, the
      prove pass can see that the body is dominated by an i < len condition,
      and findIndVar can detect that i is an induction variable and that
      0 <= i < len.
      
      GOEXPERIMENT=preemptibleloops compiles range loops to OFORUNTIL and
      we're preparing to unconditionally switch to a variation of this for
       #24543. OFORUNTIL moves the increment and condition *after* the body,
      which makes the bounds on the index variable much less obvious. With
      OFORUNTIL, proving anything about the index variable requires
      understanding the phi that joins the index values at the top of the
      loop body block.
      
      This interferes with both prove's ability to see that i < len (this is
      true on both paths that enter the body, but from two different
      conditional checks) and with findIndVar's ability to detect the
      induction pattern.
      
      Fix this by teaching prove to detect that the index in the pattern
      constructed by OFORUNTIL is an induction variable and add both bounds
      to the facts table. Currently this is done separately from findIndVar
      because it depends on prove's factsTable, while findIndVar runs before
      visiting blocks and building the factsTable.
      
      Without any GOEXPERIMENT, this has no effect on std or cmd. However,
      with GOEXPERIMENT=preemptibleloops, this change becomes necessary to
      prove 90 conditions in std and cmd.
      
      Change-Id: Ic025d669f81b53426309da5a6e8010e5ccaf4f49
      Reviewed-on: https://go-review.googlesource.com/102603
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      b812eec9
    • Austin Clements's avatar
      cmd/compile: derive len/cap relations in factsTable.update · 4816efac
      Austin Clements authored
      Currently, the prove pass derives implicit relations between len and
      cap in the code that adds branch conditions. This is fine right now
      because that's the only place we can encounter len and cap, but we're
      about to add a second way to add assertions to the facts table that
      can also produce facts involving len and cap.
      
      Prepare for this by moving the fact derivation from updateRestrictions
      (where it only applies on branches) to factsTable.update, which can
      derive these facts no matter where the root facts come from.
      
      Passes toolstash -cmp.
      
      Change-Id: If09692d9eb98ffaa93f4cfa58ed2d8ba0887c111
      Reviewed-on: https://go-review.googlesource.com/102602
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      4816efac
    • Austin Clements's avatar
      cmd/compile: teach prove about relations between constants · 8d8f620f
      Austin Clements authored
      Currently, we never add a relation between two constants to prove's
      fact table because these are eliminated before prove runs, so it
      currently doesn't handle facts like this very well even though they're
      easy to prove.
      
      We're about to start asserting some conditions that don't appear in
      the SSA, but are constructed from existing SSA values that may both be
      constants.
      
      Hence, improve the fact table to understand relations between
      constants by initializing the constant bounds of constant values to
      the value itself, rather than noLimit.
      
      Passes toolstash -cmp.
      
      Change-Id: I71f8dc294e59f19433feab1c10b6d3c99b7f1e26
      Reviewed-on: https://go-review.googlesource.com/102601
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      8d8f620f
    • David Chase's avatar
      cmd/compile: refactor inlining parameters; inline panic · 87a18c61
      David Chase authored
      Inlining was refactored to perform tuning experiments,
      with the "knobs" now set to also inline functions/methods
      that include panic(), and -l=4 (inline calls) now expressed
      as a change to costs, rather than scattered if-thens.
      
      The -l=4 inline-calls penalty is chosen to be the best
      found during experiments; it makes some programs much
      larger and slower (notably, the compiler itself) and is
      believed to be risky for machine-generated code in general,
      which is why it is not the default.  It is also not
      well-tested with the debugger and DWARF output.
      
      This change includes an explicit go:noinline applied to the
      method that is the largest cause of compiler binary growth
      and slowdown for midstack inlining; there are others,
      ideally whatever heuristic eventually appears will make
      this unnecessary.
      
      Change-Id: Idf7056ed2f961472cf49d2fd154ee98bef9421e2
      Reviewed-on: https://go-review.googlesource.com/109918
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAustin Clements <austin@google.com>
      87a18c61
  2. 21 May, 2018 11 commits
  3. 20 May, 2018 1 commit
    • Keith Randall's avatar
      runtime: use libc for nanotime on Darwin · cc09212f
      Keith Randall authored
      Use mach_absolute_time and mach_timebase_info to get nanosecond-level
      timing information from libc on Darwin.
      
      The conversion code from Apple's arbitrary time unit to nanoseconds is
      really annoying.  It would be nice if we could replace the internal
      runtime "time" with arbitrary units and put the conversion to nanoseconds
      only in the places that really need it (so it isn't in every nanotime call).
      
      It's especially annoying because numer==denom==1 for all the machines
      I tried.  Makes it hard to test the conversion code :(
      
      Update #17490
      
      Change-Id: I6c5d602a802f5c24e35184e33d5e8194aa7afa86
      Reviewed-on: https://go-review.googlesource.com/110655
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      cc09212f
  4. 19 May, 2018 2 commits
    • Keith Randall's avatar
      runtime: fix darwin 386/amd64 stack switches · e86c2678
      Keith Randall authored
      A few libc_ calls were missing stack switches.
      
      Unfortunately, adding the stack switches revealed a deeper problem.
      systemstack() is fundamentally flawed because when you do
      
          systemstack(func() { ... })
      
      There's no way to mark the anonymous function as nosplit.  At first I
      thought it didn't matter, as that function runs on the g0 stack.  But
      nosplit is still required, because some syscalls are done when stack
      bounds are not set up correctly (e.g. in a signal handler, which runs
      on the g0 stack, but g is still pointing at the g stack).  Instead use
      asmcgocall and funcPC, so we can be nosplit all the way down.
      
      Mid-stack inlining now pushes darwin over the nosplit limit also.
      Leaving that as a TODO.
      Update #23168
      
      This might fix the cause of occasional darwin hangs.
      Update #25181
      
      Update #17490
      
      Change-Id: If9c3ef052822c7679f5a1dd192443f714483327e
      Reviewed-on: https://go-review.googlesource.com/111258Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      e86c2678
    • Ali Rizvi-Santiago's avatar
      debug/pe: parse the import directory correctly · e9137299
      Ali Rizvi-Santiago authored
      This parses the import table properly which allows for debug/pe
      to extract import symbols from pecoffs linked with an import
      table in a section named something other than ".idata"
      
      The section names in a pecoff object aren't guaranteed to actually
      mean anything, so hardcoding a search for the ".idata" section
      is not guaranteed to find the import table in all shared libraries.
      This resulted in debug/pe being unable to read import symbols
      from some libraries.
      
      The proper way to locate the import table is to validate the
      number of data directory entries, locate the import entry, and
      then use the va to identify the section containing the import
      table. This patch does exactly this.
      
      Fixes #16103.
      
      Change-Id: I3ab6de7f896a0c56bb86c3863e504e8dd4c8faf3
      GitHub-Last-Rev: ce8077cb154f18ada7a86e152ab03de813937816
      GitHub-Pull-Request: golang/go#25193
      Reviewed-on: https://go-review.googlesource.com/110555
      Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAlex Brainman <alex.brainman@gmail.com>
      e9137299
  5. 18 May, 2018 5 commits
  6. 17 May, 2018 7 commits
  7. 16 May, 2018 3 commits