1. 20 Mar, 2018 9 commits
    • Ilya Tocar's avatar
      cmd/compile/internal/ssa: update regalloc in loops · 983dcf70
      Ilya Tocar authored
      Currently we don't lift spill out of loop if loop contains call.
      However often we have code like this:
      
      for .. {
          if hard_case {
      	call()
          }
          // simple case, without call
      }
      
      So instead of checking for any call, check for unavoidable call.
      For #22698 cases I see:
      mime/quotedprintable/Writer-6                   10.9µs ± 4%      9.2µs ± 3%   -15.02%  (p=0.000 n=8+8)
      And:
      compress/flate/Encode/Twain/Huffman/1e4-6       99.4µs ± 6%     90.9µs ± 0%    -8.57%  (p=0.000 n=8+8)
      compress/flate/Encode/Twain/Huffman/1e5-6       760µs ± 1%      725µs ± 1%     -4.56%  (p=0.000 n=8+8)
      compress/flate/Encode/Twain/Huffman/1e6-6       7.55ms ± 0%      7.24ms ± 0%     -4.07%  (p=0.000 n=8+7)
      
      There are no significant changes on go1 benchmarks.
      But for cases with runtime arch checks, where we call generic version on old hardware,
      there are respectable performance gains:
      math/RoundToEven-6                             1.43ns ± 0%     1.25ns ± 0%   -12.59%  (p=0.001 n=7+7)
      math/bits/OnesCount64-6                        1.60ns ± 1%     1.42ns ± 1%   -11.32%  (p=0.000 n=8+8)
      
      Also on some runtime benchmarks loops have less loads and higher performance:
      runtime/RuneIterate/range1/ASCII-6             15.6ns ± 1%     13.9ns ± 1%   -10.74%  (p=0.000 n=7+8)
      runtime/ArrayEqual-6                           3.22ns ± 0%     2.86ns ± 2%   -11.06%  (p=0.000 n=7+8)
      
      Fixes #22698
      Updates #22234
      
      Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a
      Reviewed-on: https://go-review.googlesource.com/84055
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      983dcf70
    • Alberto Donizetti's avatar
      test/codegen: port comparisons tests to codegen · be371edd
      Alberto Donizetti authored
      And delete them from asm_test.
      
      Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab
      Reviewed-on: https://go-review.googlesource.com/101595
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarGiovanni Bajo <rasky@develer.com>
      be371edd
    • Than McIntosh's avatar
      cmd/compile: fix regression in DWARF inlined routine variable tracking · f45c07e8
      Than McIntosh authored
      Fix a bug in the code that generates the pre-inlined variable
      declaration table used as raw material for emitting DWARF inline
      routine records. The fix for issue 23704 altered the recipe for
      assigning file/line/col to variables in one part of the compiler, but
      didn't update a similar recipe in the code for variable tracking.
      Added a new test that should catch problems of a similar nature.
      
      Fixes #24460.
      
      Change-Id: I255c036637f4151aa579c0e21d123fd413724d61
      Reviewed-on: https://go-review.googlesource.com/101676Reviewed-by: 's avatarAlessandro Arzilli <alessandro.arzilli@gmail.com>
      Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      f45c07e8
    • Michael Munday's avatar
      cmd/compile: mark LAA and LAAG as clobbering flags on s390x · ae10914e
      Michael Munday authored
      The atomic add instructions modify the condition code and so need to
      be marked as clobbering flags.
      
      Fixes #24449.
      
      Change-Id: Ic69c8d775fbdbfb2a56c5e0cfca7a49c0d7f6897
      Reviewed-on: https://go-review.googlesource.com/101455
      Run-TryBot: Michael Munday <mike.munday@ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      ae10914e
    • Fangming.Fang's avatar
      cmd/asm: fix bug about VMOV instruction (move a vector element to another) on ARM64 · 9c312245
      Fangming.Fang authored
      This change fixes index error when encoding VMOV instruction which pattern
      is vmov Vn.<T>[index], Vd.<T>[index]
      
      Change-Id: I949166e6dfd63fb0a9365f183b6c50d452614f9d
      Reviewed-on: https://go-review.googlesource.com/101335Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      9c312245
    • Fangming.Fang's avatar
      cmd/asm: fix bug about VMOV instruction (move register to vector element) on ARM64 · 7673e305
      Fangming.Fang authored
      This change fixes index error when encoding VMOV instruction which pattern is
      VMOV Rn, V.<T>[index]. For example VMOV R1, V1.S[1] is assembled as VMOV R1, V1.S[0]
      
      Fixes #24400
      Change-Id: I82b5edc8af4e06862bc4692b119697c6bb7dc3fb
      Reviewed-on: https://go-review.googlesource.com/101297Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      7673e305
    • Vladimir Kuzmin's avatar
      cmd/compile: avoid mapaccess at m[k]=append(m[k].. · c12b185a
      Vladimir Kuzmin authored
      Currently rvalue m[k] is transformed during walk into:
      
              tmp1 := *mapaccess(m, k)
              tmp2 := append(tmp1, ...)
              *mapassign(m, k) = tmp2
      
      However, this is suboptimal, as we could instead produce just:
              tmp := mapassign(m, k)
              *tmp := append(*tmp, ...)
      
      Optimization is possible only if during Order it may tell that m[k] is
      exactly the same at left and right part of assignment. It doesn't work:
      1) m[f(k)] = append(m[f(k)], ...)
      2) sink, m[k] = sink, append(m[k]...)
      3) m[k] = append(..., m[k],...)
      
      Benchmark:
      name                           old time/op    new time/op    delta
      MapAppendAssign/Int32/256-8      33.5ns ± 3%    22.4ns ±10%  -33.24%  (p=0.000 n=16+18)
      MapAppendAssign/Int32/65536-8    68.2ns ± 6%    48.5ns ±29%  -28.90%  (p=0.000 n=20+20)
      MapAppendAssign/Int64/256-8      34.3ns ± 4%    23.3ns ± 5%  -32.23%  (p=0.000 n=17+18)
      MapAppendAssign/Int64/65536-8    65.9ns ± 7%    61.2ns ±19%   -7.06%  (p=0.002 n=18+20)
      MapAppendAssign/Str/256-8         116ns ±12%      79ns ±16%  -31.70%  (p=0.000 n=20+19)
      MapAppendAssign/Str/65536-8       134ns ±15%     111ns ±45%  -16.95%  (p=0.000 n=19+20)
      
      name                           old alloc/op   new alloc/op   delta
      MapAppendAssign/Int32/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=19+18)
      MapAppendAssign/Int32/65536-8     27.0B ± 0%     20.7B ±30%  -23.33%  (p=0.000 n=20+20)
      MapAppendAssign/Int64/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=20+17)
      MapAppendAssign/Int64/65536-8     27.0B ± 0%     27.0B ± 0%     ~     (all equal)
      MapAppendAssign/Str/256-8         94.0B ± 0%     78.0B ± 0%  -17.02%  (p=0.000 n=20+16)
      MapAppendAssign/Str/65536-8       54.0B ± 0%     54.0B ± 0%     ~     (all equal)
      
      Fixes #24364
      Updates #5147
      
      Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409
      Reviewed-on: https://go-review.googlesource.com/100838Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c12b185a
    • Cherry Zhang's avatar
      Revert "bytes: add optimized Compare for arm64" · e22d2413
      Cherry Zhang authored
      This reverts commit bfa8b6f8.
      
      Reason for revert: This depends on another CL which is not yet submitted.
      
      Change-Id: I50e7594f1473c911a2079fe910849a6694ac6c07
      Reviewed-on: https://go-review.googlesource.com/101496Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      e22d2413
    • fanzha02's avatar
      bytes: add optimized Compare for arm64 · bfa8b6f8
      fanzha02 authored
      Use LDP instructions to load 16 bytes per loop when the source length is long. Specially
      process the 8 bytes length, 4 bytes length and 2 bytes length to get a better performance.
      
      Benchmark result:
      name                           old time/op   new time/op    delta
      BytesCompare/1-8                21.0ns ± 0%    10.5ns ± 0%      ~     (p=0.079 n=4+5)
      BytesCompare/2-8                11.5ns ± 0%    10.5ns ± 0%    -8.70%  (p=0.008 n=5+5)
      BytesCompare/4-8                13.5ns ± 0%    10.0ns ± 0%   -25.93%  (p=0.008 n=5+5)
      BytesCompare/8-8                28.8ns ± 0%     9.5ns ± 0%      ~     (p=0.079 n=4+5)
      BytesCompare/16-8               40.5ns ± 0%    10.5ns ± 0%   -74.07%  (p=0.008 n=5+5)
      BytesCompare/32-8               64.6ns ± 0%    12.5ns ± 0%   -80.65%  (p=0.008 n=5+5)
      BytesCompare/64-8                112ns ± 0%      16ns ± 0%   -85.27%  (p=0.008 n=5+5)
      BytesCompare/128-8               208ns ± 0%      24ns ± 0%   -88.22%  (p=0.008 n=5+5)
      BytesCompare/256-8               400ns ± 0%      50ns ± 0%   -87.62%  (p=0.008 n=5+5)
      BytesCompare/512-8               785ns ± 0%      82ns ± 0%   -89.61%  (p=0.008 n=5+5)
      BytesCompare/1024-8             1.55µs ± 0%    0.14µs ± 0%      ~     (p=0.079 n=4+5)
      BytesCompare/2048-8             3.09µs ± 0%    0.27µs ± 0%      ~     (p=0.079 n=4+5)
      CompareBytesEqual-8             39.0ns ± 0%    12.0ns ± 0%   -69.23%  (p=0.008 n=5+5)
      CompareBytesToNil-8             8.57ns ± 5%    8.23ns ± 2%    -3.99%  (p=0.016 n=5+5)
      CompareBytesEmpty-8             7.37ns ± 0%    7.36ns ± 4%      ~     (p=0.690 n=5+5)
      CompareBytesIdentical-8         7.39ns ± 0%    7.46ns ± 2%      ~     (p=0.667 n=5+5)
      CompareBytesSameLength-8        17.0ns ± 0%    10.5ns ± 0%   -38.24%  (p=0.008 n=5+5)
      CompareBytesDifferentLength-8   17.0ns ± 0%    10.5ns ± 0%   -38.24%  (p=0.008 n=5+5)
      CompareBytesBigUnaligned-8      1.58ms ± 0%    0.19ms ± 0%   -88.31%  (p=0.016 n=4+5)
      CompareBytesBig-8               1.59ms ± 0%    0.19ms ± 0%   -88.27%  (p=0.016 n=5+4)
      CompareBytesBigIdentical-8      7.01ns ± 0%    6.60ns ± 3%    -5.91%  (p=0.008 n=5+5)
      
      name                           old speed     new speed      delta
      CompareBytesBigUnaligned-8     662MB/s ± 0%  5660MB/s ± 0%  +755.15%  (p=0.016 n=4+5)
      CompareBytesBig-8              661MB/s ± 0%  5636MB/s ± 0%  +752.57%  (p=0.016 n=5+4)
      CompareBytesBigIdentical-8     150TB/s ± 0%   159TB/s ± 3%    +6.27%  (p=0.008 n=5+5)
      
      Change-Id: I70332de06f873df3bc12c4a5af1028307b670046
      Reviewed-on: https://go-review.googlesource.com/90175Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      bfa8b6f8
  2. 19 Mar, 2018 11 commits
  3. 18 Mar, 2018 2 commits
  4. 17 Mar, 2018 1 commit
  5. 16 Mar, 2018 4 commits
    • Daniel Martí's avatar
      cmd/go: remove some unused parameters · 2767c4e2
      Daniel Martí authored
      Change-Id: I441b3045e76afc1c561914926c14efc8a116c8a7
      Reviewed-on: https://go-review.googlesource.com/101195
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      2767c4e2
    • David Chase's avatar
      cmd/compile: enable scopes unconditionally · b30bf958
      David Chase authored
      This revives Alessandro Arzilli's CL to enable scopes
      whenever any dwarf is emitted (with optimization or not),
      adds a test that detects this changes and shows that it
      creates more truthful debugging output.
      
      Reverted change to ssa/debug_test tests made when
      scopes were disabled during dwarflocationlist development.
      
      Also included are updates to the Delve test output (it
      had fallen out of sync; creating test output for one
      updates it for all) and minor naming changes in
      ssa/debug_test.
      
      Compile-time/space changes (relative to tip including dwarflocationlists):
      
      benchstat -geomean after.log scopes.log
      name        old time/op     new time/op     delta
      Template        182ms ± 1%      182ms ± 1%    ~     (p=0.666 n=9+9)
      Unicode        82.8ms ± 1%     86.6ms ±14%    ~     (p=0.211 n=9+10)
      GoTypes         611ms ± 1%      616ms ± 2%  +0.97%  (p=0.001 n=10+9)
      Compiler        2.95s ± 1%      2.95s ± 0%    ~     (p=0.573 n=10+8)
      SSA             6.70s ± 1%      6.81s ± 1%  +1.68%  (p=0.000 n=9+10)
      Flate           117ms ± 1%      118ms ± 1%  +0.60%  (p=0.036 n=9+8)
      GoParser        145ms ± 1%      145ms ± 1%    ~     (p=1.000 n=9+9)
      Reflect         398ms ± 1%      396ms ± 1%    ~     (p=0.053 n=9+10)
      Tar             171ms ± 1%      171ms ± 1%    ~     (p=0.356 n=9+10)
      XML             214ms ± 1%      214ms ± 1%    ~     (p=0.605 n=9+9)
      StdCmd          12.4s ± 2%      12.4s ± 1%    ~     (p=1.000 n=9+9)
      [Geo mean]      506ms           509ms       +0.71%
      
      name        old user-ns/op  new user-ns/op  delta
      Template         254M ± 4%       249M ± 6%    ~     (p=0.155 n=10+10)
      Unicode          121M ±11%       124M ± 6%    ~     (p=0.516 n=10+10)
      GoTypes          824M ± 2%       869M ± 5%  +5.49%  (p=0.001 n=8+10)
      Compiler        4.01G ± 2%      4.02G ± 1%    ~     (p=0.561 n=9+9)
      SSA             10.0G ± 2%      10.2G ± 2%  +2.29%  (p=0.000 n=9+10)
      Flate            154M ± 7%       154M ± 7%    ~     (p=0.960 n=10+9)
      GoParser         190M ± 7%       196M ± 6%    ~     (p=0.064 n=9+10)
      Reflect          528M ± 2%       517M ± 3%  -1.97%  (p=0.025 n=10+10)
      Tar              227M ± 5%       232M ± 3%    ~     (p=0.061 n=9+10)
      XML              286M ± 4%       283M ± 4%    ~     (p=0.343 n=9+9)
      [Geo mean]       502M            508M       +1.09%
      
      name        old text-bytes  new text-bytes  delta
      HelloSize        672k ± 0%       672k ± 0%  +0.01%  (p=0.000 n=10+10)
      CmdGoSize       7.21M ± 0%      7.21M ± 0%  -0.00%  (p=0.000 n=10+10)
      [Geo mean]      2.20M           2.20M       +0.00%
      
      name        old data-bytes  new data-bytes  delta
      HelloSize       9.88k ± 0%      9.88k ± 0%    ~     (all equal)
      CmdGoSize        248k ± 0%       248k ± 0%    ~     (all equal)
      [Geo mean]      49.5k           49.5k       +0.00%
      
      name        old bss-bytes   new bss-bytes   delta
      HelloSize        125k ± 0%       125k ± 0%    ~     (all equal)
      CmdGoSize        144k ± 0%       144k ± 0%  -0.04%  (p=0.000 n=10+10)
      [Geo mean]       135k            135k       -0.02%
      
      name        old exe-bytes   new exe-bytes   delta
      HelloSize       1.30M ± 0%      1.34M ± 0%  +3.15%  (p=0.000 n=10+10)
      CmdGoSize       13.5M ± 0%      13.9M ± 0%  +2.70%  (p=0.000 n=10+10)
      [Geo mean]      4.19M           4.31M       +2.92%
      
      Change-Id: Id53b8d57bd00440142ccbd39b95710e14e083fb5
      Reviewed-on: https://go-review.googlesource.com/101217Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      b30bf958
    • Ian Lance Taylor's avatar
      net: don't let cancelation of a DNS lookup affect another lookup · bd859439
      Ian Lance Taylor authored
      Updates #8602
      Updates #20703
      Fixes #22724
      
      Change-Id: I27b72311b2c66148c59977361bd3f5101e47b51d
      Reviewed-on: https://go-review.googlesource.com/100840
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      bd859439
    • Brad Fitzpatrick's avatar
      net: make Resolver.PreferGo work more as documented · 0b20aece
      Brad Fitzpatrick authored
      Fixes #24393
      
      Change-Id: I8bcee34cdf30472663d866ed6056301d8445215c
      Reviewed-on: https://go-review.googlesource.com/100875
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      0b20aece
  6. 15 Mar, 2018 13 commits
    • Matthew Dempsky's avatar
      reflect: sort exported methods first · 86a33896
      Matthew Dempsky authored
      By moving exported methods to the front of method lists, filtering
      down to only the exported methods just needs a count of how many
      exported methods exist, which the compiler can statically
      provide. This allows getting rid of the exported method cache.
      
      For #22075.
      
      Change-Id: I8eeb274563a2940e1347c34d673f843ae2569064
      Reviewed-on: https://go-review.googlesource.com/100846Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      86a33896
    • Matthew Dempsky's avatar
      cmd/compile: sort method sets earlier · 91bbe538
      Matthew Dempsky authored
      By sorting method sets earlier, we can change the interface
      satisfaction problem from taking O(NM) time to O(N+M). This is the
      same algorithm already used by runtime and reflect for dynamic
      interface satisfaction testing.
      
      For #22075.
      
      Change-Id: I3d889f0227f37704535739bbde11f5107b4eea17
      Reviewed-on: https://go-review.googlesource.com/100845
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      91bbe538
    • Adam Shannon's avatar
      crypto/x509: clarify accepted keys for MarshalPKCS8PrivateKey · dfaed7ff
      Adam Shannon authored
      Fixes #24413.
      
      Change-Id: I265088c9ddc624cb3b3132087cc3d4baf95d2777
      Reviewed-on: https://go-review.googlesource.com/100839Reviewed-by: 's avatarFilippo Valsorda <filippo@golang.org>
      dfaed7ff
    • David Chase's avatar
      cmd/compile: turn on DWARF locations lists for ssa vars · 1c24ffbf
      David Chase authored
      This changes the default setting for -dwarflocationlists
      from false to true, removes the flag from ssa/debug_test.go,
      and updates runtime/runtime-gdb_test.go to match a change
      in debugging output for composite variables.
      
      Current benchmarks (perflock, -count 10)
      
      benchstat -geomean before.log after.log
      name        old time/op     new time/op     delta
      Template        175ms ± 0%      182ms ± 1%   +3.68%  (p=0.000 n=8+9)
      Unicode        82.0ms ± 2%     82.8ms ± 1%   +0.96%  (p=0.019 n=9+9)
      GoTypes         590ms ± 1%      611ms ± 1%   +3.42%  (p=0.000 n=9+10)
      Compiler        2.85s ± 0%      2.95s ± 1%   +3.60%  (p=0.000 n=9+10)
      SSA             6.42s ± 1%      6.70s ± 1%   +4.31%  (p=0.000 n=10+9)
      Flate           113ms ± 2%      117ms ± 1%   +3.11%  (p=0.000 n=10+9)
      GoParser        140ms ± 1%      145ms ± 1%   +3.47%  (p=0.000 n=10+9)
      Reflect         384ms ± 0%      398ms ± 1%   +3.56%  (p=0.000 n=8+9)
      Tar             165ms ± 1%      171ms ± 1%   +3.33%  (p=0.000 n=9+9)
      XML             207ms ± 2%      214ms ± 1%   +3.41%  (p=0.000 n=9+9)
      StdCmd          11.8s ± 2%      12.4s ± 2%   +4.41%  (p=0.000 n=10+9)
      [Geo mean]      489ms           506ms        +3.38%
      
      name        old user-ns/op  new user-ns/op  delta
      Template         247M ± 4%       254M ± 4%   +2.76%  (p=0.040 n=10+10)
      Unicode          118M ±16%       121M ±11%     ~     (p=0.364 n=10+10)
      GoTypes          805M ± 2%       824M ± 2%   +2.37%  (p=0.003 n=9+8)
      Compiler        3.92G ± 2%      4.01G ± 2%   +2.20%  (p=0.001 n=9+9)
      SSA             9.63G ± 4%     10.00G ± 2%   +3.81%  (p=0.000 n=10+9)
      Flate            155M ±10%       154M ± 7%     ~     (p=0.718 n=9+10)
      GoParser         184M ±11%       190M ± 7%     ~     (p=0.220 n=10+9)
      Reflect          506M ± 4%       528M ± 2%   +4.27%  (p=0.000 n=10+10)
      Tar              224M ± 4%       227M ± 5%     ~     (p=0.207 n=10+9)
      XML              272M ± 7%       286M ± 4%   +5.23%  (p=0.010 n=10+9)
      [Geo mean]       489M            502M        +2.76%
      
      name        old text-bytes  new text-bytes  delta
      HelloSize        672k ± 0%       672k ± 0%     ~     (all equal)
      CmdGoSize       7.21M ± 0%      7.21M ± 0%     ~     (all equal)
      [Geo mean]      2.20M           2.20M        +0.00%
      
      name        old data-bytes  new data-bytes  delta
      HelloSize       9.88k ± 0%      9.88k ± 0%     ~     (all equal)
      CmdGoSize        248k ± 0%       248k ± 0%     ~     (all equal)
      [Geo mean]      49.5k           49.5k        +0.00%
      
      name        old bss-bytes   new bss-bytes   delta
      HelloSize        125k ± 0%       125k ± 0%     ~     (all equal)
      CmdGoSize        144k ± 0%       144k ± 0%     ~     (all equal)
      [Geo mean]       135k            135k        +0.00%
      
      name        old exe-bytes   new exe-bytes   delta
      HelloSize       1.10M ± 0%      1.30M ± 0%  +17.82%  (p=0.000 n=10+10)
      CmdGoSize       11.6M ± 0%      13.5M ± 0%  +16.90%  (p=0.000 n=10+10)
      [Geo mean]      3.57M           4.19M       +17.36%
      
      Change-Id: I250055813cadd25cebee8da1f9a7f995a6eae432
      Reviewed-on: https://go-review.googlesource.com/100738Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      1c24ffbf
    • Heschi Kreinick's avatar
      cmd/trace: filter tasks by log text · 1814a059
      Heschi Kreinick authored
      Add a search box to the top of the user task views that only displays
      tasks containing a particular log message.
      
      Change-Id: I92f4aa113f930954e8811416901e37824f0eb884
      Reviewed-on: https://go-review.googlesource.com/100843
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarHyang-Ah Hana Kim <hyangah@gmail.com>
      1814a059
    • Hana Kim's avatar
      internal/trace: fix GC time computation of short goroutines · 61f92ee5
      Hana Kim authored
      Goroutine analysis reports the sum of all overlapping GC intervals as
      the GCTime of a goroutine. The computation is done by adding the length
      of a completed GC interval to 'active' goroutines when processing the
      corresponding EvGCDone event. This change fixes the two corner cases
      the current implementation ignores:
      
      1) Goroutine that ends during GC. Previously, this goroutine was ignored
      and GC time was undercounted. We handle this case by setting the
      gcStartTime only when GC is active and handling non-zero gcStartTime
      when processing EvGoStop and EvGoStart.
      
      2) Goroutine that starts during GC. Previously, the entire GC interval
      length was added to the Goroutine's GCTime which resulted in overcount
      of GC time. We handle this case by computing the length of overlapped
      period precisely.
      
      Change-Id: Ifa8e82672ec341b5ff87837209f4311fa7262b7f
      Reviewed-on: https://go-review.googlesource.com/100842Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      61f92ee5
    • Alberto Donizetti's avatar
      test/codegen: port floats tests to codegen · cceee685
      Alberto Donizetti authored
      And delete them from asm_test.
      
      Change-Id: Ibdaca3496eefc73c731b511ddb9636a1f3dff68c
      Reviewed-on: https://go-review.googlesource.com/100915
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      cceee685
    • Robert Griesemer's avatar
      go/scanner: report errors for incorrect line directives · 546bab8c
      Robert Griesemer authored
      Based on decision for #24183. This makes the go/scanner behavior
      match cmd/compile behavior. Adjusted a go/printer test that assumed
      silent behavior for invalid line directive, and added more scanner
      tests verifying the correct error position and message for invalid
      line directives.
      
      The filenames in line directives now remain untouched by the scanner;
      there is no cleanup or conversion of relative into absolute paths
      anymore, in sync with what the compiler's scanner/parser are doing.
      Any kind of filename transformation has to be done by a client. This
      makes the scanner code simpler and also more predictable.
      
      For #24183.
      
      Change-Id: Ia091548e1d3d89dfdf6e7d82dab50bea05742ce3
      Reviewed-on: https://go-review.googlesource.com/100235
      Run-TryBot: Robert Griesemer <gri@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      546bab8c
    • Keith Randall's avatar
      runtime: identify special functions by flag instead of address · 9d421531
      Keith Randall authored
      When there are plugins, there may not be a unique copy of runtime
      functions like goexit, mcall, etc.  So identifying them by entry
      address is problematic.  Instead, keep track of each special function
      using a field in the symbol table.  That way, multiple copies of
      the same runtime function will be treated identically.
      
      Fixes #24351
      Fixes #23133
      
      Change-Id: Iea3232df8a6af68509769d9ca618f530cc0f84fd
      Reviewed-on: https://go-review.googlesource.com/100739
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      9d421531
    • Daniel Martí's avatar
      cmd/compile: cache sparse maps across ssa passes · cd2cb6e3
      Daniel Martí authored
      This is done for sparse sets already, but it was missing for sparse
      maps. Only affects deadstore and regalloc, as they're the only ones that
      use sparse maps.
      
      name                 old time/op    new time/op    delta
      DSEPass-4               247µs ± 0%     216µs ± 0%  -12.75%  (p=0.008 n=5+5)
      DSEPassBlock-4         3.05ms ± 1%    2.87ms ± 1%   -6.02%  (p=0.002 n=6+6)
      CSEPass-4              2.30ms ± 0%    2.32ms ± 0%   +0.53%  (p=0.026 n=6+6)
      CSEPassBlock-4         23.8ms ± 0%    23.8ms ± 0%     ~     (p=0.931 n=6+5)
      DeadcodePass-4         51.7µs ± 1%    51.5µs ± 2%     ~     (p=0.429 n=5+6)
      DeadcodePassBlock-4     734µs ± 1%     742µs ± 3%     ~     (p=0.394 n=6+6)
      MultiPass-4             152µs ± 0%     149µs ± 2%     ~     (p=0.082 n=5+6)
      MultiPassBlock-4       2.67ms ± 1%    2.41ms ± 2%   -9.77%  (p=0.008 n=5+5)
      
      name                 old alloc/op   new alloc/op   delta
      DSEPass-4              41.2kB ± 0%     0.1kB ± 0%  -99.68%  (p=0.002 n=6+6)
      DSEPassBlock-4          560kB ± 0%       4kB ± 0%  -99.34%  (p=0.026 n=5+6)
      CSEPass-4               189kB ± 0%     189kB ± 0%     ~     (all equal)
      CSEPassBlock-4         3.10MB ± 0%    3.10MB ± 0%     ~     (p=0.444 n=5+5)
      DeadcodePass-4         10.5kB ± 0%    10.5kB ± 0%     ~     (all equal)
      DeadcodePassBlock-4     164kB ± 0%     164kB ± 0%     ~     (all equal)
      MultiPass-4             240kB ± 0%     199kB ± 0%  -17.06%  (p=0.002 n=6+6)
      MultiPassBlock-4       3.60MB ± 0%    2.99MB ± 0%  -17.06%  (p=0.002 n=6+6)
      
      name                 old allocs/op  new allocs/op  delta
      DSEPass-4                8.00 ± 0%      4.00 ± 0%  -50.00%  (p=0.002 n=6+6)
      DSEPassBlock-4            240 ± 0%       120 ± 0%  -50.00%  (p=0.002 n=6+6)
      CSEPass-4                9.00 ± 0%      9.00 ± 0%     ~     (all equal)
      CSEPassBlock-4          1.35k ± 0%     1.35k ± 0%     ~     (all equal)
      DeadcodePass-4           3.00 ± 0%      3.00 ± 0%     ~     (all equal)
      DeadcodePassBlock-4      9.00 ± 0%      9.00 ± 0%     ~     (all equal)
      MultiPass-4              11.0 ± 0%      10.0 ± 0%   -9.09%  (p=0.002 n=6+6)
      MultiPassBlock-4          165 ± 0%       150 ± 0%   -9.09%  (p=0.002 n=6+6)
      
      Change-Id: I43860687c88f33605eb1415f36473c5cfe8fde4a
      Reviewed-on: https://go-review.googlesource.com/98449
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarJosh Bleecher Snyder <josharian@gmail.com>
      cd2cb6e3
    • Giovanni Bajo's avatar
      cmd/compile: implement CMOV on amd64 · a35ec9a5
      Giovanni Bajo authored
      This builds upon the branchelim pass, activating it for amd64 and
      lowering CondSelect. Special care is made to FPU instructions for
      NaN handling.
      
      Benchmark results on Xeon E5630 (Westmere EP):
      
      name                      old time/op    new time/op    delta
      BinaryTree17-16              4.99s ± 9%     4.66s ± 2%     ~     (p=0.095 n=5+5)
      Fannkuch11-16                4.93s ± 3%     5.04s ± 2%     ~     (p=0.548 n=5+5)
      FmtFprintfEmpty-16          58.8ns ± 7%    61.4ns ±14%     ~     (p=0.579 n=5+5)
      FmtFprintfString-16          114ns ± 2%     114ns ± 4%     ~     (p=0.603 n=5+5)
      FmtFprintfInt-16             181ns ± 4%     125ns ± 3%  -30.90%  (p=0.008 n=5+5)
      FmtFprintfIntInt-16          263ns ± 2%     217ns ± 2%  -17.34%  (p=0.008 n=5+5)
      FmtFprintfPrefixedInt-16     230ns ± 1%     212ns ± 1%   -7.99%  (p=0.008 n=5+5)
      FmtFprintfFloat-16           411ns ± 3%     344ns ± 5%  -16.43%  (p=0.008 n=5+5)
      FmtManyArgs-16               828ns ± 4%     790ns ± 2%   -4.59%  (p=0.032 n=5+5)
      GobDecode-16                10.9ms ± 4%    10.8ms ± 5%     ~     (p=0.548 n=5+5)
      GobEncode-16                9.52ms ± 5%    9.46ms ± 2%     ~     (p=1.000 n=5+5)
      Gzip-16                      334ms ± 2%     337ms ± 2%     ~     (p=0.548 n=5+5)
      Gunzip-16                   64.4ms ± 1%    65.0ms ± 1%   +1.00%  (p=0.008 n=5+5)
      HTTPClientServer-16          156µs ± 3%     155µs ± 3%     ~     (p=0.690 n=5+5)
      JSONEncode-16               21.0ms ± 1%    21.8ms ± 0%   +3.76%  (p=0.016 n=5+4)
      JSONDecode-16               95.1ms ± 0%    95.7ms ± 1%     ~     (p=0.151 n=5+5)
      Mandelbrot200-16            6.38ms ± 1%    6.42ms ± 1%     ~     (p=0.095 n=5+5)
      GoParse-16                  5.47ms ± 2%    5.36ms ± 1%   -1.95%  (p=0.016 n=5+5)
      RegexpMatchEasy0_32-16       111ns ± 1%     111ns ± 1%     ~     (p=0.635 n=5+4)
      RegexpMatchEasy0_1K-16       408ns ± 1%     411ns ± 2%     ~     (p=0.087 n=5+5)
      RegexpMatchEasy1_32-16       103ns ± 1%     104ns ± 1%     ~     (p=0.484 n=5+5)
      RegexpMatchEasy1_1K-16       659ns ± 2%     652ns ± 1%     ~     (p=0.571 n=5+5)
      RegexpMatchMedium_32-16      176ns ± 2%     174ns ± 1%     ~     (p=0.476 n=5+5)
      RegexpMatchMedium_1K-16     58.6µs ± 4%    57.7µs ± 4%     ~     (p=0.548 n=5+5)
      RegexpMatchHard_32-16       3.07µs ± 3%    3.04µs ± 4%     ~     (p=0.421 n=5+5)
      RegexpMatchHard_1K-16       89.2µs ± 1%    87.9µs ± 2%   -1.52%  (p=0.032 n=5+5)
      Revcomp-16                   575ms ± 0%     587ms ± 2%   +2.12%  (p=0.032 n=4+5)
      Template-16                  110ms ± 1%     107ms ± 3%   -3.00%  (p=0.032 n=5+5)
      TimeParse-16                 463ns ± 0%     462ns ± 0%     ~     (p=0.810 n=5+4)
      TimeFormat-16                538ns ± 0%     535ns ± 0%   -0.63%  (p=0.024 n=5+5)
      
      name                      old speed      new speed      delta
      GobDecode-16              70.7MB/s ± 4%  71.4MB/s ± 5%     ~     (p=0.452 n=5+5)
      GobEncode-16              80.7MB/s ± 5%  81.2MB/s ± 2%     ~     (p=1.000 n=5+5)
      Gzip-16                   58.2MB/s ± 2%  57.7MB/s ± 2%     ~     (p=0.452 n=5+5)
      Gunzip-16                  302MB/s ± 1%   299MB/s ± 1%   -0.99%  (p=0.008 n=5+5)
      JSONEncode-16             92.4MB/s ± 1%  89.1MB/s ± 0%   -3.63%  (p=0.016 n=5+4)
      JSONDecode-16             20.4MB/s ± 0%  20.3MB/s ± 1%     ~     (p=0.135 n=5+5)
      GoParse-16                10.6MB/s ± 2%  10.8MB/s ± 1%   +2.00%  (p=0.016 n=5+5)
      RegexpMatchEasy0_32-16     286MB/s ± 1%   285MB/s ± 3%     ~     (p=1.000 n=5+5)
      RegexpMatchEasy0_1K-16    2.51GB/s ± 1%  2.49GB/s ± 2%     ~     (p=0.095 n=5+5)
      RegexpMatchEasy1_32-16     309MB/s ± 1%   307MB/s ± 1%     ~     (p=0.548 n=5+5)
      RegexpMatchEasy1_1K-16    1.55GB/s ± 2%  1.57GB/s ± 1%     ~     (p=0.690 n=5+5)
      RegexpMatchMedium_32-16   5.68MB/s ± 2%  5.73MB/s ± 1%     ~     (p=0.579 n=5+5)
      RegexpMatchMedium_1K-16   17.5MB/s ± 4%  17.8MB/s ± 4%     ~     (p=0.500 n=5+5)
      RegexpMatchHard_32-16     10.4MB/s ± 3%  10.5MB/s ± 4%     ~     (p=0.460 n=5+5)
      RegexpMatchHard_1K-16     11.5MB/s ± 1%  11.7MB/s ± 2%   +1.57%  (p=0.032 n=5+5)
      Revcomp-16                 442MB/s ± 0%   433MB/s ± 2%   -2.05%  (p=0.032 n=4+5)
      Template-16               17.7MB/s ± 1%  18.2MB/s ± 3%   +3.12%  (p=0.032 n=5+5)
      
      Change-Id: I6972e8f35f2b31f9a42ac473a6bf419a18022558
      Reviewed-on: https://go-review.googlesource.com/100935
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      a35ec9a5
    • James Cowgill's avatar
      cmd/internal/obj/mips: load/store even float registers first · 42311108
      James Cowgill authored
      There is a bug in Octeon III processors where storing an odd floating
      point register after it has recently been written to by a double
      floating point operation will store the old value from before the double
      operation (there are some extra details - the operation and store
      must be a certain number of cycles apart). However, this bug does not
      occur if the even register is stored first. Currently the bug only
      happens on big endian because go always loads the even register first on
      little endian.
      
      Workaround the bug by always loading / storing the even floating point
      register first. Since this is just an instruction reordering, it should
      have no performance penalty. This follows other compilers like GCC which
      will always store the even register first (although you do have to set
      the ISA level to MIPS I to prevent it from using SDC1).
      
      Change-Id: I5e73daa4d724ca1df7bf5228aab19f53f26a4976
      Reviewed-on: https://go-review.googlesource.com/97735Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      42311108
    • Geoff Berry's avatar
      cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes · e244a7a7
      Geoff Berry authored
      Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX,
      UBFIZ and UBFX opcodes.
      
      go1 benchmarks results on Amberwing:
      name                   old time/op    new time/op    delta
      FmtManyArgs               786ns ± 2%     714ns ± 1%  -9.20%  (p=0.000 n=10+10)
      Gzip                      437ms ± 0%     402ms ± 0%  -7.99%  (p=0.000 n=10+10)
      FmtFprintfIntInt          196ns ± 0%     182ns ± 0%  -7.28%  (p=0.000 n=10+9)
      FmtFprintfPrefixedInt     207ns ± 0%     199ns ± 0%  -3.86%  (p=0.000 n=10+10)
      FmtFprintfFloat           324ns ± 0%     316ns ± 0%  -2.47%  (p=0.000 n=10+8)
      FmtFprintfInt             119ns ± 0%     117ns ± 0%  -1.68%  (p=0.000 n=10+9)
      GobDecode                12.8ms ± 2%    12.6ms ± 1%  -1.62%  (p=0.002 n=10+10)
      JSONDecode               94.4ms ± 1%    93.4ms ± 0%  -1.10%  (p=0.000 n=10+10)
      RegexpMatchEasy0_32       247ns ± 0%     245ns ± 0%  -0.65%  (p=0.000 n=10+10)
      RegexpMatchMedium_32      314ns ± 0%     312ns ± 0%  -0.64%  (p=0.000 n=10+10)
      RegexpMatchEasy0_1K       541ns ± 0%     538ns ± 0%  -0.55%  (p=0.000 n=10+9)
      TimeParse                 450ns ± 1%     448ns ± 1%  -0.42%  (p=0.035 n=9+9)
      RegexpMatchEasy1_32       244ns ± 0%     243ns ± 0%  -0.41%  (p=0.000 n=10+10)
      GoParse                  6.03ms ± 0%    6.00ms ± 0%  -0.40%  (p=0.002 n=10+10)
      RegexpMatchEasy1_1K       779ns ± 0%     777ns ± 0%  -0.26%  (p=0.000 n=10+10)
      RegexpMatchHard_32       2.75µs ± 0%    2.74µs ± 1%  -0.06%  (p=0.026 n=9+9)
      BinaryTree17              11.7s ± 0%     11.6s ± 0%    ~     (p=0.089 n=10+10)
      HTTPClientServer         89.1µs ± 1%    89.5µs ± 2%    ~     (p=0.436 n=10+10)
      RegexpMatchHard_1K       78.9µs ± 0%    79.5µs ± 2%    ~     (p=0.469 n=10+10)
      FmtFprintfEmpty          58.5ns ± 0%    58.5ns ± 0%    ~     (all equal)
      GobEncode                12.0ms ± 1%    12.1ms ± 0%    ~     (p=0.075 n=10+10)
      Revcomp                   669ms ± 0%     668ms ± 0%    ~     (p=0.091 n=7+9)
      Mandelbrot200            5.35ms ± 0%    5.36ms ± 0%  +0.07%  (p=0.000 n=9+9)
      RegexpMatchMedium_1K     52.1µs ± 0%    52.1µs ± 0%  +0.10%  (p=0.000 n=9+9)
      Fannkuch11                3.25s ± 0%     3.26s ± 0%  +0.36%  (p=0.000 n=9+10)
      FmtFprintfString          114ns ± 1%     115ns ± 0%  +0.52%  (p=0.011 n=10+10)
      JSONEncode               20.2ms ± 0%    20.3ms ± 0%  +0.65%  (p=0.000 n=10+10)
      Template                 91.3ms ± 0%    92.3ms ± 0%  +1.08%  (p=0.000 n=10+10)
      TimeFormat                484ns ± 0%     495ns ± 1%  +2.30%  (p=0.000 n=9+10)
      
      There are some opportunities to improve this change further by adding
      patterns to match the "extended register" versions of ADD/SUB/CMP, but I
      think that should be evaluated on its own.  The regressions in Template
      and TimeFormat would likely be recovered by this, as they seem to be due
      to generating:
      
          ubfiz x0, x0, #3, #8
          add x1, x2, x0
      
      instead of
      
          add x1, x2, x0, lsl #3
      
      Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b
      Reviewed-on: https://go-review.googlesource.com/88355Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      e244a7a7