1. 14 Feb, 2018 35 commits
    • Ilya Tocar's avatar
      cmd/compile/internal/amd64: update popcnt code generation · de4edf3d
      Ilya Tocar authored
      Popcnt has false dependency on output register and generates
      MOVQ $0, reg to break it. But recently we switched MOVQ $0, reg
      encoding from xor reg, reg  to actual mov $0, reg. This CL updates
      code generation for popcnt to use actual XOR.
      
      Change-Id: I4c1fc11e85758b53ba2679165fa55614ec54b27d
      Reviewed-on: https://go-review.googlesource.com/82516
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      de4edf3d
    • Heschi Kreinick's avatar
      cmd/compile/internal: pass LocalSlot values, not pointers · 9c4fd462
      Heschi Kreinick authored
      Because getStackOffset is a function pointer, the compiler assumes that
      its arguments escape. Pass a value instead to avoid heap allocations.
      
      Change-Id: Ib94e5941847f134cd00e873040a4d7fcf15ced26
      Reviewed-on: https://go-review.googlesource.com/92397
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      9c4fd462
    • Heschi Kreinick's avatar
      cmd/compile/internal: reuse memory for valueToProgAfter · b8644e32
      Heschi Kreinick authored
      Not a big improvement, but does help edge cases like the SSA package.
      Change-Id: I40e531110b97efd5f45955be477fd0f4faa8d545
      Reviewed-on: https://go-review.googlesource.com/92396
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      b8644e32
    • Heschi Kreinick's avatar
      cmd/compile/internal/ssa: use math/bits for register sets · 7ac756f7
      Heschi Kreinick authored
      Using bits.TrailingZeroes instead of iterating over each bit is a small
      but easy win for the common case of only one or two registers being set.
      
      I copied in the implementation for use with pre-1.9 bootstraps.
      
      Change-Id: Ieaa768554d7d5239a5617fbf34f1ee0b32ce1de5
      Reviewed-on: https://go-review.googlesource.com/92395
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      7ac756f7
    • Heschi Kreinick's avatar
      cmd/compile/internal/ssa: reduce location list memory use · 39eea623
      Heschi Kreinick authored
      Put everything that showed up in the allocation profile into the cache,
      and reuse it across functions.
      
      After this CL, the overhead of enabling location lists is getting
      pretty close to the desired 5%:
      
      compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4
      compilecmp  -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4  -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4
      benchstat -geomean  /tmp/869550129 /tmp/143495132
      completed   30 of   30, estimated time remaining 0s (eta 3:24PM)
      name        old time/op       new time/op       delta
      Template          199ms ± 4%        209ms ± 6%   +5.17%  (p=0.000 n=29+30)
      Unicode          99.2ms ± 8%      100.5ms ± 6%     ~     (p=0.112 n=30+30)
      GoTypes           642ms ± 3%        684ms ± 3%   +6.54%  (p=0.000 n=29+30)
      SSA               8.00s ± 1%        8.71s ± 1%   +8.78%  (p=0.000 n=29+29)
      Flate             129ms ± 7%        134ms ± 5%   +3.77%  (p=0.000 n=30+30)
      GoParser          157ms ± 4%        164ms ± 5%   +4.35%  (p=0.000 n=29+30)
      Reflect           428ms ± 3%        450ms ± 4%   +5.09%  (p=0.000 n=30+30)
      Tar               195ms ± 5%        204ms ± 8%   +4.78%  (p=0.000 n=30+30)
      XML               228ms ± 4%        241ms ± 4%   +5.62%  (p=0.000 n=30+29)
      StdCmd            15.4s ± 1%        16.7s ± 1%   +8.29%  (p=0.000 n=29+29)
      [Geo mean]        476ms             502ms        +5.35%
      
      name        old user-time/op  new user-time/op  delta
      Template          294ms ±18%        304ms ±15%     ~     (p=0.242 n=29+29)
      Unicode           182ms ±27%        172ms ±28%     ~     (p=0.104 n=30+30)
      GoTypes           957ms ±15%       1016ms ±12%   +6.16%  (p=0.000 n=30+30)
      SSA               13.3s ± 5%        14.3s ± 3%   +7.32%  (p=0.000 n=30+28)
      Flate             188ms ±17%        193ms ±17%     ~     (p=0.288 n=28+29)
      GoParser          232ms ±16%        238ms ±13%     ~     (p=0.065 n=30+29)
      Reflect           585ms ±13%        620ms ±10%   +5.88%  (p=0.000 n=30+30)
      Tar               298ms ±21%        332ms ±23%  +11.32%  (p=0.000 n=30+30)
      XML               329ms ±17%        343ms ±12%   +4.18%  (p=0.032 n=30+30)
      [Geo mean]        492ms             513ms        +4.13%
      
      name        old alloc/op      new alloc/op      delta
      Template         38.3MB ± 0%       40.3MB ± 0%   +5.29%  (p=0.000 n=30+30)
      Unicode          29.3MB ± 0%       29.6MB ± 0%   +1.28%  (p=0.000 n=30+29)
      GoTypes           110MB ± 0%        118MB ± 0%   +6.97%  (p=0.000 n=29+30)
      SSA              1.48GB ± 0%       1.61GB ± 0%   +9.06%  (p=0.000 n=30+30)
      Flate            24.8MB ± 0%       26.0MB ± 0%   +4.99%  (p=0.000 n=29+30)
      GoParser         30.9MB ± 0%       32.2MB ± 0%   +4.20%  (p=0.000 n=30+30)
      Reflect          76.8MB ± 0%       80.6MB ± 0%   +4.97%  (p=0.000 n=30+30)
      Tar              39.6MB ± 0%       41.7MB ± 0%   +5.22%  (p=0.000 n=29+30)
      XML              42.0MB ± 0%       45.4MB ± 0%   +8.22%  (p=0.000 n=29+30)
      [Geo mean]       63.9MB            67.5MB        +5.56%
      
      name        old allocs/op     new allocs/op     delta
      Template           383k ± 0%         405k ± 0%   +5.69%  (p=0.000 n=30+30)
      Unicode            343k ± 0%         346k ± 0%   +0.98%  (p=0.000 n=30+27)
      GoTypes           1.15M ± 0%        1.22M ± 0%   +6.17%  (p=0.000 n=29+29)
      SSA               12.2M ± 0%        13.2M ± 0%   +8.15%  (p=0.000 n=30+30)
      Flate              234k ± 0%         249k ± 0%   +6.44%  (p=0.000 n=30+30)
      GoParser           315k ± 0%         332k ± 0%   +5.31%  (p=0.000 n=30+28)
      Reflect            972k ± 0%        1010k ± 0%   +3.89%  (p=0.000 n=30+30)
      Tar                394k ± 0%         415k ± 0%   +5.35%  (p=0.000 n=28+30)
      XML                404k ± 0%         429k ± 0%   +6.31%  (p=0.000 n=29+29)
      [Geo mean]         651k              686k        +5.35%
      
      Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94
      Reviewed-on: https://go-review.googlesource.com/89357
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      39eea623
    • Heschi Kreinick's avatar
      cmd/compile: reimplement location list generation · 2075a932
      Heschi Kreinick authored
      Completely redesign and reimplement location list generation to be more
      efficient, and hopefully not too hard to understand.
      
      RegKills are gone. Instead of using the regalloc's liveness
      calculations, redo them using the Ops' clobber information. Besides
      saving a lot of Values, this avoids adding RegKills to blocks that would
      be empty otherwise, which was messing up optimizations. This does mean
      that it's much harder to tell whether the generation process is buggy
      (there's nothing to cross-check it with), and there may be disagreements
      with GC liveness. But the performance gain is significant, and it's nice
      not to be messing with earlier compiler phases.
      
      The intermediate representations are gone. Instead of producing
      ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
      location lists, go directly from the SSA to a (mostly) real location
      list. Because the SSA analysis happens before assembly, it stores
      encoded block/value IDs where PCs would normally go. It would be easier
      to do the SSA analysis after assembly, but I didn't want to retain the
      SSA just for that.
      
      Generation proceeds in two phases: first, it traverses the function in
      CFG order, storing the state of the block at the beginning and end. End
      states are used to produce the start states of the successor blocks. In
      the second phase, it traverses in program text order and produces the
      location lists. The processing in the second phase is redundant, but
      much cheaper than storing the intermediate representation. It might be
      possible to combine the two phases somewhat to take advantage of cases
      where the CFG matches the block layout, but I haven't tried.
      
      Location lists are finalized by adding a base address selection entry,
      translating each encoded block/value ID to a real PC, and adding the
      terminating zero entry. This probably won't work on OSX, where dsymutil
      will choke on the base address selection. I tried emitting CU-relative
      relocations for each address, and it was *very* bad for performance --
      it uses more memory storing all the relocations than it does for the
      actual location list bytes. I think I'm going to end up synthesizing the
      relocations in the linker only on OSX, but TBD.
      
      TestNexting needs updating: with more optimizations working, the
      debugger doesn't stop on the continue (line 88) any more, and the test's
      duplicate suppression kicks in. Also, dx and dy live a little longer
      now, but they have the correct values.
      
      Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
      Reviewed-on: https://go-review.googlesource.com/89356
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      2075a932
    • Heschi Kreinick's avatar
      cmd/compile/internal: decouple scope tracking from location lists · 7d7af610
      Heschi Kreinick authored
      We're trying to enable location lists by default, and it's easier to do
      that if we don't have to worry about scope tracking at the same time.
      We can evaluate their performance impact separately.
      
      However, that does mean that "err" is ambiguous in the test case, so
      rename it to err2 for now.
      
      Change-Id: I24f119016185c52b7d9affc74207f6a5b450fb6f
      Reviewed-on: https://go-review.googlesource.com/89355
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      7d7af610
    • Ian Lance Taylor's avatar
      runtime: use private futexes on Linux · 07751f4b
      Ian Lance Taylor authored
      By default futexes are permitted in shared memory regions, which
      requires the kernel to translate the memory address. Since our futexes
      are never in shared memory, set FUTEX_PRIVATE_FLAG, which makes futex
      operations slightly more efficient.
      
      Change-Id: I2a82365ed27d5cd8d53c5382ebaca1a720a80952
      Reviewed-on: https://go-review.googlesource.com/80144Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarDavid Crawshaw <crawshaw@golang.org>
      07751f4b
    • fanzha02's avatar
      cmd/asm: add PRFM instruction on ARM64 · ebd4950e
      fanzha02 authored
      The current assembler cannot handle PRFM(immediate) instruciton.
      The fix creates a prfopfield struct that contains the eight
      prefetch operations and the value to use in instruction. And add
      the test cases.
      
      Fixes #22932
      
      Change-Id: I621d611bd930ef3c42306a4372447c46d53b2ccf
      Reviewed-on: https://go-review.googlesource.com/81675
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      ebd4950e
    • Cherry Zhang's avatar
      cmd/internal/obj/mips: support NEG, avoid crash with illegal instruction · 0938e4cf
      Cherry Zhang authored
      Add support of NEG{V,W} pseudo-instructions, which are translated
      to a SUB instruction from R0 with proper width.
      
      Also turn illegal instruction to UNDEF, to avoid crashing in
      asmout when it tries to read the operands.
      
      Fixes #23548.
      
      Change-Id: I047b27559ccd9594c3dcf62ab039b636098f30a3
      Reviewed-on: https://go-review.googlesource.com/89896
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      0938e4cf
    • Cherry Zhang's avatar
      nacl*.bash: pass flags to make.bash · 1fccbfe9
      Cherry Zhang authored
      Just like all.bash passes flags to make.bash, I think it makes
      sense that naclmake.bash and nacltest.bash do so as well. For
      example, on a slow machine I can do "./nacltest.bash -v" to see
      the build progress.
      
      Change-Id: Id766dd590e6b83e8b5345822580dc1b05eac8ea3
      Reviewed-on: https://go-review.googlesource.com/93117
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarAustin Clements <austin@google.com>
      1fccbfe9
    • Cherry Zhang's avatar
      cmd/compile: CALLudiv on nacl/arm doesn't clobber R12 · 5a43a271
      Cherry Zhang authored
      On nacl/arm, R12 is clobbered by the RET instruction in function
      that has a frame. runtime.udiv doesn't have a frame, so it does
      not clobber R12.
      
      Change-Id: I0de448749f615908f6659e92d201ba3eb2f8266d
      Reviewed-on: https://go-review.googlesource.com/93116
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAustin Clements <austin@google.com>
      5a43a271
    • Cherry Zhang's avatar
      runtime/internal/atomic: add early nil check on ARM · 633b38c5
      Cherry Zhang authored
      If nil, fault before taking the lock or calling into the kernel.
      
      Change-Id: I013d78a5f9233c2a9197660025f679940655d384
      Reviewed-on: https://go-review.googlesource.com/93636
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarAustin Clements <austin@google.com>
      633b38c5
    • Cherry Zhang's avatar
      runtime/internal/atomic: unify sys_*_arm.s on non-linux · 97124af9
      Cherry Zhang authored
      Updates #23778.
      
      Change-Id: I80e57a15b6e3bbc2e25ea186399ff0e360fc5c21
      Reviewed-on: https://go-review.googlesource.com/93635
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAustin Clements <austin@google.com>
      97124af9
    • Martin Möhrmann's avatar
      cmd/compile: replace range loop over list of nodes with orderexprlistinplace · a90fc6d2
      Martin Möhrmann authored
      Replace explicit range loop that applies orderexprinplace on a
      list of nodes with existing helper function orderexprlistinplace.
      
      Passes toolstash -cmp.
      
      Change-Id: Ic8098ed08cf67f319de3faa83b00a5b73bbde95d
      Reviewed-on: https://go-review.googlesource.com/88815Reviewed-by: 's avatarDaniel Martí <mvdan@mvdan.cc>
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a90fc6d2
    • Andrew Bonventre's avatar
      github: add a Pull Request template · d009679d
      Andrew Bonventre authored
      Change-Id: I02938b2435e3a98efea7ee5545a6f8f5f6f794b4
      Reviewed-on: https://go-review.googlesource.com/93915Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      d009679d
    • Chad Rosier's avatar
      cmd/compile: generate tbz/tbnz when comparing against zero on arm64 · cdd96163
      Chad Rosier authored
      The tbz/tbnz checks the sign bit to determine if the value is >= 0 or < 0.
      
      go1 benchmark results:
      name                   old speed      new speed      delta
      JSONEncode             94.4MB/s ± 1%  95.7MB/s ± 0%  +1.36%  (p=0.000 n=10+9)
      JSONDecode             19.7MB/s ± 1%  19.9MB/s ± 1%  +1.08%  (p=0.000 n=9+10)
      Gzip                   45.5MB/s ± 0%  46.0MB/s ± 0%  +1.06%  (p=0.000 n=10+10)
      Revcomp                 376MB/s ± 0%   379MB/s ± 0%  +0.69%  (p=0.000 n=10+10)
      RegexpMatchHard_1K     12.6MB/s ± 0%  12.7MB/s ± 0%  +0.57%  (p=0.000 n=10+8)
      RegexpMatchMedium_32   3.21MB/s ± 0%  3.22MB/s ± 0%  +0.31%  (p=0.000 n=9+10)
      RegexpMatchEasy1_1K    1.27GB/s ± 0%  1.27GB/s ± 0%  +0.23%  (p=0.000 n=9+9)
      RegexpMatchHard_32     11.4MB/s ± 0%  11.4MB/s ± 1%  +0.19%  (p=0.036 n=10+8)
      RegexpMatchEasy0_1K    1.77GB/s ± 0%  1.77GB/s ± 0%  +0.13%  (p=0.000 n=9+10)
      RegexpMatchMedium_1K   19.3MB/s ± 0%  19.3MB/s ± 0%  +0.04%  (p=0.008 n=10+8)
      RegexpMatchEasy0_32     131MB/s ± 0%   131MB/s ± 0%    ~     (p=0.211 n=10+10)
      GobDecode              57.5MB/s ± 1%  57.6MB/s ± 2%    ~     (p=0.469 n=10+10)
      GobEncode              58.6MB/s ± 1%  58.5MB/s ± 2%    ~     (p=0.781 n=10+10)
      GoParse                9.40MB/s ± 0%  9.39MB/s ± 0%  -0.19%  (p=0.005 n=10+9)
      RegexpMatchEasy1_32     133MB/s ± 0%   133MB/s ± 0%  -0.48%  (p=0.000 n=10+10)
      Template               20.9MB/s ± 0%  20.6MB/s ± 0%  -1.54%  (p=0.000 n=8+10)
      
      Change-Id: I411efe44db35c3962445618d5a47c12e31b3925b
      Reviewed-on: https://go-review.googlesource.com/92715
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      cdd96163
    • Tobias Klauser's avatar
      syscall, os: use pipe2 syscall on NetBSD instead of pipe · eab06e65
      Tobias Klauser authored
      The pipe2 syscall is part of NetBSD since version 6.0 and thus exists in
      all officially supported versions (6.0 through 6.1 and 7.0+).
      
      Follows CL 38426
      
      Change-Id: I7b62b507300c3dfbcc6ae56408a7d7088ddccc77
      Reviewed-on: https://go-review.googlesource.com/94035
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarBenny Siegert <bsiegert@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      eab06e65
    • Nate Wilkinson's avatar
      cmd/go: put "go help" list in the right order, take 2 · 9dba56ba
      Nate Wilkinson authored
      The previous fix had "bug" and "build" in the wrong order.
      
      Fixes #23791
      
      Change-Id: I4897428516b159966c13c1054574c4f6fbf0fbac
      Reviewed-on: https://go-review.googlesource.com/94017Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      9dba56ba
    • Lorenz Bauer's avatar
      sync: enable profiling of RWMutex · 88ba6458
      Lorenz Bauer authored
      Include reader / writer interactions of RWMutex in the mutex profile.
      Writer contention is already included in the profile, since a plain Mutex
      is used to control exclusion.
      
      Fixes #18496
      
      Change-Id: Ib0dc1ffa0fd5e6d964a6f7764d7f09556eb63f00
      Reviewed-on: https://go-review.googlesource.com/87095
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarPeter Weinberger <pjw@google.com>
      88ba6458
    • Conrado Gouvea's avatar
      crypto/cipher: add NewGCMWithNonceAndTagSize for custom tag sizes. · 8cb4327e
      Conrado Gouvea authored
      GCM allows using tag sizes smaller than the block size. This adds a
      NewGCMWithNonceAndTagSize function which allows specifying the tag
      size.
      
      Fixes #19594
      
      Change-Id: Ib2008c6f13ad6d916638b1523c0ded8a80eaf42d
      Reviewed-on: https://go-review.googlesource.com/48510Reviewed-by: 's avatarFilippo Valsorda <hi@filippo.io>
      Run-TryBot: Filippo Valsorda <hi@filippo.io>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      8cb4327e
    • Filippo Valsorda's avatar
      crypto/rsa: improve error message for keys too short for PSS · c0094338
      Filippo Valsorda authored
      Fixes #23736
      
      Change-Id: I850d91a512394c4292927d51c475064bfa4e3053
      Reviewed-on: https://go-review.googlesource.com/92815Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      c0094338
    • Ian Lance Taylor's avatar
      reflect: add embedded field test · 9558ba29
      Ian Lance Taylor authored
      Gccgo failed this test.
      
      Updates #23620
      
      Change-Id: I3979a6d3b87d2d014850accf9cb7f356349e6195
      Reviewed-on: https://go-review.googlesource.com/91138
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarThan McIntosh <thanm@google.com>
      9558ba29
    • David Crawshaw's avatar
      runtime: remove extraneous stackPreempt setting · b03f1d1a
      David Crawshaw authored
      The stackguard is set to stackPreempt earlier in reentersyscall, and
      as it comes with throwsplit = true there's no way for the stackguard
      to be set to anything else by the end of reentersyscall.
      
      Change-Id: I4e942005b22ac784c52398c74093ac887fc8ec24
      Reviewed-on: https://go-review.googlesource.com/65673
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAustin Clements <austin@google.com>
      b03f1d1a
    • Ben Shi's avatar
      cmd/compile/internal/ssa: optimize arm64 with FNMULS/FNMULD · ebb77aa8
      Ben Shi authored
      FNMULS&FNMULD are efficient arm64 instructions, which can be used
      to improve FP performance. This CL use them to optimize pairs of neg-mul
      operations.
      
      Here are benchmark test results on Raspberry Pi 3 with ArchLinux.
      
      1. A special test case gets about 15% improvement.
      (https://github.com/benshi001/ugo1/blob/master/fpmul_test.go)
      FPMul-4                     485µs ± 0%     410µs ± 0%  -15.49%  (p=0.000 n=26+23)
      
      2. There is little regression in the go1 benchmark (excluding noise).
      name                     old time/op    new time/op    delta
      BinaryTree17-4              42.0s ± 3%     42.1s ± 2%    ~     (p=0.542 n=39+40)
      Fannkuch11-4                33.3s ± 3%     32.9s ± 1%    ~     (p=0.200 n=40+32)
      FmtFprintfEmpty-4           534ns ± 0%     534ns ± 0%    ~     (all equal)
      FmtFprintfString-4         1.09µs ± 1%    1.09µs ± 0%    ~     (p=0.950 n=32+32)
      FmtFprintfInt-4            1.14µs ± 0%    1.14µs ± 1%    ~     (p=0.571 n=32+31)
      FmtFprintfIntInt-4         1.79µs ± 3%    1.76µs ± 0%  -1.42%  (p=0.004 n=40+34)
      FmtFprintfPrefixedInt-4    2.17µs ± 0%    2.17µs ± 0%    ~     (p=0.073 n=31+34)
      FmtFprintfFloat-4          3.33µs ± 3%    3.28µs ± 0%  -1.46%  (p=0.001 n=40+34)
      FmtManyArgs-4              7.28µs ± 6%    7.19µs ± 0%    ~     (p=0.641 n=40+33)
      GobDecode-4                96.5ms ± 4%    96.5ms ± 9%    ~     (p=0.214 n=40+40)
      GobEncode-4                79.5ms ± 0%    80.7ms ± 4%  +1.51%  (p=0.000 n=34+40)
      Gzip-4                      4.53s ± 4%     4.56s ± 4%  +0.60%  (p=0.000 n=40+40)
      Gunzip-4                    451ms ± 3%     442ms ± 0%  -1.93%  (p=0.000 n=40+32)
      HTTPClientServer-4          530µs ± 1%     535µs ± 1%  +0.88%  (p=0.000 n=39+39)
      JSONEncode-4                214ms ± 4%     211ms ± 0%    ~     (p=0.059 n=40+31)
      JSONDecode-4                865ms ± 5%     864ms ± 4%  -0.06%  (p=0.003 n=40+40)
      Mandelbrot200-4            52.0ms ± 3%    52.1ms ± 3%    ~     (p=0.556 n=40+40)
      GoParse-4                  43.1ms ± 8%    42.1ms ± 0%    ~     (p=0.083 n=40+33)
      RegexpMatchEasy0_32-4      1.02µs ± 3%    1.02µs ± 4%  +0.06%  (p=0.020 n=40+40)
      RegexpMatchEasy0_1K-4      3.90µs ± 0%    3.96µs ± 3%  +1.58%  (p=0.000 n=31+40)
      RegexpMatchEasy1_32-4       967ns ± 4%     981ns ± 3%  +1.40%  (p=0.000 n=40+40)
      RegexpMatchEasy1_1K-4      6.41µs ± 4%    6.43µs ± 3%    ~     (p=0.386 n=40+40)
      RegexpMatchMedium_32-4     1.76µs ± 3%    1.78µs ± 3%  +1.08%  (p=0.000 n=40+40)
      RegexpMatchMedium_1K-4      561µs ± 0%     562µs ± 0%  +0.09%  (p=0.003 n=34+31)
      RegexpMatchHard_32-4       31.5µs ± 2%    31.1µs ± 4%  -1.17%  (p=0.000 n=30+40)
      RegexpMatchHard_1K-4        960µs ± 3%     950µs ± 4%  -1.02%  (p=0.016 n=40+40)
      Revcomp-4                   7.79s ± 7%     7.79s ± 4%    ~     (p=0.859 n=40+40)
      Template-4                  889ms ± 6%     872ms ± 3%  -1.86%  (p=0.025 n=40+31)
      TimeParse-4                4.80µs ± 0%    4.89µs ± 3%  +1.71%  (p=0.001 n=31+40)
      TimeFormat-4               4.70µs ± 1%    4.78µs ± 3%  +1.57%  (p=0.000 n=33+40)
      [Geo mean]                  710µs          709µs       -0.13%
      
      name                     old speed      new speed      delta
      GobDecode-4              7.96MB/s ± 4%  7.96MB/s ± 9%    ~     (p=0.174 n=40+40)
      GobEncode-4              9.65MB/s ± 0%  9.51MB/s ± 4%  -1.45%  (p=0.000 n=34+40)
      Gzip-4                   4.29MB/s ± 4%  4.26MB/s ± 4%  -0.59%  (p=0.000 n=40+40)
      Gunzip-4                 43.0MB/s ± 3%  43.9MB/s ± 0%  +1.90%  (p=0.000 n=40+32)
      JSONEncode-4             9.09MB/s ± 4%  9.22MB/s ± 0%    ~     (p=0.429 n=40+31)
      JSONDecode-4             2.25MB/s ± 5%  2.25MB/s ± 4%    ~     (p=0.278 n=40+40)
      GoParse-4                1.35MB/s ± 7%  1.37MB/s ± 0%    ~     (p=0.071 n=40+25)
      RegexpMatchEasy0_32-4    31.5MB/s ± 3%  31.5MB/s ± 4%  -0.08%  (p=0.018 n=40+40)
      RegexpMatchEasy0_1K-4     263MB/s ± 0%   259MB/s ± 3%  -1.51%  (p=0.000 n=31+40)
      RegexpMatchEasy1_32-4    33.1MB/s ± 4%  32.6MB/s ± 3%  -1.38%  (p=0.000 n=40+40)
      RegexpMatchEasy1_1K-4     160MB/s ± 4%   159MB/s ± 3%    ~     (p=0.364 n=40+40)
      RegexpMatchMedium_32-4    565kB/s ± 3%   562kB/s ± 2%    ~     (p=0.208 n=40+40)
      RegexpMatchMedium_1K-4   1.82MB/s ± 0%  1.82MB/s ± 0%  -0.27%  (p=0.000 n=34+31)
      RegexpMatchHard_32-4     1.02MB/s ± 3%  1.03MB/s ± 4%  +1.04%  (p=0.000 n=32+40)
      RegexpMatchHard_1K-4     1.07MB/s ± 4%  1.08MB/s ± 4%  +0.94%  (p=0.003 n=40+40)
      Revcomp-4                32.6MB/s ± 7%  32.6MB/s ± 4%    ~     (p=0.965 n=40+40)
      Template-4               2.18MB/s ± 6%  2.22MB/s ± 3%  +1.83%  (p=0.020 n=40+31)
      [Geo mean]               7.77MB/s       7.78MB/s       +0.16%
      
      3. There is little change in the compilecmp benchmark (excluding noise).
      name        old time/op       new time/op       delta
      Template          2.37s ± 3%        2.35s ± 4%    ~     (p=0.529 n=10+10)
      Unicode           1.38s ± 8%        1.36s ± 5%    ~     (p=0.247 n=10+10)
      GoTypes           8.10s ± 2%        8.10s ± 2%    ~     (p=0.971 n=10+10)
      Compiler          40.5s ± 4%        40.8s ± 1%    ~     (p=0.529 n=10+10)
      SSA                115s ± 2%         115s ± 3%    ~     (p=0.684 n=10+10)
      Flate             1.45s ± 5%        1.46s ± 3%    ~     (p=0.796 n=10+10)
      GoParser          1.86s ± 4%        1.84s ± 2%    ~     (p=0.095 n=9+10)
      Reflect           5.11s ± 2%        5.13s ± 2%    ~     (p=0.315 n=10+10)
      Tar               2.22s ± 3%        2.23s ± 1%    ~     (p=0.299 n=9+7)
      XML               2.72s ± 3%        2.72s ± 3%    ~     (p=0.912 n=10+10)
      [Geo mean]        5.03s             5.02s       -0.21%
      
      name        old user-time/op  new user-time/op  delta
      Template          2.92s ± 2%        2.89s ± 1%    ~     (p=0.247 n=10+10)
      Unicode           1.71s ± 5%        1.69s ± 4%    ~     (p=0.393 n=10+10)
      GoTypes           9.78s ± 2%        9.76s ± 2%    ~     (p=0.631 n=10+10)
      Compiler          49.1s ± 2%        49.1s ± 1%    ~     (p=0.796 n=10+10)
      SSA                144s ± 1%         144s ± 2%    ~     (p=0.796 n=10+10)
      Flate             1.74s ± 2%        1.73s ± 3%    ~     (p=0.842 n=10+9)
      GoParser          2.23s ± 3%        2.25s ± 2%    ~     (p=0.143 n=10+10)
      Reflect           5.93s ± 3%        5.98s ± 2%    ~     (p=0.211 n=10+9)
      Tar               2.65s ± 2%        2.69s ± 3%  +1.51%  (p=0.010 n=9+10)
      XML               3.25s ± 2%        3.21s ± 1%  -1.24%  (p=0.035 n=10+9)
      [Geo mean]        6.07s             6.07s       -0.08%
      
      name        old text-bytes    new text-bytes    delta
      HelloSize         641kB ± 0%        641kB ± 0%    ~     (all equal)
      
      name        old data-bytes    new data-bytes    delta
      HelloSize        9.46kB ± 0%       9.46kB ± 0%    ~     (all equal)
      
      name        old bss-bytes     new bss-bytes     delta
      HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize        1.24MB ± 0%       1.24MB ± 0%    ~     (all equal)
      
      Change-Id: Id095d998c380eef929755124084df02446a6b7c1
      Reviewed-on: https://go-review.googlesource.com/92555
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      ebb77aa8
    • Ian Lance Taylor's avatar
      misc/cgo/testcshared: increase sleep in TestUnexportedSymbols · 3773cbba
      Ian Lance Taylor authored
      Increase the sleep and wait for up to 2 seconds for the dup2.
      Apparently it can sometimes take a long time.
      
      Fixes #23784
      
      Change-Id: I929530b057bbcd842b28a7640c39dd68d719ff7d
      Reviewed-on: https://go-review.googlesource.com/93895
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      3773cbba
    • Daniel Martí's avatar
      path/filepath: fix escaped chars in Glob on non-Windows · 03f27d5f
      Daniel Martí authored
      Backslashes are ignored in Match and Glob on Windows, since those
      collide with the separator character. However, they should still work in
      both functions on other operating systems.
      
      hasMeta did not reflect this logic - it always treated a backslash as a
      non-special character. Do that only on Windows.
      
      Assuming this is what the TODO was referring to, remove it. There are no
      other characters that scanChunk treats especially.
      
      Fixes #23418.
      
      Change-Id: Ie0bd795812e0ed9d8c8c1bbc3137f29d960cba84
      Reviewed-on: https://go-review.googlesource.com/87455
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      03f27d5f
    • Daniel Martí's avatar
      path: remove filename mentions from pattern godocs · 821b04da
      Daniel Martí authored
      path.Match works purely with strings, not file paths. That's what sets
      it apart from filepath.Match. For example, only filepath.Match will
      change its behavior towards backslashes on Windows, to accomodate for
      the file path separator on that system.
      
      As such, path.Match should make no mention of file names. Nor should
      path.ErrBadPattern mention globbing at all - the package has no notion
      of globbing, and the error concerns only patterns.
      
      For a similar reason, remove the mention of globbing from
      filepath.ErrBadPattern. The error isn't reserved to just globbing, as it
      can be returned from filepath.Match. And, as before, it only concerns
      the patterns themselves.
      
      Change-Id: I58a83ffa3e2549625d8e546ef916652525504bd1
      Reviewed-on: https://go-review.googlesource.com/87857Reviewed-by: 's avatarRob Pike <r@golang.org>
      821b04da
    • Alberto Donizetti's avatar
      math/big: fix %s verbs in Float tests error messages · 331092c5
      Alberto Donizetti authored
      Fatalf calls in two Float tests use the %s verb with Floats values,
      which is not allowed and results in failure messages that look like
      this:
      
          float_test.go:1385: i = 0, prec = 1, ToZero:
                           %!s(*big.Float=1) [0]
                      /    %!s(*big.Float=1) [0]
                      =    %!s(*big.Float=0.0625)
                      want %!s(*big.Float=1)
      
      Switch to %v.
      
      Change-Id: Ifdc80bf19c91ca1b190f6551a6d0a51b42ed5919
      Reviewed-on: https://go-review.googlesource.com/87199
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      331092c5
    • Martin Möhrmann's avatar
      cmd/compile: change type of clear argument of ordercopyexpr to bool · 3d4c9cec
      Martin Möhrmann authored
      ordercopyexpr is only called with 0 or 1 as value for the clear
      argument. The clear variable in ordercopyexpr is only used in the
      call to ordertemp which has a clear argument of type bool.
      
      Change the clear argument of ordercopyexpr from int to bool and change
      calls to ordercopyexpr to use false instead of 0 and true instead of 1.
      
      Passes toolstash -cmp.
      
      Change-Id: Ic264aafd3b0c8b99f6ef028ffaa2e30f23f9125a
      Reviewed-on: https://go-review.googlesource.com/88115Reviewed-by: 's avatarDaniel Martí <mvdan@mvdan.cc>
      3d4c9cec
    • Martin Möhrmann's avatar
      internal/cpu: make arm64 capability bits naming less verbose · 57020705
      Martin Möhrmann authored
      This makes the constant names less verbose and aligns them more
      with the Linux kernel which uses HWCAP_XXX for the constant names.
      
      Change-Id: Ia7d079b59b57978adc045945951eaa1d99b41fac
      Reviewed-on: https://go-review.googlesource.com/91738
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      57020705
    • Tobias Klauser's avatar
      runtime: add symbol for AT_FDCWD on Linux amd64 and mips64x · 0e1bcfc6
      Tobias Klauser authored
      Also order the syscall number list by numerically for mips64x.
      
      Follow-up for CL 92895.
      
      Change-Id: I5f01f8c626132a06160997fce8a2aef0c486bb1c
      Reviewed-on: https://go-review.googlesource.com/93616
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarAustin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      0e1bcfc6
    • Agniva De Sarker's avatar
      doc/articles/wiki: highlight the use of _ warning · 32a0a1d3
      Agniva De Sarker authored
      This moves the paragraph mentioning the use of _ higher up
      to emphasize the warning and thereby reducing chances of getting
      stuck.
      
      Fixes #22617
      
      Change-Id: I64352a3e966a22d86fc9d381332bade49d74714a
      Reviewed-on: https://go-review.googlesource.com/87375Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      32a0a1d3
    • Tim Cooper's avatar
      encoding/hex: fix potential incorrect Dumper output when Close is called multiple times · 0519126a
      Tim Cooper authored
      Fixes #23574
      
      Change-Id: I69573de47daa6fd53cc99a78c0c4b867460242e3
      Reviewed-on: https://go-review.googlesource.com/90275Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      0519126a
    • Keith Randall's avatar
      cmd/compile: fix constant folding of right shifts · 755b36aa
      Keith Randall authored
      The sub-word shifts need to sign-extend before shifting, to avoid
      bringing in data from higher in the argument.
      
      Fixes #23812
      
      Change-Id: I0a95a0b49c48f3b40b85765bb4a9bb492be0cd73
      Reviewed-on: https://go-review.googlesource.com/93716
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      755b36aa
  2. 13 Feb, 2018 5 commits