1. 22 May, 2018 32 commits
    • Austin Clements's avatar
      cmd/compile: make LivenessMap dense · 75dadbec
      Austin Clements authored
      Currently liveness information is kept in a map keyed by *ssa.Value.
      This made sense when liveness information was sparse, but now we have
      liveness for nearly every ssa.Value. There's a fair amount of memory
      and CPU overhead to this map now.
      
      This CL replaces this map with a slice indexed by value ID.
      
      Passes toolstash -cmp.
      
      name        old time/op       new time/op       delta
      Template          197ms ± 1%        194ms ± 1%  -1.60%  (p=0.000 n=9+10)
      Unicode           100ms ± 2%         99ms ± 1%  -1.31%  (p=0.012 n=8+10)
      GoTypes           695ms ± 1%        689ms ± 0%  -0.94%  (p=0.000 n=10+10)
      Compiler          3.34s ± 2%        3.29s ± 1%  -1.26%  (p=0.000 n=10+9)
      SSA               8.08s ± 0%        8.02s ± 2%  -0.70%  (p=0.034 n=8+10)
      Flate             133ms ± 1%        131ms ± 1%  -1.04%  (p=0.006 n=10+9)
      GoParser          163ms ± 1%        162ms ± 1%  -0.79%  (p=0.034 n=8+10)
      Reflect           459ms ± 1%        454ms ± 0%  -1.06%  (p=0.000 n=10+8)
      Tar               186ms ± 1%        185ms ± 1%  -0.87%  (p=0.003 n=9+9)
      XML               238ms ± 1%        235ms ± 1%  -1.01%  (p=0.004 n=8+9)
      [Geo mean]        418ms             414ms       -1.06%
      
      name        old alloc/op      new alloc/op      delta
      Template         36.4MB ± 0%       35.6MB ± 0%  -2.29%  (p=0.000 n=9+10)
      Unicode          29.7MB ± 0%       29.5MB ± 0%  -0.68%  (p=0.000 n=10+10)
      GoTypes           119MB ± 0%        117MB ± 0%  -2.30%  (p=0.000 n=9+9)
      Compiler          546MB ± 0%        532MB ± 0%  -2.47%  (p=0.000 n=10+10)
      SSA              1.59GB ± 0%       1.55GB ± 0%  -2.41%  (p=0.000 n=10+10)
      Flate            24.9MB ± 0%       24.5MB ± 0%  -1.77%  (p=0.000 n=8+10)
      GoParser         29.5MB ± 0%       28.7MB ± 0%  -2.60%  (p=0.000 n=9+10)
      Reflect          81.7MB ± 0%       80.5MB ± 0%  -1.49%  (p=0.000 n=10+10)
      Tar              35.7MB ± 0%       35.1MB ± 0%  -1.64%  (p=0.000 n=10+10)
      XML              45.0MB ± 0%       43.7MB ± 0%  -2.76%  (p=0.000 n=9+10)
      [Geo mean]       80.1MB            78.4MB       -2.04%
      
      name        old allocs/op     new allocs/op     delta
      Template           336k ± 0%         335k ± 0%  -0.31%  (p=0.000 n=9+10)
      Unicode            339k ± 0%         339k ± 0%  -0.05%  (p=0.000 n=10+10)
      GoTypes           1.18M ± 0%        1.18M ± 0%  -0.26%  (p=0.000 n=10+10)
      Compiler          4.96M ± 0%        4.94M ± 0%  -0.24%  (p=0.000 n=10+10)
      SSA               12.6M ± 0%        12.5M ± 0%  -0.30%  (p=0.000 n=10+10)
      Flate              224k ± 0%         223k ± 0%  -0.30%  (p=0.000 n=10+10)
      GoParser           282k ± 0%         281k ± 0%  -0.32%  (p=0.000 n=10+10)
      Reflect            965k ± 0%         963k ± 0%  -0.27%  (p=0.000 n=9+10)
      Tar                331k ± 0%         330k ± 0%  -0.27%  (p=0.000 n=10+10)
      XML                393k ± 0%         392k ± 0%  -0.26%  (p=0.000 n=10+10)
      [Geo mean]         763k              761k       -0.26%
      
      Updates #24543.
      
      Change-Id: I4cfd2461510d3c026a262760bca225dc37482341
      Reviewed-on: https://go-review.googlesource.com/110178
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      75dadbec
    • Austin Clements's avatar
      cmd/compile: incrementally compact liveness maps · 3c36b8be
      Austin Clements authored
      The per-Value slice of liveness maps is currently one of the largest
      sources of allocation in the compiler. On cmd/compile/internal/ssa,
      it's 5% of overall allocation, or 75MB in total. Enabling liveness
      maps everywhere significantly increased this allocation footprint,
      which in turn slowed down the compiler.
      
      Improve this by compacting the liveness maps after every block is
      processed. There are typically very few distinct liveness maps, so
      compacting the maps after every block, rather than at the end of the
      function, can significantly reduce these allocations.
      
      Passes toolstash -cmp.
      
      name        old time/op       new time/op       delta
      Template          198ms ± 2%        196ms ± 1%  -1.11%  (p=0.008 n=9+10)
      Unicode           100ms ± 1%         99ms ± 1%  -0.94%  (p=0.015 n=8+9)
      GoTypes           703ms ± 2%        695ms ± 1%  -1.15%  (p=0.000 n=10+10)
      Compiler          3.38s ± 3%        3.33s ± 0%  -1.66%  (p=0.000 n=10+9)
      SSA               7.96s ± 1%        7.93s ± 1%    ~ 	(p=0.113 n=9+10)
      Flate             134ms ± 1%        132ms ± 1%  -1.30%  (p=0.000 n=8+10)
      GoParser          165ms ± 2%        163ms ± 1%  -1.32%  (p=0.013 n=9+10)
      Reflect           462ms ± 2%        459ms ± 0%  -0.65%  (p=0.036 n=9+8)
      Tar               188ms ± 2%        186ms ± 1%    ~     (p=0.173 n=8+10)
      XML               243ms ± 7%        239ms ± 1%    ~     (p=0.684 n=10+10)
      [Geo mean]        421ms             416ms       -1.10%
      
      name        old alloc/op      new alloc/op      delta
      Template         38.0MB ± 0%       36.5MB ± 0%  -3.98%  (p=0.000 n=10+10)
      Unicode          30.3MB ± 0%       29.6MB ± 0%  -2.21% 	(p=0.000 n=10+10)
      GoTypes           125MB ± 0%        120MB ± 0%  -4.51% 	(p=0.000 n=10+9)
      Compiler          575MB ± 0%        546MB ± 0%  -5.06% 	(p=0.000 n=10+10)
      SSA              1.64GB ± 0%       1.55GB ± 0%  -4.97% 	(p=0.000 n=10+10)
      Flate            25.9MB ± 0%       25.0MB ± 0%  -3.41% 	(p=0.000 n=10+10)
      GoParser         30.7MB ± 0%       29.5MB ± 0%  -3.97% 	(p=0.000 n=10+10)
      Reflect          84.1MB ± 0%       81.9MB ± 0%  -2.64% 	(p=0.000 n=10+10)
      Tar              37.0MB ± 0%       35.8MB ± 0%  -3.27% 	(p=0.000 n=10+9)
      XML              47.2MB ± 0%       45.0MB ± 0%  -4.57% 	(p=0.000 n=10+10)
      [Geo mean]       83.2MB            79.9MB       -3.86%
      
      name        old allocs/op     new allocs/op     delta
      Template           337k ± 0%         337k ± 0%  -0.06%  (p=0.000 n=10+10)
      Unicode            340k ± 0%         340k ± 0%  -0.01% 	(p=0.014 n=10+10)
      GoTypes           1.18M ± 0%        1.18M ± 0%  -0.04% 	(p=0.000 n=10+10)
      Compiler          4.97M ± 0%        4.97M ± 0%  -0.03% 	(p=0.000 n=10+10)
      SSA               12.3M ± 0%        12.3M ± 0%  -0.01% 	(p=0.000 n=10+10)
      Flate              226k ± 0%         225k ± 0%  -0.09% 	(p=0.000 n=10+10)
      GoParser           283k ± 0%         283k ± 0%  -0.06% 	(p=0.000 n=10+9)
      Reflect            972k ± 0%         971k ± 0%  -0.04% 	(p=0.000 n=10+8)
      Tar                333k ± 0%         332k ± 0%  -0.05% 	(p=0.000 n=10+9)
      XML                395k ± 0%         395k ± 0%  -0.04% 	(p=0.000 n=10+10)
      [Geo mean]         764k              764k       -0.04%
      
      Updates #24543.
      
      Change-Id: I6fdc46e4ddb6a8eea95d38242345205eb8397f0b
      Reviewed-on: https://go-review.googlesource.com/110177
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      3c36b8be
    • Austin Clements's avatar
      cmd/compile: abstract bvec sets · 3c4aaf8a
      Austin Clements authored
      This moves the bvec hash table logic out of Liveness.compact and into
      a bvecSet type. Furthermore, the bvecSet type has the ability to grow
      dynamically, which the current implementation doesn't. In addition to
      making the code cleaner, this will make it possible to incrementally
      compact liveness bitmaps.
      
      Passes toolstash -cmp
      
      Updates #24543.
      
      Change-Id: I46c53e504494206061a1f790ae4a02d768a65681
      Reviewed-on: https://go-review.googlesource.com/110176
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      3c4aaf8a
    • Austin Clements's avatar
      cmd/compile: single pass over Blocks in Liveness.epilogue · 577c05ca
      Austin Clements authored
      Currently Liveness.epilogue makes three passes over the Blocks, but
      there's no need to do this. Combine them into a single pass. This
      eliminates the need for blockEffects.lastbitmapindex, but, more
      importantly, will let us incrementally compact the liveness bitmaps
      and significantly reduce allocatons in Liveness.epilogue.
      
      Passes toolstash -cmp.
      
      Updates #24543.
      
      Change-Id: I27802bcd00d23aa122a7ec16cdfd739ae12dd7aa
      Reviewed-on: https://go-review.googlesource.com/110175
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      577c05ca
    • Hana Kim's avatar
      cmd/vendor/README: temporary instruction for update · 51be90a2
      Hana Kim authored
      Until vgo sorts out and cleans up the vendoring process.
      
      Ran govendor to update packages the cmd/pprof depends on
      which resulted in deletion of some of unnecessary files.
      
      Change-Id: Idfba53e94414e90a5e280222750a6df77e979a16
      Reviewed-on: https://go-review.googlesource.com/114079
      Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
      Reviewed-by: 's avatarDaniel Theophanes <kardianos@gmail.com>
      51be90a2
    • David Chase's avatar
      cmd/link: revert DWARF version to 2 for .debug_lines · 402dd10a
      David Chase authored
      On OSX 10.12 and earlier, paired with XCode 9.0,
      specifying DWARF version 3 causes dsymutil to misbehave.
      Version 2 appears to be good enough to allow processing
      of the prologue_end opcode on (at least one version of)
      Linux and OSX 10.13.
      
      Fixes #25451.
      
      Change-Id: Ic760e34248393a5386be96351c8e492da1d3413b
      Reviewed-on: https://go-review.googlesource.com/114015Reviewed-by: 's avatarAlessandro Arzilli <alessandro.arzilli@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      402dd10a
    • Martin Möhrmann's avatar
      internal/cpu: fix test build on ppc64 · 685ecc7f
      Martin Möhrmann authored
      The runtime import is unused.
      
      Change-Id: I37fe210256ddafa579d9e6d64f3f0db78581974e
      Reviewed-on: https://go-review.googlesource.com/114175
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      Reviewed-by: 's avatarAustin Clements <austin@google.com>
      685ecc7f
    • Austin Clements's avatar
      runtime: fix defer matching of leaf functions on LR machines · e391fade
      Austin Clements authored
      Traceback matches the defer stack with the function call stack using
      the SP recorded in defer frames when the defer frame is created.
      However, on LR machines this is ambiguous: if function A pushes a
      defer and then calls function B, where B is a leaf function with a
      zero-sized frame, then both A and B have the same SP and will *both*
      match the defer on the defer stack. Since traceback unwinds through B
      first, it will incorrectly match up the defer with B's frame instead
      of A's frame.
      
      Where this goes particularly wrong is if function B causes a signal
      that turns into a panic (e.g., a nil pointer dereference). In order to
      handle the fact that we may not have a liveness map at the location
      that caused the signal and injected a sigpanic call, traceback has
      logic to unwind the panicking frame's continuation PC to the PC where
      the most recent defer was pushed (this is safe because the frame is
      dead other than any defers it pushed). However, if traceback
      mis-matches the defer stack, it winds up reporting the B's
      continuation PC is in A. If the runtime then uses this continuation PC
      to look up PCDATA in B, it will panic because the PC is out of range
      for B. This failure mode can be seen in
      sync/atomic/atomic_test.go:TestNilDeref. An example failure is:
      https://build.golang.org/log/8e07a762487839252af902355f6b1379dbd463c5
      
      This CL fixes all of this by recognizing that a function that pushes a
      defer must also have a non-zero-sized frame and using this fact to
      refine the defer matching logic.
      
      Fixes the build for arm64, mips, mipsle, ppc64, ppc64le, and s390x.
      
      Fixes #25499.
      
      Change-Id: Iff7c01d08ad42f3de22b3a73658cc2f674900101
      Reviewed-on: https://go-review.googlesource.com/114078
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      e391fade
    • Martin Möhrmann's avatar
      internal/cpu: add experiment to disable CPU features with GODEBUGCPU · f045ddc6
      Martin Möhrmann authored
      Needs the go compiler to be build with GOEXPERIMENT=debugcpu to be active.
      
      The GODEBUGCPU environment variable can be used to disable usage of
      specific processor features in the Go standard library.
      This is useful for testing and benchmarking different code paths that
      are guarded by internal/cpu variable checks.
      
      Use of processor features can not be enabled through GODEBUGCPU.
      
      To disable usage of AVX and SSE41 cpu features on GOARCH amd64 use:
      GODEBUGCPU=avx=0,sse41=0
      
      The special "all" option can be used to disable all options:
      GODEBUGCPU=all=0
      
      Updates #12805
      Updates #15403
      
      Change-Id: I699c2e6f74d98472b6fb4b1e5ffbf29b15697aab
      Reviewed-on: https://go-review.googlesource.com/91737
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      f045ddc6
    • Zhongpeng Lin's avatar
      go/build: call ctxt.match for checking file name constraints · 3d15f768
      Zhongpeng Lin authored
      This makes the checking of build tags in file names consistent to that of the build tags in `// +build` line.
      
      Fixed #25461
      
      Change-Id: Iba14d1050f8aba44e7539ab3b8711af1980ccfe4
      GitHub-Last-Rev: 11b14e239dd85e11e669919aab45494aee7c59a3
      GitHub-Pull-Request: golang/go#25480
      Reviewed-on: https://go-review.googlesource.com/113818
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      3d15f768
    • Martin Sucha's avatar
      crypto/x509: document fields used in CreateCertificate · a10d3906
      Martin Sucha authored
      The added fields are used in buildExtensions so
      should be documented too.
      
      Fixes #21363
      
      Change-Id: Ifcc11da5b690327946c2488bcf4c79c60175a339
      Reviewed-on: https://go-review.googlesource.com/113916Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      a10d3906
    • Martin Sucha's avatar
      crypto/x509: reformat template members in docs · 5c8f65b9
      Martin Sucha authored
      It's easier to skim a list of items visually when the
      items are each on a separate line. Separate lines also
      help reduce diff size when items are added/removed.
      
      The list is indented so that it's displayed preformatted
      in HTML output as godoc doesn't support formatting lists
      natively yet (see #7873).
      
      Change-Id: Ibf9e92437e4b464ba58ea3ccef579e8df4745d75
      Reviewed-on: https://go-review.googlesource.com/113915Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      5c8f65b9
    • Alberto Donizetti's avatar
      log/syslog: skip tests that depend on daemon on builders · 37c11dc0
      Alberto Donizetti authored
      Some functions in log/syslog depend on syslogd running. Instead of
      treating errors caused by the daemon not running as test failures,
      ignore them and skip the test.
      
      Fixes the longtest builder.
      
      Change-Id: I628fe4aab5f1a505edfc0748861bb976ed5917ea
      Reviewed-on: https://go-review.googlesource.com/113838
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      37c11dc0
    • dchenk's avatar
      encoding/base32: remove redundant conditional · faa69b90
      dchenk authored
      Immediately following the conditional block removed here is a loop
      which checks exactly what the conditional already checked, so the
      entire conditional is redundant.
      
      Change-Id: I892fd9f2364d87e2c1cacb0407531daec6643183
      Reviewed-on: https://go-review.googlesource.com/114000Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      faa69b90
    • Keith Randall's avatar
      cmd/compile: add rulegen diagnostic · 31ef3846
      Keith Randall authored
      When rulegen complains about a missing type, report the line number
      in the rules file.
      
      Change-Id: Ic7c19e1d5f29547911909df5788945848a6080ff
      Reviewed-on: https://go-review.googlesource.com/114004Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      31ef3846
    • Austin Clements's avatar
      runtime: support for debugger function calls · c5ed10f3
      Austin Clements authored
      This adds a mechanism for debuggers to safely inject calls to Go
      functions on amd64. Debuggers must participate in a protocol with the
      runtime, and need to know how to lay out a call frame, but the runtime
      support takes care of the details of handling live pointers in
      registers, stack growth, and detecting the trickier conditions when it
      is unsafe to inject a user function call.
      
      Fixes #21678.
      Updates derekparker/delve#119.
      
      Change-Id: I56d8ca67700f1f77e19d89e7fc92ab337b228834
      Reviewed-on: https://go-review.googlesource.com/109699
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      c5ed10f3
    • Austin Clements's avatar
      cmd/compile, cmd/internal/obj: record register maps in binary · 9f95c9db
      Austin Clements authored
      This adds FUNCDATA and PCDATA that records the register maps much like
      the existing live arguments maps and live locals maps. The register
      map is indexed independently from the argument and locals maps since
      changes in register liveness tend not to correlate with changes to
      argument and local liveness.
      
      This is the final CL toward adding safe-points everywhere. The
      following CLs will optimize liveness analysis to bring down the cost.
      The effect of this CL is:
      
      name        old time/op       new time/op       delta
      Template          195ms ± 2%        197ms ± 1%    ~     (p=0.136 n=9+9)
      Unicode          98.4ms ± 2%       99.7ms ± 1%  +1.39%  (p=0.004 n=10+10)
      GoTypes           685ms ± 1%        700ms ± 1%  +2.06%  (p=0.000 n=9+9)
      Compiler          3.28s ± 1%        3.34s ± 0%  +1.71%  (p=0.000 n=9+8)
      SSA               7.79s ± 1%        7.91s ± 1%  +1.55%  (p=0.000 n=10+9)
      Flate             133ms ± 2%        133ms ± 2%    ~     (p=0.190 n=10+10)
      GoParser          161ms ± 2%        164ms ± 3%  +1.83%  (p=0.015 n=10+10)
      Reflect           450ms ± 1%        457ms ± 1%  +1.62%  (p=0.000 n=10+10)
      Tar               183ms ± 2%        185ms ± 1%  +0.91%  (p=0.008 n=9+10)
      XML               234ms ± 1%        238ms ± 1%  +1.60%  (p=0.000 n=9+9)
      [Geo mean]        411ms             417ms       +1.40%
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize         1.47M ± 0%        1.51M ± 0%  +2.79%  (p=0.000 n=10+10)
      
      Compared to just before "cmd/internal/obj: consolidate emitting entry
      stack map", the cumulative effect of adding stack maps everywhere and
      register maps is:
      
      name        old time/op       new time/op       delta
      Template          185ms ± 2%        197ms ± 1%   +6.42%  (p=0.000 n=10+9)
      Unicode          96.3ms ± 3%       99.7ms ± 1%   +3.60%  (p=0.000 n=10+10)
      GoTypes           658ms ± 0%        700ms ± 1%   +6.37%  (p=0.000 n=10+9)
      Compiler          3.14s ± 1%        3.34s ± 0%   +6.53%  (p=0.000 n=9+8)
      SSA               7.41s ± 2%        7.91s ± 1%   +6.71%  (p=0.000 n=9+9)
      Flate             126ms ± 1%        133ms ± 2%   +6.15%  (p=0.000 n=10+10)
      GoParser          153ms ± 1%        164ms ± 3%   +6.89%  (p=0.000 n=10+10)
      Reflect           437ms ± 1%        457ms ± 1%   +4.59%  (p=0.000 n=10+10)
      Tar               178ms ± 1%        185ms ± 1%   +4.18%  (p=0.000 n=10+10)
      XML               223ms ± 1%        238ms ± 1%   +6.39%  (p=0.000 n=10+9)
      [Geo mean]        394ms             417ms        +5.78%
      
      name        old alloc/op      new alloc/op      delta
      Template         34.5MB ± 0%       38.0MB ± 0%  +10.19%  (p=0.000 n=10+10)
      Unicode          29.3MB ± 0%       30.3MB ± 0%   +3.56%  (p=0.000 n=8+9)
      GoTypes           113MB ± 0%        125MB ± 0%  +10.89%  (p=0.000 n=10+10)
      Compiler          510MB ± 0%        575MB ± 0%  +12.79%  (p=0.000 n=10+10)
      SSA              1.46GB ± 0%       1.64GB ± 0%  +12.40%  (p=0.000 n=10+10)
      Flate            23.9MB ± 0%       25.9MB ± 0%   +8.56%  (p=0.000 n=10+10)
      GoParser         28.0MB ± 0%       30.8MB ± 0%  +10.08%  (p=0.000 n=10+10)
      Reflect          77.6MB ± 0%       84.3MB ± 0%   +8.63%  (p=0.000 n=10+10)
      Tar              34.1MB ± 0%       37.0MB ± 0%   +8.44%  (p=0.000 n=10+10)
      XML              42.7MB ± 0%       47.2MB ± 0%  +10.75%  (p=0.000 n=10+10)
      [Geo mean]       76.0MB            83.3MB        +9.60%
      
      name        old allocs/op     new allocs/op     delta
      Template           321k ± 0%         337k ± 0%   +4.98%  (p=0.000 n=10+10)
      Unicode            337k ± 0%         340k ± 0%   +1.04%  (p=0.000 n=10+9)
      GoTypes           1.13M ± 0%        1.18M ± 0%   +4.85%  (p=0.000 n=10+10)
      Compiler          4.67M ± 0%        4.96M ± 0%   +6.25%  (p=0.000 n=10+10)
      SSA               11.7M ± 0%        12.3M ± 0%   +5.69%  (p=0.000 n=10+10)
      Flate              216k ± 0%         226k ± 0%   +4.52%  (p=0.000 n=10+9)
      GoParser           271k ± 0%         283k ± 0%   +4.52%  (p=0.000 n=10+10)
      Reflect            927k ± 0%         972k ± 0%   +4.78%  (p=0.000 n=10+10)
      Tar                318k ± 0%         333k ± 0%   +4.56%  (p=0.000 n=10+10)
      XML                376k ± 0%         395k ± 0%   +5.04%  (p=0.000 n=10+10)
      [Geo mean]         730k              764k        +4.61%
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize         1.46M ± 0%        1.51M ± 0%   +3.66%  (p=0.000 n=10+10)
      
      For #24543.
      
      Change-Id: I91e003dc64151916b384274884bf02a2d6862547
      Reviewed-on: https://go-review.googlesource.com/109353
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      9f95c9db
    • Austin Clements's avatar
      cmd/compile: compute register liveness maps · 61158c16
      Austin Clements authored
      This extends the liveness analysis to track registers containing live
      pointers. We do this by tracking bitmaps for live pointer registers
      in parallel with bitmaps for stack variables.
      
      This does not yet do anything with these liveness maps, though they do
      appear in the debug output for -live=2.
      
      We'll optimize this in later CLs:
      
      name        old time/op       new time/op       delta
      Template          193ms ± 5%        195ms ± 2%    ~     (p=0.050 n=9+9)
      Unicode          97.7ms ± 2%       98.4ms ± 2%    ~     (p=0.315 n=9+10)
      GoTypes           674ms ± 2%        685ms ± 1%  +1.72%  (p=0.001 n=9+9)
      Compiler          3.21s ± 1%        3.28s ± 1%  +2.28%  (p=0.000 n=10+9)
      SSA               7.70s ± 1%        7.79s ± 1%  +1.07%  (p=0.015 n=10+10)
      Flate             130ms ± 3%        133ms ± 2%  +2.19%  (p=0.003 n=10+10)
      GoParser          159ms ± 3%        161ms ± 2%  +1.51%  (p=0.019 n=10+10)
      Reflect           444ms ± 1%        450ms ± 1%  +1.43%  (p=0.000 n=9+10)
      Tar               181ms ± 2%        183ms ± 2%  +1.45%  (p=0.010 n=10+9)
      XML               230ms ± 1%        234ms ± 1%  +1.56%  (p=0.000 n=8+9)
      [Geo mean]        405ms             411ms       +1.48%
      
      No effect on binary size because we're not yet emitting the register
      maps.
      
      For #24543.
      
      Change-Id: Ieb022f0aea89c0ea9a6f035195bce2f0e67dbae4
      Reviewed-on: https://go-review.googlesource.com/109352
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      61158c16
    • Austin Clements's avatar
      cmd/compile: dense numbering for GP registers · 840f25be
      Austin Clements authored
      For register maps, we need a dense numbering of registers that may
      contain pointers of interest to the garbage collector. Add this to
      Register and compute it from the GP register set.
      
      For #24543.
      
      Change-Id: If6f0521effca5eca4d17895468b1fc52d67e0f32
      Reviewed-on: https://go-review.googlesource.com/109351
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      840f25be
    • Austin Clements's avatar
      cmd/compile: fix ARM64 build · 4f765b18
      Austin Clements authored
      Write barrier unsafe-point analysis needs to flow through
      OpARM64MOVWUload in c-shared mode.
      
      Change-Id: I4f06f54d9e74a739a1b4fcb9ab0a1ae9b7b88a95
      Reviewed-on: https://go-review.googlesource.com/114077
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      4f765b18
    • Austin Clements's avatar
      cmd/compile: fix unsafe-point analysis with -N · 97bea970
      Austin Clements authored
      Compiling without optimizations (-N) can result in write barrier
      blocks that have been optimized away but not actually pruned from the
      block set. Fix unsafe-point analysis to recognize and ignore these.
      
      For #24543.
      
      Change-Id: I2ca86fb1a0346214ec71d7d6c17b6a121857b01d
      Reviewed-on: https://go-review.googlesource.com/114076
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      97bea970
    • isharipo's avatar
      cmd/asm: enable AVX512 · 5437cde9
      isharipo authored
      - Uncomment tests for AVX512 encoder
      - Permit instruction suffixes for x86
      - Permit limited reg list [reg-reg] syntax for x86 for multi-source ops
      - EVEX encoding support in obj/x86 (Z-cases, asmevex, etc.)
      - optabs and ytabs generated by x86avxgen (https://golang.org/cl/107216)
      
      Note: suffix formatting implemented with updated CConv function.
      Now arch asm backend should register formatting function by
      calling RegisterOpSuffix.
      
      Updates #22779
      
      Change-Id: I076a167ee49582700e058c56ad74e6696710c8c8
      Reviewed-on: https://go-review.googlesource.com/113315
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      5437cde9
    • Ben Shi's avatar
      test/codegen: improve test cases for arm64 · 8a85bce2
      Ben Shi authored
      1. Some incorrect test cases are disabled.
      2. Some wrong test cases are corrected.
      3. Some new test cases are added.
      
      Change-Id: Ib5d0473d55159f233ddab79f96967eaec7b08597
      Reviewed-on: https://go-review.googlesource.com/113736Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      8a85bce2
    • Austin Clements's avatar
      cmd/compile: output stack map index everywhere it changes · ed7a0682
      Austin Clements authored
      Currently, the code generator only considers outputting stack map
      indexes at CALL instructions. Raise this into the code generator loop
      itself so that changes in the stack map index at any instruction emit
      a PCDATA Prog before the actual instruction.
      
      We'll optimize this in later CLs:
      
      name        old time/op       new time/op       delta
      Template          190ms ± 2%        191ms ± 2%    ~     (p=0.529 n=10+10)
      Unicode          96.4ms ± 1%       98.5ms ± 3%  +2.18%  (p=0.001 n=9+10)
      GoTypes           669ms ± 1%        673ms ± 1%  +0.62%  (p=0.004 n=9+9)
      Compiler          3.18s ± 1%        3.22s ± 1%  +1.06%  (p=0.000 n=10+9)
      SSA               7.59s ± 1%        7.64s ± 1%  +0.66%  (p=0.023 n=10+10)
      Flate             128ms ± 1%        130ms ± 2%  +1.07%  (p=0.043 n=10+10)
      GoParser          157ms ± 2%        158ms ± 3%    ~     (p=0.123 n=10+10)
      Reflect           442ms ± 1%        445ms ± 1%  +0.73%  (p=0.017 n=10+9)
      Tar               179ms ± 1%        180ms ± 1%  +0.58%  (p=0.019 n=9+9)
      XML               229ms ± 1%        232ms ± 2%  +1.27%  (p=0.009 n=10+10)
      [Geo mean]        401ms             405ms       +0.94%
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize         1.46M ± 0%        1.47M ± 0%  +0.84%  (p=0.000 n=10+10)
      [Geo mean]        1.46M             1.47M       +0.84%
      
      For #24543.
      
      Change-Id: I4bfe45b767c9d9db47308a27763b303fa75bfa54
      Reviewed-on: https://go-review.googlesource.com/109350
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      ed7a0682
    • Austin Clements's avatar
      cmd/compile: enable stack maps everywhere except unsafe points · a367f44c
      Austin Clements authored
      This modifies issafepoint in liveness analysis to report almost every
      operation as a safe point. There are four things we don't mark as
      safe-points:
      
      1. Runtime code (other than at calls).
      
      2. go:nosplit functions (other than at calls).
      
      3. Instructions between the load of the write barrier-enabled flag and
         the write.
      
      4. Instructions leading up to a uintptr -> unsafe.Pointer conversion.
      
      We'll optimize this in later CLs:
      
      name        old time/op       new time/op       delta
      Template          185ms ± 2%        190ms ± 2%   +2.95%  (p=0.000 n=10+10)
      Unicode          96.3ms ± 3%       96.4ms ± 1%     ~     (p=0.905 n=10+9)
      GoTypes           658ms ± 0%        669ms ± 1%   +1.72%  (p=0.000 n=10+9)
      Compiler          3.14s ± 1%        3.18s ± 1%   +1.56%  (p=0.000 n=9+10)
      SSA               7.41s ± 2%        7.59s ± 1%   +2.48%  (p=0.000 n=9+10)
      Flate             126ms ± 1%        128ms ± 1%   +2.08%  (p=0.000 n=10+10)
      GoParser          153ms ± 1%        157ms ± 2%   +2.38%  (p=0.000 n=10+10)
      Reflect           437ms ± 1%        442ms ± 1%   +0.98%  (p=0.001 n=10+10)
      Tar               178ms ± 1%        179ms ± 1%   +0.67%  (p=0.035 n=10+9)
      XML               223ms ± 1%        229ms ± 1%   +2.58%  (p=0.000 n=10+10)
      [Geo mean]        394ms             401ms        +1.75%
      
      No effect on binary size because we're not yet emitting these extra
      safe points.
      
      For #24543.
      
      Change-Id: I16a1eebb9183cad7cef9d53c0fd21a973cad6859
      Reviewed-on: https://go-review.googlesource.com/109348
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      a367f44c
    • Austin Clements's avatar
      cmd/compile: introduce LivenessMap and LivenessIndex · d7d9df8a
      Austin Clements authored
      Currently liveness only produces a stack map index at each safe point,
      so the information is summarized in a map[*ssa.Value]int. We're about
      to have both a stack map index and a register map index, so replace
      the int with a LivenessIndex type we can extend, and replace the map
      with a LivenessMap that we can also change more easily in the future.
      
      This also gives us an easy hook for defining the value that means "not
      a safe point".
      
      Passes toolstash -cmp.
      
      For #24543.
      
      Change-Id: Ic4c069839635efed4fd0f603899b80f8be3b56ec
      Reviewed-on: https://go-review.googlesource.com/109347
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      d7d9df8a
    • Austin Clements's avatar
      cmd/internal/obj: consolidate emitting entry stack map · 02495da6
      Austin Clements authored
      The obj package needs to emit the PCDATA to select the entry stack map
      before calling morestack. Currently this is copied for every
      architecture. Since we're about to change how this works, consolidate
      all of these copies into a single helper function.
      
      For #24543.
      
      Change-Id: Ia92d94de78f8e23fd06dba747c43e03e5989f67b
      Reviewed-on: https://go-review.googlesource.com/109346
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      02495da6
    • Austin Clements's avatar
      cmd/compile: don't produce a past-the-end pointer in range loops · 837ed98d
      Austin Clements authored
      Currently, range loops over slices and arrays are compiled roughly
      like:
      
      for i, x := range s { b }
        ⇓
      for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b }
        ⇓
      i, _n, _p := 0, len(s), &s[0]
      goto cond
      body:
      { b }
      i, _p = i+1, _p + unsafe.Sizeof(s[0])
      cond:
      if i < _n { goto body } else { goto end }
      end:
      
      The problem with this lowering is that _p may temporarily point past
      the end of the allocation the moment before the loop terminates. Right
      now this isn't a problem because there's never a safe-point during
      this brief moment.
      
      We're about to introduce safe-points everywhere, so this bad pointer
      is going to be a problem. We could mark the increment as an unsafe
      block, but this inhibits reordering opportunities and could result in
      infrequent safe-points if the body is short.
      
      Instead, this CL fixes this by changing how we compile range loops to
      never produce this past-the-end pointer. It changes the lowering to
      roughly:
      
      i, _n, _p := 0, len(s), &s[0]
      if i < _n { goto body } else { goto end }
      top:
      _p += unsafe.Sizeof(s[0])
      body:
      { b }
      i++
      if i < _n { goto top } else { goto end }
      end:
      
      Notably, the increment is split into two parts: we increment the index
      before checking the condition, but increment the pointer only *after*
      the condition check has succeeded.
      
      The implementation builds on the OFORUNTIL construct that was
      introduced during the loop preemption experiments, since OFORUNTIL
      places the increment and condition after the loop body. To support the
      extra "late increment" step, we further define OFORUNTIL's "List"
      field to contain the late increment statements. This makes all of this
      a relatively small change.
      
      This depends on the improvements to the prove pass in CL 102603. With
      the current lowering, bounds-check elimination knows that i < _n in
      the body because the body block is dominated by the cond block. In the
      new lowering, deriving this fact requires detecting that i < _n on
      *both* paths into body and hence is true in body. CL 102603 made prove
      able to detect this.
      
      The code size effect of this is minimal. The cmd/go binary on
      linux/amd64 increases by 0.17%. Performance-wise, this actually
      appears to be a net win, though it's mostly noise:
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.80s ± 0%     2.61s ± 1%  -6.88%  (p=0.000 n=20+18)
      Fannkuch11-12                2.41s ± 0%     2.42s ± 0%  +0.05%  (p=0.005 n=20+20)
      FmtFprintfEmpty-12          41.6ns ± 5%    41.4ns ± 6%    ~     (p=0.765 n=20+19)
      FmtFprintfString-12         69.4ns ± 3%    69.3ns ± 1%    ~     (p=0.084 n=19+17)
      FmtFprintfInt-12            76.1ns ± 1%    77.3ns ± 1%  +1.57%  (p=0.000 n=19+19)
      FmtFprintfIntInt-12          122ns ± 2%     123ns ± 3%  +0.95%  (p=0.015 n=20+20)
      FmtFprintfPrefixedInt-12     153ns ± 2%     151ns ± 3%  -1.27%  (p=0.013 n=20+20)
      FmtFprintfFloat-12           215ns ± 0%     216ns ± 0%  +0.47%  (p=0.000 n=20+16)
      FmtManyArgs-12               486ns ± 1%     498ns ± 0%  +2.40%  (p=0.000 n=20+17)
      GobDecode-12                6.43ms ± 0%    6.50ms ± 0%  +1.08%  (p=0.000 n=18+19)
      GobEncode-12                5.43ms ± 1%    5.47ms ± 0%  +0.76%  (p=0.000 n=20+20)
      Gzip-12                      218ms ± 1%     218ms ± 1%    ~     (p=0.883 n=20+20)
      Gunzip-12                   38.8ms ± 0%    38.9ms ± 0%    ~     (p=0.644 n=19+19)
      HTTPClientServer-12         76.2µs ± 1%    76.4µs ± 2%    ~     (p=0.218 n=20+20)
      JSONEncode-12               12.2ms ± 0%    12.3ms ± 1%  +0.45%  (p=0.000 n=19+19)
      JSONDecode-12               54.2ms ± 1%    53.3ms ± 0%  -1.67%  (p=0.000 n=20+20)
      Mandelbrot200-12            3.71ms ± 0%    3.71ms ± 0%    ~     (p=0.143 n=19+20)
      GoParse-12                  3.22ms ± 0%    3.19ms ± 1%  -0.72%  (p=0.000 n=20+20)
      RegexpMatchEasy0_32-12      76.7ns ± 1%    75.8ns ± 1%  -1.19%  (p=0.000 n=20+17)
      RegexpMatchEasy0_1K-12       245ns ± 1%     243ns ± 0%  -0.72%  (p=0.000 n=18+17)
      RegexpMatchEasy1_32-12      71.9ns ± 0%    71.7ns ± 1%  -0.39%  (p=0.006 n=12+18)
      RegexpMatchEasy1_1K-12       358ns ± 1%     354ns ± 1%  -1.13%  (p=0.000 n=20+19)
      RegexpMatchMedium_32-12      105ns ± 2%     105ns ± 1%  -0.63%  (p=0.007 n=19+20)
      RegexpMatchMedium_1K-12     31.9µs ± 1%    31.9µs ± 1%    ~     (p=1.000 n=17+17)
      RegexpMatchHard_32-12       1.51µs ± 1%    1.52µs ± 2%  +0.46%  (p=0.042 n=18+18)
      RegexpMatchHard_1K-12       45.3µs ± 1%    45.5µs ± 2%  +0.44%  (p=0.029 n=18+19)
      Revcomp-12                   388ms ± 1%     385ms ± 0%  -0.57%  (p=0.000 n=19+18)
      Template-12                 63.0ms ± 1%    63.3ms ± 0%  +0.50%  (p=0.000 n=19+20)
      TimeParse-12                 309ns ± 1%     307ns ± 0%  -0.62%  (p=0.000 n=20+20)
      TimeFormat-12                328ns ± 0%     333ns ± 0%  +1.35%  (p=0.000 n=19+19)
      [Geo mean]                  47.0µs         46.9µs       -0.20%
      
      (https://perf.golang.org/search?q=upload:20180326.1)
      
      For #10958.
      For #24543.
      
      Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a
      Reviewed-on: https://go-review.googlesource.com/102604
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      837ed98d
    • Austin Clements's avatar
      cmd/compile: detect OFORUNTIL inductive facts in prove · b812eec9
      Austin Clements authored
      Currently, we compile range loops into for loops with the obvious
      initialization and update of the index variable. In this form, the
      prove pass can see that the body is dominated by an i < len condition,
      and findIndVar can detect that i is an induction variable and that
      0 <= i < len.
      
      GOEXPERIMENT=preemptibleloops compiles range loops to OFORUNTIL and
      we're preparing to unconditionally switch to a variation of this for
       #24543. OFORUNTIL moves the increment and condition *after* the body,
      which makes the bounds on the index variable much less obvious. With
      OFORUNTIL, proving anything about the index variable requires
      understanding the phi that joins the index values at the top of the
      loop body block.
      
      This interferes with both prove's ability to see that i < len (this is
      true on both paths that enter the body, but from two different
      conditional checks) and with findIndVar's ability to detect the
      induction pattern.
      
      Fix this by teaching prove to detect that the index in the pattern
      constructed by OFORUNTIL is an induction variable and add both bounds
      to the facts table. Currently this is done separately from findIndVar
      because it depends on prove's factsTable, while findIndVar runs before
      visiting blocks and building the factsTable.
      
      Without any GOEXPERIMENT, this has no effect on std or cmd. However,
      with GOEXPERIMENT=preemptibleloops, this change becomes necessary to
      prove 90 conditions in std and cmd.
      
      Change-Id: Ic025d669f81b53426309da5a6e8010e5ccaf4f49
      Reviewed-on: https://go-review.googlesource.com/102603
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      b812eec9
    • Austin Clements's avatar
      cmd/compile: derive len/cap relations in factsTable.update · 4816efac
      Austin Clements authored
      Currently, the prove pass derives implicit relations between len and
      cap in the code that adds branch conditions. This is fine right now
      because that's the only place we can encounter len and cap, but we're
      about to add a second way to add assertions to the facts table that
      can also produce facts involving len and cap.
      
      Prepare for this by moving the fact derivation from updateRestrictions
      (where it only applies on branches) to factsTable.update, which can
      derive these facts no matter where the root facts come from.
      
      Passes toolstash -cmp.
      
      Change-Id: If09692d9eb98ffaa93f4cfa58ed2d8ba0887c111
      Reviewed-on: https://go-review.googlesource.com/102602
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      4816efac
    • Austin Clements's avatar
      cmd/compile: teach prove about relations between constants · 8d8f620f
      Austin Clements authored
      Currently, we never add a relation between two constants to prove's
      fact table because these are eliminated before prove runs, so it
      currently doesn't handle facts like this very well even though they're
      easy to prove.
      
      We're about to start asserting some conditions that don't appear in
      the SSA, but are constructed from existing SSA values that may both be
      constants.
      
      Hence, improve the fact table to understand relations between
      constants by initializing the constant bounds of constant values to
      the value itself, rather than noLimit.
      
      Passes toolstash -cmp.
      
      Change-Id: I71f8dc294e59f19433feab1c10b6d3c99b7f1e26
      Reviewed-on: https://go-review.googlesource.com/102601
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      8d8f620f
    • David Chase's avatar
      cmd/compile: refactor inlining parameters; inline panic · 87a18c61
      David Chase authored
      Inlining was refactored to perform tuning experiments,
      with the "knobs" now set to also inline functions/methods
      that include panic(), and -l=4 (inline calls) now expressed
      as a change to costs, rather than scattered if-thens.
      
      The -l=4 inline-calls penalty is chosen to be the best
      found during experiments; it makes some programs much
      larger and slower (notably, the compiler itself) and is
      believed to be risky for machine-generated code in general,
      which is why it is not the default.  It is also not
      well-tested with the debugger and DWARF output.
      
      This change includes an explicit go:noinline applied to the
      method that is the largest cause of compiler binary growth
      and slowdown for midstack inlining; there are others,
      ideally whatever heuristic eventually appears will make
      this unnecessary.
      
      Change-Id: Idf7056ed2f961472cf49d2fd154ee98bef9421e2
      Reviewed-on: https://go-review.googlesource.com/109918
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAustin Clements <austin@google.com>
      87a18c61
  2. 21 May, 2018 8 commits