- 22 May, 2018 32 commits
-
-
Austin Clements authored
Currently liveness information is kept in a map keyed by *ssa.Value. This made sense when liveness information was sparse, but now we have liveness for nearly every ssa.Value. There's a fair amount of memory and CPU overhead to this map now. This CL replaces this map with a slice indexed by value ID. Passes toolstash -cmp. name old time/op new time/op delta Template 197ms ± 1% 194ms ± 1% -1.60% (p=0.000 n=9+10) Unicode 100ms ± 2% 99ms ± 1% -1.31% (p=0.012 n=8+10) GoTypes 695ms ± 1% 689ms ± 0% -0.94% (p=0.000 n=10+10) Compiler 3.34s ± 2% 3.29s ± 1% -1.26% (p=0.000 n=10+9) SSA 8.08s ± 0% 8.02s ± 2% -0.70% (p=0.034 n=8+10) Flate 133ms ± 1% 131ms ± 1% -1.04% (p=0.006 n=10+9) GoParser 163ms ± 1% 162ms ± 1% -0.79% (p=0.034 n=8+10) Reflect 459ms ± 1% 454ms ± 0% -1.06% (p=0.000 n=10+8) Tar 186ms ± 1% 185ms ± 1% -0.87% (p=0.003 n=9+9) XML 238ms ± 1% 235ms ± 1% -1.01% (p=0.004 n=8+9) [Geo mean] 418ms 414ms -1.06% name old alloc/op new alloc/op delta Template 36.4MB ± 0% 35.6MB ± 0% -2.29% (p=0.000 n=9+10) Unicode 29.7MB ± 0% 29.5MB ± 0% -0.68% (p=0.000 n=10+10) GoTypes 119MB ± 0% 117MB ± 0% -2.30% (p=0.000 n=9+9) Compiler 546MB ± 0% 532MB ± 0% -2.47% (p=0.000 n=10+10) SSA 1.59GB ± 0% 1.55GB ± 0% -2.41% (p=0.000 n=10+10) Flate 24.9MB ± 0% 24.5MB ± 0% -1.77% (p=0.000 n=8+10) GoParser 29.5MB ± 0% 28.7MB ± 0% -2.60% (p=0.000 n=9+10) Reflect 81.7MB ± 0% 80.5MB ± 0% -1.49% (p=0.000 n=10+10) Tar 35.7MB ± 0% 35.1MB ± 0% -1.64% (p=0.000 n=10+10) XML 45.0MB ± 0% 43.7MB ± 0% -2.76% (p=0.000 n=9+10) [Geo mean] 80.1MB 78.4MB -2.04% name old allocs/op new allocs/op delta Template 336k ± 0% 335k ± 0% -0.31% (p=0.000 n=9+10) Unicode 339k ± 0% 339k ± 0% -0.05% (p=0.000 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% -0.26% (p=0.000 n=10+10) Compiler 4.96M ± 0% 4.94M ± 0% -0.24% (p=0.000 n=10+10) SSA 12.6M ± 0% 12.5M ± 0% -0.30% (p=0.000 n=10+10) Flate 224k ± 0% 223k ± 0% -0.30% (p=0.000 n=10+10) GoParser 282k ± 0% 281k ± 0% -0.32% (p=0.000 n=10+10) Reflect 965k ± 0% 963k ± 0% -0.27% (p=0.000 n=9+10) Tar 331k ± 0% 330k ± 0% -0.27% (p=0.000 n=10+10) XML 393k ± 0% 392k ± 0% -0.26% (p=0.000 n=10+10) [Geo mean] 763k 761k -0.26% Updates #24543. Change-Id: I4cfd2461510d3c026a262760bca225dc37482341 Reviewed-on: https://go-review.googlesource.com/110178 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
The per-Value slice of liveness maps is currently one of the largest sources of allocation in the compiler. On cmd/compile/internal/ssa, it's 5% of overall allocation, or 75MB in total. Enabling liveness maps everywhere significantly increased this allocation footprint, which in turn slowed down the compiler. Improve this by compacting the liveness maps after every block is processed. There are typically very few distinct liveness maps, so compacting the maps after every block, rather than at the end of the function, can significantly reduce these allocations. Passes toolstash -cmp. name old time/op new time/op delta Template 198ms ± 2% 196ms ± 1% -1.11% (p=0.008 n=9+10) Unicode 100ms ± 1% 99ms ± 1% -0.94% (p=0.015 n=8+9) GoTypes 703ms ± 2% 695ms ± 1% -1.15% (p=0.000 n=10+10) Compiler 3.38s ± 3% 3.33s ± 0% -1.66% (p=0.000 n=10+9) SSA 7.96s ± 1% 7.93s ± 1% ~ (p=0.113 n=9+10) Flate 134ms ± 1% 132ms ± 1% -1.30% (p=0.000 n=8+10) GoParser 165ms ± 2% 163ms ± 1% -1.32% (p=0.013 n=9+10) Reflect 462ms ± 2% 459ms ± 0% -0.65% (p=0.036 n=9+8) Tar 188ms ± 2% 186ms ± 1% ~ (p=0.173 n=8+10) XML 243ms ± 7% 239ms ± 1% ~ (p=0.684 n=10+10) [Geo mean] 421ms 416ms -1.10% name old alloc/op new alloc/op delta Template 38.0MB ± 0% 36.5MB ± 0% -3.98% (p=0.000 n=10+10) Unicode 30.3MB ± 0% 29.6MB ± 0% -2.21% (p=0.000 n=10+10) GoTypes 125MB ± 0% 120MB ± 0% -4.51% (p=0.000 n=10+9) Compiler 575MB ± 0% 546MB ± 0% -5.06% (p=0.000 n=10+10) SSA 1.64GB ± 0% 1.55GB ± 0% -4.97% (p=0.000 n=10+10) Flate 25.9MB ± 0% 25.0MB ± 0% -3.41% (p=0.000 n=10+10) GoParser 30.7MB ± 0% 29.5MB ± 0% -3.97% (p=0.000 n=10+10) Reflect 84.1MB ± 0% 81.9MB ± 0% -2.64% (p=0.000 n=10+10) Tar 37.0MB ± 0% 35.8MB ± 0% -3.27% (p=0.000 n=10+9) XML 47.2MB ± 0% 45.0MB ± 0% -4.57% (p=0.000 n=10+10) [Geo mean] 83.2MB 79.9MB -3.86% name old allocs/op new allocs/op delta Template 337k ± 0% 337k ± 0% -0.06% (p=0.000 n=10+10) Unicode 340k ± 0% 340k ± 0% -0.01% (p=0.014 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% -0.04% (p=0.000 n=10+10) Compiler 4.97M ± 0% 4.97M ± 0% -0.03% (p=0.000 n=10+10) SSA 12.3M ± 0% 12.3M ± 0% -0.01% (p=0.000 n=10+10) Flate 226k ± 0% 225k ± 0% -0.09% (p=0.000 n=10+10) GoParser 283k ± 0% 283k ± 0% -0.06% (p=0.000 n=10+9) Reflect 972k ± 0% 971k ± 0% -0.04% (p=0.000 n=10+8) Tar 333k ± 0% 332k ± 0% -0.05% (p=0.000 n=10+9) XML 395k ± 0% 395k ± 0% -0.04% (p=0.000 n=10+10) [Geo mean] 764k 764k -0.04% Updates #24543. Change-Id: I6fdc46e4ddb6a8eea95d38242345205eb8397f0b Reviewed-on: https://go-review.googlesource.com/110177 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
This moves the bvec hash table logic out of Liveness.compact and into a bvecSet type. Furthermore, the bvecSet type has the ability to grow dynamically, which the current implementation doesn't. In addition to making the code cleaner, this will make it possible to incrementally compact liveness bitmaps. Passes toolstash -cmp Updates #24543. Change-Id: I46c53e504494206061a1f790ae4a02d768a65681 Reviewed-on: https://go-review.googlesource.com/110176 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
Currently Liveness.epilogue makes three passes over the Blocks, but there's no need to do this. Combine them into a single pass. This eliminates the need for blockEffects.lastbitmapindex, but, more importantly, will let us incrementally compact the liveness bitmaps and significantly reduce allocatons in Liveness.epilogue. Passes toolstash -cmp. Updates #24543. Change-Id: I27802bcd00d23aa122a7ec16cdfd739ae12dd7aa Reviewed-on: https://go-review.googlesource.com/110175 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
Hana Kim authored
Until vgo sorts out and cleans up the vendoring process. Ran govendor to update packages the cmd/pprof depends on which resulted in deletion of some of unnecessary files. Change-Id: Idfba53e94414e90a5e280222750a6df77e979a16 Reviewed-on: https://go-review.googlesource.com/114079 Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com> Reviewed-by: Daniel Theophanes <kardianos@gmail.com>
-
David Chase authored
On OSX 10.12 and earlier, paired with XCode 9.0, specifying DWARF version 3 causes dsymutil to misbehave. Version 2 appears to be good enough to allow processing of the prologue_end opcode on (at least one version of) Linux and OSX 10.13. Fixes #25451. Change-Id: Ic760e34248393a5386be96351c8e492da1d3413b Reviewed-on: https://go-review.googlesource.com/114015Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Martin Möhrmann authored
The runtime import is unused. Change-Id: I37fe210256ddafa579d9e6d64f3f0db78581974e Reviewed-on: https://go-review.googlesource.com/114175 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Austin Clements <austin@google.com>
-
Austin Clements authored
Traceback matches the defer stack with the function call stack using the SP recorded in defer frames when the defer frame is created. However, on LR machines this is ambiguous: if function A pushes a defer and then calls function B, where B is a leaf function with a zero-sized frame, then both A and B have the same SP and will *both* match the defer on the defer stack. Since traceback unwinds through B first, it will incorrectly match up the defer with B's frame instead of A's frame. Where this goes particularly wrong is if function B causes a signal that turns into a panic (e.g., a nil pointer dereference). In order to handle the fact that we may not have a liveness map at the location that caused the signal and injected a sigpanic call, traceback has logic to unwind the panicking frame's continuation PC to the PC where the most recent defer was pushed (this is safe because the frame is dead other than any defers it pushed). However, if traceback mis-matches the defer stack, it winds up reporting the B's continuation PC is in A. If the runtime then uses this continuation PC to look up PCDATA in B, it will panic because the PC is out of range for B. This failure mode can be seen in sync/atomic/atomic_test.go:TestNilDeref. An example failure is: https://build.golang.org/log/8e07a762487839252af902355f6b1379dbd463c5 This CL fixes all of this by recognizing that a function that pushes a defer must also have a non-zero-sized frame and using this fact to refine the defer matching logic. Fixes the build for arm64, mips, mipsle, ppc64, ppc64le, and s390x. Fixes #25499. Change-Id: Iff7c01d08ad42f3de22b3a73658cc2f674900101 Reviewed-on: https://go-review.googlesource.com/114078 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Martin Möhrmann authored
Needs the go compiler to be build with GOEXPERIMENT=debugcpu to be active. The GODEBUGCPU environment variable can be used to disable usage of specific processor features in the Go standard library. This is useful for testing and benchmarking different code paths that are guarded by internal/cpu variable checks. Use of processor features can not be enabled through GODEBUGCPU. To disable usage of AVX and SSE41 cpu features on GOARCH amd64 use: GODEBUGCPU=avx=0,sse41=0 The special "all" option can be used to disable all options: GODEBUGCPU=all=0 Updates #12805 Updates #15403 Change-Id: I699c2e6f74d98472b6fb4b1e5ffbf29b15697aab Reviewed-on: https://go-review.googlesource.com/91737 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Zhongpeng Lin authored
This makes the checking of build tags in file names consistent to that of the build tags in `// +build` line. Fixed #25461 Change-Id: Iba14d1050f8aba44e7539ab3b8711af1980ccfe4 GitHub-Last-Rev: 11b14e239dd85e11e669919aab45494aee7c59a3 GitHub-Pull-Request: golang/go#25480 Reviewed-on: https://go-review.googlesource.com/113818 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Martin Sucha authored
The added fields are used in buildExtensions so should be documented too. Fixes #21363 Change-Id: Ifcc11da5b690327946c2488bcf4c79c60175a339 Reviewed-on: https://go-review.googlesource.com/113916Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Martin Sucha authored
It's easier to skim a list of items visually when the items are each on a separate line. Separate lines also help reduce diff size when items are added/removed. The list is indented so that it's displayed preformatted in HTML output as godoc doesn't support formatting lists natively yet (see #7873). Change-Id: Ibf9e92437e4b464ba58ea3ccef579e8df4745d75 Reviewed-on: https://go-review.googlesource.com/113915Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Alberto Donizetti authored
Some functions in log/syslog depend on syslogd running. Instead of treating errors caused by the daemon not running as test failures, ignore them and skip the test. Fixes the longtest builder. Change-Id: I628fe4aab5f1a505edfc0748861bb976ed5917ea Reviewed-on: https://go-review.googlesource.com/113838 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
dchenk authored
Immediately following the conditional block removed here is a loop which checks exactly what the conditional already checked, so the entire conditional is redundant. Change-Id: I892fd9f2364d87e2c1cacb0407531daec6643183 Reviewed-on: https://go-review.googlesource.com/114000Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Keith Randall authored
When rulegen complains about a missing type, report the line number in the rules file. Change-Id: Ic7c19e1d5f29547911909df5788945848a6080ff Reviewed-on: https://go-review.googlesource.com/114004Reviewed-by: David Chase <drchase@google.com>
-
Austin Clements authored
This adds a mechanism for debuggers to safely inject calls to Go functions on amd64. Debuggers must participate in a protocol with the runtime, and need to know how to lay out a call frame, but the runtime support takes care of the details of handling live pointers in registers, stack growth, and detecting the trickier conditions when it is unsafe to inject a user function call. Fixes #21678. Updates derekparker/delve#119. Change-Id: I56d8ca67700f1f77e19d89e7fc92ab337b228834 Reviewed-on: https://go-review.googlesource.com/109699 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
This adds FUNCDATA and PCDATA that records the register maps much like the existing live arguments maps and live locals maps. The register map is indexed independently from the argument and locals maps since changes in register liveness tend not to correlate with changes to argument and local liveness. This is the final CL toward adding safe-points everywhere. The following CLs will optimize liveness analysis to bring down the cost. The effect of this CL is: name old time/op new time/op delta Template 195ms ± 2% 197ms ± 1% ~ (p=0.136 n=9+9) Unicode 98.4ms ± 2% 99.7ms ± 1% +1.39% (p=0.004 n=10+10) GoTypes 685ms ± 1% 700ms ± 1% +2.06% (p=0.000 n=9+9) Compiler 3.28s ± 1% 3.34s ± 0% +1.71% (p=0.000 n=9+8) SSA 7.79s ± 1% 7.91s ± 1% +1.55% (p=0.000 n=10+9) Flate 133ms ± 2% 133ms ± 2% ~ (p=0.190 n=10+10) GoParser 161ms ± 2% 164ms ± 3% +1.83% (p=0.015 n=10+10) Reflect 450ms ± 1% 457ms ± 1% +1.62% (p=0.000 n=10+10) Tar 183ms ± 2% 185ms ± 1% +0.91% (p=0.008 n=9+10) XML 234ms ± 1% 238ms ± 1% +1.60% (p=0.000 n=9+9) [Geo mean] 411ms 417ms +1.40% name old exe-bytes new exe-bytes delta HelloSize 1.47M ± 0% 1.51M ± 0% +2.79% (p=0.000 n=10+10) Compared to just before "cmd/internal/obj: consolidate emitting entry stack map", the cumulative effect of adding stack maps everywhere and register maps is: name old time/op new time/op delta Template 185ms ± 2% 197ms ± 1% +6.42% (p=0.000 n=10+9) Unicode 96.3ms ± 3% 99.7ms ± 1% +3.60% (p=0.000 n=10+10) GoTypes 658ms ± 0% 700ms ± 1% +6.37% (p=0.000 n=10+9) Compiler 3.14s ± 1% 3.34s ± 0% +6.53% (p=0.000 n=9+8) SSA 7.41s ± 2% 7.91s ± 1% +6.71% (p=0.000 n=9+9) Flate 126ms ± 1% 133ms ± 2% +6.15% (p=0.000 n=10+10) GoParser 153ms ± 1% 164ms ± 3% +6.89% (p=0.000 n=10+10) Reflect 437ms ± 1% 457ms ± 1% +4.59% (p=0.000 n=10+10) Tar 178ms ± 1% 185ms ± 1% +4.18% (p=0.000 n=10+10) XML 223ms ± 1% 238ms ± 1% +6.39% (p=0.000 n=10+9) [Geo mean] 394ms 417ms +5.78% name old alloc/op new alloc/op delta Template 34.5MB ± 0% 38.0MB ± 0% +10.19% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 30.3MB ± 0% +3.56% (p=0.000 n=8+9) GoTypes 113MB ± 0% 125MB ± 0% +10.89% (p=0.000 n=10+10) Compiler 510MB ± 0% 575MB ± 0% +12.79% (p=0.000 n=10+10) SSA 1.46GB ± 0% 1.64GB ± 0% +12.40% (p=0.000 n=10+10) Flate 23.9MB ± 0% 25.9MB ± 0% +8.56% (p=0.000 n=10+10) GoParser 28.0MB ± 0% 30.8MB ± 0% +10.08% (p=0.000 n=10+10) Reflect 77.6MB ± 0% 84.3MB ± 0% +8.63% (p=0.000 n=10+10) Tar 34.1MB ± 0% 37.0MB ± 0% +8.44% (p=0.000 n=10+10) XML 42.7MB ± 0% 47.2MB ± 0% +10.75% (p=0.000 n=10+10) [Geo mean] 76.0MB 83.3MB +9.60% name old allocs/op new allocs/op delta Template 321k ± 0% 337k ± 0% +4.98% (p=0.000 n=10+10) Unicode 337k ± 0% 340k ± 0% +1.04% (p=0.000 n=10+9) GoTypes 1.13M ± 0% 1.18M ± 0% +4.85% (p=0.000 n=10+10) Compiler 4.67M ± 0% 4.96M ± 0% +6.25% (p=0.000 n=10+10) SSA 11.7M ± 0% 12.3M ± 0% +5.69% (p=0.000 n=10+10) Flate 216k ± 0% 226k ± 0% +4.52% (p=0.000 n=10+9) GoParser 271k ± 0% 283k ± 0% +4.52% (p=0.000 n=10+10) Reflect 927k ± 0% 972k ± 0% +4.78% (p=0.000 n=10+10) Tar 318k ± 0% 333k ± 0% +4.56% (p=0.000 n=10+10) XML 376k ± 0% 395k ± 0% +5.04% (p=0.000 n=10+10) [Geo mean] 730k 764k +4.61% name old exe-bytes new exe-bytes delta HelloSize 1.46M ± 0% 1.51M ± 0% +3.66% (p=0.000 n=10+10) For #24543. Change-Id: I91e003dc64151916b384274884bf02a2d6862547 Reviewed-on: https://go-review.googlesource.com/109353 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
This extends the liveness analysis to track registers containing live pointers. We do this by tracking bitmaps for live pointer registers in parallel with bitmaps for stack variables. This does not yet do anything with these liveness maps, though they do appear in the debug output for -live=2. We'll optimize this in later CLs: name old time/op new time/op delta Template 193ms ± 5% 195ms ± 2% ~ (p=0.050 n=9+9) Unicode 97.7ms ± 2% 98.4ms ± 2% ~ (p=0.315 n=9+10) GoTypes 674ms ± 2% 685ms ± 1% +1.72% (p=0.001 n=9+9) Compiler 3.21s ± 1% 3.28s ± 1% +2.28% (p=0.000 n=10+9) SSA 7.70s ± 1% 7.79s ± 1% +1.07% (p=0.015 n=10+10) Flate 130ms ± 3% 133ms ± 2% +2.19% (p=0.003 n=10+10) GoParser 159ms ± 3% 161ms ± 2% +1.51% (p=0.019 n=10+10) Reflect 444ms ± 1% 450ms ± 1% +1.43% (p=0.000 n=9+10) Tar 181ms ± 2% 183ms ± 2% +1.45% (p=0.010 n=10+9) XML 230ms ± 1% 234ms ± 1% +1.56% (p=0.000 n=8+9) [Geo mean] 405ms 411ms +1.48% No effect on binary size because we're not yet emitting the register maps. For #24543. Change-Id: Ieb022f0aea89c0ea9a6f035195bce2f0e67dbae4 Reviewed-on: https://go-review.googlesource.com/109352 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
For register maps, we need a dense numbering of registers that may contain pointers of interest to the garbage collector. Add this to Register and compute it from the GP register set. For #24543. Change-Id: If6f0521effca5eca4d17895468b1fc52d67e0f32 Reviewed-on: https://go-review.googlesource.com/109351 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
Write barrier unsafe-point analysis needs to flow through OpARM64MOVWUload in c-shared mode. Change-Id: I4f06f54d9e74a739a1b4fcb9ab0a1ae9b7b88a95 Reviewed-on: https://go-review.googlesource.com/114077 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com>
-
Austin Clements authored
Compiling without optimizations (-N) can result in write barrier blocks that have been optimized away but not actually pruned from the block set. Fix unsafe-point analysis to recognize and ignore these. For #24543. Change-Id: I2ca86fb1a0346214ec71d7d6c17b6a121857b01d Reviewed-on: https://go-review.googlesource.com/114076 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
isharipo authored
- Uncomment tests for AVX512 encoder - Permit instruction suffixes for x86 - Permit limited reg list [reg-reg] syntax for x86 for multi-source ops - EVEX encoding support in obj/x86 (Z-cases, asmevex, etc.) - optabs and ytabs generated by x86avxgen (https://golang.org/cl/107216) Note: suffix formatting implemented with updated CConv function. Now arch asm backend should register formatting function by calling RegisterOpSuffix. Updates #22779 Change-Id: I076a167ee49582700e058c56ad74e6696710c8c8 Reviewed-on: https://go-review.googlesource.com/113315 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
-
Ben Shi authored
1. Some incorrect test cases are disabled. 2. Some wrong test cases are corrected. 3. Some new test cases are added. Change-Id: Ib5d0473d55159f233ddab79f96967eaec7b08597 Reviewed-on: https://go-review.googlesource.com/113736Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
Currently, the code generator only considers outputting stack map indexes at CALL instructions. Raise this into the code generator loop itself so that changes in the stack map index at any instruction emit a PCDATA Prog before the actual instruction. We'll optimize this in later CLs: name old time/op new time/op delta Template 190ms ± 2% 191ms ± 2% ~ (p=0.529 n=10+10) Unicode 96.4ms ± 1% 98.5ms ± 3% +2.18% (p=0.001 n=9+10) GoTypes 669ms ± 1% 673ms ± 1% +0.62% (p=0.004 n=9+9) Compiler 3.18s ± 1% 3.22s ± 1% +1.06% (p=0.000 n=10+9) SSA 7.59s ± 1% 7.64s ± 1% +0.66% (p=0.023 n=10+10) Flate 128ms ± 1% 130ms ± 2% +1.07% (p=0.043 n=10+10) GoParser 157ms ± 2% 158ms ± 3% ~ (p=0.123 n=10+10) Reflect 442ms ± 1% 445ms ± 1% +0.73% (p=0.017 n=10+9) Tar 179ms ± 1% 180ms ± 1% +0.58% (p=0.019 n=9+9) XML 229ms ± 1% 232ms ± 2% +1.27% (p=0.009 n=10+10) [Geo mean] 401ms 405ms +0.94% name old exe-bytes new exe-bytes delta HelloSize 1.46M ± 0% 1.47M ± 0% +0.84% (p=0.000 n=10+10) [Geo mean] 1.46M 1.47M +0.84% For #24543. Change-Id: I4bfe45b767c9d9db47308a27763b303fa75bfa54 Reviewed-on: https://go-review.googlesource.com/109350 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
This modifies issafepoint in liveness analysis to report almost every operation as a safe point. There are four things we don't mark as safe-points: 1. Runtime code (other than at calls). 2. go:nosplit functions (other than at calls). 3. Instructions between the load of the write barrier-enabled flag and the write. 4. Instructions leading up to a uintptr -> unsafe.Pointer conversion. We'll optimize this in later CLs: name old time/op new time/op delta Template 185ms ± 2% 190ms ± 2% +2.95% (p=0.000 n=10+10) Unicode 96.3ms ± 3% 96.4ms ± 1% ~ (p=0.905 n=10+9) GoTypes 658ms ± 0% 669ms ± 1% +1.72% (p=0.000 n=10+9) Compiler 3.14s ± 1% 3.18s ± 1% +1.56% (p=0.000 n=9+10) SSA 7.41s ± 2% 7.59s ± 1% +2.48% (p=0.000 n=9+10) Flate 126ms ± 1% 128ms ± 1% +2.08% (p=0.000 n=10+10) GoParser 153ms ± 1% 157ms ± 2% +2.38% (p=0.000 n=10+10) Reflect 437ms ± 1% 442ms ± 1% +0.98% (p=0.001 n=10+10) Tar 178ms ± 1% 179ms ± 1% +0.67% (p=0.035 n=10+9) XML 223ms ± 1% 229ms ± 1% +2.58% (p=0.000 n=10+10) [Geo mean] 394ms 401ms +1.75% No effect on binary size because we're not yet emitting these extra safe points. For #24543. Change-Id: I16a1eebb9183cad7cef9d53c0fd21a973cad6859 Reviewed-on: https://go-review.googlesource.com/109348 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
Austin Clements authored
Currently liveness only produces a stack map index at each safe point, so the information is summarized in a map[*ssa.Value]int. We're about to have both a stack map index and a register map index, so replace the int with a LivenessIndex type we can extend, and replace the map with a LivenessMap that we can also change more easily in the future. This also gives us an easy hook for defining the value that means "not a safe point". Passes toolstash -cmp. For #24543. Change-Id: Ic4c069839635efed4fd0f603899b80f8be3b56ec Reviewed-on: https://go-review.googlesource.com/109347 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
The obj package needs to emit the PCDATA to select the entry stack map before calling morestack. Currently this is copied for every architecture. Since we're about to change how this works, consolidate all of these copies into a single helper function. For #24543. Change-Id: Ia92d94de78f8e23fd06dba747c43e03e5989f67b Reviewed-on: https://go-review.googlesource.com/109346 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
Currently, range loops over slices and arrays are compiled roughly like: for i, x := range s { b } ⇓ for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b } ⇓ i, _n, _p := 0, len(s), &s[0] goto cond body: { b } i, _p = i+1, _p + unsafe.Sizeof(s[0]) cond: if i < _n { goto body } else { goto end } end: The problem with this lowering is that _p may temporarily point past the end of the allocation the moment before the loop terminates. Right now this isn't a problem because there's never a safe-point during this brief moment. We're about to introduce safe-points everywhere, so this bad pointer is going to be a problem. We could mark the increment as an unsafe block, but this inhibits reordering opportunities and could result in infrequent safe-points if the body is short. Instead, this CL fixes this by changing how we compile range loops to never produce this past-the-end pointer. It changes the lowering to roughly: i, _n, _p := 0, len(s), &s[0] if i < _n { goto body } else { goto end } top: _p += unsafe.Sizeof(s[0]) body: { b } i++ if i < _n { goto top } else { goto end } end: Notably, the increment is split into two parts: we increment the index before checking the condition, but increment the pointer only *after* the condition check has succeeded. The implementation builds on the OFORUNTIL construct that was introduced during the loop preemption experiments, since OFORUNTIL places the increment and condition after the loop body. To support the extra "late increment" step, we further define OFORUNTIL's "List" field to contain the late increment statements. This makes all of this a relatively small change. This depends on the improvements to the prove pass in CL 102603. With the current lowering, bounds-check elimination knows that i < _n in the body because the body block is dominated by the cond block. In the new lowering, deriving this fact requires detecting that i < _n on *both* paths into body and hence is true in body. CL 102603 made prove able to detect this. The code size effect of this is minimal. The cmd/go binary on linux/amd64 increases by 0.17%. Performance-wise, this actually appears to be a net win, though it's mostly noise: name old time/op new time/op delta BinaryTree17-12 2.80s ± 0% 2.61s ± 1% -6.88% (p=0.000 n=20+18) Fannkuch11-12 2.41s ± 0% 2.42s ± 0% +0.05% (p=0.005 n=20+20) FmtFprintfEmpty-12 41.6ns ± 5% 41.4ns ± 6% ~ (p=0.765 n=20+19) FmtFprintfString-12 69.4ns ± 3% 69.3ns ± 1% ~ (p=0.084 n=19+17) FmtFprintfInt-12 76.1ns ± 1% 77.3ns ± 1% +1.57% (p=0.000 n=19+19) FmtFprintfIntInt-12 122ns ± 2% 123ns ± 3% +0.95% (p=0.015 n=20+20) FmtFprintfPrefixedInt-12 153ns ± 2% 151ns ± 3% -1.27% (p=0.013 n=20+20) FmtFprintfFloat-12 215ns ± 0% 216ns ± 0% +0.47% (p=0.000 n=20+16) FmtManyArgs-12 486ns ± 1% 498ns ± 0% +2.40% (p=0.000 n=20+17) GobDecode-12 6.43ms ± 0% 6.50ms ± 0% +1.08% (p=0.000 n=18+19) GobEncode-12 5.43ms ± 1% 5.47ms ± 0% +0.76% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.883 n=20+20) Gunzip-12 38.8ms ± 0% 38.9ms ± 0% ~ (p=0.644 n=19+19) HTTPClientServer-12 76.2µs ± 1% 76.4µs ± 2% ~ (p=0.218 n=20+20) JSONEncode-12 12.2ms ± 0% 12.3ms ± 1% +0.45% (p=0.000 n=19+19) JSONDecode-12 54.2ms ± 1% 53.3ms ± 0% -1.67% (p=0.000 n=20+20) Mandelbrot200-12 3.71ms ± 0% 3.71ms ± 0% ~ (p=0.143 n=19+20) GoParse-12 3.22ms ± 0% 3.19ms ± 1% -0.72% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 76.7ns ± 1% 75.8ns ± 1% -1.19% (p=0.000 n=20+17) RegexpMatchEasy0_1K-12 245ns ± 1% 243ns ± 0% -0.72% (p=0.000 n=18+17) RegexpMatchEasy1_32-12 71.9ns ± 0% 71.7ns ± 1% -0.39% (p=0.006 n=12+18) RegexpMatchEasy1_1K-12 358ns ± 1% 354ns ± 1% -1.13% (p=0.000 n=20+19) RegexpMatchMedium_32-12 105ns ± 2% 105ns ± 1% -0.63% (p=0.007 n=19+20) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.9µs ± 1% ~ (p=1.000 n=17+17) RegexpMatchHard_32-12 1.51µs ± 1% 1.52µs ± 2% +0.46% (p=0.042 n=18+18) RegexpMatchHard_1K-12 45.3µs ± 1% 45.5µs ± 2% +0.44% (p=0.029 n=18+19) Revcomp-12 388ms ± 1% 385ms ± 0% -0.57% (p=0.000 n=19+18) Template-12 63.0ms ± 1% 63.3ms ± 0% +0.50% (p=0.000 n=19+20) TimeParse-12 309ns ± 1% 307ns ± 0% -0.62% (p=0.000 n=20+20) TimeFormat-12 328ns ± 0% 333ns ± 0% +1.35% (p=0.000 n=19+19) [Geo mean] 47.0µs 46.9µs -0.20% (https://perf.golang.org/search?q=upload:20180326.1) For #10958. For #24543. Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a Reviewed-on: https://go-review.googlesource.com/102604 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
Austin Clements authored
Currently, we compile range loops into for loops with the obvious initialization and update of the index variable. In this form, the prove pass can see that the body is dominated by an i < len condition, and findIndVar can detect that i is an induction variable and that 0 <= i < len. GOEXPERIMENT=preemptibleloops compiles range loops to OFORUNTIL and we're preparing to unconditionally switch to a variation of this for #24543. OFORUNTIL moves the increment and condition *after* the body, which makes the bounds on the index variable much less obvious. With OFORUNTIL, proving anything about the index variable requires understanding the phi that joins the index values at the top of the loop body block. This interferes with both prove's ability to see that i < len (this is true on both paths that enter the body, but from two different conditional checks) and with findIndVar's ability to detect the induction pattern. Fix this by teaching prove to detect that the index in the pattern constructed by OFORUNTIL is an induction variable and add both bounds to the facts table. Currently this is done separately from findIndVar because it depends on prove's factsTable, while findIndVar runs before visiting blocks and building the factsTable. Without any GOEXPERIMENT, this has no effect on std or cmd. However, with GOEXPERIMENT=preemptibleloops, this change becomes necessary to prove 90 conditions in std and cmd. Change-Id: Ic025d669f81b53426309da5a6e8010e5ccaf4f49 Reviewed-on: https://go-review.googlesource.com/102603 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
Currently, the prove pass derives implicit relations between len and cap in the code that adds branch conditions. This is fine right now because that's the only place we can encounter len and cap, but we're about to add a second way to add assertions to the facts table that can also produce facts involving len and cap. Prepare for this by moving the fact derivation from updateRestrictions (where it only applies on branches) to factsTable.update, which can derive these facts no matter where the root facts come from. Passes toolstash -cmp. Change-Id: If09692d9eb98ffaa93f4cfa58ed2d8ba0887c111 Reviewed-on: https://go-review.googlesource.com/102602 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
Austin Clements authored
Currently, we never add a relation between two constants to prove's fact table because these are eliminated before prove runs, so it currently doesn't handle facts like this very well even though they're easy to prove. We're about to start asserting some conditions that don't appear in the SSA, but are constructed from existing SSA values that may both be constants. Hence, improve the fact table to understand relations between constants by initializing the constant bounds of constant values to the value itself, rather than noLimit. Passes toolstash -cmp. Change-Id: I71f8dc294e59f19433feab1c10b6d3c99b7f1e26 Reviewed-on: https://go-review.googlesource.com/102601 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
David Chase authored
Inlining was refactored to perform tuning experiments, with the "knobs" now set to also inline functions/methods that include panic(), and -l=4 (inline calls) now expressed as a change to costs, rather than scattered if-thens. The -l=4 inline-calls penalty is chosen to be the best found during experiments; it makes some programs much larger and slower (notably, the compiler itself) and is believed to be risky for machine-generated code in general, which is why it is not the default. It is also not well-tested with the debugger and DWARF output. This change includes an explicit go:noinline applied to the method that is the largest cause of compiler binary growth and slowdown for midstack inlining; there are others, ideally whatever heuristic eventually appears will make this unnecessary. Change-Id: Idf7056ed2f961472cf49d2fd154ee98bef9421e2 Reviewed-on: https://go-review.googlesource.com/109918 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
-
- 21 May, 2018 8 commits
-
-
Adam Langley authored
This change brings back the EKU checking from 1.9. In 1.10, we checked EKU nesting independent of the requested EKUs so that, after verifying a certifciate, one could inspect the EKUs in the leaf and trust them. That, however, was too optimistic. I had misunderstood that the PKI was /currently/ clean enough to require that, rather than it being desirable. Go generally does not push the envelope on these sorts of things and lets the browsers clear the path first. Fixes #24590 Change-Id: I18c070478e3bbb6468800ae461c207af9e954949 Reviewed-on: https://go-review.googlesource.com/113475Reviewed-by: Filippo Valsorda <filippo@golang.org>
-
Elias Naur authored
Now that raise on darwin targets the current thread, we can remove the workaround in dieFromSignal. Change-Id: I1e468dc05e49403ee0bbe0a3a85e764c81fec4f2 Reviewed-on: https://go-review.googlesource.com/110476 Run-TryBot: Elias Naur <elias.naur@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Elias Naur authored
This CL is the darwin/arm and darwin/arm64 equivalent to CL 108679, 110215, 110437, 110438, 111258, 110655. Updates #17490 Change-Id: Ia95b27b38f9c3535012c566f17a44b4ed26b9db6 Reviewed-on: https://go-review.googlesource.com/111015 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Elias Naur authored
pthread_self and pthread_kill are not safe to call from a signal handler. In particular, pthread_self fails in iOS when called from a signal handler context. Use raise instead; it is signal handler safe and simpler. Change-Id: I0cbfe25151aed245f55d7b76719ce06dc78c6a75 Reviewed-on: https://go-review.googlesource.com/113877 Run-TryBot: Elias Naur <elias.naur@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Austin Clements authored
When an object spans heap arenas, its bitmap is discontiguous, so heapBitsSetType unrolls the bitmap into the object itself and then copies it out to the real heap bitmap. Unfortunately, since this code path is rare, it had two unnoticed bugs related to the head and tail of the bitmap: 1. At the head of the object, we were using hbitp as the destination bitmap pointer rather than h.bitp, but hbitp points into the *temporary* bitmap space (that is, the object itself), so we were failing to copy the partial bitmap byte at the head of an object. 2. The core copying loop copied all of the full bitmap bytes, but always drove the remaining word count down to 0, even if there was a partial bitmap byte for the tail of the object. As a result, we never wrote partial bitmap bytes at the tail of an object. I found these by enabling out-of-place unrolling all the time. To improve our chances of detecting these sorts of bugs in the future, this CL mimics this by enabling out-of-place mode 50% of the time when doubleCheck is enabled so that we test both in-place and out-of-place mode. Change-Id: I69e5d829fb3444be4cf11f4c6d8462c26dc467e8 Reviewed-on: https://go-review.googlesource.com/110995 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
-
Alberto Donizetti authored
We have a workaround in place in the runtime (see CL 16853 and CL 111176) to keep arm and arm64 Go binaries working under QEMU in user-emulation mode (Issue #13024). This change adds a regression test about arm/arm64 QEMU emulation to cmd/go. Change-Id: Ic67f476e7c30a7d7852d9b01834f1dcabfac2ff7 Reviewed-on: https://go-review.googlesource.com/111477Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
Brad Fitzpatrick authored
Fixes #25476 Change-Id: I5a81cdf7d0ef9a22b0267732f27bcc2ef76eaa29 Reviewed-on: https://go-review.googlesource.com/113817Reviewed-by: Bryan C. Mills <bcmills@google.com>
-
Daniel Martí authored
First, the regions sort was buggy, as its last comparison was ineffective. Second, the insyscall and insyscallRuntime fields were unsigned, so the check for them being negative was pointless. Make them signed instead, to also prevent the possibility of underflows when decreasing numbers that might realistically be 0. Third, the color constants were all untyped strings except the first one. Be consistent with their typing. Change-Id: I4eb8d08028ed92589493c2a4b9cc5a88d83f769b Reviewed-on: https://go-review.googlesource.com/113895 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-