- 27 Apr, 2017 10 commits
-
-
Keith Randall authored
When a constant doesn't fit in a single instruction, use two paired instructions instead of the constant pool. For example ADD $0xaa00bb, R0, R1 Used to rewrite to: MOV ?(IP), R11 ADD R11, R0, R1 Instead, do: ADD $0xaa0000, R0, R1 ADD $0xbb, R1, R1 Same number of instructions. Good: 4 less bytes (no constant pool entry) One less load. Bad: Critical path is one instruction longer. It's probably worth it to avoid the loads, they are expensive. Dave Cheney got us some performance numbers: https://perf.golang.org/search?q=upload:20170426.1 TL;DR mean 1.37% improvement. Change-Id: Ib206836161fdc94a3962db6f9caa635c87d57cf1 Reviewed-on: https://go-review.googlesource.com/41612 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
-
Bryan C. Mills authored
This provides a significant speedup for encoding and decoding when using many CPU cores. name old time/op new time/op delta EndToEndPipe 5.26µs ± 2% 5.38µs ± 7% ~ (p=0.121 n=8+7) EndToEndPipe-6 1.86µs ± 5% 1.80µs ±11% ~ (p=0.442 n=8+8) EndToEndPipe-48 1.39µs ± 2% 1.41µs ± 4% ~ (p=0.645 n=8+8) EndToEndByteBuffer 1.54µs ± 5% 1.57µs ± 5% ~ (p=0.130 n=8+8) EndToEndByteBuffer-6 620ns ± 6% 310ns ± 8% -50.04% (p=0.000 n=8+8) EndToEndByteBuffer-48 506ns ± 4% 110ns ± 3% -78.22% (p=0.000 n=8+8) EndToEndSliceByteBuffer 149µs ± 3% 153µs ± 5% +2.80% (p=0.021 n=8+8) EndToEndSliceByteBuffer-6 103µs ±17% 31µs ±12% -70.06% (p=0.000 n=8+8) EndToEndSliceByteBuffer-48 93.2µs ± 2% 18.0µs ± 5% -80.66% (p=0.000 n=7+8) EncodeComplex128Slice 20.6µs ± 5% 20.9µs ± 8% ~ (p=0.959 n=8+8) EncodeComplex128Slice-6 4.10µs ±10% 3.75µs ± 8% -8.58% (p=0.004 n=8+7) EncodeComplex128Slice-48 1.14µs ± 2% 0.81µs ± 2% -28.98% (p=0.000 n=8+8) EncodeFloat64Slice 10.2µs ± 7% 10.1µs ± 6% ~ (p=0.694 n=7+8) EncodeFloat64Slice-6 2.01µs ± 6% 1.80µs ±11% -10.30% (p=0.004 n=8+8) EncodeFloat64Slice-48 701ns ± 3% 408ns ± 2% -41.72% (p=0.000 n=8+8) EncodeInt32Slice 11.8µs ± 7% 11.7µs ± 6% ~ (p=0.463 n=8+7) EncodeInt32Slice-6 2.32µs ± 4% 2.06µs ± 5% -10.89% (p=0.000 n=8+8) EncodeInt32Slice-48 731ns ± 2% 445ns ± 2% -39.10% (p=0.000 n=7+8) EncodeStringSlice 9.13µs ± 9% 9.18µs ± 8% ~ (p=0.798 n=8+8) EncodeStringSlice-6 1.91µs ± 5% 1.70µs ± 5% -11.07% (p=0.000 n=8+8) EncodeStringSlice-48 679ns ± 3% 397ns ± 3% -41.50% (p=0.000 n=8+8) EncodeInterfaceSlice 449µs ±11% 461µs ± 9% ~ (p=0.328 n=8+8) EncodeInterfaceSlice-6 503µs ± 7% 88µs ± 7% -82.51% (p=0.000 n=7+8) EncodeInterfaceSlice-48 335µs ± 8% 22µs ± 1% -93.55% (p=0.000 n=8+7) DecodeComplex128Slice 67.2µs ± 4% 67.0µs ± 6% ~ (p=0.721 n=8+8) DecodeComplex128Slice-6 22.0µs ± 8% 18.9µs ± 5% -14.44% (p=0.000 n=8+8) DecodeComplex128Slice-48 46.8µs ± 3% 34.9µs ± 3% -25.48% (p=0.000 n=8+8) DecodeFloat64Slice 39.4µs ± 4% 40.3µs ± 3% ~ (p=0.105 n=8+8) DecodeFloat64Slice-6 16.1µs ± 2% 11.2µs ± 7% -30.64% (p=0.001 n=6+7) DecodeFloat64Slice-48 38.1µs ± 3% 24.0µs ± 7% -37.10% (p=0.000 n=8+8) DecodeInt32Slice 39.1µs ± 4% 40.1µs ± 5% ~ (p=0.083 n=8+8) DecodeInt32Slice-6 16.3µs ±21% 10.6µs ± 1% -35.17% (p=0.000 n=8+7) DecodeInt32Slice-48 36.5µs ± 6% 21.9µs ± 9% -39.89% (p=0.000 n=8+8) DecodeStringSlice 82.9µs ± 6% 85.5µs ± 5% ~ (p=0.121 n=8+7) DecodeStringSlice-6 32.4µs ±11% 26.8µs ±16% -17.37% (p=0.000 n=8+8) DecodeStringSlice-48 76.0µs ± 2% 57.0µs ± 5% -25.02% (p=0.000 n=8+8) DecodeInterfaceSlice 718µs ± 4% 752µs ± 5% +4.83% (p=0.038 n=8+8) DecodeInterfaceSlice-6 500µs ± 6% 165µs ± 7% -66.95% (p=0.000 n=7+8) DecodeInterfaceSlice-48 470µs ± 5% 120µs ± 6% -74.55% (p=0.000 n=8+7) DecodeMap 3.29ms ± 5% 3.34ms ± 5% ~ (p=0.279 n=8+8) DecodeMap-6 7.73ms ± 8% 7.53ms ±18% ~ (p=0.779 n=7+8) DecodeMap-48 7.46ms ± 6% 7.71ms ± 3% ~ (p=0.161 n=8+8) https://perf.golang.org/search?q=upload:20170426.4 Change-Id: I335874028ef8d7c991051004f8caadd16c92d5cc Reviewed-on: https://go-review.googlesource.com/41872 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
Bryan C. Mills authored
This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Elias Naur authored
The "lldb start" phase often times out on the iOS builder. Increase the timeout and see if that helps. Change-Id: I92fd67cbfa90659600e713198d6b2c5c78dde20f Reviewed-on: https://go-review.googlesource.com/41863Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Elias Naur <elias.naur@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Weichao Tang authored
Fixes #19976 Change-Id: I48486467066784a9dcc24357ec94a1be85265a6f Reviewed-on: https://go-review.googlesource.com/40940 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Wei Xiao authored
The current code treats condition as special register and write its raw data directly into instruction. The fix converts the raw data into correct condition encoding. Also fix the operand catogery of FCCMP. Add tests to cover all cases. Change-Id: Ib194041bd9017dd0edbc241564fe983082ac616b Reviewed-on: https://go-review.googlesource.com/41511 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
-
Ian Lance Taylor authored
Fixes #20134 Change-Id: I92699d118c713179961c037a6bbbcbec4efa63ba Reviewed-on: https://go-review.googlesource.com/41823 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Nigel Tao authored
Previously, the package did not distinguish between baseline and extended sequential images. Both are non-progressive images, but the Th range differs between the two, as per Annex B of https://www.w3.org/Graphics/JPEG/itu-t81.pdf Extended sequential images are often emitted by the Guetzli encoder. Fixes #19913 Change-Id: I3d0f9e16d5d374ee1c65e3a8fb87519de61cff94 Reviewed-on: https://go-review.googlesource.com/41831Reviewed-by: David Symonds <dsymonds@golang.org>
-
Josh Bleecher Snyder authored
When using a concurrent backend, the overall compilation time is bounded in part by the slowest function to compile. The number of top-level statements in a function is an easily calculated and fairly reliable proxy for compilation time. Here's a standard compilecmp output for -c=8 with this CL: name old time/op new time/op delta Template 127ms ± 4% 125ms ± 6% -1.33% (p=0.000 n=47+50) Unicode 84.8ms ± 4% 84.5ms ± 4% ~ (p=0.217 n=49+49) GoTypes 289ms ± 3% 287ms ± 3% -0.78% (p=0.002 n=48+50) Compiler 1.36s ± 3% 1.34s ± 2% -1.29% (p=0.000 n=49+47) SSA 2.95s ± 3% 2.77s ± 4% -6.23% (p=0.000 n=50+49) Flate 70.7ms ± 3% 70.9ms ± 2% ~ (p=0.112 n=50+49) GoParser 85.0ms ± 3% 83.0ms ± 4% -2.31% (p=0.000 n=48+49) Reflect 229ms ± 3% 225ms ± 4% -1.83% (p=0.000 n=49+49) Tar 70.2ms ± 3% 69.4ms ± 3% -1.17% (p=0.000 n=49+49) XML 115ms ± 7% 114ms ± 6% ~ (p=0.158 n=49+47) name old user-time/op new user-time/op delta Template 352ms ± 5% 342ms ± 8% -2.74% (p=0.000 n=49+50) Unicode 117ms ± 5% 118ms ± 4% +0.88% (p=0.005 n=46+48) GoTypes 986ms ± 3% 980ms ± 4% ~ (p=0.110 n=46+48) Compiler 4.39s ± 2% 4.43s ± 4% +0.97% (p=0.002 n=50+50) SSA 12.0s ± 2% 13.3s ± 3% +11.33% (p=0.000 n=49+49) Flate 222ms ± 5% 219ms ± 6% -1.56% (p=0.002 n=50+50) GoParser 271ms ± 5% 268ms ± 4% -0.83% (p=0.036 n=49+48) Reflect 560ms ± 4% 571ms ± 3% +1.90% (p=0.000 n=50+49) Tar 183ms ± 3% 183ms ± 3% ~ (p=0.903 n=45+50) XML 364ms ±13% 391ms ± 4% +7.16% (p=0.000 n=50+40) A more interesting way of viewing the data is by looking at the ratio of the time taken to compile the slowest-to-compile function to the overall time spent compiling functions. If this ratio is small (near 0), then increased concurrency might help. If this ratio is big (near 1), then we're bounded by that single function. I instrumented the compiler to emit this ratio per-package, ran 'go build -a -gcflags=-c=C -p=P std cmd' three times, for varying values of C and P, and collected the ratios encountered into an ASCII histogram. Here's c=1 p=1, which is a non-concurrent backend, single process at a time: 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%|** 10%|*** 0%|********* ----+---------- |0123456789 The x-axis is floor(10*ratio), so the first column indicates the percent of ratios that fell in the 0% to 9.9999% range. We can see in this histogram that more concurrency will help; in most cases, the ratio is small. Here's c=8 p=1, before this CL: 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%|* * * 0%|********** ----+---------- |0123456789 In 30-40% of cases, we're mostly bound by the compilation time of a single function. Here's c=8 p=1, after this CL: 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 The sorting pays off; we are bound by the compilation time of a single function in over half of packages. The single * in the histogram indicates 0-10%. The actual values for this chart are: 0: 5%, 1: 1%, 2: 1%, 3: 4%, 4: 5%, 5: 7%, 6: 7%, 7: 7%, 8: 9%, 9: 55% This indicates that efforts to increase or enable more concurrency, e.g. by optimizing mutexes or increasing the value of c, will probably not yield fruit. That matches what compilecmp tells us. Further optimization efforts should thus focus instead on one of: (1) making more functions compile concurrently (2) improving the compilation time of the slowest functions (3) speeding up the remaining serial parts of the compiler (4) automatically splitting up some large autogenerated functions into small ones, as discussed in #19751 I hope to spend more time on (1) before the freeze. Adding process parallelism doesn't change the story much. For example, here's c=8 p=8, after this CL: 90%| 80%| 70%| 60%| 50%| 40%| * 30%| * 20%| * 10%| *** 0%|********** ----+---------- |0123456789 Since we don't need to worry much about p, these histograms can help us select a good general value of c to use as a default, assuming we're not bounded by GOMAXPROCS. Here are some charts after this CL, for c from 1 to 8: c=1 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%|** 10%|*** 0%|********* ----+---------- |0123456789 c=2 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%| 10%| **** * 0%|********** ----+---------- |0123456789 c=3 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| 20%| * 10%| ** * * 0%|********** ----+---------- |0123456789 c=4 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%| * * 0%|********** ----+---------- |0123456789 c=5 p=1 90%| 80%| 70%| 60%| 50%| 40%| 30%| * 20%| * 10%| * * 0%|********** ----+---------- |0123456789 c=6 p=1 90%| 80%| 70%| 60%| 50%| 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 c=7 p=1 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| ** 0%|********** ----+---------- |0123456789 c=8 p=1 90%| 80%| 70%| 60%| 50%| * 40%| * 30%| * 20%| * 10%| * 0%|********** ----+---------- |0123456789 Given the increased user-CPU costs as c increases, it looks like c=4 is probably the sweet spot, at least for now. Pleasingly, this matches (and explains) the results of the standard benchmarking that I have done. Updates #15756 Change-Id: I82b606c06efd34a5dbd1afdbcf66a605905b2aeb Reviewed-on: https://go-review.googlesource.com/41192 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 26 Apr, 2017 30 commits
-
-
Alex Brainman authored
When using Lstat against symlinks that point to a directory, the function returns FileInfo with both ModeDir and ModeSymlink set. Change that to never set ModeDir if ModeSymlink is set. Fixes #10424 Fixes #17540 Fixes #17541 Change-Id: Iba280888aad108360b8c1f18180a24493fe7ad2b Reviewed-on: https://go-review.googlesource.com/41830Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Mostyn Bramley-Moore authored
Fixes golang/go#20008. Change-Id: I7a429490320595fc558a8c5e260ec41bc3a788e2 Reviewed-on: https://go-review.googlesource.com/41858Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
-
Damien Lespiau authored
Taken from the Intel Software Development Manual (of course, in the line below it's ADC DST, SRC; The opposite of the commit subject). 12 /r ADC r8, r/m8 We need 0x12 for the corresponding ytab line, not 0x10. {Ymb, Ynone, Yrb, Zm_r, 1}, Updates #14069 Change-Id: Id37cbd0c581c9988c2de355efa908956278e2189 Reviewed-on: https://go-review.googlesource.com/41857Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
This is preparatory cleanup to make future changes clearer. Change-Id: I20fb9c78257de61b8bd096fce6b1e751995c01f2 Reviewed-on: https://go-review.googlesource.com/41818 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Russ Cox authored
Change-Id: I72bea1450386100482b4681b20eb9a9af12c7522 Reviewed-on: https://go-review.googlesource.com/41816Reviewed-by: Michael Matloob <matloob@golang.org>
-
Russ Cox authored
Delete old TestRuntimeFunctionTrimming, which is testing a dead API and is now handled in end-to-end tests. Change-Id: I64fc2991ed4a7690456356b5f6b546f36935bb67 Reviewed-on: https://go-review.googlesource.com/41815 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Matloob <matloob@golang.org>
-
Bryan C. Mills authored
Don't bother with BenchmarkDecoderStream — it's doing something subtle with the input buffer that isn't easy to replicate in a parallel test. Results remain comparable with the non-parallel version with -cpu=1: benchmark old ns/op new ns/op delta BenchmarkCodeEncoder 22815832 21058729 -7.70% BenchmarkCodeEncoder-6 22190561 3579757 -83.87% BenchmarkCodeMarshal 25356621 25396429 +0.16% BenchmarkCodeMarshal-6 25359813 4944908 -80.50% BenchmarkCodeDecoder 94794556 88016360 -7.15% BenchmarkCodeDecoder-6 93795028 16726283 -82.17% BenchmarkDecoderStream 532 583 +9.59% BenchmarkDecoderStream-6 598 550 -8.03% BenchmarkCodeUnmarshal 97644168 89162504 -8.69% BenchmarkCodeUnmarshal-6 96615302 17036419 -82.37% BenchmarkCodeUnmarshalReuse 91747073 90298479 -1.58% BenchmarkCodeUnmarshalReuse-6 89397165 15518005 -82.64% BenchmarkUnmarshalString 808 843 +4.33% BenchmarkUnmarshalString-6 912 220 -75.88% BenchmarkUnmarshalFloat64 695 732 +5.32% BenchmarkUnmarshalFloat64-6 710 191 -73.10% BenchmarkUnmarshalInt64 635 640 +0.79% BenchmarkUnmarshalInt64-6 618 185 -70.06% BenchmarkIssue10335 916 947 +3.38% BenchmarkIssue10335-6 879 216 -75.43% BenchmarkNumberIsValid 34.7 34.3 -1.15% BenchmarkNumberIsValid-6 34.9 36.7 +5.16% BenchmarkNumberIsValidRegexp 1174 1121 -4.51% BenchmarkNumberIsValidRegexp-6 1134 1119 -1.32% BenchmarkSkipValue 20506938 20708060 +0.98% BenchmarkSkipValue-6 21627665 22375630 +3.46% BenchmarkEncoderEncode 690 726 +5.22% BenchmarkEncoderEncode-6 649 157 -75.81% benchmark old MB/s new MB/s speedup BenchmarkCodeEncoder 85.05 92.15 1.08x BenchmarkCodeEncoder-6 87.45 542.07 6.20x BenchmarkCodeMarshal 76.53 76.41 1.00x BenchmarkCodeMarshal-6 76.52 392.42 5.13x BenchmarkCodeDecoder 20.47 22.05 1.08x BenchmarkCodeDecoder-6 20.69 116.01 5.61x BenchmarkCodeUnmarshal 19.87 21.76 1.10x BenchmarkCodeUnmarshal-6 20.08 113.90 5.67x BenchmarkSkipValue 90.55 89.67 0.99x BenchmarkSkipValue-6 90.83 87.80 0.97x benchmark old allocs new allocs delta BenchmarkIssue10335 4 4 +0.00% BenchmarkIssue10335-6 4 4 +0.00% BenchmarkEncoderEncode 1 1 +0.00% BenchmarkEncoderEncode-6 1 1 +0.00% benchmark old bytes new bytes delta BenchmarkIssue10335 320 320 +0.00% BenchmarkIssue10335-6 320 320 +0.00% BenchmarkEncoderEncode 8 8 +0.00% BenchmarkEncoderEncode-6 8 8 +0.00% updates #18177 Change-Id: Ia4f5bf5ac0afbadb1705ed9f9e1b39dabba67b40 Reviewed-on: https://go-review.googlesource.com/36724Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Bryan C. Mills authored
Add a benchmark for PtrTo: it's the motivation for #17973, which is the motivation for #18177. Results remain comparable with the non-parallel version with -cpu=1: benchmark old ns/op new ns/op delta BenchmarkCall 357 360 +0.84% BenchmarkCall-6 90.3 90.7 +0.44% BenchmarkCallArgCopy/size=128 319 323 +1.25% BenchmarkCallArgCopy/size=128-6 329 82.2 -75.02% BenchmarkCallArgCopy/size=256 354 335 -5.37% BenchmarkCallArgCopy/size=256-6 340 85.2 -74.94% BenchmarkCallArgCopy/size=1024 374 703 +87.97% BenchmarkCallArgCopy/size=1024-6 378 95.8 -74.66% BenchmarkCallArgCopy/size=4096 627 631 +0.64% BenchmarkCallArgCopy/size=4096-6 643 120 -81.34% BenchmarkCallArgCopy/size=65536 10502 10169 -3.17% BenchmarkCallArgCopy/size=65536-6 10298 2240 -78.25% BenchmarkFieldByName1 139 132 -5.04% BenchmarkFieldByName1-6 144 24.9 -82.71% BenchmarkFieldByName2 2721 2778 +2.09% BenchmarkFieldByName2-6 3953 578 -85.38% BenchmarkFieldByName3 19136 18357 -4.07% BenchmarkFieldByName3-6 23072 3850 -83.31% BenchmarkInterfaceBig 12.7 15.5 +22.05% BenchmarkInterfaceBig-6 14.2 2.48 -82.54% BenchmarkInterfaceSmall 13.1 15.1 +15.27% BenchmarkInterfaceSmall-6 13.0 2.54 -80.46% BenchmarkNew 43.8 43.0 -1.83% BenchmarkNew-6 40.5 6.67 -83.53% benchmark old MB/s new MB/s speedup BenchmarkCallArgCopy/size=128 400.24 395.15 0.99x BenchmarkCallArgCopy/size=128-6 388.74 1557.76 4.01x BenchmarkCallArgCopy/size=256 722.44 762.44 1.06x BenchmarkCallArgCopy/size=256-6 751.98 3003.83 3.99x BenchmarkCallArgCopy/size=1024 2733.22 1455.50 0.53x BenchmarkCallArgCopy/size=1024-6 2706.40 10687.53 3.95x BenchmarkCallArgCopy/size=4096 6523.32 6488.25 0.99x BenchmarkCallArgCopy/size=4096-6 6363.85 34003.09 5.34x BenchmarkCallArgCopy/size=65536 6239.88 6444.46 1.03x BenchmarkCallArgCopy/size=65536-6 6363.83 29255.26 4.60x benchmark old allocs new allocs delta BenchmarkCall 0 0 +0.00% BenchmarkCall-6 0 0 +0.00% benchmark old bytes new bytes delta BenchmarkCall 0 0 +0.00% BenchmarkCall-6 0 0 +0.00% updates #17973 updates #18177 Change-Id: If70c5c742e8d1b138347f4963ad7cff38fffc018 Reviewed-on: https://go-review.googlesource.com/36831 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Bryan C. Mills authored
Results remain comparable with the non-parallel version with -cpu=1: benchmark old ns/op new ns/op delta BenchmarkEndToEndPipe 6200 6171 -0.47% BenchmarkEndToEndPipe-6 1073 1024 -4.57% BenchmarkEndToEndByteBuffer 2925 2664 -8.92% BenchmarkEndToEndByteBuffer-6 516 560 +8.53% BenchmarkEndToEndSliceByteBuffer 231683 237450 +2.49% BenchmarkEndToEndSliceByteBuffer-6 59080 59452 +0.63% BenchmarkEncodeComplex128Slice 67541 66003 -2.28% BenchmarkEncodeComplex128Slice-6 72740 11316 -84.44% BenchmarkEncodeFloat64Slice 25769 27899 +8.27% BenchmarkEncodeFloat64Slice-6 26655 4557 -82.90% BenchmarkEncodeInt32Slice 18685 18845 +0.86% BenchmarkEncodeInt32Slice-6 18389 3462 -81.17% BenchmarkEncodeStringSlice 19089 19354 +1.39% BenchmarkEncodeStringSlice-6 20155 3237 -83.94% BenchmarkEncodeInterfaceSlice 659601 677129 +2.66% BenchmarkEncodeInterfaceSlice-6 640974 251621 -60.74% BenchmarkDecodeComplex128Slice 117130 129955 +10.95% BenchmarkDecodeComplex128Slice-6 155447 24924 -83.97% BenchmarkDecodeFloat64Slice 67695 68776 +1.60% BenchmarkDecodeFloat64Slice-6 82966 15225 -81.65% BenchmarkDecodeInt32Slice 63102 62733 -0.58% BenchmarkDecodeInt32Slice-6 77857 13003 -83.30% BenchmarkDecodeStringSlice 130240 129562 -0.52% BenchmarkDecodeStringSlice-6 165500 31507 -80.96% BenchmarkDecodeInterfaceSlice 937637 1060835 +13.14% BenchmarkDecodeInterfaceSlice-6 973495 270613 -72.20% updates #18177 Change-Id: Ib3579010faa70827d5cbd02a826dbbb66ca13eb7 Reviewed-on: https://go-review.googlesource.com/36722 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Bryan C. Mills authored
Results remain comparable with the non-parallel version with -cpu=1: benchmark old ns/op new ns/op delta BenchmarkMarshal 31220 28618 -8.33% BenchmarkMarshal-6 37181 7658 -79.40% BenchmarkUnmarshal 81837 83522 +2.06% BenchmarkUnmarshal-6 96339 18244 -81.06% benchmark old allocs new allocs delta BenchmarkMarshal 23 23 +0.00% BenchmarkMarshal-6 23 23 +0.00% BenchmarkUnmarshal 189 189 +0.00% BenchmarkUnmarshal-6 189 189 +0.00% benchmark old bytes new bytes delta BenchmarkMarshal 5776 5776 +0.00% BenchmarkMarshal-6 5776 5776 +0.00% BenchmarkUnmarshal 8576 8576 +0.00% BenchmarkUnmarshal-6 8576 8576 +0.00% updates #18177 Change-Id: I7e7055a11d18896bd54d7d773f2ec64767cdb4c8 Reviewed-on: https://go-review.googlesource.com/36810 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Bryan C. Mills authored
This is a direct port of the version from commit a60ad46e0ed33d02e09bda439efaf9c9727dbc6c (https://go-review.googlesource.com/c/37342/). updates #17973 updates #18177 Change-Id: I63fa5ef6951b1edd39f84927d1181a4df9b15385 Reviewed-on: https://go-review.googlesource.com/36617Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Josh Bleecher Snyder authored
Follow-up to review comments on CL 41797. Mask the input to set2 and set3, so that at the very least, we won't corrupt the rest of the flags in case of a bad input. It also seems more semantically appropriate. Do minor cleanup in addrescapes. I started on larger cleanup, but it wasn't clear that it was an improvement. Add warning comments and sanity checks to Initorder and Class constants, to attempt to prevent them from overflowing their allotted flag bits. Passes toolstash-check. Change-Id: I57b9661ba36f56406aa7a1d8da9b7c70338f9119 Reviewed-on: https://go-review.googlesource.com/41817 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Lynn Boger authored
When the stack register is decremented to acquire stack space at the beginning of a function, a MOVDU should be used so it is done atomically, unless the size of the stack frame is too large for that instruction. The code to determine whether to use MOVDU or MOVD was checking if the function was a leaf and always generating MOVD when it was. The choice of MOVD vs. MOVDU should only depend on the stack frame size. This fixes that problem. Change-Id: I0e49c79036f1e8f7584179e1442b938fc6da085f Reviewed-on: https://go-review.googlesource.com/41813Reviewed-by: Michael Munday <munday@ca.ibm.com>
-
Josh Bleecher Snyder authored
Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
-
Justin Nuß authored
In many cases the records returned by Reader.Read will only be used between calls to Read and become garbage once a new record is read. In this case, instead of allocating a new slice on each call to Read, we can reuse the last allocated slice for successive calls to avoid unnecessary allocations. This change adds a new field ReuseRecord to the Reader struct to enable this reuse. ReuseRecord is false by default to avoid breaking existing code which dependss on the current behaviour. I also added 4 new benchmarks, corresponding to the existing Read benchmarks, which set ReuseRecord to true. Benchstat on my local machine (old is ReuseRecord = false, new is ReuseRecord = true) name old time/op new time/op delta Read-8 2.75µs ± 2% 1.88µs ± 1% -31.52% (p=0.000 n=14+15) ReadWithFieldsPerRecord-8 2.75µs ± 0% 1.89µs ± 1% -31.43% (p=0.000 n=13+13) ReadWithoutFieldsPerRecord-8 2.77µs ± 1% 1.88µs ± 1% -32.06% (p=0.000 n=15+15) ReadLargeFields-8 55.4µs ± 1% 54.2µs ± 0% -2.07% (p=0.000 n=15+14) name old alloc/op new alloc/op delta Read-8 664B ± 0% 24B ± 0% -96.39% (p=0.000 n=15+15) ReadWithFieldsPerRecord-8 664B ± 0% 24B ± 0% -96.39% (p=0.000 n=15+15) ReadWithoutFieldsPerRecord-8 664B ± 0% 24B ± 0% -96.39% (p=0.000 n=15+15) ReadLargeFields-8 3.94kB ± 0% 2.98kB ± 0% -24.39% (p=0.000 n=15+15) name old allocs/op new allocs/op delta Read-8 18.0 ± 0% 8.0 ± 0% -55.56% (p=0.000 n=15+15) ReadWithFieldsPerRecord-8 18.0 ± 0% 8.0 ± 0% -55.56% (p=0.000 n=15+15) ReadWithoutFieldsPerRecord-8 18.0 ± 0% 8.0 ± 0% -55.56% (p=0.000 n=15+15) ReadLargeFields-8 24.0 ± 0% 12.0 ± 0% -50.00% (p=0.000 n=15+15) Fixes #19721 Change-Id: I79b14128bb9bb3465f53f40f93b1b528a9da6f58 Reviewed-on: https://go-review.googlesource.com/41730Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Brandon Bennett authored
Some large testing/build systems require some form of test discovery before running tests. This usually allows for analytics, history, and stats on a per tests basis. Typically these systems are meant used in multi-language environments and the original source code is not known or available. This adds a -test.list option which takes a regular expression as an argument. Any tests, benchmarks, or examples that match that regular expression will be printed, one per line, to stdout and then the program will exit. Since subtests are named/discovered at run time this will only show top-level tests names and is a known limitation. Fixes #17209 Change-Id: I7e607f5f4f084d623a1cae88a1f70e7d92b7f13e Reviewed-on: https://go-review.googlesource.com/41195Reviewed-by: Marcel van Lohuizen <mpvl@golang.org> Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Russ Cox authored
The Context definition to date has not defined what Err returns before the Done channel is closed. Define that it returns nil, as most implementations do. All the standard context implementations (those in package context and in golang.org/x/net/context) return Err() == nil when Done is not yet closed. However, some non-standard implementations may exist that return Err() != nil in this case, as permitted by the Context definition before this date. Call these "errorful implementations". Because all the standard context implementations ensure that Err() == nil when Done is not yet closed, clients now exist that assume Err() != nil implies Done is closed and use calling Err as a quick short-circuit check instead of first doing a non-blocking receive from Done and then, if that succeeds, needing to call Err. This assumption holds for all the standard Context implementations, so these clients work fine in practice, even though they are making unwarranted assumptions about the Context implementations. Call these "technically incorrect clients". If a technically incorrect client encounters an errorful implementation, the client misbehaves. Because there are few errorful implementations, over time we expect that many clients will end up being technically incorrect without realizing it, leading to latent, subtle bugs. If we want to eliminate these latent, subtle bugs, there are two ways to do this: either make errorful implementations more common (exposing the client bugs more often) or redefine the Context interface so that the clients are not buggy after all. If we make errorful implementations more common, such as by changing the standard context implementations to return ErrNotDone instead of nil when Err is called before Done is closed, this will shake out essentially all of the technically incorrect clients, forcing people to find and fix those clients during the transition to Go 1.9. Technically this is allowed by the compatibility policy, but we expect there are many pieces of code assuming that Err() != nil means done, so updating will cause real pain. If instead we disallow errorful implementations, then they will need to be fixed as they are discovered, but the fault will officially lie in the errorful Context implementation, not in the clients. Technically this is disallowed by the compatibility policy, because these errorful implementations were "correct" in earlier versions of Go, except that they didn't work with common client code. We expect there are hardly any errorful implementations, so that disallowing them will be less disruptive and more in the spirit of the compatibility policy. This CL takes the path of expected least disruption, narrowing the Context interface semantics and potentially invalidating existing implementations. A survey of the go-corpus v0.01 turned up only five Context implementations, all trivial and none errorful (details in #19856). We are aware of one early Context implementation inside Google, from before even golang.org/x/net/context existed, that is errorful. The misbehavior of an open-source library when passed such a context is what prompted #19856. That context implementation would be disallowed after this CL and would need to be corrected. We are aware of no other affected context implementations. On the other hand, a survey of the go-corpus v0.01 turned up many instances of client code assuming that Err() == nil implies not done yet (details also in #19856). On balance, narrowing Context and thereby allowing Err() == nil checks should invalidate significantly less code than a push to flush out all the currently technically incorrect Err() == nil checks. If release feedback shows that we're wrong about this balance, we can roll back this CL and try again in Go 1.10. Fixes #19856. Change-Id: Id45d126fac70e1fcc42d73e5a87ca1b66935b831 Reviewed-on: https://go-review.googlesource.com/40291 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Sameer Ajmani <sameer@golang.org>
-
David du Colombier authored
Since close errors have been cleaned up in CL 39997, TestCloseError is failing on Plan 9, because TCPListener.Close didn't check that the listener has already been closed before writing the "hangup" string to the listener control file. This change fixes TCPListener.Close on Plan 9, by closing poll.FD before writing the "hangup" string. Fixes #20128. Change-Id: I13862b23a9055dd1be658acef7066707d98c591f Reviewed-on: https://go-review.googlesource.com/41850 Run-TryBot: David du Colombier <0intro@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Fangming.Fang authored
Change-Id: I251ae497b0ab237d4b3fe98e397052394142d437 Reviewed-on: https://go-review.googlesource.com/41653Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Mike Strosaker authored
This updates sha256.block and sha512.block to use vector instructions. While each round must still be performed independently, this allows for the use of the vshasigma{w,d} crypto acceleration instructions. For crypto/sha256: benchmark old ns/op new ns/op delta BenchmarkHash8Bytes 570 300 -47.37% BenchmarkHash1K 7529 3018 -59.91% BenchmarkHash8K 55308 21938 -60.33% benchmark old MB/s new MB/s speedup BenchmarkHash8Bytes 14.01 26.58 1.90x BenchmarkHash1K 136.00 339.23 2.49x BenchmarkHash8K 148.11 373.40 2.52x For crypto/sha512: benchmark old ns/op new ns/op delta BenchmarkHash8Bytes 725 394 -45.66% BenchmarkHash1K 5062 2107 -58.38% BenchmarkHash8K 34711 13918 -59.90% benchmark old MB/s new MB/s speedup BenchmarkHash8Bytes 11.03 20.29 1.84x BenchmarkHash1K 202.28 485.84 2.40x BenchmarkHash8K 236.00 588.56 2.49x Fixes #20069 Change-Id: I28bffe6e9eb484a83a004116fce84acb4942abca Reviewed-on: https://go-review.googlesource.com/41391 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
-
Aliaksandr Valialkin authored
This may improve perormance during concurrent access to mheap.central array from multiple CPU cores. Change-Id: I8f48dd2e72aa62e9c32de07ae60fe552d8642782 Reviewed-on: https://go-review.googlesource.com/41550Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Emmanuel Odeke authored
Moved the relevant file.close() usages close to after the file opens and put them in defer statements, so that readers don't have to think too much as to where the file is being closed. Change-Id: Ic4190b02ea2f5ac281b9ba104e0023e9f87ca8c7 Reviewed-on: https://go-review.googlesource.com/41796Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Ian Lance Taylor authored
Catch all the cases where a file operation might return ErrFileClosing, and convert to ErrClosed. Use a new method for the conversion, which permits us to remove some KeepAlive calls. Change-Id: I584178f297efe6cb86f3090b2341091b412f1041 Reviewed-on: https://go-review.googlesource.com/41793 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
Change-Id: Id5aa4a1499068bf2d3497b21d794f970b7e47fdf Reviewed-on: https://go-review.googlesource.com/41795 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
Grand savings: 6 bits. Change-Id: I364be54cc41534689e01672ed0fe2c10a560d3d4 Reviewed-on: https://go-review.googlesource.com/41794 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Josh Bleecher Snyder authored
Change-Id: I30c59ba84dcacc3de39c42f94484b47bb7c36eba Reviewed-on: https://go-review.googlesource.com/41792 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Todd Neal authored
Resolve import paths to get plugin symbol prefixes. Fixes #19534 Change-Id: Ic25d83e72465ba8f6be0337218a1627b5dc702dc Reviewed-on: https://go-review.googlesource.com/40994 Run-TryBot: Todd Neal <todd@tneal.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org>
-
Michael Fraenkel authored
The LocalAddrContext should have the network address of the actual interface. Fixes #18686 Change-Id: I9c401eda312f3a0e7e65b013af827aeeef3b4d3d Reviewed-on: https://go-review.googlesource.com/35490 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
Node.Walkdef is 0, 1, or 2, so it only requires two bits. Add support for 2-bit values to bitset, and use it for Node.Walkdef. Class, Embedded, Typecheck, and Initorder will follow suit in subsequent CLs. The multi-bit flags will go at the beginning, since that generates (marginally) more efficient code. Change-Id: Id6e2e66e437f10aaa05b8a6e1652efb327d06128 Reviewed-on: https://go-review.googlesource.com/41791 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Josh Bleecher Snyder authored
It is no longer used. Change-Id: Id64f387867a0503d13eaecda12e6606682c24595 Reviewed-on: https://go-review.googlesource.com/41790 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-