- 01 Mar, 2017 1 commit
-
-
Robert Griesemer authored
No measurable impact on performance (specifically, no degradation). Reverse is used in Huffman en/de-coding. For completeness, here are all the speed-related benchmark results: name old time/op new time/op delta Decode/Digits/Huffman/1e4-8 181µs ± 0% 178µs ± 1% ~ (p=0.100 n=3+3) Decode/Digits/Huffman/1e5-8 1.60ms ± 3% 1.56ms ± 3% ~ (p=0.400 n=3+3) Decode/Digits/Huffman/1e6-8 15.7ms ± 1% 15.3ms ± 3% ~ (p=0.700 n=3+3) Decode/Digits/Speed/1e4-8 179µs ± 0% 180µs ± 0% ~ (p=0.200 n=3+3) Decode/Digits/Speed/1e5-8 1.68ms ± 0% 1.66ms ± 3% ~ (p=0.700 n=3+3) Decode/Digits/Speed/1e6-8 16.6ms ± 2% 16.6ms ± 5% ~ (p=0.700 n=3+3) Decode/Digits/Default/1e4-8 179µs ± 1% 178µs ± 1% ~ (p=0.700 n=3+3) Decode/Digits/Default/1e5-8 1.62ms ± 3% 1.62ms ± 4% ~ (p=1.000 n=3+3) Decode/Digits/Default/1e6-8 16.0ms ± 2% 16.0ms ± 3% ~ (p=1.000 n=3+3) Decode/Digits/Compression/1e4-8 179µs ± 1% 179µs ± 0% ~ (p=0.200 n=3+3) Decode/Digits/Compression/1e5-8 1.62ms ± 2% 1.62ms ± 3% ~ (p=1.000 n=3+3) Decode/Digits/Compression/1e6-8 16.1ms ± 3% 16.0ms ± 3% ~ (p=1.000 n=3+3) Decode/Twain/Huffman/1e4-8 205µs ± 2% 207µs ± 1% ~ (p=1.000 n=3+3) Decode/Twain/Huffman/1e5-8 1.77ms ± 2% 1.77ms ± 4% ~ (p=0.700 n=3+3) Decode/Twain/Huffman/1e6-8 17.4ms ± 2% 17.4ms ± 3% ~ (p=1.000 n=3+3) Decode/Twain/Speed/1e4-8 186µs ± 1% 186µs ± 1% ~ (p=0.400 n=3+3) Decode/Twain/Speed/1e5-8 1.53ms ± 2% 1.52ms ± 0% ~ (p=0.700 n=3+3) Decode/Twain/Speed/1e6-8 14.9ms ± 1% 14.8ms ± 1% ~ (p=1.000 n=3+3) Decode/Twain/Default/1e4-8 176µs ± 1% 174µs ± 0% ~ (p=0.200 n=3+3) Decode/Twain/Default/1e5-8 1.30ms ± 2% 1.31ms ± 1% ~ (p=0.700 n=3+3) Decode/Twain/Default/1e6-8 12.6ms ± 3% 12.5ms ± 0% ~ (p=0.700 n=3+3) Decode/Twain/Compression/1e4-8 177µs ± 0% 174µs ± 1% ~ (p=0.100 n=3+3) Decode/Twain/Compression/1e5-8 1.30ms ± 1% 1.31ms ± 0% ~ (p=0.700 n=3+3) Decode/Twain/Compression/1e6-8 12.5ms ± 1% 12.5ms ± 1% ~ (p=1.000 n=3+3) Encode/Digits/Huffman/1e4-8 47.4µs ± 1% 46.5µs ± 0% ~ (p=0.100 n=3+3) Encode/Digits/Huffman/1e5-8 453µs ± 2% 446µs ± 1% ~ (p=0.700 n=3+3) Encode/Digits/Huffman/1e6-8 4.44ms ± 3% 4.39ms ± 0% ~ (p=1.000 n=3+3) Encode/Digits/Speed/1e4-8 190µs ± 4% 185µs ± 0% ~ (p=0.100 n=3+3) Encode/Digits/Speed/1e5-8 1.78ms ± 5% 1.75ms ± 1% ~ (p=1.000 n=3+3) Encode/Digits/Speed/1e6-8 17.9ms ± 7% 17.3ms ± 1% ~ (p=0.400 n=3+3) Encode/Digits/Default/1e4-8 366µs ± 1% 361µs ± 0% ~ (p=0.200 n=3+3) Encode/Digits/Default/1e5-8 5.58ms ± 5% 5.44ms ± 1% ~ (p=0.400 n=3+3) Encode/Digits/Default/1e6-8 59.0ms ± 3% 58.2ms ± 1% ~ (p=0.700 n=3+3) Encode/Digits/Compression/1e4-8 369µs ± 3% 362µs ± 0% ~ (p=0.100 n=3+3) Encode/Digits/Compression/1e5-8 5.50ms ± 2% 5.47ms ± 1% ~ (p=1.000 n=3+3) Encode/Digits/Compression/1e6-8 59.4ms ± 2% 58.5ms ± 1% ~ (p=0.400 n=3+3) Encode/Twain/Huffman/1e4-8 64.4µs ± 3% 64.7µs ± 1% ~ (p=0.700 n=3+3) Encode/Twain/Huffman/1e5-8 526µs ± 1% 526µs ± 2% ~ (p=1.000 n=3+3) Encode/Twain/Huffman/1e6-8 5.18ms ± 2% 5.17ms ± 1% ~ (p=0.700 n=3+3) Encode/Twain/Speed/1e4-8 206µs ± 1% 204µs ± 0% ~ (p=0.100 n=3+3) Encode/Twain/Speed/1e5-8 1.73ms ± 2% 1.70ms ± 0% ~ (p=0.100 n=3+3) Encode/Twain/Speed/1e6-8 16.7ms ± 0% 16.7ms ± 1% ~ (p=1.000 n=3+3) Encode/Twain/Default/1e4-8 423µs ± 3% 418µs ± 1% ~ (p=1.000 n=3+3) Encode/Twain/Default/1e5-8 6.34ms ± 4% 6.23ms ± 0% ~ (p=1.000 n=3+3) Encode/Twain/Default/1e6-8 68.0ms ± 3% 67.5ms ± 0% ~ (p=0.700 n=3+3) Encode/Twain/Compression/1e4-8 435µs ± 3% 424µs ± 0% ~ (p=0.700 n=3+3) Encode/Twain/Compression/1e5-8 7.01ms ± 1% 6.92ms ± 0% ~ (p=0.100 n=3+3) Encode/Twain/Compression/1e6-8 77.1ms ± 4% 75.5ms ± 1% ~ (p=0.400 n=3+3) name old speed new speed delta Decode/Digits/Huffman/1e4-8 55.2MB/s ± 0% 56.2MB/s ± 1% ~ (p=0.100 n=3+3) Decode/Digits/Huffman/1e5-8 62.4MB/s ± 3% 64.1MB/s ± 3% ~ (p=0.400 n=3+3) Decode/Digits/Huffman/1e6-8 63.8MB/s ± 1% 65.3MB/s ± 3% ~ (p=0.700 n=3+3) Decode/Digits/Speed/1e4-8 55.8MB/s ± 0% 55.4MB/s ± 0% ~ (p=0.200 n=3+3) Decode/Digits/Speed/1e5-8 59.6MB/s ± 0% 60.3MB/s ± 3% ~ (p=0.700 n=3+3) Decode/Digits/Speed/1e6-8 60.1MB/s ± 2% 60.3MB/s ± 4% ~ (p=0.700 n=3+3) Decode/Digits/Default/1e4-8 55.8MB/s ± 1% 56.1MB/s ± 1% ~ (p=0.700 n=3+3) Decode/Digits/Default/1e5-8 61.8MB/s ± 3% 61.7MB/s ± 4% ~ (p=1.000 n=3+3) Decode/Digits/Default/1e6-8 62.4MB/s ± 2% 62.4MB/s ± 3% ~ (p=1.000 n=3+3) Decode/Digits/Compression/1e4-8 55.7MB/s ± 1% 56.0MB/s ± 0% ~ (p=0.300 n=3+3) Decode/Digits/Compression/1e5-8 61.7MB/s ± 2% 61.9MB/s ± 3% ~ (p=1.000 n=3+3) Decode/Digits/Compression/1e6-8 62.2MB/s ± 3% 62.6MB/s ± 3% ~ (p=1.000 n=3+3) Decode/Twain/Huffman/1e4-8 48.8MB/s ± 2% 48.4MB/s ± 1% ~ (p=1.000 n=3+3) Decode/Twain/Huffman/1e5-8 56.4MB/s ± 2% 56.6MB/s ± 4% ~ (p=0.700 n=3+3) Decode/Twain/Huffman/1e6-8 57.6MB/s ± 2% 57.5MB/s ± 3% ~ (p=1.000 n=3+3) Decode/Twain/Speed/1e4-8 53.7MB/s ± 1% 53.9MB/s ± 1% ~ (p=0.400 n=3+3) Decode/Twain/Speed/1e5-8 65.5MB/s ± 2% 65.6MB/s ± 0% ~ (p=0.700 n=3+3) Decode/Twain/Speed/1e6-8 66.9MB/s ± 1% 67.4MB/s ± 1% ~ (p=1.000 n=3+3) Decode/Twain/Default/1e4-8 56.9MB/s ± 1% 57.3MB/s ± 0% ~ (p=0.200 n=3+3) Decode/Twain/Default/1e5-8 77.2MB/s ± 2% 76.6MB/s ± 1% ~ (p=0.700 n=3+3) Decode/Twain/Default/1e6-8 79.3MB/s ± 3% 80.0MB/s ± 0% ~ (p=0.700 n=3+3) Decode/Twain/Compression/1e4-8 56.4MB/s ± 0% 57.5MB/s ± 1% ~ (p=0.100 n=3+3) Decode/Twain/Compression/1e5-8 76.8MB/s ± 1% 76.5MB/s ± 0% ~ (p=0.700 n=3+3) Decode/Twain/Compression/1e6-8 80.1MB/s ± 1% 79.8MB/s ± 1% ~ (p=1.000 n=3+3) Encode/Digits/Huffman/1e4-8 211MB/s ± 1% 215MB/s ± 0% ~ (p=0.100 n=3+3) Encode/Digits/Huffman/1e5-8 221MB/s ± 2% 224MB/s ± 1% ~ (p=0.700 n=3+3) Encode/Digits/Huffman/1e6-8 225MB/s ± 3% 228MB/s ± 0% ~ (p=1.000 n=3+3) Encode/Digits/Speed/1e4-8 52.8MB/s ± 4% 54.1MB/s ± 0% ~ (p=0.100 n=3+3) Encode/Digits/Speed/1e5-8 56.2MB/s ± 5% 57.0MB/s ± 1% ~ (p=1.000 n=3+3) Encode/Digits/Speed/1e6-8 56.0MB/s ± 6% 57.7MB/s ± 1% ~ (p=0.400 n=3+3) Encode/Digits/Default/1e4-8 27.3MB/s ± 1% 27.7MB/s ± 0% ~ (p=0.200 n=3+3) Encode/Digits/Default/1e5-8 17.9MB/s ± 4% 18.4MB/s ± 1% ~ (p=0.400 n=3+3) Encode/Digits/Default/1e6-8 17.0MB/s ± 3% 17.2MB/s ± 1% ~ (p=0.500 n=3+3) Encode/Digits/Compression/1e4-8 27.1MB/s ± 3% 27.6MB/s ± 0% ~ (p=0.100 n=3+3) Encode/Digits/Compression/1e5-8 18.2MB/s ± 2% 18.3MB/s ± 1% ~ (p=1.000 n=3+3) Encode/Digits/Compression/1e6-8 16.9MB/s ± 2% 17.1MB/s ± 1% ~ (p=0.400 n=3+3) Encode/Twain/Huffman/1e4-8 155MB/s ± 3% 155MB/s ± 1% ~ (p=0.700 n=3+3) Encode/Twain/Huffman/1e5-8 190MB/s ± 1% 190MB/s ± 2% ~ (p=1.000 n=3+3) Encode/Twain/Huffman/1e6-8 193MB/s ± 2% 193MB/s ± 1% ~ (p=0.700 n=3+3) Encode/Twain/Speed/1e4-8 48.5MB/s ± 1% 49.1MB/s ± 0% ~ (p=0.100 n=3+3) Encode/Twain/Speed/1e5-8 57.7MB/s ± 2% 59.0MB/s ± 0% ~ (p=0.100 n=3+3) Encode/Twain/Speed/1e6-8 59.7MB/s ± 0% 59.7MB/s ± 1% ~ (p=1.000 n=3+3) Encode/Twain/Default/1e4-8 23.6MB/s ± 3% 23.9MB/s ± 1% ~ (p=1.000 n=3+3) Encode/Twain/Default/1e5-8 15.8MB/s ± 4% 16.1MB/s ± 0% ~ (p=1.000 n=3+3) Encode/Twain/Default/1e6-8 14.7MB/s ± 3% 14.8MB/s ± 0% ~ (p=0.700 n=3+3) Encode/Twain/Compression/1e4-8 23.0MB/s ± 3% 23.6MB/s ± 0% ~ (p=0.700 n=3+3) Encode/Twain/Compression/1e5-8 14.3MB/s ± 1% 14.5MB/s ± 0% ~ (p=0.100 n=3+3) Encode/Twain/Compression/1e6-8 13.0MB/s ± 4% 13.2MB/s ± 1% ~ (p=0.400 n=3+3) Measured on a "quiet" (no browser running) 2.3 GHz Intel Core i7, running macOS 10.12.3. See also #19279. Change-Id: Ice759eb34eb37442b543957447c264e0aadc1fa9 Reviewed-on: https://go-review.googlesource.com/37460 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
- 28 Feb, 2017 26 commits
-
-
Robert Griesemer authored
Removes an extra function call for TrailingZeroes and thus may increase chances for inlining. Change-Id: Iefd8d4402dc89b64baf4e5c865eb3dadade623af Reviewed-on: https://go-review.googlesource.com/37613 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
Change-Id: Iab1e84624c0288ebdd33fbe83bd60948b5d91fc4 Reviewed-on: https://go-review.googlesource.com/37612Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Brad Fitzpatrick authored
Nobody intends to have duplicates anyway because it's so undefined and everything handles it so poorly. Removing duplicates automatically simplifies code and makes existing code do what people already expect. Fixes #12868 Change-Id: I95eeba8c59ff94d0f018012a6f4e031aaabfd5d9 Reviewed-on: https://go-review.googlesource.com/37586 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
Martin Möhrmann authored
The new Map implementation introduced in golang.org/cl/33201 did not differentiate if an invalid UTF-8 sequence was decoded or the RuneError rune. It would therefore always advance by 3 bytes (which is the length of the RuneError rune) instead of 1 for an invalid sequences. This cl adds a check to correctly determine the length of bytes needed to advance to the next rune. Fixes #19330. Change-Id: I1e7f9333f3ef6068ffc64015bb0a9f32b0b7111d Reviewed-on: https://go-review.googlesource.com/37597 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Keith Randall authored
There's no need to use @block rules, as canMergeLoad makes sure that the load and op are already in the same block. With no @block needed, we also don't need to set the type explicitly. It can just be inherited from the op being rewritten. Noticed while working on #19284. Change-Id: Ied8bcc8058260118ff7e166093112e29107bcb7e Reviewed-on: https://go-review.googlesource.com/37585 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Robert Griesemer authored
SizesFor returns a Sizes implementation for a supported architecture. Use functionality in srcimporter. Change-Id: I197e641b419c678030dfaab5c5b8c569fd0410f3 Reviewed-on: https://go-review.googlesource.com/37583 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>
-
Robert Griesemer authored
benchmark old ns/op new ns/op delta BenchmarkLeadingZeros-8 8.43 3.10 -63.23% BenchmarkLeadingZeros8-8 8.13 1.33 -83.64% BenchmarkLeadingZeros16-8 7.34 2.07 -71.80% BenchmarkLeadingZeros32-8 7.99 2.87 -64.08% BenchmarkLeadingZeros64-8 8.13 2.96 -63.59% Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3. Change-Id: Id343531b408d42ac45f10c76f60e85bdb977f91e Reviewed-on: https://go-review.googlesource.com/37582Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Robert Griesemer authored
For sizes > 8, the existing code is faster. benchmark old ns/op new ns/op delta BenchmarkTrailingZeros8-8 1.95 1.29 -33.85% Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3. Change-Id: I6f3a33ec633a2c544ec29693c141f2f99335c745 Reviewed-on: https://go-review.googlesource.com/37581Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Robert Griesemer authored
For uint64, the existing algorithm is faster. benchmark old ns/op new ns/op delta BenchmarkOnesCount8-8 1.95 0.97 -50.26% BenchmarkOnesCount16-8 2.54 1.39 -45.28% BenchmarkOnesCount32-8 2.61 1.96 -24.90% Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3. Change-Id: I6cc42882fef3d24694720464039161e339a9ae99 Reviewed-on: https://go-review.googlesource.com/37580Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
During map growth, buckets are evacuated in two ways. When a value is altered, its containing bucket is evacuated. Also, an evacuation mark is maintained and advanced every time. Prior to this CL, the evacuation mark was always incremented, even if the next bucket to be evacuated had already been evacuated. This CL changes evacuation mark advancement to skip previously evacuated buckets. This has the effect of making map evacuation both more aggressive and more consistent. Aggressive map evacuation is good. While the map is growing, map accesses must check two buckets, which may be far apart in memory. Map growth also delays garbage collection. And if map evacuation is not aggressive enough, there is a risk that a populate-once read-many map may be stuck permanently in map growth. This CL does not eliminate that possibility, but it shrinks the window. There is minimal impact on map benchmarks: name old time/op new time/op delta MapPop100-8 12.4µs ±11% 12.4µs ± 7% ~ (p=0.798 n=15+15) MapPop1000-8 240µs ± 8% 235µs ± 8% ~ (p=0.217 n=15+14) MapPop10000-8 4.49ms ±10% 4.51ms ±15% ~ (p=1.000 n=15+13) MegMap-8 11.9ns ± 2% 11.8ns ± 0% -1.01% (p=0.000 n=15+11) MegOneMap-8 9.30ns ± 1% 9.29ns ± 1% ~ (p=0.955 n=14+14) MegEqMap-8 31.9µs ± 5% 31.9µs ± 3% ~ (p=0.935 n=15+15) MegEmptyMap-8 2.41ns ± 2% 2.41ns ± 0% ~ (p=0.594 n=12+14) SmallStrMap-8 12.8ns ± 1% 12.7ns ± 1% ~ (p=0.569 n=14+13) MapStringKeysEight_16-8 13.6ns ± 1% 13.7ns ± 2% ~ (p=0.100 n=13+15) MapStringKeysEight_32-8 12.1ns ± 1% 12.1ns ± 2% ~ (p=0.340 n=15+15) MapStringKeysEight_64-8 12.1ns ± 1% 12.1ns ± 2% ~ (p=0.582 n=15+14) MapStringKeysEight_1M-8 12.0ns ± 1% 12.1ns ± 1% ~ (p=0.267 n=15+14) IntMap-8 7.96ns ± 1% 7.97ns ± 2% ~ (p=0.991 n=15+13) RepeatedLookupStrMapKey32-8 15.8ns ± 2% 15.8ns ± 1% ~ (p=0.393 n=15+14) RepeatedLookupStrMapKey1M-8 35.3µs ± 2% 35.3µs ± 1% ~ (p=0.815 n=15+15) NewEmptyMap-8 36.0ns ± 4% 36.4ns ± 7% ~ (p=0.270 n=15+15) NewSmallMap-8 85.5ns ± 1% 85.6ns ± 1% ~ (p=0.674 n=14+15) MapIter-8 89.9ns ± 6% 90.8ns ± 6% ~ (p=0.467 n=15+15) MapIterEmpty-8 10.0ns ±22% 10.0ns ±25% ~ (p=0.846 n=15+15) SameLengthMap-8 4.18ns ± 1% 4.17ns ± 1% ~ (p=0.653 n=15+14) BigKeyMap-8 20.2ns ± 1% 20.1ns ± 1% -0.82% (p=0.002 n=15+15) BigValMap-8 22.5ns ± 8% 22.3ns ± 6% ~ (p=0.615 n=15+15) SmallKeyMap-8 15.3ns ± 1% 15.3ns ± 1% ~ (p=0.754 n=15+14) ComplexAlgMap-8 58.4ns ± 1% 58.7ns ± 1% +0.52% (p=0.000 n=14+15) There is a tiny but detectable difference in the compiler: name old time/op new time/op delta Template 218ms ± 5% 219ms ± 4% ~ (p=0.094 n=98+98) Unicode 93.6ms ± 5% 93.6ms ± 4% ~ (p=0.910 n=94+95) GoTypes 596ms ± 5% 598ms ± 6% ~ (p=0.533 n=98+100) Compiler 2.72s ± 3% 2.72s ± 4% ~ (p=0.238 n=100+99) SSA 4.11s ± 3% 4.11s ± 3% ~ (p=0.864 n=99+98) Flate 129ms ± 6% 129ms ± 4% ~ (p=0.522 n=98+96) GoParser 151ms ± 4% 151ms ± 4% -0.48% (p=0.017 n=96+96) Reflect 379ms ± 3% 376ms ± 4% -0.57% (p=0.011 n=99+99) Tar 112ms ± 5% 112ms ± 6% ~ (p=0.688 n=93+95) XML 214ms ± 4% 214ms ± 5% ~ (p=0.968 n=100+99) StdCmd 16.2s ± 2% 16.2s ± 2% -0.26% (p=0.048 n=99+99) name old user-ns/op new user-ns/op delta Template 252user-ms ± 4% 250user-ms ± 4% -0.63% (p=0.020 n=98+97) Unicode 113user-ms ± 7% 114user-ms ± 5% ~ (p=0.057 n=97+94) GoTypes 776user-ms ± 5% 777user-ms ± 5% ~ (p=0.375 n=97+96) Compiler 3.61user-s ± 3% 3.60user-s ± 3% ~ (p=0.445 n=98+93) SSA 5.84user-s ± 6% 5.85user-s ± 5% ~ (p=0.542 n=100+95) Flate 154user-ms ± 5% 154user-ms ± 5% ~ (p=0.699 n=99+99) GoParser 184user-ms ± 6% 183user-ms ± 4% ~ (p=0.557 n=98+95) Reflect 461user-ms ± 5% 462user-ms ± 4% ~ (p=0.853 n=97+99) Tar 130user-ms ± 5% 129user-ms ± 6% ~ (p=0.567 n=93+100) XML 257user-ms ± 6% 258user-ms ± 6% ~ (p=0.205 n=99+100) Change-Id: Id92dd54a152904069aac415e6aaaab5c67f5f476 Reviewed-on: https://go-review.googlesource.com/37011Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Matthew Dempsky authored
Change-Id: I0c7b677657326f318e906e109cbda0cfa78c4973 Reviewed-on: https://go-review.googlesource.com/37537 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
-
philhofer authored
Add rewrite rules that canonicalize the location of constants in expressions, and fold conststants that appear in operations that can be trivially reassociated. After this change, the compiler constant-folds expressions like "4 + x - 1" and "4 & x & 1" Benchmarks affected on darwin/amd64: name old time/op new time/op delta FmtFprintfInt-8 82.1ns ± 1% 81.7ns ± 1% -0.46% (p=0.023 n=8+9) FmtFprintfIntInt-8 122ns ± 2% 120ns ± 2% -1.48% (p=0.047 n=10+10) FmtManyArgs-8 493ns ± 0% 486ns ± 1% -1.37% (p=0.000 n=8+10) Gzip-8 230ms ± 0% 229ms ± 1% -0.46% (p=0.001 n=10+9) HTTPClientServer-8 74.5µs ± 1% 73.7µs ± 1% -1.11% (p=0.000 n=10+10) JSONDecode-8 51.7ms ± 0% 51.9ms ± 1% +0.42% (p=0.017 n=10+9) RegexpMatchEasy0_32-8 82.6ns ± 1% 81.7ns ± 0% -1.02% (p=0.000 n=9+8) RegexpMatchMedium_32-8 121ns ± 1% 120ns ± 1% -1.48% (p=0.001 n=10+10) Revcomp-8 426ms ± 1% 400ms ± 1% -6.16% (p=0.000 n=10+10) TimeFormat-8 330ns ± 1% 327ns ± 0% -0.82% (p=0.000 n=10+10) name old speed new speed delta Gzip-8 84.4MB/s ± 0% 84.8MB/s ± 1% +0.47% (p=0.001 n=10+9) JSONDecode-8 37.6MB/s ± 0% 37.4MB/s ± 1% -0.42% (p=0.016 n=10+9) RegexpMatchEasy0_32-8 387MB/s ± 1% 392MB/s ± 0% +1.06% (p=0.000 n=9+8) RegexpMatchMedium_32-8 8.21MB/s ± 1% 8.34MB/s ± 1% +1.58% (p=0.000 n=10+9) Revcomp-8 597MB/s ± 1% 636MB/s ± 1% +6.57% (p=0.000 n=10+10) Change-Id: Ie37ff91605b76a984a8400dfd1e34f50bf61c864 Reviewed-on: https://go-review.googlesource.com/37290Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Josh Bleecher Snyder authored
Prior to this CL, all runtime conversions from a concrete value to an interface went through one of two runtime calls: convT2E or convT2I. However, in practice, basic types are very common. Specializing convT2x for those basic types allows for a more efficient implementation for those types. For basic scalars and strings, allocation and copying can use the same methods as normal code. For pointer-free types, allocation can occur without zeroing, and copying can take place without GC calls. For slices, copying is cheaper and simpler. This CL adds twelve runtime routines: convT2E16, convT2I16 convT2E32, convT2I32 convT2E64, convT2I64 convT2Estring, convT2Istring convT2Eslice, convT2Islice convT2Enoptr, convT2Inoptr While compiling make.bash, 93% of all convT2x calls are now to one of these specialized convT2x call. Within specialized convT2x routines, it is cheap to check for a zero value, in a way that it is not in general. When we detect a zero value there, we return a pointer to zeroVal, rather than allocating. name old time/op new time/op delta ConvT2Ezero/zero/16-8 17.9ns ± 2% 3.0ns ± 3% -83.20% (p=0.000 n=56+56) ConvT2Ezero/zero/32-8 17.8ns ± 2% 3.0ns ± 3% -83.15% (p=0.000 n=59+60) ConvT2Ezero/zero/64-8 20.1ns ± 1% 3.0ns ± 2% -84.98% (p=0.000 n=57+57) ConvT2Ezero/zero/str-8 32.6ns ± 1% 3.0ns ± 4% -90.70% (p=0.000 n=59+60) ConvT2Ezero/zero/slice-8 36.7ns ± 2% 3.0ns ± 2% -91.78% (p=0.000 n=59+59) ConvT2Ezero/zero/big-8 91.9ns ± 2% 85.9ns ± 2% -6.52% (p=0.000 n=57+57) ConvT2Ezero/nonzero/16-8 17.7ns ± 2% 12.7ns ± 3% -28.38% (p=0.000 n=55+60) ConvT2Ezero/nonzero/32-8 17.8ns ± 1% 12.7ns ± 1% -28.44% (p=0.000 n=54+57) ConvT2Ezero/nonzero/64-8 20.0ns ± 1% 15.0ns ± 1% -24.90% (p=0.000 n=56+58) ConvT2Ezero/nonzero/str-8 32.6ns ± 1% 25.7ns ± 1% -21.17% (p=0.000 n=58+55) ConvT2Ezero/nonzero/slice-8 36.8ns ± 2% 30.4ns ± 1% -17.32% (p=0.000 n=60+52) ConvT2Ezero/nonzero/big-8 92.1ns ± 2% 85.9ns ± 2% -6.70% (p=0.000 n=57+59) Benchmarks on a real program (the compiler): name old time/op new time/op delta Template 227ms ± 5% 221ms ± 2% -2.48% (p=0.000 n=30+26) Unicode 102ms ± 5% 100ms ± 3% -1.30% (p=0.009 n=30+26) GoTypes 656ms ± 5% 659ms ± 4% ~ (p=0.208 n=30+30) Compiler 2.82s ± 2% 2.82s ± 1% ~ (p=0.614 n=29+27) Flate 128ms ± 2% 128ms ± 5% ~ (p=0.783 n=27+28) GoParser 158ms ± 3% 158ms ± 3% ~ (p=0.261 n=28+30) Reflect 408ms ± 7% 401ms ± 3% ~ (p=0.075 n=30+30) Tar 123ms ± 6% 121ms ± 8% ~ (p=0.287 n=29+30) XML 220ms ± 2% 220ms ± 4% ~ (p=0.805 n=29+29) name old user-ns/op new user-ns/op delta Template 281user-ms ± 4% 279user-ms ± 3% -0.87% (p=0.044 n=28+28) Unicode 142user-ms ± 4% 141user-ms ± 3% -1.04% (p=0.015 n=30+27) GoTypes 884user-ms ± 3% 886user-ms ± 2% ~ (p=0.532 n=30+30) Compiler 3.94user-s ± 3% 3.92user-s ± 1% ~ (p=0.185 n=30+28) Flate 165user-ms ± 2% 165user-ms ± 4% ~ (p=0.780 n=27+29) GoParser 209user-ms ± 2% 208user-ms ± 3% ~ (p=0.453 n=28+30) Reflect 533user-ms ± 6% 526user-ms ± 3% ~ (p=0.057 n=30+30) Tar 156user-ms ± 6% 154user-ms ± 6% ~ (p=0.133 n=29+30) XML 288user-ms ± 4% 288user-ms ± 4% ~ (p=0.633 n=30+30) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 40.9MB ± 0% -0.11% (p=0.000 n=29+29) Unicode 32.6MB ± 0% 32.6MB ± 0% ~ (p=0.572 n=29+30) GoTypes 122MB ± 0% 122MB ± 0% -0.10% (p=0.000 n=30+30) Compiler 482MB ± 0% 481MB ± 0% -0.07% (p=0.000 n=30+29) Flate 26.6MB ± 0% 26.6MB ± 0% ~ (p=0.096 n=30+30) GoParser 32.7MB ± 0% 32.6MB ± 0% -0.06% (p=0.011 n=28+28) Reflect 84.2MB ± 0% 84.1MB ± 0% -0.17% (p=0.000 n=29+30) Tar 27.7MB ± 0% 27.7MB ± 0% -0.05% (p=0.032 n=27+28) XML 44.7MB ± 0% 44.7MB ± 0% ~ (p=0.131 n=28+30) name old allocs/op new allocs/op delta Template 373k ± 1% 370k ± 1% -0.76% (p=0.000 n=30+30) Unicode 325k ± 1% 325k ± 1% ~ (p=0.383 n=29+30) GoTypes 1.16M ± 0% 1.15M ± 0% -0.75% (p=0.000 n=29+30) Compiler 4.15M ± 0% 4.13M ± 0% -0.59% (p=0.000 n=30+29) Flate 238k ± 1% 237k ± 1% -0.62% (p=0.000 n=30+30) GoParser 304k ± 1% 302k ± 1% -0.64% (p=0.000 n=30+28) Reflect 1.00M ± 0% 0.99M ± 0% -1.10% (p=0.000 n=29+30) Tar 245k ± 1% 244k ± 1% -0.59% (p=0.000 n=27+29) XML 391k ± 1% 389k ± 1% -0.59% (p=0.000 n=29+30) Change-Id: Id7f456d690567c2b0a96b0d6d64de8784b6e305f Reviewed-on: https://go-review.googlesource.com/36476 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Cherry Zhang authored
runtime.memclr* functions have signatures func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr) func memclrHasPointers(ptr unsafe.Pointer, n uintptr) Update compiler's copy. Also teach gc/mkbuiltin.go to handle unsafe.Pointer. The import statement and its support is not really necessary, but just to make it look like real Go code. Fixes #19185. Change-Id: I251d02571fde2716d4727e31e04d56ec04b6f22a Reviewed-on: https://go-review.googlesource.com/37257 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
-
Josh Bleecher Snyder authored
Change-Id: I3d96b9803dbbd7184f96240bd7944af919ca1376 Reviewed-on: https://go-review.googlesource.com/37579 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
The real world code that inspired this fix, from runtime/pprof/map.go: // Compute hash of (stk, tag). h := uintptr(0) for _, x := range stk { h = h<<8 | (h >> (8 * (unsafe.Sizeof(h) - 1))) h += uintptr(x) * 41 } h = h<<8 | (h >> (8 * (unsafe.Sizeof(h) - 1))) h += uintptr(tag) * 41 Change-Id: I99a95b97cba73811faedb0b9a1b9b54e9a1784a3 Reviewed-on: https://go-review.googlesource.com/37574 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
This is an inconsequential consequence of updating math/big to use math/bits. Better would be to teach the vet shift test to size int/uint/uintptr to the platform in use, eliminating the whole category of "might be too small". Filed #19321 for that. Change-Id: I7e0b837bd329132d7a564468c18502dd2e724fc6 Reviewed-on: https://go-review.googlesource.com/37576 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Brad Fitzpatrick authored
This makes the vetall builder friendly to auto-sharding by the build coordinator. Change-Id: I0893f5051ec90e7a6adcb89904ba08cd2d590549 Reviewed-on: https://go-review.googlesource.com/37572Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
-
Josh Bleecher Snyder authored
Change-Id: I68e60b155c583fa47aa5ca13d591851009a4e571 Reviewed-on: https://go-review.googlesource.com/37571Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Michael Munday authored
Explcitly block fused multiply-add pattern matching when a cast is used after the multiplication, for example: - (a * b) + c // can emit fused multiply-add - float64(a * b) + c // cannot emit fused multiply-add float{32,64} and complex{64,128} casts of matching types are now kept as OCONV operations rather than being replaced with OCONVNOP operations because they now imply a rounding operation (and therefore aren't a no-op anymore). Operations (for example, multiplication) on complex types may utilize fused multiply-add and -subtract instructions internally. There is no way to disable this behavior at the moment. Improves the performance of the floating point implementation of poly1305: name old speed new speed delta 64 246MB/s ± 0% 275MB/s ± 0% +11.48% (p=0.000 n=10+8) 1K 312MB/s ± 0% 357MB/s ± 0% +14.41% (p=0.000 n=10+10) 64Unaligned 246MB/s ± 0% 274MB/s ± 0% +11.43% (p=0.000 n=10+10) 1KUnaligned 312MB/s ± 0% 357MB/s ± 0% +14.39% (p=0.000 n=10+8) Updates #17895. Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16 Reviewed-on: https://go-review.googlesource.com/36963 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
David du Colombier authored
The checkAVX2 test doesn't appear to be correct, because it always returns the value of support_bmi2, even if the value of support_avx2 is false. Consequently, checkAVX2 always returns true, as long as BMI2 is supported, even if AVX2 is not supported. We change checkAVX2 to return false when support_avx2 is false. Fixes #19316. Change-Id: I2ec9dfaa09f4b54c4a03d60efef891b955d60578 Reviewed-on: https://go-review.googlesource.com/37590 Run-TryBot: David du Colombier <0intro@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
-
Martin Möhrmann authored
Fixes #18376. Change-Id: I4fe24f479311cd4cd1bdad9a966b681e50e3d500 Reviewed-on: https://go-review.googlesource.com/35955Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Carlo Alberto Ferraris authored
During benchmark of an internal tool we found out that (*Buffer).Reset() was surprisingly showing up in CPU profiles. This CL contains two related changes aimed at speeding up Reset(): 1. Create a fast path for Truncate(0) by moving the logic to Reset() (this makes Reset() a simple leaf func that gets inlined since it gets compiled to 3 MOVx instructions). Accordingly change calls in the rest of the Buffer methods to call Reset() instead of Truncate(0). 2. Reorder the fields in the Buffer struct so that frequently accessed fields are packed together (buf, off, lastRead). This also make them likely to be in the same cacheline. Ideally it would be advisable to have Buffer{} cacheline-aligned, but I couldn't find a way to do this without changing the size of the bootstrap array (but this will cause some regressions, because it will make duffcopy show up in CPU profiles where it wasn't showing up before). go1 benchmarks are not really affected, but some other benchmarks that exercise Buffer more show improvements: name old time/op new time/op delta BinaryTree17-4 2.46s ± 9% 2.43s ± 3% ~ (p=0.982 n=14+14) Fannkuch11-4 2.98s ± 1% 2.90s ± 1% -2.58% (p=0.000 n=15+14) FmtFprintfEmpty-4 45.2ns ± 1% 45.2ns ± 1% ~ (p=0.494 n=14+15) FmtFprintfString-4 76.8ns ± 1% 83.1ns ± 2% +8.23% (p=0.000 n=10+15) FmtFprintfInt-4 78.0ns ± 2% 74.6ns ± 1% -4.46% (p=0.000 n=15+15) FmtFprintfIntInt-4 113ns ± 1% 109ns ± 2% -2.91% (p=0.000 n=13+15) FmtFprintfPrefixedInt-4 152ns ± 2% 143ns ± 2% -6.04% (p=0.000 n=15+14) FmtFprintfFloat-4 224ns ± 1% 222ns ± 2% -1.08% (p=0.001 n=15+14) FmtManyArgs-4 464ns ± 2% 463ns ± 2% ~ (p=0.303 n=14+15) GobDecode-4 6.25ms ± 2% 6.32ms ± 3% +1.20% (p=0.002 n=14+14) GobEncode-4 5.41ms ± 2% 5.41ms ± 2% ~ (p=0.967 n=15+15) Gzip-4 215ms ± 2% 218ms ± 2% +1.35% (p=0.002 n=15+15) Gunzip-4 34.3ms ± 2% 34.2ms ± 2% ~ (p=0.539 n=15+15) HTTPClientServer-4 76.4µs ± 2% 75.4µs ± 1% -1.31% (p=0.000 n=15+15) JSONEncode-4 14.7ms ± 2% 14.6ms ± 3% ~ (p=0.094 n=14+14) JSONDecode-4 48.0ms ± 1% 48.5ms ± 1% +0.92% (p=0.001 n=14+12) Mandelbrot200-4 4.04ms ± 2% 4.06ms ± 1% ~ (p=0.108 n=15+13) GoParse-4 2.99ms ± 2% 3.00ms ± 1% ~ (p=0.130 n=15+13) RegexpMatchEasy0_32-4 78.3ns ± 1% 79.5ns ± 1% +1.51% (p=0.000 n=15+14) RegexpMatchEasy0_1K-4 185ns ± 1% 186ns ± 1% +0.76% (p=0.005 n=15+15) RegexpMatchEasy1_32-4 79.0ns ± 2% 76.7ns ± 1% -2.87% (p=0.000 n=14+15) name old speed new speed delta GobDecode-4 123MB/s ± 2% 121MB/s ± 3% -1.18% (p=0.002 n=14+14) GobEncode-4 142MB/s ± 2% 142MB/s ± 1% ~ (p=0.959 n=15+15) Gzip-4 90.3MB/s ± 2% 89.1MB/s ± 2% -1.34% (p=0.002 n=15+15) Gunzip-4 565MB/s ± 2% 567MB/s ± 2% ~ (p=0.539 n=15+15) JSONEncode-4 132MB/s ± 2% 133MB/s ± 3% ~ (p=0.091 n=14+14) JSONDecode-4 40.4MB/s ± 1% 40.0MB/s ± 1% -0.92% (p=0.001 n=14+12) GoParse-4 19.4MB/s ± 2% 19.3MB/s ± 1% ~ (p=0.121 n=15+13) RegexpMatchEasy0_32-4 409MB/s ± 1% 403MB/s ± 1% -1.47% (p=0.000 n=15+14) RegexpMatchEasy0_1K-4 5.53GB/s ± 1% 5.49GB/s ± 1% -0.86% (p=0.002 n=15+15) RegexpMatchEasy1_32-4 405MB/s ± 2% 417MB/s ± 1% +2.94% (p=0.000 n=14+15) name old time/op new time/op delta PoolsSingle1K-4 34.9ns ± 2% 30.4ns ± 4% -12.80% (p=0.000 n=15+15) PoolsSingle64K-4 36.9ns ± 1% 34.4ns ± 4% -6.72% (p=0.000 n=14+15) PoolsRandomSmall-4 34.8ns ± 3% 29.5ns ± 1% -15.19% (p=0.000 n=15+14) PoolsRandomLarge-4 38.6ns ± 1% 34.3ns ± 3% -11.17% (p=0.000 n=14+15) PoolSingle1K-4 26.1ns ± 1% 21.2ns ± 2% -18.59% (p=0.000 n=15+14) PoolSingle64K-4 26.7ns ± 2% 21.5ns ± 2% -19.72% (p=0.000 n=15+15) MakeSingle1K-4 24.2ns ± 2% 24.3ns ± 3% ~ (p=0.132 n=13+15) MakeSingle64K-4 6.76µs ± 1% 6.96µs ± 5% +2.94% (p=0.002 n=13+13) MakeRandomSmall-4 531ns ± 4% 538ns ± 5% ~ (p=0.066 n=14+15) MakeRandomLarge-4 152µs ± 0% 152µs ± 1% -0.31% (p=0.001 n=14+13) Change-Id: I86d7d9d2cac65335baf62214fbb35ba0fd8f9528 Reviewed-on: https://go-review.googlesource.com/37416 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
Josh Bleecher Snyder authored
Fix up and enable a few rules. They trigger a handful of times in std, despite the frontend handling. Change-Id: I83378c057cbbc95a4f2b58cd8c36aec0e9dc547f Reviewed-on: https://go-review.googlesource.com/37227 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Michael Munday authored
A type conversion inserted between MOVD{LT,LE,GT,GE,EQ,NE} and CMPWconst by CL 36256 broke the rewrite rule designed to merge the two. This results in simple for loops (e.g. for i := 0; i < N; i++ {}) emitting two comparisons instead of one, plus a conditional move. This CL explicitly types the input to CMPWconst so that the type conversion can be omitted. It also adds a test to check that conditional moves aren't emitted for loops with 'less than' conditions (i.e. i < N) on s390x. Fixes #19227. Change-Id: Ia39e806ed723791c3c755951aef23f957828ea3e Reviewed-on: https://go-review.googlesource.com/37334Reviewed-by: Keith Randall <khr@golang.org>
-
Joe Tsai authored
Changes made: * Adjust the documented form for a URL to make it more obvious what happens when the scheme is missing. * Remove references to Go1.5. We are sufficiently far along enough that this distinction no longer matters. * Remove the "Opaque" example which provides a hacky and misleading use of the Opaque field. This workaround is no longer necessary since RawPath was added in Go1.5 and the obvious approach just works: // The raw string "/%2f/" will be sent as expected. req, _ := http.NewRequest("GET", "https://example.com/%2f/") Fixes #18824 Change-Id: Ie33d27222e06025ce8025f8a0f04b601aaee1513 Reviewed-on: https://go-review.googlesource.com/36127 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
- 27 Feb, 2017 13 commits
-
-
Robert Griesemer authored
For #11415. Change-Id: I5da39dad059113cfc4276152390aa4925bd18862 Reviewed-on: https://go-review.googlesource.com/37405 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>
-
Austin Clements authored
The comments in cmd/internal/obj/funcdata.go are identical to the comments in runtime/funcdata.h, but the majority of the definitions they refer to don't apply to Go sources and have been stripped out of funcdata.go. Remove these stale comments from funcdata.go and clean up the references to other copies of the PCDATA and FUNCDATA indexes. Change-Id: I5d6e49a6e586cc9aecd7c3ce1567679f2a605884 Reviewed-on: https://go-review.googlesource.com/37330Reviewed-by: Keith Randall <khr@golang.org>
-
Kevin Burke authored
If cgo is not available, parse /etc/group in Go to find the name/gid we need. This does not consult the Network Information System (NIS), /etc/nsswitch.conf or any other libc extensions to /etc/group. Fixes #18102. Change-Id: I6ae4fe0e2c899396c45cdf243d5483113932657c Reviewed-on: https://go-review.googlesource.com/33713Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Robert Griesemer authored
For #11415. Change-Id: I87a8f534ab9dfd5022422457ea637b342c057d77 Reviewed-on: https://go-review.googlesource.com/37393Reviewed-by: Alan Donovan <adonovan@google.com>
-
Josh Bleecher Snyder authored
This is the escape analysis analog of CL 37499. Fixes #12397 Fixes #16871 The only "moved to heap" decisions eliminated by this CL in std+cmd are: cmd/compile/internal/gc/const.go:1514: moved to heap: ac cmd/compile/internal/gc/const.go:1515: moved to heap: bd cmd/compile/internal/gc/const.go:1516: moved to heap: bc cmd/compile/internal/gc/const.go:1517: moved to heap: ad cmd/compile/internal/gc/const.go:1546: moved to heap: ac cmd/compile/internal/gc/const.go:1547: moved to heap: bd cmd/compile/internal/gc/const.go:1548: moved to heap: bc cmd/compile/internal/gc/const.go:1549: moved to heap: ad cmd/compile/internal/gc/const.go:1550: moved to heap: cc_plus cmd/compile/internal/gc/export.go:162: moved to heap: copy cmd/compile/internal/gc/mpfloat.go:66: moved to heap: b cmd/compile/internal/gc/mpfloat.go:97: moved to heap: b Change-Id: I0d420b69c84a41ba9968c394e8957910bab5edea Reviewed-on: https://go-review.googlesource.com/37508Reviewed-by: David Chase <drchase@google.com>
-
Matthew Dempsky authored
Keep liveness bit vectors as simple live-variable vectors during liveness analysis. We can defer expanding them into runtime heap bitmaps until we're actually writing out the symbol data, and then we only need temporary memory to expand one bitmap at a time. This is logically cleaner (e.g., we no longer depend on stack frame layout during analysis) and saves a little bit on allocations. name old alloc/op new alloc/op delta Template 41.4MB ± 0% 41.3MB ± 0% -0.28% (p=0.000 n=60+60) Unicode 32.6MB ± 0% 32.6MB ± 0% -0.11% (p=0.000 n=59+60) GoTypes 119MB ± 0% 119MB ± 0% -0.35% (p=0.000 n=60+59) Compiler 483MB ± 0% 481MB ± 0% -0.47% (p=0.000 n=59+60) name old allocs/op new allocs/op delta Template 381k ± 1% 380k ± 1% -0.32% (p=0.000 n=60+60) Unicode 325k ± 1% 325k ± 1% ~ (p=0.867 n=60+60) GoTypes 1.16M ± 0% 1.15M ± 0% -0.40% (p=0.000 n=60+59) Compiler 4.22M ± 0% 4.19M ± 0% -0.61% (p=0.000 n=59+60) Passes toolstash -cmp. Change-Id: I8175efe55201ffb5017f79ae6cb90df03f1b7e99 Reviewed-on: https://go-review.googlesource.com/37458 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
-
Josh Bleecher Snyder authored
Static branch prediction assumes that forward branches are not taken. The existing wrapper prologue almost always takes the first forward branch. Move the rare case to the end of the function. This CL is amd64 only. Other architectures will be done in separate CLs. Updates #19042. Package sort benchmarks: SearchWrappers-8 104ns ± 2% 104ns ± 0% -0.41% (p=0.006 n=30+41) SortString1K-8 128µs ± 1% 128µs ± 1% -0.25% (p=0.045 n=30+56) SortString1K_Slice-8 117µs ± 1% 117µs ± 1% ~ (p=0.855 n=30+59) StableString1K-8 18.6µs ± 1% 18.6µs ± 1% ~ (p=0.599 n=29+60) SortInt1K-8 61.0µs ± 1% 56.5µs ± 1% -7.36% (p=0.000 n=29+58) StableInt1K-8 74.6µs ± 1% 70.4µs ± 3% -5.54% (p=0.000 n=28+60) StableInt1K_Slice-8 59.9µs ± 1% 58.3µs ± 4% -2.64% (p=0.000 n=29+60) SortInt64K-8 6.02ms ± 2% 5.98ms ± 2% -0.60% (p=0.000 n=29+59) SortInt64K_Slice-8 5.07ms ± 2% 5.05ms ± 2% -0.38% (p=0.006 n=30+58) StableInt64K-8 6.41ms ± 1% 6.22ms ± 1% -3.00% (p=0.000 n=27+58) Sort1e2-8 37.4µs ± 1% 37.1µs ± 1% -0.91% (p=0.000 n=30+57) Stable1e2-8 74.8µs ± 1% 75.2µs ± 1% +0.52% (p=0.000 n=30+57) Sort1e4-8 8.11ms ± 1% 8.01ms ± 1% -1.20% (p=0.000 n=30+59) Stable1e4-8 24.3ms ± 1% 24.3ms ± 1% ~ (p=0.157 n=30+60) Sort1e6-8 1.25s ± 1% 1.23s ± 1% -1.43% (p=0.000 n=29+58) Stable1e6-8 4.93s ± 1% 4.90s ± 1% -0.56% (p=0.000 n=29+59) [Geo mean] 720µs 709µs -1.52% Assembly for sort.(*intPairs).Swap: Before: "".(*intPairs).Swap t=1 size=147 args=0x18 locals=0x8 0x0000 00000 (<autogenerated>:1) TEXT "".(*intPairs).Swap(SB), $8-24 0x0000 00000 (<autogenerated>:1) MOVQ (TLS), CX 0x0009 00009 (<autogenerated>:1) SUBQ $8, SP 0x000d 00013 (<autogenerated>:1) MOVQ BP, (SP) 0x0011 00017 (<autogenerated>:1) LEAQ (SP), BP 0x0015 00021 (<autogenerated>:1) MOVQ 32(CX), BX 0x0019 00025 (<autogenerated>:1) TESTQ BX, BX 0x001c 00028 (<autogenerated>:1) JEQ 43 0x001e 00030 (<autogenerated>:1) LEAQ 16(SP), DI 0x0023 00035 (<autogenerated>:1) CMPQ (BX), DI 0x0026 00038 (<autogenerated>:1) JNE 43 0x0028 00040 (<autogenerated>:1) MOVQ SP, (BX) 0x002b 00043 (<autogenerated>:1) NOP 0x002b 00043 (<autogenerated>:1) FUNCDATA $0, gclocals·e6397a44f8e1b6e77d0f200b4fba5269(SB) 0x002b 00043 (<autogenerated>:1) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB) 0x002b 00043 (<autogenerated>:1) MOVQ ""..this+16(FP), AX 0x0030 00048 (<autogenerated>:1) TESTQ AX, AX 0x0033 00051 (<autogenerated>:1) JEQ $0, 140 0x0035 00053 (<autogenerated>:1) MOVQ (AX), CX 0x0038 00056 (<autogenerated>:1) MOVQ 8(AX), AX 0x003c 00060 (<autogenerated>:1) MOVQ "".i+24(FP), DX 0x0041 00065 (<autogenerated>:1) CMPQ DX, AX 0x0044 00068 (<autogenerated>:1) JCC $0, 133 0x0046 00070 (<autogenerated>:1) SHLQ $4, DX 0x004a 00074 (<autogenerated>:1) MOVQ 8(CX)(DX*1), BX 0x004f 00079 (<autogenerated>:1) MOVQ (CX)(DX*1), SI 0x0053 00083 (<autogenerated>:1) MOVQ "".j+32(FP), DI 0x0058 00088 (<autogenerated>:1) CMPQ DI, AX 0x005b 00091 (<autogenerated>:1) JCC $0, 133 0x005d 00093 (<autogenerated>:1) SHLQ $4, DI 0x0061 00097 (<autogenerated>:1) MOVQ 8(CX)(DI*1), AX 0x0066 00102 (<autogenerated>:1) MOVQ (CX)(DI*1), R8 0x006a 00106 (<autogenerated>:1) MOVQ R8, (CX)(DX*1) 0x006e 00110 (<autogenerated>:1) MOVQ AX, 8(CX)(DX*1) 0x0073 00115 (<autogenerated>:1) MOVQ SI, (CX)(DI*1) 0x0077 00119 (<autogenerated>:1) MOVQ BX, 8(CX)(DI*1) 0x007c 00124 (<autogenerated>:1) MOVQ (SP), BP 0x0080 00128 (<autogenerated>:1) ADDQ $8, SP 0x0084 00132 (<autogenerated>:1) RET 0x0085 00133 (<autogenerated>:1) PCDATA $0, $1 0x0085 00133 (<autogenerated>:1) CALL runtime.panicindex(SB) 0x008a 00138 (<autogenerated>:1) UNDEF 0x008c 00140 (<autogenerated>:1) PCDATA $0, $1 0x008c 00140 (<autogenerated>:1) CALL runtime.panicwrap(SB) 0x0091 00145 (<autogenerated>:1) UNDEF After: "".(*intPairs).Swap t=1 size=149 args=0x18 locals=0x8 0x0000 00000 (<autogenerated>:1) TEXT "".(*intPairs).Swap(SB), $8-24 0x0000 00000 (<autogenerated>:1) MOVQ (TLS), CX 0x0009 00009 (<autogenerated>:1) SUBQ $8, SP 0x000d 00013 (<autogenerated>:1) MOVQ BP, (SP) 0x0011 00017 (<autogenerated>:1) LEAQ (SP), BP 0x0015 00021 (<autogenerated>:1) MOVQ 32(CX), BX 0x0019 00025 (<autogenerated>:1) TESTQ BX, BX 0x001c 00028 (<autogenerated>:1) JNE 134 0x001e 00030 (<autogenerated>:1) NOP 0x001e 00030 (<autogenerated>:1) FUNCDATA $0, gclocals·e6397a44f8e1b6e77d0f200b4fba5269(SB) 0x001e 00030 (<autogenerated>:1) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB) 0x001e 00030 (<autogenerated>:1) MOVQ ""..this+16(FP), AX 0x0023 00035 (<autogenerated>:1) TESTQ AX, AX 0x0026 00038 (<autogenerated>:1) JEQ $0, 127 0x0028 00040 (<autogenerated>:1) MOVQ (AX), CX 0x002b 00043 (<autogenerated>:1) MOVQ 8(AX), AX 0x002f 00047 (<autogenerated>:1) MOVQ "".i+24(FP), DX 0x0034 00052 (<autogenerated>:1) CMPQ DX, AX 0x0037 00055 (<autogenerated>:1) JCC $0, 120 0x0039 00057 (<autogenerated>:1) SHLQ $4, DX 0x003d 00061 (<autogenerated>:1) MOVQ 8(CX)(DX*1), BX 0x0042 00066 (<autogenerated>:1) MOVQ (CX)(DX*1), SI 0x0046 00070 (<autogenerated>:1) MOVQ "".j+32(FP), DI 0x004b 00075 (<autogenerated>:1) CMPQ DI, AX 0x004e 00078 (<autogenerated>:1) JCC $0, 120 0x0050 00080 (<autogenerated>:1) SHLQ $4, DI 0x0054 00084 (<autogenerated>:1) MOVQ 8(CX)(DI*1), AX 0x0059 00089 (<autogenerated>:1) MOVQ (CX)(DI*1), R8 0x005d 00093 (<autogenerated>:1) MOVQ R8, (CX)(DX*1) 0x0061 00097 (<autogenerated>:1) MOVQ AX, 8(CX)(DX*1) 0x0066 00102 (<autogenerated>:1) MOVQ SI, (CX)(DI*1) 0x006a 00106 (<autogenerated>:1) MOVQ BX, 8(CX)(DI*1) 0x006f 00111 (<autogenerated>:1) MOVQ (SP), BP 0x0073 00115 (<autogenerated>:1) ADDQ $8, SP 0x0077 00119 (<autogenerated>:1) RET 0x0078 00120 (<autogenerated>:1) PCDATA $0, $1 0x0078 00120 (<autogenerated>:1) CALL runtime.panicindex(SB) 0x007d 00125 (<autogenerated>:1) UNDEF 0x007f 00127 (<autogenerated>:1) PCDATA $0, $1 0x007f 00127 (<autogenerated>:1) CALL runtime.panicwrap(SB) 0x0084 00132 (<autogenerated>:1) UNDEF 0x0086 00134 (<autogenerated>:1) LEAQ 16(SP), DI 0x008b 00139 (<autogenerated>:1) CMPQ (BX), DI 0x008e 00142 (<autogenerated>:1) JNE 30 0x0090 00144 (<autogenerated>:1) MOVQ SP, (BX) 0x0093 00147 (<autogenerated>:1) JMP 30 Change-Id: Ie8c37f384bba10fbacaa754bb0a6b0a7e520ef01 Reviewed-on: https://go-review.googlesource.com/36893Reviewed-by: Keith Randall <khr@golang.org>
-
Matthew Dempsky authored
Passes toolstash -cmp. Change-Id: Ibb51ccaf29ee97c3463543175c9ac7b85ea10a7f Reviewed-on: https://go-review.googlesource.com/37339Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
-
Dmitry Vyukov authored
These functions are not defined and are not used. Fixes #19290 Change-Id: I2978147220af83cf319f7439f076c131870fb9ee Reviewed-on: https://go-review.googlesource.com/37448Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Robert Griesemer authored
Per suggestion from rsc. Change-Id: I4b61ec6f35ffaaa792b75e011fbba1bdfbabc1f6 Reviewed-on: https://go-review.googlesource.com/37501 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Tom Bergan authored
Updates http2 to x/net/http2 git rev 906cda9 for: http2: add configurable knobs for the server's receive window https://golang.org/cl/37226 http2/hpack: speedup Encoder.searchTable https://golang.org/cl/37406 http2: Add opt-in option to Framer to allow DataFrame struct reuse https://golang.org/cl/34812 http2: replace fixedBuffer with dataBuffer https://golang.org/cl/37400 http2/hpack: remove hpack's constant time string comparison https://golang.org/cl/37394 Updates golang/go#16512 Updates golang/go#18404 Change-Id: I1ad7c95c404ead4ced7f85af061cf811b299a288 Reviewed-on: https://go-review.googlesource.com/37500Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Josh Bleecher Snyder authored
Change-Id: I0d14d68b57e8605cdae8a45d6fa97255a42297d8 Reviewed-on: https://go-review.googlesource.com/37521 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Josh Bleecher Snyder authored
Constant evaluation provides some rudimentary knowledge of dead code at inlining decision time. Use it. This CL addresses only dead code inside if statements. For statements are never inlined anyway, and dead code inside for statements is rare. Analyzing switch statements is worth doing, but it is more complicated, since we would have to evaluate each case; leave it for later. Fixes #9274 After this CL, the following functions in std+cmd can be newly inlined: cmd/internal/obj/x86/asm6.go:3122: can inline subreg cmd/vendor/golang.org/x/arch/x86/x86asm/decode.go:172: can inline instPrefix cmd/vendor/golang.org/x/arch/x86/x86asm/decode.go:202: can inline truncated go/constant/value.go:234: can inline makeFloat go/types/labels.go:52: can inline (*block).insert math/big/float.go:231: can inline (*Float).Sign math/bits/bits.go:57: can inline OnesCount net/http/server.go:597: can inline (*Server).newConn runtime/hashmap.go:1165: can inline reflect_maplen runtime/proc.go:207: can inline os_beforeExit runtime/signal_unix.go:55: can inline init.5 runtime/stack.go:1081: can inline gostartcallfn Change-Id: I4c92fb96aa0c3d33df7b3f2da548612e79b56b5b Reviewed-on: https://go-review.googlesource.com/37499Reviewed-by: Matthew Dempsky <mdempsky@google.com>
-