- 03 Mar, 2018 6 commits
-
-
Giovanni Bajo authored
Currently, the top-level testsuite always uses whatever version of Go is found in the PATH to execute all the tests. This forces the developers to tweak the PATH to run the testsuite. Change it to use the same version of Go used to run run.go. This allows developers to run the testsuite using the tip compiler by simply saying "../bin/go run run.go". I think this is a better solution compared to always forcing "../bin/go", because it allows developers to run the testsuite using different Go versions, for instance to check if a new test is fixed in tip compared to the installed compiler. Fixes #24217 Change-Id: I41b299c753b6e77c41e28be9091b2b630efea9d2 Reviewed-on: https://go-review.googlesource.com/98439 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Pascal S. de Kloe authored
name old time/op new time/op delta CodeEncoder-12 1.89ms ± 1% 1.91ms ± 0% +1.16% (p=0.000 n=20+19) CodeMarshal-12 2.09ms ± 1% 2.12ms ± 0% +1.63% (p=0.000 n=17+18) CodeDecoder-12 8.43ms ± 1% 8.32ms ± 1% -1.35% (p=0.000 n=18+20) UnicodeDecoder-12 399ns ± 0% 339ns ± 0% -15.00% (p=0.000 n=20+19) DecoderStream-12 281ns ± 1% 231ns ± 0% -17.91% (p=0.000 n=20+16) CodeUnmarshal-12 9.35ms ± 2% 9.15ms ± 2% -2.11% (p=0.000 n=20+20) CodeUnmarshalReuse-12 8.41ms ± 2% 8.29ms ± 2% -1.34% (p=0.000 n=20+20) UnmarshalString-12 81.2ns ± 2% 74.0ns ± 4% -8.89% (p=0.000 n=20+20) UnmarshalFloat64-12 71.1ns ± 2% 64.3ns ± 1% -9.60% (p=0.000 n=20+19) UnmarshalInt64-12 60.6ns ± 2% 53.2ns ± 0% -12.28% (p=0.000 n=18+18) Issue10335-12 96.9ns ± 0% 87.7ns ± 1% -9.52% (p=0.000 n=17+20) Unmapped-12 247ns ± 4% 231ns ± 3% -6.34% (p=0.000 n=20+20) TypeFieldsCache/MissTypes1-12 11.1µs ± 0% 11.1µs ± 0% ~ (p=0.376 n=19+20) TypeFieldsCache/MissTypes10-12 33.9µs ± 0% 33.8µs ± 0% -0.32% (p=0.000 n=18+9) name old speed new speed delta CodeEncoder-12 1.03GB/s ± 1% 1.01GB/s ± 0% -1.15% (p=0.000 n=20+19) CodeMarshal-12 930MB/s ± 1% 915MB/s ± 0% -1.60% (p=0.000 n=17+18) CodeDecoder-12 230MB/s ± 1% 233MB/s ± 1% +1.37% (p=0.000 n=18+20) UnicodeDecoder-12 35.0MB/s ± 0% 41.2MB/s ± 0% +17.60% (p=0.000 n=20+19) CodeUnmarshal-12 208MB/s ± 2% 212MB/s ± 2% +2.16% (p=0.000 n=20+20) name old alloc/op new alloc/op delta Issue10335-12 184B ± 0% 184B ± 0% ~ (all equal) Unmapped-12 216B ± 0% 216B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Issue10335-12 3.00 ± 0% 3.00 ± 0% ~ (all equal) Unmapped-12 4.00 ± 0% 4.00 ± 0% ~ (all equal) Change-Id: I4b1a87a205da2ef9a572f86f85bc833653c61570 Reviewed-on: https://go-review.googlesource.com/98440Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Tobias Klauser authored
Use the __vdso_clock_gettime fast path via the vDSO on linux/arm to speed up nanotime and walltime. This results in the following performance improvement for time.Now on a RaspberryPi 3 (running 32bit Raspbian, i.e. GOOS=linux/GOARCH=arm): name old time/op new time/op delta TimeNow 0.99µs ± 0% 0.39µs ± 1% -60.74% (p=0.000 n=12+20) Change-Id: I3598278a6c88d7f6a6ce66c56b9d25f9dd2f4c9a Reviewed-on: https://go-review.googlesource.com/98095Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Tobias Klauser authored
It's unused since https://golang.org/cl/99320043 Change-Id: I74d69ff894aa2fb556f1c2083406c118c559d91b Reviewed-on: https://go-review.googlesource.com/98195 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
Keith Randall authored
Move bytes.Equal, runtime.memequal, and runtime.memequal_varlen to the bytealg package. Update #19792 Change-Id: Ic4175e952936016ea0bda6c7c3dbb33afdc8e4ac Reviewed-on: https://go-review.googlesource.com/98355 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Joe Tsai authored
The previous type cache is quadratic in time in the situation where new types are continually encountered. Now that it is possible to dynamically create new types with the reflect package, this can cause json to perform very poorly. Switch to sync.Map which does well when the cache has hit steady state, but also handles occasional updates in better than quadratic time. benchmark old ns/op new ns/op delta BenchmarkTypeFieldsCache/MissTypes1-8 14817 16202 +9.35% BenchmarkTypeFieldsCache/MissTypes10-8 70926 69144 -2.51% BenchmarkTypeFieldsCache/MissTypes100-8 976467 208973 -78.60% BenchmarkTypeFieldsCache/MissTypes1000-8 79520162 1750371 -97.80% BenchmarkTypeFieldsCache/MissTypes10000-8 6873625837 16847806 -99.75% BenchmarkTypeFieldsCache/HitTypes1000-8 7.51 8.80 +17.18% BenchmarkTypeFieldsCache/HitTypes10000-8 7.58 8.68 +14.51% The old implementation takes 12 minutes just to build a cache of size 1e5 due to the quadratic behavior. I did not bother benchmark sizes above that. Change-Id: I5e6facc1eb8e1b80e5ca285e4dd2cc8815618dad Reviewed-on: https://go-review.googlesource.com/76850 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 02 Mar, 2018 16 commits
-
-
Shamil Garatuev authored
Make ReadSubKeyNames work even if key is opened with only ENUMERATE_SUB_KEYs access rights mask. Fixes #23869 Change-Id: I138bd51715fdbc3bda05607c64bde1150f4fe6b2 Reviewed-on: https://go-review.googlesource.com/97435Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
-
Keith Randall authored
Move the IndexByte function from the runtime to a new bytealg package. The new package will eventually hold all the optimized assembly for groveling through byte slices and strings. It seems a better home for this code than randomly keeping it in runtime. Once this is in, the next step is to move the other functions (Compare, Equal, ...). Update #19792 This change seems complicated enough that we might just declare "not worth it" and abandon. Opinions welcome. The core assembly is all unchanged, except minor modifications where the code reads cpu feature bits. The wrapper functions have been cleaned up as they are now actually checked by vet. Change-Id: I9fa75bee5d85db3a65b3fd3b7997e60367523796 Reviewed-on: https://go-review.googlesource.com/98016 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Brad Fitzpatrick authored
Flaky tests failing trybots help nobody. Updates #22857 Change-Id: I87bc018651ab4fe02560a6d24c08a1d7ccd8ba37 Reviewed-on: https://go-review.googlesource.com/97416Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
Damien Mathieu authored
Since that method uses 'mux.m', we need to lock the mutex to avoid data races. Change-Id: I998448a6e482b5d6a1b24f3354bb824906e23172 GitHub-Last-Rev: 163a7d4942e793b328e05a7eb91f6d3fdc4ba12b GitHub-Pull-Request: golang/go#23994 Reviewed-on: https://go-review.googlesource.com/96575Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
-
David du Colombier authored
TestEmptyDwarfRanges has been added in CL 94816. This test is failing on Plan 9 because executables don't have a DWARF symbol table. Fixes #24226. Change-Id: Iff7e34b8c2703a2f19ee8087a4d64d0bb98496cd Reviewed-on: https://go-review.googlesource.com/98275Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Hana Kim authored
This reverts commit 16398894. This broke TestUserTaskSpan test. Change-Id: If5ff8bdfe84e8cb30787b03ead87205ece3d5601 Reviewed-on: https://go-review.googlesource.com/98235Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Hana Kim authored
Even though undocumented, the assumption is the Event's link field points to the following event in the future. The new span/task event processing breaks the assumption. Change-Id: I4ce2f30c67c4f525ec0a121a7e43d8bdd2ec3f77 Reviewed-on: https://go-review.googlesource.com/96395Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Alberto Donizetti authored
Change-Id: I9fe6572d1043ef9ee09c0925059ded554ad24c6b Reviewed-on: https://go-review.googlesource.com/98215Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Michael Fraenkel authored
When recursively calling walkexpr, r.Type is still the untyped value. It then sometimes recursively calls finishcompare, which complains that you can't compare the resulting expression to that untyped value. Updates #23834. Change-Id: I6b7acd3970ceaff8da9216bfa0ae24aca5dee828 Reviewed-on: https://go-review.googlesource.com/97856Reviewed-by: Matthew Dempsky <mdempsky@google.com>
-
Than McIntosh authored
Add DWARF register mappings for ARM64, so that that arch will become usable with "-dwarflocationlists". [NB: I've plugged in a set of numbers from the doc, but this will require additional manual testing.] Change-Id: Id9aa63857bc8b4f5c825f49274101cf372e9e856 Reviewed-on: https://go-review.googlesource.com/82515Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Alessandro Arzilli authored
Dsymutil, an utility used on macOS when externally linking executables, does not support base address selector entries in debug_ranges. CL 73271 worked around this problem by removing base address selectors and emitting CU-relative relocations for each list entry. This commit, as an optimization, reintroduces the base address selectors and changes the linker to remove them again, but only when it knows that it will have to invoke the external linker on macOS. Compilecmp comparing master with a branch that has scope tracking always enabled: completed 15 of 15, estimated time remaining 0s (eta 2:43PM) name old time/op new time/op delta Template 272ms ± 8% 257ms ± 5% -5.33% (p=0.000 n=15+14) Unicode 124ms ± 7% 122ms ± 5% ~ (p=0.210 n=14+14) GoTypes 873ms ± 3% 870ms ± 5% ~ (p=0.856 n=15+13) Compiler 4.49s ± 2% 4.49s ± 5% ~ (p=0.982 n=14+14) SSA 11.8s ± 4% 11.8s ± 3% ~ (p=0.653 n=15+15) Flate 163ms ± 6% 164ms ± 9% ~ (p=0.914 n=14+15) GoParser 203ms ± 6% 202ms ±10% ~ (p=0.571 n=14+14) Reflect 547ms ± 7% 542ms ± 4% ~ (p=0.914 n=15+14) Tar 244ms ± 7% 237ms ± 3% -2.80% (p=0.002 n=14+13) XML 289ms ± 6% 289ms ± 5% ~ (p=0.839 n=14+14) [Geo mean] 537ms 531ms -1.10% name old user-time/op new user-time/op delta Template 360ms ± 4% 341ms ± 7% -5.16% (p=0.000 n=14+14) Unicode 189ms ±11% 190ms ± 8% ~ (p=0.844 n=15+15) GoTypes 1.13s ± 4% 1.14s ± 7% ~ (p=0.582 n=15+14) Compiler 5.34s ± 2% 5.40s ± 4% +1.19% (p=0.036 n=11+13) SSA 14.7s ± 2% 14.7s ± 3% ~ (p=0.602 n=15+15) Flate 211ms ± 7% 214ms ± 8% ~ (p=0.252 n=14+14) GoParser 267ms ±12% 266ms ± 2% ~ (p=0.837 n=15+11) Reflect 706ms ± 4% 701ms ± 3% ~ (p=0.213 n=14+12) Tar 331ms ± 9% 320ms ± 5% -3.30% (p=0.025 n=15+14) XML 378ms ± 4% 373ms ± 6% ~ (p=0.253 n=14+15) [Geo mean] 704ms 700ms -0.58% name old alloc/op new alloc/op delta Template 38.0MB ± 0% 38.4MB ± 0% +1.12% (p=0.000 n=15+15) Unicode 28.8MB ± 0% 28.8MB ± 0% +0.17% (p=0.000 n=15+15) GoTypes 112MB ± 0% 114MB ± 0% +1.47% (p=0.000 n=15+15) Compiler 465MB ± 0% 473MB ± 0% +1.71% (p=0.000 n=15+15) SSA 1.48GB ± 0% 1.53GB ± 0% +3.07% (p=0.000 n=15+15) Flate 24.3MB ± 0% 24.7MB ± 0% +1.67% (p=0.000 n=15+15) GoParser 30.7MB ± 0% 31.0MB ± 0% +1.15% (p=0.000 n=12+15) Reflect 76.3MB ± 0% 77.1MB ± 0% +0.97% (p=0.000 n=15+15) Tar 39.2MB ± 0% 39.6MB ± 0% +0.91% (p=0.000 n=15+15) XML 41.5MB ± 0% 42.0MB ± 0% +1.29% (p=0.000 n=15+15) [Geo mean] 77.5MB 78.6MB +1.35% name old allocs/op new allocs/op delta Template 385k ± 0% 387k ± 0% +0.51% (p=0.000 n=15+15) Unicode 342k ± 0% 343k ± 0% +0.10% (p=0.000 n=14+15) GoTypes 1.19M ± 0% 1.19M ± 0% +0.62% (p=0.000 n=15+15) Compiler 4.51M ± 0% 4.54M ± 0% +0.50% (p=0.000 n=14+15) SSA 12.2M ± 0% 12.4M ± 0% +1.12% (p=0.000 n=14+15) Flate 234k ± 0% 236k ± 0% +0.60% (p=0.000 n=15+15) GoParser 318k ± 0% 320k ± 0% +0.60% (p=0.000 n=15+15) Reflect 974k ± 0% 977k ± 0% +0.27% (p=0.000 n=15+15) Tar 395k ± 0% 397k ± 0% +0.37% (p=0.000 n=14+15) XML 404k ± 0% 407k ± 0% +0.53% (p=0.000 n=15+15) [Geo mean] 794k 798k +0.52% name old text-bytes new text-bytes delta HelloSize 680kB ± 0% 680kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 9.62kB ± 0% 9.62kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.11MB ± 0% 1.13MB ± 0% +1.85% (p=0.000 n=15+15) Change-Id: I61c98ba0340cb798034b2bb55e3ab3a58ac1cf23 Reviewed-on: https://go-review.googlesource.com/98075Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Heschi Kreinick authored
When generating location lists, batch up changes for all zero-width instructions, not just phis. This prevents the creation of location list entries that don't actually cover any instructions. This isn't perfect because of the caveats in the prior CL (Copy is zero-width sometimes) but in practice this seems to fix all of the empty lists in std. Change-Id: Ice4a9ade36b6b24ca111d1494c414eec96e5af25 Reviewed-on: https://go-review.googlesource.com/97958 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
Heschi Kreinick authored
Add a bool to opInfo to indicate if an Op never results in any instructions. This is a conservative approximation: some operations, like Copy, may or may not generate code depending on their arguments. I built the list by reading each arch's ssaGenValue function. Hopefully I got them all. Change-Id: I130b251b65f18208294e129bb7ddc3f91d57d31d Reviewed-on: https://go-review.googlesource.com/97957Reviewed-by: Keith Randall <khr@golang.org>
-
Zhou Peng authored
Change-Id: I289af4884583537639800e37928c22814d38cba9 Reviewed-on: https://go-review.googlesource.com/98115Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
-
Alessandro Arzilli authored
1. Detect and remove the markers of lexical scopes that don't contain any variables early in noder, instead of waiting until the end of DWARF generation. This saves memory by never allocating some of the markers and optimizes some of the algorithms that depend on the number of scopes. 2. Assign scopes to Progs by doing, for each Prog, a binary search over the markers array. This is faster, compared to sorting the Prog list because there are fewer markers than there are Progs. completed 15 of 15, estimated time remaining 0s (eta 2:30PM) name old time/op new time/op delta Template 274ms ± 5% 260ms ± 6% -4.91% (p=0.000 n=15+15) Unicode 126ms ± 5% 127ms ± 9% ~ (p=0.856 n=13+15) GoTypes 861ms ± 5% 857ms ± 4% ~ (p=0.595 n=15+15) Compiler 4.11s ± 4% 4.12s ± 5% ~ (p=1.000 n=15+15) SSA 10.7s ± 2% 10.9s ± 4% +2.01% (p=0.002 n=14+14) Flate 163ms ± 4% 166ms ± 9% ~ (p=0.134 n=14+15) GoParser 203ms ± 4% 205ms ± 6% ~ (p=0.461 n=15+15) Reflect 544ms ± 5% 549ms ± 4% ~ (p=0.174 n=15+15) Tar 249ms ± 9% 245ms ± 6% ~ (p=0.285 n=15+15) XML 286ms ± 4% 291ms ± 5% ~ (p=0.081 n=15+15) [Geo mean] 528ms 529ms +0.14% name old user-time/op new user-time/op delta Template 358ms ± 7% 354ms ± 5% ~ (p=0.242 n=14+15) Unicode 189ms ±11% 191ms ±10% ~ (p=0.438 n=15+15) GoTypes 1.15s ± 4% 1.14s ± 3% ~ (p=0.405 n=15+15) Compiler 5.36s ± 6% 5.35s ± 5% ~ (p=0.588 n=15+15) SSA 14.6s ± 3% 15.0s ± 4% +2.58% (p=0.000 n=15+15) Flate 214ms ±12% 216ms ± 8% ~ (p=0.539 n=15+15) GoParser 267ms ± 6% 270ms ± 5% ~ (p=0.569 n=15+15) Reflect 712ms ± 5% 709ms ± 4% ~ (p=0.894 n=15+15) Tar 329ms ± 8% 330ms ± 5% ~ (p=0.974 n=14+15) XML 371ms ± 3% 381ms ± 5% +2.85% (p=0.002 n=13+15) [Geo mean] 705ms 709ms +0.62% name old alloc/op new alloc/op delta Template 38.0MB ± 0% 38.4MB ± 0% +1.27% (p=0.000 n=15+14) Unicode 28.8MB ± 0% 28.8MB ± 0% +0.16% (p=0.000 n=15+14) GoTypes 112MB ± 0% 114MB ± 0% +1.64% (p=0.000 n=15+15) Compiler 465MB ± 0% 474MB ± 0% +1.91% (p=0.000 n=15+15) SSA 1.48GB ± 0% 1.53GB ± 0% +3.32% (p=0.000 n=15+15) Flate 24.3MB ± 0% 24.8MB ± 0% +1.77% (p=0.000 n=14+15) GoParser 30.7MB ± 0% 31.1MB ± 0% +1.27% (p=0.000 n=15+15) Reflect 76.3MB ± 0% 77.1MB ± 0% +1.03% (p=0.000 n=15+15) Tar 39.2MB ± 0% 39.6MB ± 0% +1.02% (p=0.000 n=13+15) XML 41.5MB ± 0% 42.1MB ± 0% +1.45% (p=0.000 n=15+15) [Geo mean] 77.5MB 78.7MB +1.48% name old allocs/op new allocs/op delta Template 385k ± 0% 387k ± 0% +0.54% (p=0.000 n=15+15) Unicode 342k ± 0% 343k ± 0% +0.10% (p=0.000 n=15+15) GoTypes 1.19M ± 0% 1.19M ± 0% +0.64% (p=0.000 n=14+15) Compiler 4.51M ± 0% 4.54M ± 0% +0.53% (p=0.000 n=15+15) SSA 12.2M ± 0% 12.4M ± 0% +1.16% (p=0.000 n=15+15) Flate 234k ± 0% 236k ± 0% +0.63% (p=0.000 n=14+15) GoParser 318k ± 0% 320k ± 0% +0.63% (p=0.000 n=15+15) Reflect 974k ± 0% 977k ± 0% +0.28% (p=0.000 n=15+15) Tar 395k ± 0% 397k ± 0% +0.38% (p=0.000 n=15+13) XML 404k ± 0% 407k ± 0% +0.55% (p=0.000 n=15+15) [Geo mean] 794k 799k +0.55% name old text-bytes new text-bytes delta HelloSize 680kB ± 0% 680kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 9.62kB ± 0% 9.62kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.11MB ± 0% 1.12MB ± 0% +1.11% (p=0.000 n=15+15) Change-Id: I95a0173ee28c52be1a4851d2a6e389529e74bf28 Reviewed-on: https://go-review.googlesource.com/95396 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Tobias Klauser authored
The timeout parameter might be nil, don't dereference it unconditionally. Fixes #24189 Change-Id: I03e6a1ab74fe30322ce6bcfd3d6c42130b6d61be Reviewed-on: https://go-review.googlesource.com/97819 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
- 01 Mar, 2018 18 commits
-
-
Brad Fitzpatrick authored
This reverts commit 7365fac2. Reason for revert: breaks the build on some architectures, reading unmapped pages? Change-Id: I3a8c02dc0b649269faacea79ecd8213defa97c54 Reviewed-on: https://go-review.googlesource.com/97995Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Heschi Kreinick authored
LLVM tools, particularly lldb and dsymutil, don't support base address selection entries in location lists. When targeting GOOS=darwin, mode, have the linker translate location lists to CU-relative form instead. Technically, this isn't necessary when linking internally, as long as nobody plans to use anything other than Delve to look at the DWARF. But someone might want to use lldb, and it's really confusing when dwarfdump shows gibberish for the location entries. The performance cost isn't noticeable, so enable it even for internal linking. Doing this in the linker is a little weird, but it was more expensive in the compiler, probably because the compiler is much more stressful to the GC. Also, if we decide to only do it for external linking, the compiler can't see the link mode. Benchmark before and after this commit on Mac with -dwarflocationlists=1: name old time/op new time/op delta StdCmd 21.3s ± 1% 21.3s ± 1% ~ (p=0.310 n=27+27) Only StdCmd is relevant, because only StdCmd runs the linker. Whatever the cost is here, it's not very large. Change-Id: Ic8ef780d0e263230ce6aa3ca3a32fc9abd750b1e Reviewed-on: https://go-review.googlesource.com/97956 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
Heschi Kreinick authored
Some SSA values don't translate into any instructions. If a function began with two of them, and both modified the storage of the same variable, we'd end up with a location list entry that started and ended at 0. That looks like an end-of-list entry, which would then confuse downstream tools, particularly the fixup in the linker. "Fix" this by changing the end of such entries to 1. Should be harmless, since AFAIK we don't generate any 1-byte instructions. Later CLs will reduce the frequency of these entries anyway. Change-Id: I9b7e5e69f914244cc826fb9f4a6acfe2dc695f81 Reviewed-on: https://go-review.googlesource.com/97955 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
Alessandro Arzilli authored
DWARF ranges are half-open. Fixes #23928 Change-Id: I71b3384d1bc2c65bd37ca8a02a0b7ff48fec3688 Reviewed-on: https://go-review.googlesource.com/94816Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Cherry Zhang authored
In RET instruction, the operand is the return jump's target, which should be put in Prog.To. Add an action "buildrundir" to the test driver, which builds (compile+assemble+link) the code in a directory and runs the resulting binary. Fixes #23838. Change-Id: I7ebe7eda49024b40a69a24857322c5ca9c67babb Reviewed-on: https://go-review.googlesource.com/94175 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
-
Balaram Makam authored
Improve runtime memmove_arm64.s specializing for small copies and processing 32 bytes per iteration for 32 bytes or more. Benchmark results of runtime/Memmove on Amberwing: name old time/op new time/op delta Memmove/0 7.61ns ± 0% 7.20ns ± 0% ~ (p=0.053 n=5+7) Memmove/1 9.28ns ± 0% 8.80ns ± 0% -5.17% (p=0.000 n=4+8) Memmove/2 9.65ns ± 0% 9.20ns ± 0% -4.68% (p=0.000 n=5+8) Memmove/3 10.0ns ± 0% 9.2ns ± 0% -7.83% (p=0.000 n=5+8) Memmove/4 10.6ns ± 0% 9.2ns ± 0% -13.21% (p=0.000 n=5+8) Memmove/5 11.0ns ± 0% 9.2ns ± 0% -16.36% (p=0.000 n=5+8) Memmove/6 12.4ns ± 0% 9.2ns ± 0% -25.81% (p=0.000 n=5+8) Memmove/7 13.1ns ± 0% 9.2ns ± 0% -29.56% (p=0.000 n=5+8) Memmove/8 9.10ns ± 1% 9.20ns ± 0% +1.08% (p=0.002 n=5+8) Memmove/9 9.67ns ± 0% 9.20ns ± 0% -4.88% (p=0.000 n=5+8) Memmove/10 10.4ns ± 0% 9.2ns ± 0% -11.54% (p=0.000 n=5+8) Memmove/11 10.9ns ± 0% 9.2ns ± 0% -15.60% (p=0.000 n=5+8) Memmove/12 11.5ns ± 0% 9.2ns ± 0% -20.00% (p=0.000 n=5+8) Memmove/13 12.4ns ± 0% 9.2ns ± 0% -25.81% (p=0.000 n=5+8) Memmove/14 13.1ns ± 0% 9.2ns ± 0% -29.77% (p=0.000 n=5+8) Memmove/15 13.8ns ± 0% 9.2ns ± 0% -33.33% (p=0.000 n=5+8) Memmove/16 9.70ns ± 0% 9.20ns ± 0% -5.19% (p=0.000 n=5+8) Memmove/32 10.6ns ± 0% 9.2ns ± 0% -13.21% (p=0.000 n=4+8) Memmove/64 13.4ns ± 0% 10.2ns ± 0% -23.88% (p=0.000 n=4+8) Memmove/128 18.1ns ± 1% 13.2ns ± 0% -26.99% (p=0.000 n=5+8) Memmove/256 25.2ns ± 0% 16.4ns ± 0% -34.92% (p=0.000 n=5+8) Memmove/512 36.4ns ± 0% 22.8ns ± 0% -37.36% (p=0.000 n=5+8) Memmove/1024 70.1ns ± 0% 36.8ns ±11% -47.49% (p=0.002 n=5+8) Memmove/2048 121ns ± 0% 61ns ± 0% ~ (p=0.053 n=5+7) Memmove/4096 224ns ± 0% 120ns ± 0% -46.43% (p=0.000 n=5+8) MemmoveUnalignedDst/0 8.40ns ± 0% 8.00ns ± 0% -4.76% (p=0.000 n=5+8) MemmoveUnalignedDst/1 9.87ns ± 1% 10.00ns ± 0% ~ (p=0.070 n=5+8) MemmoveUnalignedDst/2 10.6ns ± 0% 10.4ns ± 0% -1.89% (p=0.000 n=5+8) MemmoveUnalignedDst/3 10.8ns ± 0% 10.4ns ± 0% -3.70% (p=0.000 n=5+8) MemmoveUnalignedDst/4 10.9ns ± 0% 10.3ns ± 0% ~ (p=0.053 n=5+7) MemmoveUnalignedDst/5 11.5ns ± 0% 10.3ns ± 1% -10.22% (p=0.000 n=4+8) MemmoveUnalignedDst/6 13.2ns ± 0% 10.4ns ± 1% -21.50% (p=0.000 n=5+8) MemmoveUnalignedDst/7 13.7ns ± 0% 10.3ns ± 1% -24.64% (p=0.000 n=4+8) MemmoveUnalignedDst/8 10.1ns ± 0% 10.4ns ± 0% +2.97% (p=0.002 n=5+8) MemmoveUnalignedDst/9 10.7ns ± 0% 10.4ns ± 0% -2.80% (p=0.000 n=5+8) MemmoveUnalignedDst/10 11.2ns ± 1% 10.4ns ± 0% -6.81% (p=0.000 n=5+8) MemmoveUnalignedDst/11 11.6ns ± 0% 10.4ns ± 0% -10.34% (p=0.000 n=5+8) MemmoveUnalignedDst/12 12.5ns ± 2% 10.4ns ± 0% -16.53% (p=0.000 n=5+8) MemmoveUnalignedDst/13 13.7ns ± 0% 10.4ns ± 0% -24.09% (p=0.000 n=5+8) MemmoveUnalignedDst/14 14.0ns ± 0% 10.4ns ± 0% -25.71% (p=0.000 n=5+8) MemmoveUnalignedDst/15 14.6ns ± 0% 10.4ns ± 0% -28.77% (p=0.000 n=5+8) MemmoveUnalignedDst/16 10.5ns ± 0% 10.4ns ± 0% -0.95% (p=0.000 n=5+8) MemmoveUnalignedDst/32 12.4ns ± 0% 11.6ns ± 0% -6.05% (p=0.000 n=5+8) MemmoveUnalignedDst/64 15.2ns ± 0% 12.3ns ± 0% -19.08% (p=0.000 n=5+8) MemmoveUnalignedDst/128 18.7ns ± 0% 15.2ns ± 0% -18.72% (p=0.000 n=5+8) MemmoveUnalignedDst/256 25.1ns ± 0% 18.6ns ± 0% -25.90% (p=0.000 n=5+8) MemmoveUnalignedDst/512 37.8ns ± 0% 24.4ns ± 0% -35.45% (p=0.000 n=5+8) MemmoveUnalignedDst/1024 74.6ns ± 0% 40.4ns ± 0% ~ (p=0.053 n=5+7) MemmoveUnalignedDst/2048 133ns ± 0% 75ns ± 0% -43.91% (p=0.000 n=5+8) MemmoveUnalignedDst/4096 247ns ± 0% 141ns ± 0% -42.91% (p=0.000 n=5+8) MemmoveUnalignedSrc/0 8.40ns ± 0% 8.00ns ± 0% -4.76% (p=0.000 n=5+8) MemmoveUnalignedSrc/1 9.81ns ± 0% 10.00ns ± 0% +1.98% (p=0.002 n=5+8) MemmoveUnalignedSrc/2 10.5ns ± 0% 10.0ns ± 0% -4.76% (p=0.000 n=5+8) MemmoveUnalignedSrc/3 10.7ns ± 1% 10.0ns ± 0% -6.89% (p=0.000 n=5+8) MemmoveUnalignedSrc/4 11.3ns ± 0% 10.0ns ± 0% -11.50% (p=0.000 n=5+8) MemmoveUnalignedSrc/5 11.6ns ± 0% 10.0ns ± 0% -13.79% (p=0.000 n=5+8) MemmoveUnalignedSrc/6 13.6ns ± 0% 10.0ns ± 0% -26.47% (p=0.000 n=5+8) MemmoveUnalignedSrc/7 14.4ns ± 0% 10.0ns ± 0% -30.75% (p=0.000 n=5+8) MemmoveUnalignedSrc/8 9.87ns ± 1% 10.00ns ± 0% ~ (p=0.070 n=5+8) MemmoveUnalignedSrc/9 10.4ns ± 0% 10.0ns ± 0% -3.85% (p=0.000 n=5+8) MemmoveUnalignedSrc/10 11.2ns ± 0% 10.0ns ± 0% -10.71% (p=0.000 n=5+8) MemmoveUnalignedSrc/11 11.8ns ± 0% 10.0ns ± 0% -15.25% (p=0.000 n=5+8) MemmoveUnalignedSrc/12 12.1ns ± 0% 10.0ns ± 0% -17.36% (p=0.000 n=5+8) MemmoveUnalignedSrc/13 13.6ns ± 0% 10.0ns ± 0% -26.47% (p=0.000 n=5+8) MemmoveUnalignedSrc/14 14.7ns ± 0% 10.0ns ± 0% -31.79% (p=0.000 n=5+8) MemmoveUnalignedSrc/15 14.4ns ± 0% 10.0ns ± 0% -30.56% (p=0.000 n=5+8) MemmoveUnalignedSrc/16 11.0ns ± 0% 10.0ns ± 0% -9.09% (p=0.000 n=5+8) MemmoveUnalignedSrc/32 11.5ns ± 0% 10.0ns ± 0% -13.04% (p=0.000 n=5+8) MemmoveUnalignedSrc/64 14.9ns ± 0% 11.2ns ± 0% -24.83% (p=0.000 n=4+8) MemmoveUnalignedSrc/128 19.5ns ± 0% 15.2ns ± 0% -22.05% (p=0.000 n=5+8) MemmoveUnalignedSrc/256 27.3ns ± 2% 19.2ns ± 0% -29.62% (p=0.000 n=5+8) MemmoveUnalignedSrc/512 40.4ns ± 0% 27.2ns ± 0% -32.67% (p=0.000 n=5+8) MemmoveUnalignedSrc/1024 75.4ns ± 0% 44.4ns ± 0% -41.15% (p=0.000 n=5+8) MemmoveUnalignedSrc/2048 131ns ± 0% 77ns ± 3% -41.56% (p=0.002 n=5+8) MemmoveUnalignedSrc/4096 248ns ± 0% 145ns ± 0% -41.53% (p=0.000 n=5+8) name old speed new speed delta Memmove/1 108MB/s ± 0% 114MB/s ± 0% +5.37% (p=0.004 n=4+8) Memmove/2 207MB/s ± 0% 217MB/s ± 0% +4.85% (p=0.002 n=5+8) Memmove/3 301MB/s ± 0% 326MB/s ± 0% +8.45% (p=0.002 n=5+8) Memmove/4 377MB/s ± 0% 435MB/s ± 0% +15.31% (p=0.004 n=4+8) Memmove/5 455MB/s ± 0% 543MB/s ± 0% +19.46% (p=0.002 n=5+8) Memmove/6 483MB/s ± 0% 652MB/s ± 0% +34.88% (p=0.003 n=5+7) Memmove/7 537MB/s ± 0% 761MB/s ± 0% +41.71% (p=0.002 n=5+8) Memmove/8 879MB/s ± 1% 869MB/s ± 0% -1.15% (p=0.000 n=5+7) Memmove/9 931MB/s ± 0% 978MB/s ± 0% +5.05% (p=0.002 n=5+8) Memmove/10 960MB/s ± 0% 1086MB/s ± 0% +13.13% (p=0.002 n=5+8) Memmove/11 1.00GB/s ± 0% 1.20GB/s ± 0% +18.92% (p=0.003 n=5+7) Memmove/12 1.04GB/s ± 0% 1.30GB/s ± 0% +25.40% (p=0.002 n=5+8) Memmove/13 1.05GB/s ± 0% 1.41GB/s ± 0% +34.87% (p=0.002 n=5+8) Memmove/14 1.07GB/s ± 0% 1.52GB/s ± 0% +42.14% (p=0.002 n=5+8) Memmove/15 1.09GB/s ± 0% 1.63GB/s ± 0% +49.91% (p=0.002 n=5+8) Memmove/16 1.65GB/s ± 0% 1.74GB/s ± 0% +5.40% (p=0.003 n=5+7) Memmove/32 3.01GB/s ± 0% 3.48GB/s ± 0% +15.58% (p=0.003 n=5+7) Memmove/64 4.76GB/s ± 0% 6.27GB/s ± 0% +31.75% (p=0.003 n=5+7) Memmove/128 7.08GB/s ± 1% 9.69GB/s ± 0% +36.96% (p=0.002 n=5+8) Memmove/256 10.2GB/s ± 0% 15.6GB/s ± 0% +53.58% (p=0.002 n=5+8) Memmove/512 14.1GB/s ± 0% 22.4GB/s ± 0% +59.57% (p=0.003 n=5+7) Memmove/1024 14.6GB/s ± 0% 27.9GB/s ±10% +91.00% (p=0.002 n=5+8) Memmove/2048 16.9GB/s ± 0% 33.4GB/s ± 0% +98.32% (p=0.003 n=5+7) Memmove/4096 18.3GB/s ± 0% 33.9GB/s ± 0% +85.80% (p=0.002 n=5+8) MemmoveUnalignedDst/1 101MB/s ± 1% 100MB/s ± 0% ~ (p=0.586 n=5+8) MemmoveUnalignedDst/2 189MB/s ± 0% 192MB/s ± 0% +1.82% (p=0.002 n=5+8) MemmoveUnalignedDst/3 278MB/s ± 0% 288MB/s ± 0% +3.88% (p=0.003 n=5+7) MemmoveUnalignedDst/4 368MB/s ± 0% 387MB/s ± 0% +5.41% (p=0.003 n=5+7) MemmoveUnalignedDst/5 434MB/s ± 0% 484MB/s ± 0% +11.52% (p=0.002 n=5+8) MemmoveUnalignedDst/6 454MB/s ± 0% 580MB/s ± 0% +27.62% (p=0.002 n=5+8) MemmoveUnalignedDst/7 509MB/s ± 0% 677MB/s ± 0% +33.01% (p=0.002 n=5+8) MemmoveUnalignedDst/8 792MB/s ± 0% 770MB/s ± 0% -2.77% (p=0.002 n=5+8) MemmoveUnalignedDst/9 841MB/s ± 0% 866MB/s ± 0% +2.92% (p=0.002 n=5+8) MemmoveUnalignedDst/10 896MB/s ± 0% 962MB/s ± 0% +7.35% (p=0.003 n=5+7) MemmoveUnalignedDst/11 947MB/s ± 0% 1058MB/s ± 0% +11.80% (p=0.002 n=5+8) MemmoveUnalignedDst/12 962MB/s ± 2% 1154MB/s ± 0% +19.97% (p=0.002 n=5+8) MemmoveUnalignedDst/13 947MB/s ± 0% 1251MB/s ± 0% +32.08% (p=0.002 n=5+8) MemmoveUnalignedDst/14 1.00GB/s ± 0% 1.35GB/s ± 0% +34.55% (p=0.002 n=5+8) MemmoveUnalignedDst/15 1.03GB/s ± 0% 1.44GB/s ± 0% +40.50% (p=0.002 n=5+8) MemmoveUnalignedDst/16 1.53GB/s ± 0% 1.54GB/s ± 0% +0.77% (p=0.002 n=5+8) MemmoveUnalignedDst/32 2.58GB/s ± 0% 2.75GB/s ± 0% +6.52% (p=0.003 n=5+7) MemmoveUnalignedDst/64 4.21GB/s ± 0% 5.19GB/s ± 0% +23.40% (p=0.004 n=5+6) MemmoveUnalignedDst/128 6.86GB/s ± 0% 8.42GB/s ± 0% +22.78% (p=0.003 n=5+7) MemmoveUnalignedDst/256 10.2GB/s ± 0% 13.8GB/s ± 0% +35.15% (p=0.002 n=5+8) MemmoveUnalignedDst/512 13.5GB/s ± 0% 21.0GB/s ± 0% +54.90% (p=0.002 n=5+8) MemmoveUnalignedDst/1024 13.7GB/s ± 0% 25.3GB/s ± 0% +84.61% (p=0.003 n=5+7) MemmoveUnalignedDst/2048 15.3GB/s ± 0% 27.5GB/s ± 0% +79.52% (p=0.002 n=5+8) MemmoveUnalignedDst/4096 16.5GB/s ± 0% 28.9GB/s ± 0% +74.74% (p=0.002 n=5+8) MemmoveUnalignedSrc/1 102MB/s ± 0% 100MB/s ± 0% -2.02% (p=0.000 n=5+7) MemmoveUnalignedSrc/2 191MB/s ± 0% 200MB/s ± 0% +4.78% (p=0.002 n=5+8) MemmoveUnalignedSrc/3 279MB/s ± 0% 300MB/s ± 0% +7.45% (p=0.002 n=5+8) MemmoveUnalignedSrc/4 354MB/s ± 0% 400MB/s ± 0% +13.10% (p=0.002 n=5+8) MemmoveUnalignedSrc/5 431MB/s ± 0% 500MB/s ± 0% +16.02% (p=0.002 n=5+8) MemmoveUnalignedSrc/6 441MB/s ± 0% 600MB/s ± 0% +36.03% (p=0.002 n=5+8) MemmoveUnalignedSrc/7 485MB/s ± 0% 700MB/s ± 0% +44.29% (p=0.002 n=5+8) MemmoveUnalignedSrc/8 811MB/s ± 1% 800MB/s ± 0% -1.36% (p=0.016 n=5+8) MemmoveUnalignedSrc/9 864MB/s ± 0% 900MB/s ± 0% +4.07% (p=0.002 n=5+8) MemmoveUnalignedSrc/10 893MB/s ± 0% 999MB/s ± 0% +11.97% (p=0.002 n=5+8) MemmoveUnalignedSrc/11 932MB/s ± 0% 1099MB/s ± 0% +18.01% (p=0.002 n=5+8) MemmoveUnalignedSrc/12 988MB/s ± 0% 1199MB/s ± 0% +21.35% (p=0.002 n=5+8) MemmoveUnalignedSrc/13 955MB/s ± 0% 1299MB/s ± 0% +36.02% (p=0.002 n=5+8) MemmoveUnalignedSrc/14 955MB/s ± 0% 1399MB/s ± 0% +46.52% (p=0.002 n=5+8) MemmoveUnalignedSrc/15 1.04GB/s ± 0% 1.50GB/s ± 0% +44.18% (p=0.002 n=5+8) MemmoveUnalignedSrc/16 1.45GB/s ± 0% 1.60GB/s ± 0% +10.14% (p=0.002 n=5+8) MemmoveUnalignedSrc/32 2.78GB/s ± 0% 3.20GB/s ± 0% +15.16% (p=0.003 n=5+7) MemmoveUnalignedSrc/64 4.30GB/s ± 0% 5.72GB/s ± 0% +32.90% (p=0.003 n=5+7) MemmoveUnalignedSrc/128 6.57GB/s ± 0% 8.42GB/s ± 0% +28.06% (p=0.002 n=5+8) MemmoveUnalignedSrc/256 9.39GB/s ± 1% 13.33GB/s ± 0% +41.96% (p=0.002 n=5+8) MemmoveUnalignedSrc/512 12.7GB/s ± 0% 18.8GB/s ± 0% +48.53% (p=0.003 n=5+7) MemmoveUnalignedSrc/1024 13.6GB/s ± 0% 23.0GB/s ± 0% +69.82% (p=0.002 n=5+8) MemmoveUnalignedSrc/2048 15.6GB/s ± 0% 26.8GB/s ± 3% +71.37% (p=0.002 n=5+8) MemmoveUnalignedSrc/4096 16.5GB/s ± 0% 28.2GB/s ± 0% +71.40% (p=0.002 n=5+8) Fixes #22925 Change-Id: I38c1a9ad5c6e3f4f95fc521c4b7e3140b58b4737 Reviewed-on: https://go-review.googlesource.com/83799 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
-
Josh Bleecher Snyder authored
bytes.IndexByte is heavily optimized. Use it in findnull. name old time/op new time/op delta GoString-8 65.5ns ± 1% 40.2ns ± 1% -38.62% (p=0.000 n=19+19) findnull is also used in gostringnocopy, which is used in many hot spots in the runtime. Fixes #23830 Change-Id: I2e6cb279c7d8078f8844065de684cc3567fe89d7 Reviewed-on: https://go-review.googlesource.com/97523 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Chad Rosier authored
This optimization mirrors that which is already implemented for AMD64. The optimization specifically targets the binary.BigEndian.PutUint* functions. encoding-binary results on Amberwing: name old time/op new time/op delta ReadSlice1000Int32s 9.83µs ± 2% 9.78µs ± 1% ~ (p=0.362 n=9+10) ReadStruct 5.24µs ± 3% 5.19µs ± 2% ~ (p=0.285 n=10+10) ReadInts 8.35µs ± 8% 8.44µs ± 3% ~ (p=0.323 n=10+10) WriteInts 3.38µs ± 3% 3.44µs ±15% ~ (p=0.921 n=9+10) WriteSlice1000Int32s 11.4µs ± 6% 10.2µs ± 4% -9.94% (p=0.000 n=10+10) PutUint16 510ns ±12% 500ns ± 0% ~ (p=0.586 n=10+7) PutUint32 530ns ±15% 490ns ±12% ~ (p=0.086 n=10+10) PutUint64 550ns ± 0% 470ns ± 6% -14.52% (p=0.000 n=7+10) LittleEndianPutUint16 500ns ± 0% 475ns ±16% ~ (p=0.120 n=7+10) LittleEndianPutUint32 450ns ± 0% 517ns ±16% +14.81% (p=0.004 n=8+9) LittleEndianPutUint64 550ns ± 0% 485ns ±13% -11.82% (p=0.000 n=8+10) PutUvarint32 685ns ±12% 622ns ± 4% -9.17% (p=0.005 n=10+9) PutUvarint64 735ns ± 9% 711ns ± 9% ~ (p=0.272 n=10+9) [Geo mean] 1.47µs 1.42µs -3.87% name old speed new speed delta ReadSlice1000Int32s 407MB/s ± 2% 409MB/s ± 1% ~ (p=0.362 n=9+10) ReadStruct 14.3MB/s ± 3% 14.4MB/s ± 2% ~ (p=0.250 n=10+10) ReadInts 3.59MB/s ± 7% 3.56MB/s ± 4% ~ (p=0.340 n=10+10) WriteInts 8.87MB/s ± 3% 8.74MB/s ±13% ~ (p=0.890 n=9+10) WriteSlice1000Int32s 352MB/s ± 6% 391MB/s ± 4% +11.03% (p=0.000 n=10+10) PutUint16 3.95MB/s ±13% 4.00MB/s ± 0% ~ (p=0.312 n=10+7) PutUint32 7.62MB/s ±17% 8.21MB/s ±11% ~ (p=0.086 n=10+10) PutUint64 14.6MB/s ± 0% 17.1MB/s ± 6% +17.28% (p=0.000 n=7+10) LittleEndianPutUint16 4.00MB/s ± 0% 4.23MB/s ±18% ~ (p=0.176 n=7+10) LittleEndianPutUint32 8.89MB/s ± 0% 7.64MB/s ±20% -14.05% (p=0.001 n=8+10) LittleEndianPutUint64 14.6MB/s ± 0% 16.6MB/s ±12% +13.86% (p=0.000 n=8+10) PutUvarint32 5.86MB/s ±14% 6.44MB/s ± 5% +9.84% (p=0.006 n=10+9) PutUvarint64 10.9MB/s ± 8% 11.3MB/s ± 9% ~ (p=0.373 n=10+9) [Geo mean] 14.2MB/s 14.8MB/s +3.93% go1 results on Amberwing: RegexpMatchEasy0_32 254ns ± 0% 254ns ± 0% ~ (all equal) RegexpMatchEasy0_1K 547ns ± 0% 547ns ± 0% ~ (all equal) RegexpMatchEasy1_32 252ns ± 0% 253ns ± 1% ~ (p=0.294 n=8+10) RegexpMatchEasy1_1K 782ns ± 0% 783ns ± 1% ~ (p=0.529 n=8+9) RegexpMatchMedium_32 316ns ± 0% 316ns ± 0% ~ (all equal) RegexpMatchMedium_1K 51.5µs ± 0% 51.5µs ± 0% ~ (p=0.645 n=10+9) RegexpMatchHard_32 2.75µs ± 0% 2.75µs ± 0% ~ (all equal) RegexpMatchHard_1K 78.7µs ± 0% 78.7µs ± 0% ~ (p=0.754 n=10+10) FmtFprintfEmpty 57.0ns ± 0% 57.0ns ± 0% ~ (all equal) FmtFprintfString 111ns ± 0% 111ns ± 0% ~ (all equal) FmtFprintfInt 114ns ± 0% 114ns ± 1% ~ (p=0.065 n=9+10) FmtFprintfIntInt 182ns ± 0% 178ns ± 0% -2.20% (p=0.000 n=10+10) FmtFprintfPrefixedInt 225ns ± 0% 227ns ± 0% +0.89% (p=0.000 n=10+10) FmtFprintfFloat 307ns ± 0% 307ns ± 0% ~ (p=1.000 n=9+9) FmtManyArgs 697ns ± 0% 701ns ± 2% ~ (p=0.108 n=9+10) Gzip 436ms ± 0% 437ms ± 0% +0.23% (p=0.000 n=10+8) HTTPClientServer 88.8µs ± 2% 89.6µs ± 1% +0.98% (p=0.019 n=10+10) JSONEncode 20.1ms ± 1% 20.2ms ± 1% +0.48% (p=0.007 n=10+10) JSONDecode 94.7ms ± 1% 94.1ms ± 0% -0.62% (p=0.000 n=10+9) GobDecode 12.6ms ± 2% 12.6ms ± 1% ~ (p=0.360 n=10+8) GobEncode 12.0ms ± 1% 11.9ms ± 1% -1.34% (p=0.000 n=10+10) Mandelbrot200 5.05ms ± 0% 5.05ms ± 0% +0.12% (p=0.000 n=10+10) TimeParse 448ns ± 0% 448ns ± 0% ~ (p=0.529 n=8+9) TimeFormat 501ns ± 1% 501ns ± 1% ~ (p=1.000 n=10+9) Template 90.6ms ± 0% 89.1ms ± 0% -1.67% (p=0.000 n=9+9) GoParse 6.01ms ± 0% 5.96ms ± 0% -0.83% (p=0.000 n=10+9) BinaryTree17 11.7s ± 0% 11.7s ± 0% ~ (p=0.481 n=10+10) Revcomp 675ms ± 0% 675ms ± 0% ~ (p=0.436 n=9+9) Fannkuch11 3.26s ± 0% 3.27s ± 1% +0.57% (p=0.000 n=10+10) [Geo mean] 67.4µs 67.3µs -0.10% name old speed new speed delta RegexpMatchEasy0_32 126MB/s ± 0% 126MB/s ± 0% ~ (p=0.353 n=10+7) RegexpMatchEasy0_1K 1.87GB/s ± 0% 1.87GB/s ± 0% ~ (p=0.275 n=8+10) RegexpMatchEasy1_32 127MB/s ± 0% 126MB/s ± 1% ~ (p=0.110 n=8+10) RegexpMatchEasy1_1K 1.31GB/s ± 0% 1.31GB/s ± 1% ~ (p=0.079 n=8+10) RegexpMatchMedium_32 3.16MB/s ± 0% 3.16MB/s ± 0% ~ (all equal) RegexpMatchMedium_1K 19.9MB/s ± 0% 19.9MB/s ± 0% ~ (p=0.889 n=10+9) RegexpMatchHard_32 11.7MB/s ± 0% 11.7MB/s ± 0% ~ (all equal) RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=1.000 n=10+10) Gzip 44.5MB/s ± 0% 44.4MB/s ± 0% -0.22% (p=0.000 n=10+8) JSONEncode 96.6MB/s ± 1% 96.1MB/s ± 1% -0.48% (p=0.007 n=10+10) JSONDecode 20.5MB/s ± 1% 20.6MB/s ± 0% +0.63% (p=0.000 n=10+9) GobDecode 61.0MB/s ± 2% 61.1MB/s ± 1% ~ (p=0.372 n=10+8) GobEncode 63.8MB/s ± 1% 64.7MB/s ± 1% +1.36% (p=0.000 n=10+10) Template 21.4MB/s ± 0% 21.8MB/s ± 0% +1.69% (p=0.000 n=9+9) GoParse 9.63MB/s ± 0% 9.71MB/s ± 0% +0.84% (p=0.000 n=9+8) Revcomp 377MB/s ± 0% 376MB/s ± 0% ~ (p=0.399 n=9+9) [Geo mean] 56.2MB/s 56.3MB/s +0.20% Change-Id: Ic915373f5ef512f9fbc45745860e5db7f6de6286 Reviewed-on: https://go-review.googlesource.com/97755 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
-
Ilya Tocar authored
Replace BYTE.. encodings with asm. This is possible due to asm implementing more instructions and removal of MOV $0, reg -> XOR reg, reg transformation from asm. Change-Id: I011749ab6b3f64403ab6e746f3760c5841548b57 Reviewed-on: https://go-review.googlesource.com/97936 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Pascal S. de Kloe authored
Eliminates the need for an extra scanner, read undo and some other tricks. name old time/op new time/op delta CodeEncoder-12 1.92ms ± 0% 1.91ms ± 1% -0.65% (p=0.000 n=17+20) CodeMarshal-12 2.13ms ± 2% 2.12ms ± 1% -0.49% (p=0.038 n=18+17) CodeDecoder-12 8.55ms ± 2% 8.49ms ± 1% ~ (p=0.119 n=20+18) UnicodeDecoder-12 411ns ± 0% 422ns ± 0% +2.77% (p=0.000 n=19+15) DecoderStream-12 320ns ± 1% 307ns ± 1% -3.80% (p=0.000 n=18+20) CodeUnmarshal-12 9.65ms ± 3% 9.58ms ± 3% ~ (p=0.157 n=20+20) CodeUnmarshalReuse-12 8.54ms ± 3% 8.56ms ± 2% ~ (p=0.602 n=20+20) UnmarshalString-12 110ns ± 1% 87ns ± 2% -21.53% (p=0.000 n=16+20) UnmarshalFloat64-12 101ns ± 1% 77ns ± 2% -23.08% (p=0.000 n=19+20) UnmarshalInt64-12 94.5ns ± 2% 68.4ns ± 1% -27.60% (p=0.000 n=20+20) Issue10335-12 128ns ± 1% 100ns ± 1% -21.89% (p=0.000 n=19+18) Unmapped-12 427ns ± 3% 247ns ± 4% -42.17% (p=0.000 n=20+20) NumberIsValid-12 23.0ns ± 0% 21.7ns ± 0% -5.73% (p=0.000 n=20+20) NumberIsValidRegexp-12 641ns ± 0% 642ns ± 0% +0.15% (p=0.003 n=19+19) EncoderEncode-12 56.9ns ± 0% 55.0ns ± 1% -3.32% (p=0.012 n=2+17) name old speed new speed delta CodeEncoder-12 1.01GB/s ± 1% 1.02GB/s ± 1% +0.71% (p=0.000 n=18+20) CodeMarshal-12 913MB/s ± 2% 917MB/s ± 1% +0.49% (p=0.038 n=18+17) CodeDecoder-12 227MB/s ± 2% 229MB/s ± 1% ~ (p=0.110 n=20+18) UnicodeDecoder-12 34.1MB/s ± 0% 33.1MB/s ± 0% -2.73% (p=0.000 n=19+19) CodeUnmarshal-12 201MB/s ± 3% 203MB/s ± 3% ~ (p=0.151 n=20+20) name old alloc/op new alloc/op delta Issue10335-12 320B ± 0% 184B ± 0% -42.50% (p=0.000 n=20+20) Unmapped-12 568B ± 0% 216B ± 0% -61.97% (p=0.000 n=20+20) EncoderEncode-12 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Issue10335-12 4.00 ± 0% 3.00 ± 0% -25.00% (p=0.000 n=20+20) Unmapped-12 18.0 ± 0% 4.0 ± 0% -77.78% (p=0.000 n=20+20) EncoderEncode-12 0.00 0.00 ~ (all equal) Fixes #17914 Updates #20693 Updates #10335 Change-Id: I0459a52febb8b79c9a2991e69ed2614cf8740429 Reviewed-on: https://go-review.googlesource.com/47152Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Ilya Tocar authored
useSSE41 was used inside asm implementation of floor to select between base and ss4 code path. We intrinsified floor and left asm functions as a backup for non-sse4 systems. This made variable unused, so remove it. Change-Id: Ia2633de7c7cb1ef1d5b15a2366b523e481b722d9 Reviewed-on: https://go-review.googlesource.com/97935 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Hana Kim authored
Change-Id: I030baaa0a0abf1e43449faaf676d389a28a868a3 Reviewed-on: https://go-review.googlesource.com/97857 Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com> Reviewed-by: Peter Weinberger <pjw@google.com>
-
Giovanni Bajo authored
Change-Id: I2b507e35cc314100eaf2ec2d1e5107cc2fc9e7cf Reviewed-on: https://go-review.googlesource.com/97818Reviewed-by: Keith Randall <khr@golang.org>
-
Giovanni Bajo authored
This avoid simple bugs like "ADD" matching "FADD". Obviously "ADD" will still match "ADDQ" so some care is still required in this regard, but at least a first class of possible errors is taken care of. Change-Id: I7deb04c31de30bedac9c026d9889ace4a1d2adcb Reviewed-on: https://go-review.googlesource.com/97817Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>
-
Giovanni Bajo authored
asmcheck comments now support a compact form of specifying multiple checks for each platform, using the following syntax: amd64:"SHL\t[$]4","SHR\t[$]4" Negative checks are also parsed using the following syntax: amd64:-"ROR" though they are still not working. Moreover, out-of-line comments have been implemented. This allows to specify asmchecks on comment-only lines, that will be matched on the first subsequent non-comment non-empty line. // amd64:"XOR" // arm:"EOR" x ^= 1 Change-Id: I110c7462fc6a5c70fd4af0d42f516016ae7f2760 Reviewed-on: https://go-review.googlesource.com/97816Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
Use staticbytes instead. Instrumenting make.bash shows approx 0.5% of all slicebytetostrings have a buffer of length 1. name old time/op new time/op delta SliceByteToString/1-8 14.1ns ± 1% 4.1ns ± 1% -71.13% (p=0.000 n=17+20) SliceByteToString/2-8 15.5ns ± 2% 15.5ns ± 1% ~ (p=0.061 n=20+18) SliceByteToString/4-8 14.9ns ± 1% 15.0ns ± 2% +1.25% (p=0.000 n=20+20) SliceByteToString/8-8 17.1ns ± 1% 17.5ns ± 1% +2.16% (p=0.000 n=19+19) SliceByteToString/16-8 23.6ns ± 1% 23.9ns ± 1% +1.41% (p=0.000 n=20+18) SliceByteToString/32-8 26.0ns ± 1% 25.8ns ± 0% -1.05% (p=0.000 n=19+16) SliceByteToString/64-8 30.0ns ± 0% 30.2ns ± 0% +0.56% (p=0.000 n=16+18) SliceByteToString/128-8 38.9ns ± 0% 39.0ns ± 0% +0.23% (p=0.019 n=19+15) Fixes #24172 Change-Id: I3dfa14eefbf9fb4387114e20c9cb40e186abe962 Reviewed-on: https://go-review.googlesource.com/97717 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Josh Bleecher Snyder authored
When the slice/string length is very large, probably artifically large as in CL 97523, adding BX (length) to R11 (pointer) overflows. As a result, checking DI < R11 yields the wrong result. Since they will be equal when the loop is done, just check DI != R11 instead. Yes, the pointer itself could overflow, but if that happens, something else has gone pretty wrong; not our concern here. Fixes #24187 Change-Id: I2f60fc6ccae739345d01bc80528560726ad4f8c6 Reviewed-on: https://go-review.googlesource.com/97802 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Chad Rosier authored
This optimization mirrors that which is already implemented for AMD64. The optimization specifically targets the binary.LittleEndian.PutUint* functions. encoding/binary results on Amberwing: name old time/op new time/op delta ReadSlice1000Int32s 9.67µs ± 1% 9.64µs ± 1% ~ (p=0.185 n=9+9) ReadStruct 5.24µs ± 2% 5.36µs ± 2% +2.24% (p=0.002 n=10+8) ReadInts 8.69µs ± 5% 8.88µs ± 5% ~ (p=0.083 n=10+10) WriteInts 3.90µs ±10% 3.71µs ± 9% ~ (p=0.077 n=10+10) WriteSlice1000Int32s 10.9µs ± 1% 10.9µs ± 1% ~ (p=0.701 n=9+9) PutUint16 572ns ±14% 505ns ±11% -11.75% (p=0.006 n=9+10) PutUint32 550ns ±18% 540ns ±11% ~ (p=0.692 n=10+10) PutUint64 565ns ±15% 540ns ±17% ~ (p=0.248 n=10+10) LittleEndianPutUint16 540ns ±11% 500ns ±10% ~ (p=0.094 n=10+10) LittleEndianPutUint32 520ns ±15% 480ns ±15% ~ (p=0.087 n=10+10) LittleEndianPutUint64 505ns ±29% 470ns ±17% ~ (p=0.208 n=10+10) PutUvarint32 700ns ±21% 635ns ±10% -9.29% (p=0.028 n=10+10) PutUvarint64 740ns ± 8% 740ns ± 8% ~ (p=0.713 n=10+10) [Geo mean] 1.53µs 1.47µs -3.93% name old speed new speed delta ReadSlice1000Int32s 414MB/s ± 1% 415MB/s ± 1% ~ (p=0.185 n=9+9) ReadStruct 14.3MB/s ± 2% 14.0MB/s ± 2% -2.21% (p=0.000 n=10+8) ReadInts 3.45MB/s ± 4% 3.38MB/s ± 6% ~ (p=0.085 n=10+10) WriteInts 7.71MB/s ± 9% 8.09MB/s ± 8% +4.93% (p=0.048 n=10+10) WriteSlice1000Int32s 367MB/s ± 1% 366MB/s ± 1% ~ (p=0.701 n=9+9) PutUint16 3.51MB/s ±14% 3.99MB/s ±11% +13.47% (p=0.009 n=9+10) PutUint32 7.35MB/s ±21% 7.44MB/s ±10% ~ (p=0.692 n=10+10) PutUint64 14.3MB/s ±14% 15.0MB/s ±19% ~ (p=0.248 n=10+10) LittleEndianPutUint16 3.72MB/s ±11% 4.03MB/s ±10% ~ (p=0.094 n=10+10) LittleEndianPutUint32 7.75MB/s ±15% 8.39MB/s ±13% ~ (p=0.087 n=10+10) LittleEndianPutUint64 16.1MB/s ±23% 17.2MB/s ±16% ~ (p=0.208 n=10+10) PutUvarint32 5.76MB/s ±18% 6.32MB/s ±10% +9.72% (p=0.028 n=10+10) PutUvarint64 10.8MB/s ± 8% 10.8MB/s ± 8% ~ (p=0.713 n=10+10) [Geo mean] 13.7MB/s 14.3MB/s +4.02% go1 results on Amberwing: name old time/op new time/op delta RegexpMatchEasy0_32 249ns ± 0% 249ns ± 0% ~ (p=0.087 n=10+10) RegexpMatchEasy0_1K 584ns ± 0% 584ns ± 0% ~ (all equal) RegexpMatchEasy1_32 246ns ± 0% 246ns ± 0% ~ (p=1.000 n=10+10) RegexpMatchEasy1_1K 806ns ± 0% 806ns ± 0% ~ (p=0.706 n=10+9) RegexpMatchMedium_32 314ns ± 0% 314ns ± 0% ~ (all equal) RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% ~ (p=0.245 n=10+8) RegexpMatchHard_32 2.75µs ± 1% 2.75µs ± 1% ~ (p=0.690 n=10+10) RegexpMatchHard_1K 78.9µs ± 0% 78.9µs ± 1% ~ (p=0.295 n=9+9) FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal) FmtFprintfString 112ns ± 0% 112ns ± 0% ~ (all equal) FmtFprintfInt 117ns ± 0% 116ns ± 0% -0.85% (p=0.000 n=10+10) FmtFprintfIntInt 181ns ± 0% 181ns ± 0% ~ (all equal) FmtFprintfPrefixedInt 222ns ± 0% 224ns ± 0% +0.90% (p=0.000 n=9+10) FmtFprintfFloat 318ns ± 1% 322ns ± 0% ~ (p=0.059 n=10+8) FmtManyArgs 736ns ± 1% 735ns ± 0% ~ (p=0.206 n=9+9) Gzip 437ms ± 0% 436ms ± 0% -0.25% (p=0.000 n=10+10) HTTPClientServer 89.8µs ± 1% 90.2µs ± 2% ~ (p=0.393 n=10+10) JSONEncode 20.1ms ± 1% 20.2ms ± 1% ~ (p=0.065 n=9+10) JSONDecode 94.2ms ± 1% 93.9ms ± 1% -0.42% (p=0.043 n=10+10) GobDecode 12.7ms ± 1% 12.8ms ± 2% +0.94% (p=0.019 n=10+10) GobEncode 12.1ms ± 0% 12.1ms ± 0% ~ (p=0.052 n=10+10) Mandelbrot200 5.06ms ± 0% 5.05ms ± 0% -0.04% (p=0.000 n=9+10) TimeParse 450ns ± 3% 446ns ± 0% ~ (p=0.238 n=10+9) TimeFormat 485ns ± 1% 483ns ± 1% ~ (p=0.073 n=10+10) Template 90.4ms ± 0% 90.7ms ± 0% +0.29% (p=0.000 n=8+10) GoParse 6.01ms ± 0% 6.03ms ± 0% +0.35% (p=0.000 n=10+10) BinaryTree17 11.7s ± 0% 11.7s ± 0% ~ (p=0.481 n=10+10) Revcomp 669ms ± 0% 669ms ± 0% ~ (p=0.315 n=10+10) Fannkuch11 3.40s ± 0% 3.37s ± 0% -0.92% (p=0.000 n=10+10) [Geo mean] 67.9µs 67.9µs +0.02% name old speed new speed delta RegexpMatchEasy0_32 128MB/s ± 0% 128MB/s ± 0% -0.08% (p=0.003 n=8+10) RegexpMatchEasy0_1K 1.75GB/s ± 0% 1.75GB/s ± 0% ~ (p=0.642 n=8+10) RegexpMatchEasy1_32 130MB/s ± 0% 130MB/s ± 0% ~ (p=0.690 n=10+9) RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% ~ (p=0.661 n=10+9) RegexpMatchMedium_32 3.18MB/s ± 0% 3.18MB/s ± 0% ~ (all equal) RegexpMatchMedium_1K 19.7MB/s ± 0% 19.6MB/s ± 0% ~ (p=0.190 n=10+9) RegexpMatchHard_32 11.6MB/s ± 0% 11.6MB/s ± 1% ~ (p=0.669 n=10+10) RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=0.718 n=9+9) Gzip 44.4MB/s ± 0% 44.5MB/s ± 0% +0.24% (p=0.000 n=10+10) JSONEncode 96.5MB/s ± 1% 96.1MB/s ± 1% ~ (p=0.065 n=9+10) JSONDecode 20.6MB/s ± 1% 20.7MB/s ± 1% +0.42% (p=0.041 n=10+10) GobDecode 60.6MB/s ± 1% 60.0MB/s ± 2% -0.92% (p=0.016 n=10+10) GobEncode 63.4MB/s ± 0% 63.6MB/s ± 0% ~ (p=0.055 n=10+10) Template 21.5MB/s ± 0% 21.4MB/s ± 0% -0.30% (p=0.000 n=9+10) GoParse 9.64MB/s ± 0% 9.61MB/s ± 0% -0.36% (p=0.000 n=10+10) Revcomp 380MB/s ± 0% 380MB/s ± 0% ~ (p=0.323 n=10+10) [Geo mean] 56.0MB/s 55.9MB/s -0.07% Change-Id: I79a4978d42d01a5f72ed5ceec07f5e78ac6b3859 Reviewed-on: https://go-review.googlesource.com/97175 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
-