- 02 Jun, 2012 5 commits
-
-
Charles L. Dorian authored
Ceil to 4.81 from 20.6 ns/op Floor to 4.37 from 13.5 ns/op Trunc to 3.97 from 14.3 ns/op Also changed three MOVSDs to MOVAPDs in log_amd64.s R=rsc, golang-dev CC=golang-dev https://golang.org/cl/6262048
-
Jan Mercl authored
Currently walk() doesn't check for err == SkipDir when iterating a directory list, but such promise is made in the docs for WalkFunc. Fixes #3486. R=rsc, r CC=golang-dev https://golang.org/cl/6257059
-
Shenghou Ma authored
R=dave, rsc CC=golang-dev https://golang.org/cl/6248070
-
Brad Fitzpatrick authored
Now that gri has made go/parser 15% faster, I offer this change to slow back down cmd/api ~proportionately, adding FreeBSD to the go1-checked set of platforms. Really we should have done this earlier. This will prevent us from breaking FreeBSD compatibility accidentally in the future. R=golang-dev, r CC=golang-dev https://golang.org/cl/6279044
-
Rob Pike authored
To avoid goroutines during init, the nextItem function was a clever workaround. Now that init goroutines are permitted, restore the original, simpler design. R=golang-dev, bradfitz CC=golang-dev https://golang.org/cl/6282043
-
- 01 Jun, 2012 5 commits
-
-
Robert Griesemer authored
- only compute current line position if needed (i.e., if a comment is present) - added benchmark benchmark old ns/op new ns/op delta BenchmarkParse 10902990 9313330 -14.58% benchmark old MB/s new MB/s speedup BenchmarkParse 5.31 6.22 1.17x R=golang-dev, r CC=golang-dev https://golang.org/cl/6270043
-
Ryan Barrett authored
R=golang-dev, sameer, bradfitz CC=golang-dev, jba https://golang.org/cl/6213056
-
Russ Cox authored
Saving the code in case we improve things enough that it matters later, but at least right now it is not worth doing. R=ken2 CC=golang-dev https://golang.org/cl/6248071
-
Russ Cox authored
Dreg from https://golang.org/cl/4629042 R=ken2 CC=golang-dev https://golang.org/cl/6259057
-
David Symonds authored
R=golang-dev, r CC=golang-dev https://golang.org/cl/6257082
-
- 31 May, 2012 8 commits
-
-
Nigel Tao authored
exp/html/atom benchmark: benchmark old ns/op new ns/op delta BenchmarkLookup 199226 80770 -59.46% exp/html benchmark: benchmark old ns/op new ns/op delta BenchmarkParser 4864890 4510834 -7.28% BenchmarkHighLevelTokenizer 2209192 1969684 -10.84% benchmark old MB/s new MB/s speedup BenchmarkParser 16.07 17.33 1.08x BenchmarkHighLevelTokenizer 35.38 39.68 1.12x R=r CC=golang-dev https://golang.org/cl/6261054
-
Rémy Oudompheng authored
The previous code was preparing arrays of entries that would be filled if there was one entry every 128 bytes. Moving to a 4096 byte interval reduces the overhead per megabyte of address space to 2kB from 64kB (on 64-bit systems). The performance impact will be negative for very small MemProfileRate. test/bench/garbage/tree2 -heapsize 800000000 (default memprofilerate) Before: mprof 65993056 bytes (1664 bucketmem + 65991392 addrmem) After: mprof 1989984 bytes (1680 bucketmem + 1988304 addrmem) R=golang-dev, rsc CC=golang-dev, remy https://golang.org/cl/6257069
-
Sameer Ajmani authored
R=golang-dev, r CC=golang-dev https://golang.org/cl/6244071
-
Rémy Oudompheng authored
The previous heap profile format did not include buckets with zero used bytes. Also add several missing MemStats fields in debug mode. R=golang-dev, rsc CC=golang-dev, remy https://golang.org/cl/6249068
-
Nigel Tao authored
50% fewer mallocs in HTML tokenization, resulting in 25% fewer mallocs in parsing go1.html. Making the parser use integer comparisons instead of string comparisons will be a follow-up CL, to be co-ordinated with Andy Balholm's work. exp/html benchmarks before/after: BenchmarkParser 500 4754294 ns/op 16.44 MB/s parse_test.go:409: 500 iterations, 14651 mallocs per iteration BenchmarkRawLevelTokenizer 2000 903481 ns/op 86.51 MB/s token_test.go:678: 2000 iterations, 28 mallocs per iteration BenchmarkLowLevelTokenizer 2000 1260485 ns/op 62.01 MB/s token_test.go:678: 2000 iterations, 41 mallocs per iteration BenchmarkHighLevelTokenizer 1000 2165964 ns/op 36.09 MB/s token_test.go:678: 1000 iterations, 6616 mallocs per iteration BenchmarkParser 500 4664912 ns/op 16.76 MB/s parse_test.go:409: 500 iterations, 11266 mallocs per iteration BenchmarkRawLevelTokenizer 2000 903065 ns/op 86.55 MB/s token_test.go:678: 2000 iterations, 28 mallocs per iteration BenchmarkLowLevelTokenizer 2000 1260032 ns/op 62.03 MB/s token_test.go:678: 2000 iterations, 41 mallocs per iteration BenchmarkHighLevelTokenizer 1000 2143356 ns/op 36.47 MB/s token_test.go:678: 1000 iterations, 3231 mallocs per iteration R=r, rsc, rogpeppe CC=andybalholm, golang-dev https://golang.org/cl/6255062
-
Rob Pike authored
Byte slices are not strings. Fixes #3687. R=golang-dev, dsymonds CC=golang-dev https://golang.org/cl/6257074
-
Andrew Gerrand authored
R=golang-dev, dsymonds CC=golang-dev https://golang.org/cl/6258064
-
Andrew Gerrand authored
R=golang-dev, dsymonds CC=golang-dev https://golang.org/cl/6244069
-
- 30 May, 2012 22 commits
-
-
Dave Cheney authored
Add -ccflags to pass arguments to {5,6,8}c similar to -gcflags for {5,6,8}g. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/6260047
-
Russ Cox authored
Drop expecttaken function in favor of extra argument to gbranch and bgen. Mark loop condition as likely to be true, so that loops are generated inline. The main benefit here is contiguous code when trying to read the generated assembly. It has only minor effects on the timing, and they mostly cancel the minor effects that aligning function entry points had. One exception: both changes made Fannkuch faster. Compared to before CL 6244066 (before aligned functions) benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4222117400 4201958800 -0.48% BenchmarkFannkuch11 3462631800 3215908600 -7.13% BenchmarkGobDecode 20887622 20899164 +0.06% BenchmarkGobEncode 9548772 9439083 -1.15% BenchmarkGzip 151687 152060 +0.25% BenchmarkGunzip 8742 8711 -0.35% BenchmarkJSONEncode 62730560 62686700 -0.07% BenchmarkJSONDecode 252569180 252368960 -0.08% BenchmarkMandelbrot200 5267599 5252531 -0.29% BenchmarkRevcomp25M 980813500 985248400 +0.45% BenchmarkTemplate 361259100 357414680 -1.06% Compared to tip (aligned functions): benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4140739800 4201958800 +1.48% BenchmarkFannkuch11 3259914400 3215908600 -1.35% BenchmarkGobDecode 20620222 20899164 +1.35% BenchmarkGobEncode 9384886 9439083 +0.58% BenchmarkGzip 150333 152060 +1.15% BenchmarkGunzip 8741 8711 -0.34% BenchmarkJSONEncode 65210990 62686700 -3.87% BenchmarkJSONDecode 249394860 252368960 +1.19% BenchmarkMandelbrot200 5273394 5252531 -0.40% BenchmarkRevcomp25M 996013800 985248400 -1.08% BenchmarkTemplate 360620840 357414680 -0.89% R=ken2 CC=golang-dev https://golang.org/cl/6245069
-
Mikio Hara authored
R=golang-dev, dave, rsc CC=golang-dev https://golang.org/cl/6248065
-
Russ Cox authored
Was missing break. R=ken2 CC=golang-dev https://golang.org/cl/6250078
-
Russ Cox authored
On 6l and 8l, this is a real instruction, guaranteed to cause an 'undefined instruction' exception. On 5l, we simulate it as BL to address 0. The plan is to use it as a signal to the linker that this point in the instruction stream cannot be reached (hence the changes to nofollow). This will help the compiler explain that panicindex and friends do not return without having to put a list of these functions in the linker. R=ken2 CC=golang-dev https://golang.org/cl/6255064
-
Russ Cox authored
16 seems pretty standard on x86 for function entry. I don't know if ARM would benefit, so I used just 4 (single instruction alignment). This has a minor absolute effect on the current timings. The main hope is that it will make them more consistent from run to run. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4222117400 4140739800 -1.93% BenchmarkFannkuch11 3462631800 3259914400 -5.85% BenchmarkGobDecode 20887622 20620222 -1.28% BenchmarkGobEncode 9548772 9384886 -1.72% BenchmarkGzip 151687 150333 -0.89% BenchmarkGunzip 8742 8741 -0.01% BenchmarkJSONEncode 62730560 65210990 +3.95% BenchmarkJSONDecode 252569180 249394860 -1.26% BenchmarkMandelbrot200 5267599 5273394 +0.11% BenchmarkRevcomp25M 980813500 996013800 +1.55% BenchmarkTemplate 361259100 360620840 -0.18% R=ken2 CC=golang-dev https://golang.org/cl/6244066
-
Russ Cox authored
The code was inconsistent about when it used brchain(x) and when it used x directly, with the result that you could end up emitting code for brchain(x) but leave the jump pointing at an unemitted x. R=ken2 CC=golang-dev https://golang.org/cl/6250077
-
Ivan Krasin authored
This bug has been introduced in the following revision: changeset: 11404:26dceba5c610 user: Ivan Krasin <krasin@golang.org> date: Mon Jan 23 09:19:39 2012 -0500 summary: compress/flate: reduce memory pressure at cost of additional arithmetic operation. This is the review page for that CL: https://golang.org/cl/5555070/ R=rsc, imkrasin CC=golang-dev https://golang.org/cl/6249067
-
Mats Lidell authored
Fixes some portability issues between the Emacsen. R=golang-dev, sameer, bradfitz, ryanb CC=golang-dev https://golang.org/cl/6206043
-
Rob Pike authored
Most significant in mandelbrot, from avoiding MOVSD between registers, but there are others. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/6258063
-
Russ Cox authored
MOVSD only copies the low half of the packed register pair, while MOVAPD copies both halves. I assume the internal register renaming works better with the latter, since it makes our code run 25% faster. Before: mandelbrot 16000 gcc -O2 mandelbrot.c 28.44u 0.00s 28.45r gc mandelbrot 44.12u 0.00s 44.13r gc_B mandelbrot 44.17u 0.01s 44.19r After: mandelbrot 16000 gcc -O2 mandelbrot.c 28.22u 0.00s 28.23r gc mandelbrot 32.81u 0.00s 32.82r gc_B mandelbrot 32.82u 0.00s 32.83r R=ken2 CC=golang-dev https://golang.org/cl/6248068
-
Russ Cox authored
Surprise! The C code is using floating point values for its counters. Its off the critical path, but the Go code and C code are supposed to be as similar as possible to make comparisons meaningful. It doesn't have a significant effect. R=golang-dev, r CC=golang-dev https://golang.org/cl/6260058
-
Sameer Ajmani authored
address, but his changelist is under the Gmail address. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/6248069
-
Jean-Marc Eurin authored
This uses the patch output of gofmt (-d option) and applies each chunk to the buffer, instead of replacing the whole buffer. The main advantage is that the undo history is kept across gofmt'ings, so it can really be used as a before-save-hook. R=sameer, sameer CC=golang-dev https://golang.org/cl/6198047
-
Rob Pike authored
R=golang-dev, bradfitz, rsc CC=golang-dev https://golang.org/cl/6259054
-
Joel Sing authored
The correct procid is needed for unparking LWPs on NetBSD - always initialise procid in minit() so that cgo works correctly. The non-cgo case already works correctly since procid is initialised via lwp_create(). R=golang-dev, rsc CC=golang-dev https://golang.org/cl/6257071
-
Jan Ziak authored
R=rsc, remyoudompheng, minux.ma, ality CC=golang-dev https://golang.org/cl/6242061
-
Joel Sing authored
On NetBSD a cgo enabled binary has more than 32 sections - bump NSECTS so that we can actually link them successfully. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/6261052
-
Jan Ziak authored
R=rsc CC=golang-dev https://golang.org/cl/6243059
-
Marcel van Lohuizen authored
R=r CC=golang-dev https://golang.org/cl/6202063
-
Russ Cox authored
R=golang-dev, bradfitz CC=golang-dev https://golang.org/cl/6244063
-
Russ Cox authored
I added the nl->op == OLITERAL case during the recent performance round, and while it helps for small integer constants, it hurts for floating point constants. In the Mandelbrot benchmark it causes 2*Zr*Zi to compile like Zr*2*Zi: 0x000000000042663d <+249>: movsd %xmm6,%xmm0 0x0000000000426641 <+253>: movsd $2,%xmm1 0x000000000042664a <+262>: mulsd %xmm1,%xmm0 0x000000000042664e <+266>: mulsd %xmm5,%xmm0 instead of: 0x0000000000426835 <+276>: movsd $2,%xmm0 0x000000000042683e <+285>: mulsd %xmm6,%xmm0 0x0000000000426842 <+289>: mulsd %xmm5,%xmm0 It is unclear why that has such a dramatic performance effect in a tight loop, but it's obviously slightly better code, so go with it. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 5957470000 5973924000 +0.28% BenchmarkFannkuch11 3811295000 3869128000 +1.52% BenchmarkGobDecode 26001900 25670500 -1.27% BenchmarkGobEncode 12051430 11948590 -0.85% BenchmarkGzip 177432 174821 -1.47% BenchmarkGunzip 10967 10756 -1.92% BenchmarkJSONEncode 78924750 79746900 +1.04% BenchmarkJSONDecode 313606400 307081600 -2.08% BenchmarkMandelbrot200 13670860 8200725 -40.01% !!! BenchmarkRevcomp25M 1179194000 1206539000 +2.32% BenchmarkTemplate 447931200 443948200 -0.89% BenchmarkMD5Hash1K 2856 2873 +0.60% BenchmarkMD5Hash8K 22083 22029 -0.24% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 29.52 29.90 1.01x BenchmarkGobEncode 63.69 64.24 1.01x BenchmarkJSONEncode 24.59 24.33 0.99x BenchmarkJSONDecode 6.19 6.32 1.02x BenchmarkRevcomp25M 215.54 210.66 0.98x BenchmarkTemplate 4.33 4.37 1.01x BenchmarkMD5Hash1K 358.54 356.31 0.99x BenchmarkMD5Hash8K 370.95 371.86 1.00x R=ken2 CC=golang-dev https://golang.org/cl/6261051
-