- 01 Jun, 2012 1 commit
-
-
David Symonds authored
R=golang-dev, r CC=golang-dev https://golang.org/cl/6257082
-
- 31 May, 2012 8 commits
-
-
Nigel Tao authored
exp/html/atom benchmark: benchmark old ns/op new ns/op delta BenchmarkLookup 199226 80770 -59.46% exp/html benchmark: benchmark old ns/op new ns/op delta BenchmarkParser 4864890 4510834 -7.28% BenchmarkHighLevelTokenizer 2209192 1969684 -10.84% benchmark old MB/s new MB/s speedup BenchmarkParser 16.07 17.33 1.08x BenchmarkHighLevelTokenizer 35.38 39.68 1.12x R=r CC=golang-dev https://golang.org/cl/6261054
-
Rémy Oudompheng authored
The previous code was preparing arrays of entries that would be filled if there was one entry every 128 bytes. Moving to a 4096 byte interval reduces the overhead per megabyte of address space to 2kB from 64kB (on 64-bit systems). The performance impact will be negative for very small MemProfileRate. test/bench/garbage/tree2 -heapsize 800000000 (default memprofilerate) Before: mprof 65993056 bytes (1664 bucketmem + 65991392 addrmem) After: mprof 1989984 bytes (1680 bucketmem + 1988304 addrmem) R=golang-dev, rsc CC=golang-dev, remy https://golang.org/cl/6257069
-
Sameer Ajmani authored
R=golang-dev, r CC=golang-dev https://golang.org/cl/6244071
-
Rémy Oudompheng authored
The previous heap profile format did not include buckets with zero used bytes. Also add several missing MemStats fields in debug mode. R=golang-dev, rsc CC=golang-dev, remy https://golang.org/cl/6249068
-
Nigel Tao authored
50% fewer mallocs in HTML tokenization, resulting in 25% fewer mallocs in parsing go1.html. Making the parser use integer comparisons instead of string comparisons will be a follow-up CL, to be co-ordinated with Andy Balholm's work. exp/html benchmarks before/after: BenchmarkParser 500 4754294 ns/op 16.44 MB/s parse_test.go:409: 500 iterations, 14651 mallocs per iteration BenchmarkRawLevelTokenizer 2000 903481 ns/op 86.51 MB/s token_test.go:678: 2000 iterations, 28 mallocs per iteration BenchmarkLowLevelTokenizer 2000 1260485 ns/op 62.01 MB/s token_test.go:678: 2000 iterations, 41 mallocs per iteration BenchmarkHighLevelTokenizer 1000 2165964 ns/op 36.09 MB/s token_test.go:678: 1000 iterations, 6616 mallocs per iteration BenchmarkParser 500 4664912 ns/op 16.76 MB/s parse_test.go:409: 500 iterations, 11266 mallocs per iteration BenchmarkRawLevelTokenizer 2000 903065 ns/op 86.55 MB/s token_test.go:678: 2000 iterations, 28 mallocs per iteration BenchmarkLowLevelTokenizer 2000 1260032 ns/op 62.03 MB/s token_test.go:678: 2000 iterations, 41 mallocs per iteration BenchmarkHighLevelTokenizer 1000 2143356 ns/op 36.47 MB/s token_test.go:678: 1000 iterations, 3231 mallocs per iteration R=r, rsc, rogpeppe CC=andybalholm, golang-dev https://golang.org/cl/6255062
-
Rob Pike authored
Byte slices are not strings. Fixes #3687. R=golang-dev, dsymonds CC=golang-dev https://golang.org/cl/6257074
-
Andrew Gerrand authored
R=golang-dev, dsymonds CC=golang-dev https://golang.org/cl/6258064
-
Andrew Gerrand authored
R=golang-dev, dsymonds CC=golang-dev https://golang.org/cl/6244069
-
- 30 May, 2012 28 commits
-
-
Dave Cheney authored
Add -ccflags to pass arguments to {5,6,8}c similar to -gcflags for {5,6,8}g. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/6260047
-
Russ Cox authored
Drop expecttaken function in favor of extra argument to gbranch and bgen. Mark loop condition as likely to be true, so that loops are generated inline. The main benefit here is contiguous code when trying to read the generated assembly. It has only minor effects on the timing, and they mostly cancel the minor effects that aligning function entry points had. One exception: both changes made Fannkuch faster. Compared to before CL 6244066 (before aligned functions) benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4222117400 4201958800 -0.48% BenchmarkFannkuch11 3462631800 3215908600 -7.13% BenchmarkGobDecode 20887622 20899164 +0.06% BenchmarkGobEncode 9548772 9439083 -1.15% BenchmarkGzip 151687 152060 +0.25% BenchmarkGunzip 8742 8711 -0.35% BenchmarkJSONEncode 62730560 62686700 -0.07% BenchmarkJSONDecode 252569180 252368960 -0.08% BenchmarkMandelbrot200 5267599 5252531 -0.29% BenchmarkRevcomp25M 980813500 985248400 +0.45% BenchmarkTemplate 361259100 357414680 -1.06% Compared to tip (aligned functions): benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4140739800 4201958800 +1.48% BenchmarkFannkuch11 3259914400 3215908600 -1.35% BenchmarkGobDecode 20620222 20899164 +1.35% BenchmarkGobEncode 9384886 9439083 +0.58% BenchmarkGzip 150333 152060 +1.15% BenchmarkGunzip 8741 8711 -0.34% BenchmarkJSONEncode 65210990 62686700 -3.87% BenchmarkJSONDecode 249394860 252368960 +1.19% BenchmarkMandelbrot200 5273394 5252531 -0.40% BenchmarkRevcomp25M 996013800 985248400 -1.08% BenchmarkTemplate 360620840 357414680 -0.89% R=ken2 CC=golang-dev https://golang.org/cl/6245069
-
Mikio Hara authored
R=golang-dev, dave, rsc CC=golang-dev https://golang.org/cl/6248065
-
Russ Cox authored
Was missing break. R=ken2 CC=golang-dev https://golang.org/cl/6250078
-
Russ Cox authored
On 6l and 8l, this is a real instruction, guaranteed to cause an 'undefined instruction' exception. On 5l, we simulate it as BL to address 0. The plan is to use it as a signal to the linker that this point in the instruction stream cannot be reached (hence the changes to nofollow). This will help the compiler explain that panicindex and friends do not return without having to put a list of these functions in the linker. R=ken2 CC=golang-dev https://golang.org/cl/6255064
-
Russ Cox authored
16 seems pretty standard on x86 for function entry. I don't know if ARM would benefit, so I used just 4 (single instruction alignment). This has a minor absolute effect on the current timings. The main hope is that it will make them more consistent from run to run. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4222117400 4140739800 -1.93% BenchmarkFannkuch11 3462631800 3259914400 -5.85% BenchmarkGobDecode 20887622 20620222 -1.28% BenchmarkGobEncode 9548772 9384886 -1.72% BenchmarkGzip 151687 150333 -0.89% BenchmarkGunzip 8742 8741 -0.01% BenchmarkJSONEncode 62730560 65210990 +3.95% BenchmarkJSONDecode 252569180 249394860 -1.26% BenchmarkMandelbrot200 5267599 5273394 +0.11% BenchmarkRevcomp25M 980813500 996013800 +1.55% BenchmarkTemplate 361259100 360620840 -0.18% R=ken2 CC=golang-dev https://golang.org/cl/6244066
-
Russ Cox authored
The code was inconsistent about when it used brchain(x) and when it used x directly, with the result that you could end up emitting code for brchain(x) but leave the jump pointing at an unemitted x. R=ken2 CC=golang-dev https://golang.org/cl/6250077
-
Ivan Krasin authored
This bug has been introduced in the following revision: changeset: 11404:26dceba5c610 user: Ivan Krasin <krasin@golang.org> date: Mon Jan 23 09:19:39 2012 -0500 summary: compress/flate: reduce memory pressure at cost of additional arithmetic operation. This is the review page for that CL: https://golang.org/cl/5555070/ R=rsc, imkrasin CC=golang-dev https://golang.org/cl/6249067
-
Mats Lidell authored
Fixes some portability issues between the Emacsen. R=golang-dev, sameer, bradfitz, ryanb CC=golang-dev https://golang.org/cl/6206043
-
Rob Pike authored
Most significant in mandelbrot, from avoiding MOVSD between registers, but there are others. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/6258063
-
Russ Cox authored
MOVSD only copies the low half of the packed register pair, while MOVAPD copies both halves. I assume the internal register renaming works better with the latter, since it makes our code run 25% faster. Before: mandelbrot 16000 gcc -O2 mandelbrot.c 28.44u 0.00s 28.45r gc mandelbrot 44.12u 0.00s 44.13r gc_B mandelbrot 44.17u 0.01s 44.19r After: mandelbrot 16000 gcc -O2 mandelbrot.c 28.22u 0.00s 28.23r gc mandelbrot 32.81u 0.00s 32.82r gc_B mandelbrot 32.82u 0.00s 32.83r R=ken2 CC=golang-dev https://golang.org/cl/6248068
-
Russ Cox authored
Surprise! The C code is using floating point values for its counters. Its off the critical path, but the Go code and C code are supposed to be as similar as possible to make comparisons meaningful. It doesn't have a significant effect. R=golang-dev, r CC=golang-dev https://golang.org/cl/6260058
-
Sameer Ajmani authored
address, but his changelist is under the Gmail address. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/6248069
-
Jean-Marc Eurin authored
This uses the patch output of gofmt (-d option) and applies each chunk to the buffer, instead of replacing the whole buffer. The main advantage is that the undo history is kept across gofmt'ings, so it can really be used as a before-save-hook. R=sameer, sameer CC=golang-dev https://golang.org/cl/6198047
-
Rob Pike authored
R=golang-dev, bradfitz, rsc CC=golang-dev https://golang.org/cl/6259054
-
Joel Sing authored
The correct procid is needed for unparking LWPs on NetBSD - always initialise procid in minit() so that cgo works correctly. The non-cgo case already works correctly since procid is initialised via lwp_create(). R=golang-dev, rsc CC=golang-dev https://golang.org/cl/6257071
-
Jan Ziak authored
R=rsc, remyoudompheng, minux.ma, ality CC=golang-dev https://golang.org/cl/6242061
-
Joel Sing authored
On NetBSD a cgo enabled binary has more than 32 sections - bump NSECTS so that we can actually link them successfully. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/6261052
-
Jan Ziak authored
R=rsc CC=golang-dev https://golang.org/cl/6243059
-
Marcel van Lohuizen authored
R=r CC=golang-dev https://golang.org/cl/6202063
-
Russ Cox authored
R=golang-dev, bradfitz CC=golang-dev https://golang.org/cl/6244063
-
Russ Cox authored
I added the nl->op == OLITERAL case during the recent performance round, and while it helps for small integer constants, it hurts for floating point constants. In the Mandelbrot benchmark it causes 2*Zr*Zi to compile like Zr*2*Zi: 0x000000000042663d <+249>: movsd %xmm6,%xmm0 0x0000000000426641 <+253>: movsd $2,%xmm1 0x000000000042664a <+262>: mulsd %xmm1,%xmm0 0x000000000042664e <+266>: mulsd %xmm5,%xmm0 instead of: 0x0000000000426835 <+276>: movsd $2,%xmm0 0x000000000042683e <+285>: mulsd %xmm6,%xmm0 0x0000000000426842 <+289>: mulsd %xmm5,%xmm0 It is unclear why that has such a dramatic performance effect in a tight loop, but it's obviously slightly better code, so go with it. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 5957470000 5973924000 +0.28% BenchmarkFannkuch11 3811295000 3869128000 +1.52% BenchmarkGobDecode 26001900 25670500 -1.27% BenchmarkGobEncode 12051430 11948590 -0.85% BenchmarkGzip 177432 174821 -1.47% BenchmarkGunzip 10967 10756 -1.92% BenchmarkJSONEncode 78924750 79746900 +1.04% BenchmarkJSONDecode 313606400 307081600 -2.08% BenchmarkMandelbrot200 13670860 8200725 -40.01% !!! BenchmarkRevcomp25M 1179194000 1206539000 +2.32% BenchmarkTemplate 447931200 443948200 -0.89% BenchmarkMD5Hash1K 2856 2873 +0.60% BenchmarkMD5Hash8K 22083 22029 -0.24% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 29.52 29.90 1.01x BenchmarkGobEncode 63.69 64.24 1.01x BenchmarkJSONEncode 24.59 24.33 0.99x BenchmarkJSONDecode 6.19 6.32 1.02x BenchmarkRevcomp25M 215.54 210.66 0.98x BenchmarkTemplate 4.33 4.37 1.01x BenchmarkMD5Hash1K 358.54 356.31 0.99x BenchmarkMD5Hash8K 370.95 371.86 1.00x R=ken2 CC=golang-dev https://golang.org/cl/6261051
-
Nigel Tao authored
filterPaeth takes []byte arguments instead of byte arguments, which avoids some redudant computation of the previous pixel in the inner loop. Also eliminate a bounds check in decoding the up filter. benchmark old ns/op new ns/op delta BenchmarkDecodeGray 3139636 2812531 -10.42% BenchmarkDecodeNRGBAGradient 12341520 10971680 -11.10% BenchmarkDecodeNRGBAOpaque 10740780 9612455 -10.51% BenchmarkDecodePaletted 1819535 1818913 -0.03% BenchmarkDecodeRGB 8974695 8178070 -8.88% R=rsc CC=golang-dev https://golang.org/cl/6243061
-
Alex Brainman authored
R=golang-dev CC=golang-dev https://golang.org/cl/6256069
-
Rémy Oudompheng authored
A block with finalizer might also be profiled. The special bit is needed to unregister the block from the profile. It will be unset only when the block is freed. Fixes #3668. R=golang-dev, rsc CC=golang-dev, remy https://golang.org/cl/6249066
-
Andrew Balholm authored
Also escape "\r" as " " when rendering HTML. Pass 2 additional tests. R=nigeltao CC=golang-dev https://golang.org/cl/6260046
-
Alex Brainman authored
Fixes #3543. R=golang-dev, kardianos, rsc CC=golang-dev, hectorchu, vcc.163 https://golang.org/cl/6245063
-
Nigel Tao authored
$GOROOT/src/pkg/exp/html/testdata/go1.html is an execution of the $GOROOT/doc/go1.html template by godoc. Sample numbers on my linux,amd64 desktop: BenchmarkParser 500 4699198 ns/op 16.63 MB/s --- BENCH: BenchmarkParser parse_test.go:409: 1 iterations, 14653 mallocs per iteration parse_test.go:409: 100 iterations, 14651 mallocs per iteration parse_test.go:409: 500 iterations, 14651 mallocs per iteration BenchmarkRawLevelTokenizer 2000 904957 ns/op 86.37 MB/s --- BENCH: BenchmarkRawLevelTokenizer token_test.go:657: 1 iterations, 28 mallocs per iteration token_test.go:657: 100 iterations, 28 mallocs per iteration token_test.go:657: 2000 iterations, 28 mallocs per iteration BenchmarkLowLevelTokenizer 2000 1134300 ns/op 68.91 MB/s --- BENCH: BenchmarkLowLevelTokenizer token_test.go:657: 1 iterations, 41 mallocs per iteration token_test.go:657: 100 iterations, 41 mallocs per iteration token_test.go:657: 2000 iterations, 41 mallocs per iteration BenchmarkHighLevelTokenizer 1000 2096179 ns/op 37.29 MB/s --- BENCH: BenchmarkHighLevelTokenizer token_test.go:657: 1 iterations, 6616 mallocs per iteration token_test.go:657: 100 iterations, 6616 mallocs per iteration token_test.go:657: 1000 iterations, 6616 mallocs per iteration R=rsc CC=andybalholm, golang-dev, r https://golang.org/cl/6257067
-
- 29 May, 2012 3 commits
-
-
Brad Fitzpatrick authored
R=golang-dev, r CC=golang-dev https://golang.org/cl/6259052
-
Rémy Oudompheng authored
Fixes #3345. R=golang-dev, r, rsc, dave CC=golang-dev, remy https://golang.org/cl/6214061
-
Rob Pike authored
The check for Stringer etc. can only fire if the test is not a builtin, so avoid the expensive check if we know there's no chance. Also put in a fast path for pad, which saves a more modest amount. benchmark old ns/op new ns/op delta BenchmarkSprintfEmpty 148 152 +2.70% BenchmarkSprintfString 585 497 -15.04% BenchmarkSprintfInt 441 396 -10.20% BenchmarkSprintfIntInt 718 603 -16.02% BenchmarkSprintfPrefixedInt 676 621 -8.14% BenchmarkSprintfFloat 1003 953 -4.99% BenchmarkManyArgs 2945 2312 -21.49% BenchmarkScanInts 1704152 1734441 +1.78% BenchmarkScanRecursiveInt 1837397 1828920 -0.46% R=golang-dev, bradfitz CC=golang-dev https://golang.org/cl/6245068
-