Commits · b3a130e81a0c3c2508f483af15e57d181c5cdc1e · go / golang

01 May, 2016 13 commits

net/http: document some errors more, mark ErrWriteAfterFlush as unused · b3a130e8

Brad Fitzpatrick authored May 01, 2016

Fixes #15150

Change-Id: I1a892d5b0516a37dac050d3bb448e0a2571db16e
Reviewed-on: https://go-review.googlesource.com/22658Reviewed-by: Andrew Gerrand <adg@golang.org>

b3a130e8

archive/zip: improve BenchmarkCompressedZipGarbage · d713e8e8

Josh Bleecher Snyder authored Apr 28, 2016

Before this CL:

$ go test -bench=CompressedZipGarbage -count=5 -run=NONE archive/zip
BenchmarkCompressedZipGarbage-8        50  20677087 ns/op   42973 B/op      47 allocs/op
BenchmarkCompressedZipGarbage-8       100  20584764 ns/op   24294 B/op      47 allocs/op
BenchmarkCompressedZipGarbage-8        50  20859221 ns/op   42973 B/op      47 allocs/op
BenchmarkCompressedZipGarbage-8       100  20901176 ns/op   24294 B/op      47 allocs/op
BenchmarkCompressedZipGarbage-8        50  21282409 ns/op   42973 B/op      47 allocs/op

The B/op number is effectively meaningless. There
is a surprisingly large one-time cost that gets
divided by the number of iterations that your
machine can get through in a second.

This CL discards the first run, which helps.
It is not a panacea. Running with -benchtime=10s
will allow the sync.Pool to be emptied,
which brings the problem back.
However, since there are more iterations to divide
the cost through, it’s not quite as bad,
and running with a high benchtime is rare.

This CL changes the meaning of the B/op number,
which is unfortunate, since it won’t have the
same order of magnitude as previous Go versions.
But it wasn’t really comparable before anyway,
since it didn’t have any reliable meaning at all.

After this CL:

$ go test -bench=CompressedZipGarbage -count=5 -run=NONE archive/zip
BenchmarkCompressedZipGarbage-8   	     100	  20881890 ns/op	    5616 B/op	      47 allocs/op
BenchmarkCompressedZipGarbage-8   	      50	  20622757 ns/op	    5616 B/op	      47 allocs/op
BenchmarkCompressedZipGarbage-8   	      50	  20628193 ns/op	    5616 B/op	      47 allocs/op
BenchmarkCompressedZipGarbage-8   	     100	  20756612 ns/op	    5616 B/op	      47 allocs/op
BenchmarkCompressedZipGarbage-8   	     100	  20639774 ns/op	    5616 B/op	      47 allocs/op

Change-Id: Iedee04f39328974c7fa272a6113d423e7ffce50f
Reviewed-on: https://go-review.googlesource.com/22585Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

d713e8e8

doc: update go1.7.txt · 3836354f

Brad Fitzpatrick authored Apr 28, 2016

Change-Id: I53dd5affc3a1e1f741fe44c7ce691bb2cd432764
Reviewed-on: https://go-review.googlesource.com/22657Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

3836354f

cmd/internal/obj/mips, cmd/link: add support TLS relocation for mips64x · 3b0b3072

Cherry Zhang authored Apr 28, 2016

a new relocation R_ADDRMIPSTLS is added, which resolves to 16-bit offset
of a TLS address on mips64x.

Change-Id: Ic60d0e1ba49ff1c433cead242f5884677ab227a5
Reviewed-on: https://go-review.googlesource.com/19804Reviewed-by: Minux Ma <minux@golang.org>

3b0b3072

runtime: update some comments · 77c7f124

Austin Clements authored May 01, 2016

This updates some comments that became out of date when we moved the
mark bit out of the heap bitmap and started using the high bit for the
first word as a scan/dead bit.

Change-Id: I4a572d16db6114cadff006825466c1f18359f2db
Reviewed-on: https://go-review.googlesource.com/22662Reviewed-by: Rick Hudson <rlh@golang.org>

77c7f124

runtime/cgo: add linux/mips64x cgo support · 5d002dbc

Cherry Zhang authored Apr 28, 2016

MIPS N64 ABI passes arguments in registers R4-R11, return value in R2.
R16-R23, R28, R30 and F24-F31 are callee-save. gcc PIC code expects
to be called with indirect call through R25.

Change-Id: I24f582b4b58e1891ba9fd606509990f95cca8051
Reviewed-on: https://go-review.googlesource.com/19805Reviewed-by: Minux Ma <minux@golang.org>

5d002dbc

cmd/link, runtime: add external linking support for linux/mips64x · 073d292c

Cherry Zhang authored Apr 28, 2016

Fixes #12560

Change-Id: Ic2004fc7b09f2dbbf83c41f8c6307757c0e1676d
Reviewed-on: https://go-review.googlesource.com/19803Reviewed-by: Minux Ma <minux@golang.org>

073d292c

cmd/compile: Improve readability of HTML produced by GOSSAFUNC · b13b249f

Frits van Bommel authored Apr 30, 2016

Factor out the Aux/AuxInt handling in (*Value).LongString() and
use it in (*Value).LongHTML() as well.
This especially improves readability of auxFloat32, auxFloat64,
and auxSymValAndOff values which would otherwise be printed as
opaque integers.
This change also makes LongString() slightly less verbose by
eliding offsets that are zero (as is very often the case).

Additionally, ensure the HTML is interpreted as UTF-8 so that
non-ASCII characters (especially the "middle dots" in some symbols)
show up correctly.

Change-Id: Ie26221df876faa056d322b3e423af63f33cd109d
Reviewed-on: https://go-review.googlesource.com/22641Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Frits van Bommel <fvbommel@gmail.com>

b13b249f

cmd/internal/obj/mips et al.: introduce SB register on mips64x · 98139510

Cherry Zhang authored Apr 28, 2016

SB register (R28) is introduced for access external addresses with shorter
instruction sequences. It is loaded at entry points. External data within
2G of SB can be accessed this way.

cmd/internal/obj: relocaltion R_ADDRMIPS is split into two relocations
R_ADDRMIPS and R_ADDRMIPSU, handling the low 16 bits and the "upper" 16
bits of external addresses, respectively, since the instructios may not
be adjacent. It might be better if relocation Variant could be used.

cmd/link/internal/mips64: support new relocations.

cmd/compile/internal/mips64: reserve SB register.

runtime: initialize SB register at entry points.

Change-Id: I5f34868f88c5a9698c042a8a1f12f76806c187b9
Reviewed-on: https://go-review.googlesource.com/19802Reviewed-by: Minux Ma <minux@golang.org>

98139510

cmd/asm, cmd/internal/obj/mips: add an alias of RSB on mips64x · 8dc0444a

Cherry Zhang authored Apr 28, 2016

Change-Id: I724ce0a48c1aeed14267c049fa415a6fa2fffbcf
Reviewed-on: https://go-review.googlesource.com/19864Reviewed-by: Minux Ma <minux@golang.org>

8dc0444a

cmd/internal/obj/mips, runtime: change REGTMP to R23 · a409fb80

Cherry Zhang authored Apr 28, 2016

Leave R28 to SB register, which will be introduced in CL 19802.

Change-Id: I1cf7a789695c5de664267ec8086bfb0b043ebc14
Reviewed-on: https://go-review.googlesource.com/19863Reviewed-by: Minux Ma <minux@golang.org>

a409fb80

cmd/asm/internal/asm/testdata: remove WORD $foo(SB) from mips64.s · 9bc1e206

Cherry Zhang authored Apr 28, 2016

on mips64, address is 64 bit, not a WORD. also it is never used anywhere.

Change-Id: Ic6bf6d6a21c8d2f1eb7bfe9efc5a29186ec2a8ef
Reviewed-on: https://go-review.googlesource.com/19801Reviewed-by: Minux Ma <minux@golang.org>

9bc1e206

net/http: add Transport.MaxIdleConns limit · 81b2ea4d

Brad Fitzpatrick authored Apr 30, 2016

The HTTP client had a limit for the maximum number of idle connections
per-host, but not a global limit.

This CLs adds a global idle connection limit too,
Transport.MaxIdleConns.

All idle conns are now also stored in a doubly-linked list. When there
are too many, the oldest one is closed.

Fixes #15461

Change-Id: I72abbc28d140c73cf50f278fa70088b45ae0deef
Reviewed-on: https://go-review.googlesource.com/22655Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

81b2ea4d

30 Apr, 2016 7 commits

net/http: expand documentation of Server.MaxHeaderBytes · 38cfaa5f

Brad Fitzpatrick authored Apr 30, 2016

Clarify that it includes the RFC 7230 "request-line".

Fixes #15494

Change-Id: I9cc5dd5f2d85ebf903229539208cec4da5c38d04
Reviewed-on: https://go-review.googlesource.com/22656Reviewed-by: Andrew Gerrand <adg@golang.org>

38cfaa5f

database/sql: clone data for named []byte types · 4e0cd1ee

Kevin Burke authored Apr 23, 2016

Previously named byte types like json.RawMessage could get dirty
database memory from a call to Scan. These types would activate a
code path that didn't clone the byte data coming from the database
before assigning it. Another thread could then overwrite the byte
array in src, which has unexpected consequences.

Originally reported by Jason Moiron; the patch and test are his
suggestions. Fixes #13905.

Change-Id: Iacfef61cbc9dd51c8fccef9b2b9d9544c77dd0e0
Reviewed-on: https://go-review.googlesource.com/22393Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

4e0cd1ee

runtime: reclaim scan/dead bit in first word · a20fd1f6

Austin Clements authored Apr 29, 2016

With the switch to separate mark bitmaps, the scan/dead bit for the
first word of each object is now unused. Reclaim this bit and use it
as a scan/dead bit, just like words three and on. The second word is
still used for checkmark.

This dramatically simplifies heapBitsSetTypeNoScan and hasPointers,
since they no longer need different cases for 1, 2, and 3+ word
objects. They can instead just manipulate the heap bitmap for the
first word and be done with it.

In order to enable this, we change heapBitsSetType and runGCProg to
always set the scan/dead bit to scan for the first word on every code
path. Since these functions only apply to types that have pointers,
there's no need to do this conditionally: it's *always* necessary to
set the scan bit in the first word.

We also change every place that scans an object and checks if there
are more pointers. Rather than only checking morePointers if the word
is >= 2, we now check morePointers if word != 1 (since that's the
checkmark word).

Looking forward, we should probably reclaim the checkmark bit, too,
but that's going to be quite a bit more work.

Tested by setting doubleCheck in heapBitsSetType and running all.bash
on both linux/amd64 and linux/386, and by running GOGC=10 all.bash.

This particularly improves the FmtFprintf* go1 benchmarks, since they
do a large amount of noscan allocation.

name old time/op new time/op delta
BinaryTree17-12 2.34s ± 1% 2.38s ± 1% +1.70% (p=0.000 n=17+19)
Fannkuch11-12 2.09s ± 0% 2.09s ± 1% ~ (p=0.276 n=17+16)
FmtFprintfEmpty-12 44.9ns ± 2% 44.8ns ± 2% ~ (p=0.340 n=19+18)
FmtFprintfString-12 127ns ± 0% 125ns ± 0% -1.57% (p=0.000 n=16+15)
FmtFprintfInt-12 128ns ± 0% 122ns ± 1% -4.45% (p=0.000 n=15+20)
FmtFprintfIntInt-12 207ns ± 1% 193ns ± 0% -6.55% (p=0.000 n=19+14)
FmtFprintfPrefixedInt-12 197ns ± 1% 191ns ± 0% -2.93% (p=0.000 n=17+18)
FmtFprintfFloat-12 263ns ± 0% 248ns ± 1% -5.88% (p=0.000 n=15+19)
FmtManyArgs-12 794ns ± 0% 779ns ± 1% -1.90% (p=0.000 n=18+18)
GobDecode-12 7.14ms ± 2% 7.11ms ± 1% ~ (p=0.072 n=20+20)
GobEncode-12 5.85ms ± 1% 5.82ms ± 1% -0.49% (p=0.000 n=20+20)
Gzip-12 218ms ± 1% 215ms ± 1% -1.22% (p=0.000 n=19+19)
Gunzip-12 36.8ms ± 0% 36.7ms ± 0% -0.18% (p=0.006 n=18+20)
HTTPClientServer-12 77.1µs ± 4% 77.1µs ± 3% ~ (p=0.945 n=19+20)
JSONEncode-12 15.6ms ± 1% 15.9ms ± 1% +1.68% (p=0.000 n=18+20)
JSONDecode-12 55.2ms ± 1% 53.6ms ± 1% -2.93% (p=0.000 n=17+19)
Mandelbrot200-12 4.05ms ± 1% 4.05ms ± 0% ~ (p=0.306 n=17+17)
GoParse-12 3.14ms ± 1% 3.10ms ± 1% -1.31% (p=0.000 n=19+18)
RegexpMatchEasy0_32-12 69.3ns ± 1% 70.0ns ± 0% +0.89% (p=0.000 n=19+17)
RegexpMatchEasy0_1K-12 237ns ± 1% 236ns ± 0% -0.62% (p=0.000 n=19+16)
RegexpMatchEasy1_32-12 69.5ns ± 1% 70.3ns ± 1% +1.14% (p=0.000 n=18+17)
RegexpMatchEasy1_1K-12 377ns ± 1% 366ns ± 1% -3.03% (p=0.000 n=15+19)
RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 2% ~ (p=0.318 n=20+19)
RegexpMatchMedium_1K-12 33.8µs ± 3% 33.5µs ± 1% -1.04% (p=0.001 n=20+19)
RegexpMatchHard_32-12 1.68µs ± 1% 1.73µs ± 0% +2.50% (p=0.000 n=20+18)
RegexpMatchHard_1K-12 50.8µs ± 1% 52.0µs ± 1% +2.50% (p=0.000 n=19+18)
Revcomp-12 381ms ± 1% 385ms ± 1% +1.00% (p=0.000 n=17+18)
Template-12 64.9ms ± 3% 62.6ms ± 1% -3.55% (p=0.000 n=19+18)
TimeParse-12 324ns ± 0% 328ns ± 1% +1.25% (p=0.000 n=18+18)
TimeFormat-12 345ns ± 0% 334ns ± 0% -3.31% (p=0.000 n=15+17)
[Geo mean] 52.1µs 51.5µs -1.00%

Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049
Reviewed-on: https://go-review.googlesource.com/22632Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

a20fd1f6

runtime: use morePointers and isPointer in more places · d5e3d08b

Austin Clements authored Apr 29, 2016

This makes this code better self-documenting and makes it easier to
find these places in the future.

Change-Id: I31dc5598ae67f937fb9ef26df92fd41d01e983c3
Reviewed-on: https://go-review.googlesource.com/22631Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

d5e3d08b

runtime: avoid conditional execution in morePointers and isPointer · a5d3f7ec

Austin Clements authored Apr 29, 2016

heapBits.bits is carefully written to produce good machine code. Use
it in heapBits.morePointers and heapBits.isPointer to get good machine
code there, too.

Change-Id: I208c7d0d38697e7a22cad67f692162589b75f1e2
Reviewed-on: https://go-review.googlesource.com/22630Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

a5d3f7ec

cmd/compile: ecx is reserved for PIC, don't let peep work on it · 7a60a962

Keith Randall authored Apr 30, 2016

Fixes #15496

Change-Id: Ieb5be1caa4b1c23e23b20d56c1a0a619032a9f5d
Reviewed-on: https://go-review.googlesource.com/22652Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>

7a60a962

runtime: fix cgocallback_gofunc on ppc64x · 58f52cbb

Michael Munday authored Apr 30, 2016

Fix issues introduced in 5f9a870b.

Change-Id: Ia75945ef563956613bf88bbe57800a96455c265d
Reviewed-on: https://go-review.googlesource.com/22661Reviewed-by: Ian Lance Taylor <iant@golang.org>

58f52cbb

29 Apr, 2016 20 commits

runtime: fix cgocallback_gofunc argument passing on arm64 · 9fe572e5

Ian Lance Taylor authored Apr 29, 2016

Change-Id: I4b34bcd5cde71ecfbb352b39c4231de6168cc7f3
Reviewed-on: https://go-review.googlesource.com/22651
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>

9fe572e5

root: remove dev.garbage file · 36b6c038

Matthew Dempsky authored Apr 29, 2016

Change-Id: I99b2ca52824341d986090f5c78ab4f396594bcdf
Reviewed-on: https://go-review.googlesource.com/22660Reviewed-by: Ian Lance Taylor <iant@golang.org>

36b6c038

cmd/cgo, runtime, runtime/cgo: use cgo context function · 5f9a870b

Ian Lance Taylor authored Apr 27, 2016

Add support for the context function set by runtime.SetCgoTraceback.
The context function was added in CL 17761, without support.
This CL is the support.

This CL has not been tested for real C code, as a working context
function for C code requires unwind support that does not seem to exist.
I wanted to get the CL out before the freeze.

I apologize for the length of this CL.  It's mostly plumbing, but
unfortunately the plumbing is processor-specific.

Change-Id: I8ce11a0de9b3dafcc29efd2649d776e93bff0e90
Reviewed-on: https://go-review.googlesource.com/22508Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

5f9a870b

crypto/cipher, crypto/aes: add s390x implementation of AES-CTR · c717675c

Michael Munday authored Apr 18, 2016

This commit adds the new 'ctrAble' interface to the crypto/cipher
package. The role of ctrAble is the same as gcmAble but for CTR
instead of GCM. It allows block ciphers to provide optimized CTR
implementations.

The primary benefit of adding CTR support to the s390x AES
implementation is that it allows us to encrypt the counter values
in bulk, giving the cipher message instruction a larger chunk of
data to work on per invocation.

The xorBytes assembly is necessary because xorBytes becomes a
bottleneck when CTR is done in this way. Hopefully it will be
possible to remove this once s390x has migrated to the ssa
backend.

name      old speed     new speed     delta
AESCTR1K  160MB/s ± 6%  867MB/s ± 0%  +442.42%  (p=0.000 n=9+10)

Change-Id: I1ae16b0ce0e2641d2bdc7d7eabc94dd35f6e9318
Reviewed-on: https://go-review.googlesource.com/22195Reviewed-by: Adam Langley <agl@golang.org>

c717675c

crypto/cipher, crypto/aes: add s390x implementation of AES-CBC · 2f847564

Michael Munday authored Apr 26, 2016

This commit adds the cbcEncAble and cbcDecAble interfaces that
can be implemented by block ciphers that support an optimized
implementation of CBC. This is similar to what is done for GCM
with the gcmAble interface.

The cbcEncAble, cbcDecAble and gcmAble interfaces all now have
tests to ensure they are detected correctly in the cipher
package.

name             old speed     new speed      delta
AESCBCEncrypt1K  152MB/s ± 1%  1362MB/s ± 0%  +795.59%   (p=0.000 n=10+9)
AESCBCDecrypt1K  143MB/s ± 1%  1362MB/s ± 0%  +853.00%   (p=0.000 n=10+9)

Change-Id: I715f686ab3686b189a3dac02f86001178fa60580
Reviewed-on: https://go-review.googlesource.com/22523
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Adam Langley <agl@golang.org>

2f847564

cmd/compile: make vet happy with ssa code · cd956576

Keith Randall authored Apr 29, 2016

Fixes #15488

Change-Id: I054eb1e1c859de315e3cdbdef5428682bce693fd
Reviewed-on: https://go-review.googlesource.com/22609
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>

cd956576

Merge remote-tracking branch 'origin/dev.garbage' · 56b54912

Rick Hudson authored Apr 29, 2016

This commit moves the GC from free list allocation to
bit mark allocation. Instead of using the bitmaps
generated during the mark phases to generate free
list and then using the free lists for allocation we
allocate directly from the bitmaps.

The change in the garbage benchmark

name              old time/op  new time/op  delta
XBenchGarbage-12  2.22ms ± 1%  2.13ms ± 1%  -3.90%  (p=0.000 n=18+18)

Change-Id: I17f57233336f0ca5ef5404c3be4ecb443ab622aa

56b54912

[dev.garbage] runtime: simplify nextFreeFast so it is inlined · e9eaa181

Rick Hudson authored Apr 29, 2016

nextFreeFast is currently not inlined by the compiler due
to its size and complexity. This CL simplifies
nextFreeFast by letting the slow path handle (nextFree)
handle a corner cases.

Change-Id: Ia9c5d1a7912bcb4bec072f5fd240f0e0bafb20e4
Reviewed-on: https://go-review.googlesource.com/22598Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>

e9eaa181

cmd/compile: Move divconst_test out of test/bench/go1 · d8d33514

David Chase authored Apr 29, 2016

This is necessary to avoid disrupting the go1 suite and gives
us a place to put other tests of basic compiler function and
correctness.

Change-Id: I36933819ff2bfe6a2121fff2be9a98efd2123d9a
Reviewed-on: https://go-review.googlesource.com/22597
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

d8d33514

cmd/compile: clean up rewrite rules · fa9435cd

Keith Randall authored Apr 26, 2016

Break really long lines.
Add spacing to line up columns.

In AMD64, put all the optimization rules after all the
lowering rules.

Change-Id: I45cc7368bf278416e67f89e74358db1bd4326a93
Reviewed-on: https://go-review.googlesource.com/22470Reviewed-by: David Chase <drchase@google.com>

fa9435cd

[dev.garbage] runtime: revive sweep fast path · b3579c09

Austin Clements authored Apr 29, 2016

sweep used to skip mcental.freeSpan (and its locking) if it didn't
find any new free objects. We lost that optimization when the
freed-object counting changed in dad83f7 to count total free objects
instead of newly freed objects.

The previous commit brings back counting of newly freed objects, so we
can easily revive this optimization by checking that count (like we
used to) instead of the total free objects count.

Change-Id: I43658707a1c61674d0366124d5976b00d98741a9
Reviewed-on: https://go-review.googlesource.com/22596
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>

b3579c09

[dev.garbage] runtime: fix nfree accounting · d97625ae

Austin Clements authored Apr 29, 2016

Commit 8dda1c4c changed the meaning of "nfree" in sweep from the number
of newly freed objects to the total number of free objects in the
span, but didn't update where sweep added nfree to c.local_nsmallfree.
Hence, we're over-accounting the number of frees. This is causing
TestArrayHash to fail with "too many allocs NNN - hash not balanced".

Fix this by computing the number of newly freed objects and adding
that to c.local_nsmallfree, so it behaves like it used to. Computing
this requires a small tweak to mallocgc: apparently we've never set
s.allocCount when allocating a large object; fix this by setting it to
1 so sweep doesn't get confused.

Change-Id: I31902ffd310110da4ffd807c5c06f1117b872dc8
Reviewed-on: https://go-review.googlesource.com/22595Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>

d97625ae

[dev.garbage] runtime: fix allocfreetrace · 6d114905

Austin Clements authored Apr 28, 2016

We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode
when we removed the sweep over the mark bitmap. Fix it by
re-introducing the sweep over the bitmap specifically if we're in
allocfreetrace mode. This doesn't have to be even remotely efficient,
since the overhead of allocfreetrace is huge anyway, so we can keep
the code for this down to just a few lines.

Change-Id: I9e176b3b04c73608a0ea3068d5d0cd30760ebd40
Reviewed-on: https://go-review.googlesource.com/22592
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

6d114905

[dev.garbage] runtime: reintroduce no-zeroing optimization · 38f67468

Austin Clements authored Apr 28, 2016

Currently we always zero objects when we allocate them. We used to
have an optimization that would not zero objects that had not been
allocated since the whole span was last zeroed (either by getting it
from the system or by getting it from the heap, which does a bulk
zero), but this depended on the sweeper clobbering the first two words
of each object. Hence, we lost this optimization when the bitmap
sweeper went away.

Re-introduce this optimization using a different mechanism. Each span
already keeps a flag indicating that it just came from the OS or was
just bulk zeroed by the mheap. We can simply use this flag to know
when we don't need to zero an object. This is slightly less efficient
than the old optimization: if a span gets allocated and partially
used, then GC happens and the span gets returned to the mcentral, then
the span gets re-acquired, the old optimization knew that it only had
to re-zero the objects that had been reclaimed, whereas this
optimization will re-zero everything. However, in this case, you're
already paying for the garbage collection, and you've only wasted one
zeroing of the span, so in practice there seems to be little
difference. (If we did want to revive the full optimization, each span
could keep track of a frontier beyond which all free slots are zeroed.
I prototyped this and it didn't obvious do any better than the much
simpler approach in this commit.)

This significantly improves BinaryTree17, which is allocation-heavy
(and runs first, so most pages are already zeroed), and slightly
improves everything else.

name old time/op new time/op delta
XBenchGarbage-12 2.15ms ± 1% 2.14ms ± 1% -0.80% (p=0.000 n=17+17)

name old time/op new time/op delta
BinaryTree17-12 2.71s ± 1% 2.56s ± 1% -5.73% (p=0.000 n=18+19)
DivconstI64-12 1.70ns ± 1% 1.70ns ± 1% ~ (p=0.562 n=18+18)
DivconstU64-12 1.74ns ± 2% 1.74ns ± 1% ~ (p=0.394 n=20+20)
DivconstI32-12 1.74ns ± 0% 1.74ns ± 0% ~ (all samples are equal)
DivconstU32-12 1.66ns ± 1% 1.66ns ± 0% ~ (p=0.516 n=15+16)
DivconstI16-12 1.84ns ± 0% 1.84ns ± 0% ~ (all samples are equal)
DivconstU16-12 1.82ns ± 0% 1.82ns ± 0% ~ (all samples are equal)
DivconstI8-12 1.79ns ± 0% 1.79ns ± 0% ~ (all samples are equal)
DivconstU8-12 1.60ns ± 0% 1.60ns ± 1% ~ (p=0.603 n=17+19)
Fannkuch11-12 2.11s ± 1% 2.11s ± 0% ~ (p=0.333 n=16+19)
FmtFprintfEmpty-12 45.1ns ± 4% 45.4ns ± 5% ~ (p=0.111 n=20+20)
FmtFprintfString-12 134ns ± 0% 129ns ± 0% -3.45% (p=0.000 n=18+16)
FmtFprintfInt-12 131ns ± 1% 129ns ± 1% -1.54% (p=0.000 n=16+18)
FmtFprintfIntInt-12 205ns ± 2% 203ns ± 0% -0.56% (p=0.014 n=20+18)
FmtFprintfPrefixedInt-12 200ns ± 2% 197ns ± 1% -1.48% (p=0.000 n=20+18)
FmtFprintfFloat-12 256ns ± 1% 256ns ± 0% -0.21% (p=0.008 n=18+20)
FmtManyArgs-12 805ns ± 0% 804ns ± 0% -0.19% (p=0.001 n=18+18)
GobDecode-12 7.21ms ± 1% 7.14ms ± 1% -0.92% (p=0.000 n=19+20)
GobEncode-12 5.88ms ± 1% 5.88ms ± 1% ~ (p=0.641 n=18+19)
Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.271 n=19+18)
Gunzip-12 37.1ms ± 0% 36.9ms ± 0% -0.29% (p=0.000 n=18+17)
HTTPClientServer-12 78.1µs ± 2% 77.4µs ± 2% ~ (p=0.070 n=19+19)
JSONEncode-12 15.5ms ± 1% 15.5ms ± 0% ~ (p=0.063 n=20+18)
JSONDecode-12 56.1ms ± 0% 55.4ms ± 1% -1.18% (p=0.000 n=19+18)
Mandelbrot200-12 4.05ms ± 0% 4.06ms ± 0% +0.29% (p=0.001 n=18+18)
GoParse-12 3.28ms ± 1% 3.21ms ± 1% -2.30% (p=0.000 n=20+20)
RegexpMatchEasy0_32-12 69.4ns ± 2% 69.3ns ± 1% ~ (p=0.205 n=18+16)
RegexpMatchEasy0_1K-12 239ns ± 0% 239ns ± 0% ~ (all samples are equal)
RegexpMatchEasy1_32-12 69.4ns ± 1% 69.4ns ± 1% ~ (p=0.620 n=15+18)
RegexpMatchEasy1_1K-12 370ns ± 1% 369ns ± 2% ~ (p=0.088 n=20+20)
RegexpMatchMedium_32-12 108ns ± 0% 108ns ± 0% ~ (all samples are equal)
RegexpMatchMedium_1K-12 33.6µs ± 3% 33.5µs ± 3% ~ (p=0.718 n=20+20)
RegexpMatchHard_32-12 1.68µs ± 1% 1.67µs ± 2% ~ (p=0.316 n=20+20)
RegexpMatchHard_1K-12 50.5µs ± 3% 50.4µs ± 3% ~ (p=0.659 n=20+20)
Revcomp-12 381ms ± 1% 381ms ± 1% ~ (p=0.916 n=19+18)
Template-12 66.5ms ± 1% 65.8ms ± 2% -1.08% (p=0.000 n=20+20)
TimeParse-12 317ns ± 0% 319ns ± 0% +0.48% (p=0.000 n=19+12)
TimeFormat-12 338ns ± 0% 338ns ± 0% ~ (p=0.124 n=19+18)
[Geo mean] 5.99µs 5.96µs -0.54%

Change-Id: I638ffd9d9f178835bbfa499bac20bd7224f1a907
Reviewed-on: https://go-review.googlesource.com/22591Reviewed-by: Rick Hudson <rlh@golang.org>

38f67468

compress/flate: use a constant hash table size for Best Speed. · 1fb4e4de

Nigel Tao authored Apr 29, 2016

This makes compress/flate's version of Snappy diverge from the upstream
golang/snappy version, but the latter has a goal of matching C++ snappy
output byte-for-byte. Both C++ and the asm version of golang/snappy can
use a smaller N for the O(N) zero-initialization of the hash table when
the input is small, even if the pure Go golang/snappy algorithm cannot:
"var table [tableSize]uint16" zeroes all tableSize elements.

For this package, we don't have the match-C++-snappy goal, so we can use
a different (constant) hash table size.

This is a small win, in terms of throughput and output size, but it also
enables us to re-use the (constant size) hash table between
encodeBestSpeed calls, avoiding the cost of zero-initializing the hash
table altogether. This will be implemented in follow-up commits.

This package's benchmarks:
name old speed new speed delta
EncodeDigitsSpeed1e4-8 72.8MB/s ± 1% 73.5MB/s ± 1% +0.86% (p=0.000 n=10+10)
EncodeDigitsSpeed1e5-8 77.5MB/s ± 1% 78.0MB/s ± 0% +0.69% (p=0.000 n=10+10)
EncodeDigitsSpeed1e6-8 82.0MB/s ± 1% 82.7MB/s ± 1% +0.85% (p=0.000 n=10+9)
EncodeTwainSpeed1e4-8 65.1MB/s ± 1% 65.6MB/s ± 0% +0.78% (p=0.000 n=10+9)
EncodeTwainSpeed1e5-8 80.0MB/s ± 0% 80.6MB/s ± 1% +0.66% (p=0.000 n=9+10)
EncodeTwainSpeed1e6-8 81.6MB/s ± 1% 82.1MB/s ± 1% +0.55% (p=0.017 n=10+10)

Input size in bytes, output size (and time taken) before and after on
some larger files:
1073741824 57269781 ( 3183ms) 57269781 ( 3177ms) adresser.001
1000000000 391052000 ( 11071ms) 391051996 ( 11067ms) enwik9
1911399616 378679516 ( 13450ms) 378679514 ( 13079ms) gob-stream
8558382592 3972329193 ( 99962ms) 3972329193 ( 91290ms) rawstudio-mint14.tar
200000000 200015265 ( 776ms) 200015265 ( 774ms) sharnd.out

Thanks to Klaus Post for the original suggestion on cl/21021.

Change-Id: Ia4c63a8d1b92c67e1765ec5c3c8c69d289d9a6ce
Reviewed-on: https://go-review.googlesource.com/22604Reviewed-by: Russ Cox <rsc@golang.org>

1fb4e4de

cmd/compile/internal/gc: bv.go cleanup · 5edcff01

Dave Cheney authored Apr 29, 2016

Drive by gardening of bv.go.

- Unexport the Bvec type, it is not used outside internal/gc.
  (machine translated with gofmt -r)
- Removed unused constants and functions.
  (driven by cmd/unused)

Change-Id: I3433758ad4e62439f802f4b0ed306e67336d9aba
Reviewed-on: https://go-review.googlesource.com/22602
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

5edcff01

misc/cgo/testcarchive: fix C include path for darwin/arm · 94e523cb

Cherry Zhang authored Apr 27, 2016

After CL 22461, c-archive build on darwin/arm is by default compiled
with -shared and installed in pkg/darwin_arm_shared.

Fix build (2nd time...)

Change-Id: Ia2bb09bb6e1ebc9bc74f7570dd80c81d05eaf744
Reviewed-on: https://go-review.googlesource.com/22534Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

94e523cb

compress/flate: replace "Best Speed" with specialized version · d8b7bd6a

Nigel Tao authored Apr 26, 2016

This encoding algorithm, which prioritizes speed over output size, is
based on Snappy's LZ77-style encoder: github.com/golang/snappy

This commit keeps the diff between this package's encodeBestSpeed
function and and Snappy's encodeBlock function as small as possible (see
the diff below). Follow-up commits will improve this package's
performance and output size.

This package's speed benchmarks:

name                    old speed      new speed      delta
EncodeDigitsSpeed1e4-8  40.7MB/s ± 0%  73.0MB/s ± 0%   +79.18%  (p=0.008 n=5+5)
EncodeDigitsSpeed1e5-8  33.0MB/s ± 0%  77.3MB/s ± 1%  +134.04%  (p=0.008 n=5+5)
EncodeDigitsSpeed1e6-8  32.1MB/s ± 0%  82.1MB/s ± 0%  +156.18%  (p=0.008 n=5+5)
EncodeTwainSpeed1e4-8   42.1MB/s ± 0%  65.0MB/s ± 0%   +54.61%  (p=0.008 n=5+5)
EncodeTwainSpeed1e5-8   46.3MB/s ± 0%  80.0MB/s ± 0%   +72.81%  (p=0.008 n=5+5)
EncodeTwainSpeed1e6-8   47.3MB/s ± 0%  81.7MB/s ± 0%   +72.86%  (p=0.008 n=5+5)

Here's the milliseconds taken, before and after this commit, to compress
a number of test files:

Go's src/compress/testdata files:

     4          1 e.txt
     8          4 Mark.Twain-Tom.Sawyer.txt

github.com/golang/snappy's benchmark files:

     3          1 alice29.txt
    12          3 asyoulik.txt
     6          1 fireworks.jpeg
     1          1 geo.protodata
     1          0 html
     2          2 html_x_4
     6          3 kppkn.gtb
    11          4 lcet10.txt
     5          1 paper-100k.pdf
    14          6 plrabn12.txt
    17          6 urls.10K

Larger files linked to from
https://docs.google.com/spreadsheets/d/1VLxi-ac0BAtf735HyH3c1xRulbkYYUkFecKdLPH7NIQ/edit#gid=166102500

  2409       3182 adresser.001
 16757      11027 enwik9
 13764      12946 gob-stream
153978      74317 rawstudio-mint14.tar
  4371        770 sharnd.out

Output size is larger. In the table below, the first column is the input
size, the second column is the output size prior to this commit, the
third column is the output size after this commit.

    100003      47707      50006 e.txt
    387851     172707     182930 Mark.Twain-Tom.Sawyer.txt
    152089      62457      66705 alice29.txt
    125179      54503      57274 asyoulik.txt
    123093     122827     123108 fireworks.jpeg
    118588      18574      20558 geo.protodata
    102400      16601      17305 html
    409600      65506      70313 html_x_4
    184320      49007      50944 kppkn.gtb
    426754     166957     179355 lcet10.txt
    102400      82126      84937 paper-100k.pdf
    481861     218617     231988 plrabn12.txt
    702087     241774     258020 urls.10K
1073741824   43074110   57269781 adresser.001
1000000000  365772256  391052000 enwik9
1911399616  340364558  378679516 gob-stream
8558382592 3807229562 3972329193 rawstudio-mint14.tar
 200000000  200061040  200015265 sharnd.out

The diff between github.com/golang/snappy's encodeBlock function and
this commit's encodeBestSpeed function:

1c1,7
< func encodeBlock(dst, src []byte) (d int) {
---
> func encodeBestSpeed(dst []token, src []byte) []token {
> 	// This check isn't in the Snappy implementation, but there, the caller
> 	// instead of the callee handles this case.
> 	if len(src) < minNonLiteralBlockSize {
> 		return emitLiteral(dst, src)
> 	}
>
4c10
< 	// and len(src) <= maxBlockSize and maxBlockSize == 65536.
---
> 	// and len(src) <= maxStoreBlockSize and maxStoreBlockSize == 65535.
65c71
< 			if load32(src, s) == load32(src, candidate) {
---
> 			if s-candidate < maxOffset && load32(src, s) == load32(src, candidate) {
73c79
< 		d += emitLiteral(dst[d:], src[nextEmit:s])
---
> 		dst = emitLiteral(dst, src[nextEmit:s])
90c96
< 			// This is an inlined version of:
---
> 			// This is an inlined version of Snappy's:
93c99,103
< 			for i := candidate + 4; s < len(src) && src[i] == src[s]; i, s = i+1, s+1 {
---
> 			s1 := base + maxMatchLength
> 			if s1 > len(src) {
> 				s1 = len(src)
> 			}
> 			for i := candidate + 4; s < s1 && src[i] == src[s]; i, s = i+1, s+1 {
96c106,107
< 			d += emitCopy(dst[d:], base-candidate, s-base)
---
> 			// matchToken is flate's equivalent of Snappy's emitCopy.
> 			dst = append(dst, matchToken(uint32(s-base-3), uint32(base-candidate-minOffsetSize)))
114c125
< 			if uint32(x>>8) != load32(src, candidate) {
---
> 			if s-candidate >= maxOffset || uint32(x>>8) != load32(src, candidate) {
124c135
< 		d += emitLiteral(dst[d:], src[nextEmit:])
---
> 		dst = emitLiteral(dst, src[nextEmit:])
126c137
< 	return d
---
> 	return dst

This change is based on https://go-review.googlesource.com/#/c/21021/ by
Klaus Post, but it is a separate changelist as cl/21021 seems to have
stalled in code review, and the Go 1.7 feature freeze approaches.

Golang-dev discussion:
https://groups.google.com/d/topic/golang-dev/XYgHX9p8IOk/discussion and
of course cl/21021.

Change-Id: Ib662439417b3bd0b61c2977c12c658db3e44d164
Reviewed-on: https://go-review.googlesource.com/22370Reviewed-by: Russ Cox <rsc@golang.org>

d8b7bd6a

[dev.garbage] runtime: eliminate mspan.start · 3e246238

Austin Clements authored Apr 28, 2016

This converts all remaining uses of mspan.start to instead use
mspan.base(). In many cases, this actually reduces the complexity of
the code.

Change-Id: If113840e00d3345a6cf979637f6a152e6344aee7
Reviewed-on: https://go-review.googlesource.com/22590Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>

3e246238

[dev.garbage] runtime: use s.base() everywhere it makes sense · b7adc41f

Austin Clements authored Apr 28, 2016

Currently we have lots of (s.start << _PageShift) and variants. We now
have an s.base() function that returns this. It's faster and more
readable, so use it.

Change-Id: I888060a9dae15ea75ca8cc1c2b31c905e71b452b
Reviewed-on: https://go-review.googlesource.com/22559Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>

b7adc41f