1. 30 Apr, 2016 5 commits
    • Austin Clements's avatar
      runtime: reclaim scan/dead bit in first word · a20fd1f6
      Austin Clements authored
      With the switch to separate mark bitmaps, the scan/dead bit for the
      first word of each object is now unused. Reclaim this bit and use it
      as a scan/dead bit, just like words three and on. The second word is
      still used for checkmark.
      
      This dramatically simplifies heapBitsSetTypeNoScan and hasPointers,
      since they no longer need different cases for 1, 2, and 3+ word
      objects. They can instead just manipulate the heap bitmap for the
      first word and be done with it.
      
      In order to enable this, we change heapBitsSetType and runGCProg to
      always set the scan/dead bit to scan for the first word on every code
      path. Since these functions only apply to types that have pointers,
      there's no need to do this conditionally: it's *always* necessary to
      set the scan bit in the first word.
      
      We also change every place that scans an object and checks if there
      are more pointers. Rather than only checking morePointers if the word
      is >= 2, we now check morePointers if word != 1 (since that's the
      checkmark word).
      
      Looking forward, we should probably reclaim the checkmark bit, too,
      but that's going to be quite a bit more work.
      
      Tested by setting doubleCheck in heapBitsSetType and running all.bash
      on both linux/amd64 and linux/386, and by running GOGC=10 all.bash.
      
      This particularly improves the FmtFprintf* go1 benchmarks, since they
      do a large amount of noscan allocation.
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.34s ± 1%     2.38s ± 1%  +1.70%  (p=0.000 n=17+19)
      Fannkuch11-12                2.09s ± 0%     2.09s ± 1%    ~     (p=0.276 n=17+16)
      FmtFprintfEmpty-12          44.9ns ± 2%    44.8ns ± 2%    ~     (p=0.340 n=19+18)
      FmtFprintfString-12          127ns ± 0%     125ns ± 0%  -1.57%  (p=0.000 n=16+15)
      FmtFprintfInt-12             128ns ± 0%     122ns ± 1%  -4.45%  (p=0.000 n=15+20)
      FmtFprintfIntInt-12          207ns ± 1%     193ns ± 0%  -6.55%  (p=0.000 n=19+14)
      FmtFprintfPrefixedInt-12     197ns ± 1%     191ns ± 0%  -2.93%  (p=0.000 n=17+18)
      FmtFprintfFloat-12           263ns ± 0%     248ns ± 1%  -5.88%  (p=0.000 n=15+19)
      FmtManyArgs-12               794ns ± 0%     779ns ± 1%  -1.90%  (p=0.000 n=18+18)
      GobDecode-12                7.14ms ± 2%    7.11ms ± 1%    ~     (p=0.072 n=20+20)
      GobEncode-12                5.85ms ± 1%    5.82ms ± 1%  -0.49%  (p=0.000 n=20+20)
      Gzip-12                      218ms ± 1%     215ms ± 1%  -1.22%  (p=0.000 n=19+19)
      Gunzip-12                   36.8ms ± 0%    36.7ms ± 0%  -0.18%  (p=0.006 n=18+20)
      HTTPClientServer-12         77.1µs ± 4%    77.1µs ± 3%    ~     (p=0.945 n=19+20)
      JSONEncode-12               15.6ms ± 1%    15.9ms ± 1%  +1.68%  (p=0.000 n=18+20)
      JSONDecode-12               55.2ms ± 1%    53.6ms ± 1%  -2.93%  (p=0.000 n=17+19)
      Mandelbrot200-12            4.05ms ± 1%    4.05ms ± 0%    ~     (p=0.306 n=17+17)
      GoParse-12                  3.14ms ± 1%    3.10ms ± 1%  -1.31%  (p=0.000 n=19+18)
      RegexpMatchEasy0_32-12      69.3ns ± 1%    70.0ns ± 0%  +0.89%  (p=0.000 n=19+17)
      RegexpMatchEasy0_1K-12       237ns ± 1%     236ns ± 0%  -0.62%  (p=0.000 n=19+16)
      RegexpMatchEasy1_32-12      69.5ns ± 1%    70.3ns ± 1%  +1.14%  (p=0.000 n=18+17)
      RegexpMatchEasy1_1K-12       377ns ± 1%     366ns ± 1%  -3.03%  (p=0.000 n=15+19)
      RegexpMatchMedium_32-12      107ns ± 1%     107ns ± 2%    ~     (p=0.318 n=20+19)
      RegexpMatchMedium_1K-12     33.8µs ± 3%    33.5µs ± 1%  -1.04%  (p=0.001 n=20+19)
      RegexpMatchHard_32-12       1.68µs ± 1%    1.73µs ± 0%  +2.50%  (p=0.000 n=20+18)
      RegexpMatchHard_1K-12       50.8µs ± 1%    52.0µs ± 1%  +2.50%  (p=0.000 n=19+18)
      Revcomp-12                   381ms ± 1%     385ms ± 1%  +1.00%  (p=0.000 n=17+18)
      Template-12                 64.9ms ± 3%    62.6ms ± 1%  -3.55%  (p=0.000 n=19+18)
      TimeParse-12                 324ns ± 0%     328ns ± 1%  +1.25%  (p=0.000 n=18+18)
      TimeFormat-12                345ns ± 0%     334ns ± 0%  -3.31%  (p=0.000 n=15+17)
      [Geo mean]                  52.1µs         51.5µs       -1.00%
      
      Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049
      Reviewed-on: https://go-review.googlesource.com/22632Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a20fd1f6
    • Austin Clements's avatar
      runtime: use morePointers and isPointer in more places · d5e3d08b
      Austin Clements authored
      This makes this code better self-documenting and makes it easier to
      find these places in the future.
      
      Change-Id: I31dc5598ae67f937fb9ef26df92fd41d01e983c3
      Reviewed-on: https://go-review.googlesource.com/22631Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      d5e3d08b
    • Austin Clements's avatar
      runtime: avoid conditional execution in morePointers and isPointer · a5d3f7ec
      Austin Clements authored
      heapBits.bits is carefully written to produce good machine code. Use
      it in heapBits.morePointers and heapBits.isPointer to get good machine
      code there, too.
      
      Change-Id: I208c7d0d38697e7a22cad67f692162589b75f1e2
      Reviewed-on: https://go-review.googlesource.com/22630Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a5d3f7ec
    • Keith Randall's avatar
      cmd/compile: ecx is reserved for PIC, don't let peep work on it · 7a60a962
      Keith Randall authored
      Fixes #15496
      
      Change-Id: Ieb5be1caa4b1c23e23b20d56c1a0a619032a9f5d
      Reviewed-on: https://go-review.googlesource.com/22652Reviewed-by: 's avatarJosh Bleecher Snyder <josharian@gmail.com>
      7a60a962
    • Michael Munday's avatar
      runtime: fix cgocallback_gofunc on ppc64x · 58f52cbb
      Michael Munday authored
      Fix issues introduced in 5f9a870b.
      
      Change-Id: Ia75945ef563956613bf88bbe57800a96455c265d
      Reviewed-on: https://go-review.googlesource.com/22661Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      58f52cbb
  2. 29 Apr, 2016 29 commits
    • Ian Lance Taylor's avatar
      runtime: fix cgocallback_gofunc argument passing on arm64 · 9fe572e5
      Ian Lance Taylor authored
      Change-Id: I4b34bcd5cde71ecfbb352b39c4231de6168cc7f3
      Reviewed-on: https://go-review.googlesource.com/22651
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMichael Munday <munday@ca.ibm.com>
      9fe572e5
    • Matthew Dempsky's avatar
      root: remove dev.garbage file · 36b6c038
      Matthew Dempsky authored
      Change-Id: I99b2ca52824341d986090f5c78ab4f396594bcdf
      Reviewed-on: https://go-review.googlesource.com/22660Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      36b6c038
    • Ian Lance Taylor's avatar
      cmd/cgo, runtime, runtime/cgo: use cgo context function · 5f9a870b
      Ian Lance Taylor authored
      Add support for the context function set by runtime.SetCgoTraceback.
      The context function was added in CL 17761, without support.
      This CL is the support.
      
      This CL has not been tested for real C code, as a working context
      function for C code requires unwind support that does not seem to exist.
      I wanted to get the CL out before the freeze.
      
      I apologize for the length of this CL.  It's mostly plumbing, but
      unfortunately the plumbing is processor-specific.
      
      Change-Id: I8ce11a0de9b3dafcc29efd2649d776e93bff0e90
      Reviewed-on: https://go-review.googlesource.com/22508Reviewed-by: 's avatarAustin Clements <austin@google.com>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      5f9a870b
    • Michael Munday's avatar
      crypto/cipher, crypto/aes: add s390x implementation of AES-CTR · c717675c
      Michael Munday authored
      This commit adds the new 'ctrAble' interface to the crypto/cipher
      package. The role of ctrAble is the same as gcmAble but for CTR
      instead of GCM. It allows block ciphers to provide optimized CTR
      implementations.
      
      The primary benefit of adding CTR support to the s390x AES
      implementation is that it allows us to encrypt the counter values
      in bulk, giving the cipher message instruction a larger chunk of
      data to work on per invocation.
      
      The xorBytes assembly is necessary because xorBytes becomes a
      bottleneck when CTR is done in this way. Hopefully it will be
      possible to remove this once s390x has migrated to the ssa
      backend.
      
      name      old speed     new speed     delta
      AESCTR1K  160MB/s ± 6%  867MB/s ± 0%  +442.42%  (p=0.000 n=9+10)
      
      Change-Id: I1ae16b0ce0e2641d2bdc7d7eabc94dd35f6e9318
      Reviewed-on: https://go-review.googlesource.com/22195Reviewed-by: 's avatarAdam Langley <agl@golang.org>
      c717675c
    • Michael Munday's avatar
      crypto/cipher, crypto/aes: add s390x implementation of AES-CBC · 2f847564
      Michael Munday authored
      This commit adds the cbcEncAble and cbcDecAble interfaces that
      can be implemented by block ciphers that support an optimized
      implementation of CBC. This is similar to what is done for GCM
      with the gcmAble interface.
      
      The cbcEncAble, cbcDecAble and gcmAble interfaces all now have
      tests to ensure they are detected correctly in the cipher
      package.
      
      name             old speed     new speed      delta
      AESCBCEncrypt1K  152MB/s ± 1%  1362MB/s ± 0%  +795.59%   (p=0.000 n=10+9)
      AESCBCDecrypt1K  143MB/s ± 1%  1362MB/s ± 0%  +853.00%   (p=0.000 n=10+9)
      
      Change-Id: I715f686ab3686b189a3dac02f86001178fa60580
      Reviewed-on: https://go-review.googlesource.com/22523
      Run-TryBot: Michael Munday <munday@ca.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAdam Langley <agl@golang.org>
      2f847564
    • Keith Randall's avatar
      cmd/compile: make vet happy with ssa code · cd956576
      Keith Randall authored
      Fixes #15488
      
      Change-Id: I054eb1e1c859de315e3cdbdef5428682bce693fd
      Reviewed-on: https://go-review.googlesource.com/22609
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      cd956576
    • Rick Hudson's avatar
      Merge remote-tracking branch 'origin/dev.garbage' · 56b54912
      Rick Hudson authored
      This commit moves the GC from free list allocation to
      bit mark allocation. Instead of using the bitmaps
      generated during the mark phases to generate free
      list and then using the free lists for allocation we
      allocate directly from the bitmaps.
      
      The change in the garbage benchmark
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  2.22ms ± 1%  2.13ms ± 1%  -3.90%  (p=0.000 n=18+18)
      
      Change-Id: I17f57233336f0ca5ef5404c3be4ecb443ab622aa
      56b54912
    • Rick Hudson's avatar
      [dev.garbage] runtime: simplify nextFreeFast so it is inlined · e9eaa181
      Rick Hudson authored
      nextFreeFast is currently not inlined by the compiler due
      to its size and complexity. This CL simplifies
      nextFreeFast by letting the slow path handle (nextFree)
      handle a corner cases.
      
      Change-Id: Ia9c5d1a7912bcb4bec072f5fd240f0e0bafb20e4
      Reviewed-on: https://go-review.googlesource.com/22598Reviewed-by: 's avatarAustin Clements <austin@google.com>
      Run-TryBot: Austin Clements <austin@google.com>
      e9eaa181
    • David Chase's avatar
      cmd/compile: Move divconst_test out of test/bench/go1 · d8d33514
      David Chase authored
      This is necessary to avoid disrupting the go1 suite and gives
      us a place to put other tests of basic compiler function and
      correctness.
      
      Change-Id: I36933819ff2bfe6a2121fff2be9a98efd2123d9a
      Reviewed-on: https://go-review.googlesource.com/22597
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      d8d33514
    • Keith Randall's avatar
      cmd/compile: clean up rewrite rules · fa9435cd
      Keith Randall authored
      Break really long lines.
      Add spacing to line up columns.
      
      In AMD64, put all the optimization rules after all the
      lowering rules.
      
      Change-Id: I45cc7368bf278416e67f89e74358db1bd4326a93
      Reviewed-on: https://go-review.googlesource.com/22470Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      fa9435cd
    • Austin Clements's avatar
      [dev.garbage] runtime: revive sweep fast path · b3579c09
      Austin Clements authored
      sweep used to skip mcental.freeSpan (and its locking) if it didn't
      find any new free objects. We lost that optimization when the
      freed-object counting changed in dad83f7 to count total free objects
      instead of newly freed objects.
      
      The previous commit brings back counting of newly freed objects, so we
      can easily revive this optimization by checking that count (like we
      used to) instead of the total free objects count.
      
      Change-Id: I43658707a1c61674d0366124d5976b00d98741a9
      Reviewed-on: https://go-review.googlesource.com/22596
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      b3579c09
    • Austin Clements's avatar
      [dev.garbage] runtime: fix nfree accounting · d97625ae
      Austin Clements authored
      Commit 8dda1c4c changed the meaning of "nfree" in sweep from the number
      of newly freed objects to the total number of free objects in the
      span, but didn't update where sweep added nfree to c.local_nsmallfree.
      Hence, we're over-accounting the number of frees. This is causing
      TestArrayHash to fail with "too many allocs NNN - hash not balanced".
      
      Fix this by computing the number of newly freed objects and adding
      that to c.local_nsmallfree, so it behaves like it used to. Computing
      this requires a small tweak to mallocgc: apparently we've never set
      s.allocCount when allocating a large object; fix this by setting it to
      1 so sweep doesn't get confused.
      
      Change-Id: I31902ffd310110da4ffd807c5c06f1117b872dc8
      Reviewed-on: https://go-review.googlesource.com/22595Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      d97625ae
    • Austin Clements's avatar
      [dev.garbage] runtime: fix allocfreetrace · 6d114905
      Austin Clements authored
      We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode
      when we removed the sweep over the mark bitmap. Fix it by
      re-introducing the sweep over the bitmap specifically if we're in
      allocfreetrace mode. This doesn't have to be even remotely efficient,
      since the overhead of allocfreetrace is huge anyway, so we can keep
      the code for this down to just a few lines.
      
      Change-Id: I9e176b3b04c73608a0ea3068d5d0cd30760ebd40
      Reviewed-on: https://go-review.googlesource.com/22592
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      6d114905
    • Austin Clements's avatar
      [dev.garbage] runtime: reintroduce no-zeroing optimization · 38f67468
      Austin Clements authored
      Currently we always zero objects when we allocate them. We used to
      have an optimization that would not zero objects that had not been
      allocated since the whole span was last zeroed (either by getting it
      from the system or by getting it from the heap, which does a bulk
      zero), but this depended on the sweeper clobbering the first two words
      of each object. Hence, we lost this optimization when the bitmap
      sweeper went away.
      
      Re-introduce this optimization using a different mechanism. Each span
      already keeps a flag indicating that it just came from the OS or was
      just bulk zeroed by the mheap. We can simply use this flag to know
      when we don't need to zero an object. This is slightly less efficient
      than the old optimization: if a span gets allocated and partially
      used, then GC happens and the span gets returned to the mcentral, then
      the span gets re-acquired, the old optimization knew that it only had
      to re-zero the objects that had been reclaimed, whereas this
      optimization will re-zero everything. However, in this case, you're
      already paying for the garbage collection, and you've only wasted one
      zeroing of the span, so in practice there seems to be little
      difference. (If we did want to revive the full optimization, each span
      could keep track of a frontier beyond which all free slots are zeroed.
      I prototyped this and it didn't obvious do any better than the much
      simpler approach in this commit.)
      
      This significantly improves BinaryTree17, which is allocation-heavy
      (and runs first, so most pages are already zeroed), and slightly
      improves everything else.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  2.15ms ± 1%  2.14ms ± 1%  -0.80%  (p=0.000 n=17+17)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.71s ± 1%     2.56s ± 1%  -5.73%        (p=0.000 n=18+19)
      DivconstI64-12              1.70ns ± 1%    1.70ns ± 1%    ~           (p=0.562 n=18+18)
      DivconstU64-12              1.74ns ± 2%    1.74ns ± 1%    ~           (p=0.394 n=20+20)
      DivconstI32-12              1.74ns ± 0%    1.74ns ± 0%    ~     (all samples are equal)
      DivconstU32-12              1.66ns ± 1%    1.66ns ± 0%    ~           (p=0.516 n=15+16)
      DivconstI16-12              1.84ns ± 0%    1.84ns ± 0%    ~     (all samples are equal)
      DivconstU16-12              1.82ns ± 0%    1.82ns ± 0%    ~     (all samples are equal)
      DivconstI8-12               1.79ns ± 0%    1.79ns ± 0%    ~     (all samples are equal)
      DivconstU8-12               1.60ns ± 0%    1.60ns ± 1%    ~           (p=0.603 n=17+19)
      Fannkuch11-12                2.11s ± 1%     2.11s ± 0%    ~           (p=0.333 n=16+19)
      FmtFprintfEmpty-12          45.1ns ± 4%    45.4ns ± 5%    ~           (p=0.111 n=20+20)
      FmtFprintfString-12          134ns ± 0%     129ns ± 0%  -3.45%        (p=0.000 n=18+16)
      FmtFprintfInt-12             131ns ± 1%     129ns ± 1%  -1.54%        (p=0.000 n=16+18)
      FmtFprintfIntInt-12          205ns ± 2%     203ns ± 0%  -0.56%        (p=0.014 n=20+18)
      FmtFprintfPrefixedInt-12     200ns ± 2%     197ns ± 1%  -1.48%        (p=0.000 n=20+18)
      FmtFprintfFloat-12           256ns ± 1%     256ns ± 0%  -0.21%        (p=0.008 n=18+20)
      FmtManyArgs-12               805ns ± 0%     804ns ± 0%  -0.19%        (p=0.001 n=18+18)
      GobDecode-12                7.21ms ± 1%    7.14ms ± 1%  -0.92%        (p=0.000 n=19+20)
      GobEncode-12                5.88ms ± 1%    5.88ms ± 1%    ~           (p=0.641 n=18+19)
      Gzip-12                      218ms ± 1%     218ms ± 1%    ~           (p=0.271 n=19+18)
      Gunzip-12                   37.1ms ± 0%    36.9ms ± 0%  -0.29%        (p=0.000 n=18+17)
      HTTPClientServer-12         78.1µs ± 2%    77.4µs ± 2%    ~           (p=0.070 n=19+19)
      JSONEncode-12               15.5ms ± 1%    15.5ms ± 0%    ~           (p=0.063 n=20+18)
      JSONDecode-12               56.1ms ± 0%    55.4ms ± 1%  -1.18%        (p=0.000 n=19+18)
      Mandelbrot200-12            4.05ms ± 0%    4.06ms ± 0%  +0.29%        (p=0.001 n=18+18)
      GoParse-12                  3.28ms ± 1%    3.21ms ± 1%  -2.30%        (p=0.000 n=20+20)
      RegexpMatchEasy0_32-12      69.4ns ± 2%    69.3ns ± 1%    ~           (p=0.205 n=18+16)
      RegexpMatchEasy0_1K-12       239ns ± 0%     239ns ± 0%    ~     (all samples are equal)
      RegexpMatchEasy1_32-12      69.4ns ± 1%    69.4ns ± 1%    ~           (p=0.620 n=15+18)
      RegexpMatchEasy1_1K-12       370ns ± 1%     369ns ± 2%    ~           (p=0.088 n=20+20)
      RegexpMatchMedium_32-12      108ns ± 0%     108ns ± 0%    ~     (all samples are equal)
      RegexpMatchMedium_1K-12     33.6µs ± 3%    33.5µs ± 3%    ~           (p=0.718 n=20+20)
      RegexpMatchHard_32-12       1.68µs ± 1%    1.67µs ± 2%    ~           (p=0.316 n=20+20)
      RegexpMatchHard_1K-12       50.5µs ± 3%    50.4µs ± 3%    ~           (p=0.659 n=20+20)
      Revcomp-12                   381ms ± 1%     381ms ± 1%    ~           (p=0.916 n=19+18)
      Template-12                 66.5ms ± 1%    65.8ms ± 2%  -1.08%        (p=0.000 n=20+20)
      TimeParse-12                 317ns ± 0%     319ns ± 0%  +0.48%        (p=0.000 n=19+12)
      TimeFormat-12                338ns ± 0%     338ns ± 0%    ~           (p=0.124 n=19+18)
      [Geo mean]                  5.99µs         5.96µs       -0.54%
      
      Change-Id: I638ffd9d9f178835bbfa499bac20bd7224f1a907
      Reviewed-on: https://go-review.googlesource.com/22591Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      38f67468
    • Nigel Tao's avatar
      compress/flate: use a constant hash table size for Best Speed. · 1fb4e4de
      Nigel Tao authored
      This makes compress/flate's version of Snappy diverge from the upstream
      golang/snappy version, but the latter has a goal of matching C++ snappy
      output byte-for-byte. Both C++ and the asm version of golang/snappy can
      use a smaller N for the O(N) zero-initialization of the hash table when
      the input is small, even if the pure Go golang/snappy algorithm cannot:
      "var table [tableSize]uint16" zeroes all tableSize elements.
      
      For this package, we don't have the match-C++-snappy goal, so we can use
      a different (constant) hash table size.
      
      This is a small win, in terms of throughput and output size, but it also
      enables us to re-use the (constant size) hash table between
      encodeBestSpeed calls, avoiding the cost of zero-initializing the hash
      table altogether. This will be implemented in follow-up commits.
      
      This package's benchmarks:
      name                    old speed      new speed      delta
      EncodeDigitsSpeed1e4-8  72.8MB/s ± 1%  73.5MB/s ± 1%  +0.86%  (p=0.000 n=10+10)
      EncodeDigitsSpeed1e5-8  77.5MB/s ± 1%  78.0MB/s ± 0%  +0.69%  (p=0.000 n=10+10)
      EncodeDigitsSpeed1e6-8  82.0MB/s ± 1%  82.7MB/s ± 1%  +0.85%   (p=0.000 n=10+9)
      EncodeTwainSpeed1e4-8   65.1MB/s ± 1%  65.6MB/s ± 0%  +0.78%   (p=0.000 n=10+9)
      EncodeTwainSpeed1e5-8   80.0MB/s ± 0%  80.6MB/s ± 1%  +0.66%   (p=0.000 n=9+10)
      EncodeTwainSpeed1e6-8   81.6MB/s ± 1%  82.1MB/s ± 1%  +0.55%  (p=0.017 n=10+10)
      
      Input size in bytes, output size (and time taken) before and after on
      some larger files:
      1073741824   57269781 (  3183ms)   57269781 (  3177ms) adresser.001
      1000000000  391052000 ( 11071ms)  391051996 ( 11067ms) enwik9
      1911399616  378679516 ( 13450ms)  378679514 ( 13079ms) gob-stream
      8558382592 3972329193 ( 99962ms) 3972329193 ( 91290ms) rawstudio-mint14.tar
       200000000  200015265 (   776ms)  200015265 (   774ms) sharnd.out
      
      Thanks to Klaus Post for the original suggestion on cl/21021.
      
      Change-Id: Ia4c63a8d1b92c67e1765ec5c3c8c69d289d9a6ce
      Reviewed-on: https://go-review.googlesource.com/22604Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      1fb4e4de
    • Dave Cheney's avatar
      cmd/compile/internal/gc: bv.go cleanup · 5edcff01
      Dave Cheney authored
      Drive by gardening of bv.go.
      
      - Unexport the Bvec type, it is not used outside internal/gc.
        (machine translated with gofmt -r)
      - Removed unused constants and functions.
        (driven by cmd/unused)
      
      Change-Id: I3433758ad4e62439f802f4b0ed306e67336d9aba
      Reviewed-on: https://go-review.googlesource.com/22602
      Run-TryBot: Dave Cheney <dave@cheney.net>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      5edcff01
    • Cherry Zhang's avatar
      misc/cgo/testcarchive: fix C include path for darwin/arm · 94e523cb
      Cherry Zhang authored
      After CL 22461, c-archive build on darwin/arm is by default compiled
      with -shared and installed in pkg/darwin_arm_shared.
      
      Fix build (2nd time...)
      
      Change-Id: Ia2bb09bb6e1ebc9bc74f7570dd80c81d05eaf744
      Reviewed-on: https://go-review.googlesource.com/22534Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      Reviewed-by: 's avatarDavid Crawshaw <crawshaw@golang.org>
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      94e523cb
    • Nigel Tao's avatar
      compress/flate: replace "Best Speed" with specialized version · d8b7bd6a
      Nigel Tao authored
      This encoding algorithm, which prioritizes speed over output size, is
      based on Snappy's LZ77-style encoder: github.com/golang/snappy
      
      This commit keeps the diff between this package's encodeBestSpeed
      function and and Snappy's encodeBlock function as small as possible (see
      the diff below). Follow-up commits will improve this package's
      performance and output size.
      
      This package's speed benchmarks:
      
      name                    old speed      new speed      delta
      EncodeDigitsSpeed1e4-8  40.7MB/s ± 0%  73.0MB/s ± 0%   +79.18%  (p=0.008 n=5+5)
      EncodeDigitsSpeed1e5-8  33.0MB/s ± 0%  77.3MB/s ± 1%  +134.04%  (p=0.008 n=5+5)
      EncodeDigitsSpeed1e6-8  32.1MB/s ± 0%  82.1MB/s ± 0%  +156.18%  (p=0.008 n=5+5)
      EncodeTwainSpeed1e4-8   42.1MB/s ± 0%  65.0MB/s ± 0%   +54.61%  (p=0.008 n=5+5)
      EncodeTwainSpeed1e5-8   46.3MB/s ± 0%  80.0MB/s ± 0%   +72.81%  (p=0.008 n=5+5)
      EncodeTwainSpeed1e6-8   47.3MB/s ± 0%  81.7MB/s ± 0%   +72.86%  (p=0.008 n=5+5)
      
      Here's the milliseconds taken, before and after this commit, to compress
      a number of test files:
      
      Go's src/compress/testdata files:
      
           4          1 e.txt
           8          4 Mark.Twain-Tom.Sawyer.txt
      
      github.com/golang/snappy's benchmark files:
      
           3          1 alice29.txt
          12          3 asyoulik.txt
           6          1 fireworks.jpeg
           1          1 geo.protodata
           1          0 html
           2          2 html_x_4
           6          3 kppkn.gtb
          11          4 lcet10.txt
           5          1 paper-100k.pdf
          14          6 plrabn12.txt
          17          6 urls.10K
      
      Larger files linked to from
      https://docs.google.com/spreadsheets/d/1VLxi-ac0BAtf735HyH3c1xRulbkYYUkFecKdLPH7NIQ/edit#gid=166102500
      
        2409       3182 adresser.001
       16757      11027 enwik9
       13764      12946 gob-stream
      153978      74317 rawstudio-mint14.tar
        4371        770 sharnd.out
      
      Output size is larger. In the table below, the first column is the input
      size, the second column is the output size prior to this commit, the
      third column is the output size after this commit.
      
          100003      47707      50006 e.txt
          387851     172707     182930 Mark.Twain-Tom.Sawyer.txt
          152089      62457      66705 alice29.txt
          125179      54503      57274 asyoulik.txt
          123093     122827     123108 fireworks.jpeg
          118588      18574      20558 geo.protodata
          102400      16601      17305 html
          409600      65506      70313 html_x_4
          184320      49007      50944 kppkn.gtb
          426754     166957     179355 lcet10.txt
          102400      82126      84937 paper-100k.pdf
          481861     218617     231988 plrabn12.txt
          702087     241774     258020 urls.10K
      1073741824   43074110   57269781 adresser.001
      1000000000  365772256  391052000 enwik9
      1911399616  340364558  378679516 gob-stream
      8558382592 3807229562 3972329193 rawstudio-mint14.tar
       200000000  200061040  200015265 sharnd.out
      
      The diff between github.com/golang/snappy's encodeBlock function and
      this commit's encodeBestSpeed function:
      
      1c1,7
      < func encodeBlock(dst, src []byte) (d int) {
      ---
      > func encodeBestSpeed(dst []token, src []byte) []token {
      > 	// This check isn't in the Snappy implementation, but there, the caller
      > 	// instead of the callee handles this case.
      > 	if len(src) < minNonLiteralBlockSize {
      > 		return emitLiteral(dst, src)
      > 	}
      >
      4c10
      < 	// and len(src) <= maxBlockSize and maxBlockSize == 65536.
      ---
      > 	// and len(src) <= maxStoreBlockSize and maxStoreBlockSize == 65535.
      65c71
      < 			if load32(src, s) == load32(src, candidate) {
      ---
      > 			if s-candidate < maxOffset && load32(src, s) == load32(src, candidate) {
      73c79
      < 		d += emitLiteral(dst[d:], src[nextEmit:s])
      ---
      > 		dst = emitLiteral(dst, src[nextEmit:s])
      90c96
      < 			// This is an inlined version of:
      ---
      > 			// This is an inlined version of Snappy's:
      93c99,103
      < 			for i := candidate + 4; s < len(src) && src[i] == src[s]; i, s = i+1, s+1 {
      ---
      > 			s1 := base + maxMatchLength
      > 			if s1 > len(src) {
      > 				s1 = len(src)
      > 			}
      > 			for i := candidate + 4; s < s1 && src[i] == src[s]; i, s = i+1, s+1 {
      96c106,107
      < 			d += emitCopy(dst[d:], base-candidate, s-base)
      ---
      > 			// matchToken is flate's equivalent of Snappy's emitCopy.
      > 			dst = append(dst, matchToken(uint32(s-base-3), uint32(base-candidate-minOffsetSize)))
      114c125
      < 			if uint32(x>>8) != load32(src, candidate) {
      ---
      > 			if s-candidate >= maxOffset || uint32(x>>8) != load32(src, candidate) {
      124c135
      < 		d += emitLiteral(dst[d:], src[nextEmit:])
      ---
      > 		dst = emitLiteral(dst, src[nextEmit:])
      126c137
      < 	return d
      ---
      > 	return dst
      
      This change is based on https://go-review.googlesource.com/#/c/21021/ by
      Klaus Post, but it is a separate changelist as cl/21021 seems to have
      stalled in code review, and the Go 1.7 feature freeze approaches.
      
      Golang-dev discussion:
      https://groups.google.com/d/topic/golang-dev/XYgHX9p8IOk/discussion and
      of course cl/21021.
      
      Change-Id: Ib662439417b3bd0b61c2977c12c658db3e44d164
      Reviewed-on: https://go-review.googlesource.com/22370Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      d8b7bd6a
    • Austin Clements's avatar
      [dev.garbage] runtime: eliminate mspan.start · 3e246238
      Austin Clements authored
      This converts all remaining uses of mspan.start to instead use
      mspan.base(). In many cases, this actually reduces the complexity of
      the code.
      
      Change-Id: If113840e00d3345a6cf979637f6a152e6344aee7
      Reviewed-on: https://go-review.googlesource.com/22590Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      3e246238
    • Austin Clements's avatar
      [dev.garbage] runtime: use s.base() everywhere it makes sense · b7adc41f
      Austin Clements authored
      Currently we have lots of (s.start << _PageShift) and variants. We now
      have an s.base() function that returns this. It's faster and more
      readable, so use it.
      
      Change-Id: I888060a9dae15ea75ca8cc1c2b31c905e71b452b
      Reviewed-on: https://go-review.googlesource.com/22559Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      b7adc41f
    • Austin Clements's avatar
      [dev.garbage] runtime: document sysAlloc · 2e8b74b6
      Austin Clements authored
      In particular, it always returns an aligned pointer.
      
      Change-Id: I763789a539a4bfd8b0efb36a39a80be1a479d3e2
      Reviewed-on: https://go-review.googlesource.com/22558Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      2e8b74b6
    • Austin Clements's avatar
      [dev.garbage] runtime: remove unused head/end arguments from freeSpan · 15744c92
      Austin Clements authored
      These used to be used for the list of newly freed objects, but that's
      no longer a thing.
      
      Change-Id: I5a4503137b74ec0eae5372ca271b1aa0b32df074
      Reviewed-on: https://go-review.googlesource.com/22557Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      15744c92
    • Brad Fitzpatrick's avatar
      context: produce a nicer panic message for a nil WithValue key · c884f659
      Brad Fitzpatrick authored
      Change-Id: I2e8ae403622ba7131cadaba506100d79613183f1
      Reviewed-on: https://go-review.googlesource.com/22601Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      Reviewed-by: 's avatarAndrew Gerrand <adg@golang.org>
      c884f659
    • Alex Brainman's avatar
      debug/pe: .bss section must contain only zeros · 694846a5
      Alex Brainman authored
      .bss section has no data stored in PE file. But when .bss section data
      is used by the linker it is assumed that its every byte is set to zero.
      (*Section).Data returns garbage at this moment. Change (*Section).Data
      so it returns slice filled with 0s.
      
      Updates #15345
      
      Change-Id: I1fa5138244a9447e1d59dec24178b1dd0fd4c5d7
      Reviewed-on: https://go-review.googlesource.com/22544Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      694846a5
    • Robert Griesemer's avatar
      test: added test case for (fixed) issue 15470 · d954f9c4
      Robert Griesemer authored
      Follow-up to https://golang.org/cl/22543.
      
      Change-Id: I873b4fa6616ac2aea8faada2fccd126233bbc07f
      Reviewed-on: https://go-review.googlesource.com/22583
      Run-TryBot: Robert Griesemer <gri@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      d954f9c4
    • Russ Cox's avatar
      cmd/go, go/build: add support for binary-only packages · af6aa0fd
      Russ Cox authored
      See https://golang.org/design/2775-binary-only-packages for design.
      
      Fixes #2775.
      
      Change-Id: I33e74eebffadc14d3340bba96083af0dec5172d5
      Reviewed-on: https://go-review.googlesource.com/22433Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      af6aa0fd
    • Nigel Tao's avatar
      image/gif: accept an out-of-bounds transparent color index. · 4618dd87
      Nigel Tao authored
      This is an error according to the spec, but Firefox and Google Chrome
      seem OK with this.
      
      Fixes #15059.
      
      Change-Id: I841cf44e96655e91a2481555f38fbd7055a32202
      Reviewed-on: https://go-review.googlesource.com/22546Reviewed-by: 's avatarRob Pike <r@golang.org>
      4618dd87
    • Rick Hudson's avatar
      [dev.garbage] runtime: use sys.Ctz64 intrinsic · 2fb75ea6
      Rick Hudson authored
      Our compilers now provides instrinsics including
      sys.Ctz64 that support CTZ (count trailing zero)
      instructions. This CL replaces the Go versions
      of CTZ with the compiler intrinsic.
      
      Count trailing zeros CTZ finds the least
      significant 1 in a word and returns the number
      of less significant 0s in the word.
      
      Allocation uses the bitmap created by the garbage
      collector to locate an unmarked object. The logic
      takes a word of the bitmap, complements, and then
      caches it. It then uses CTZ to locate an available
      unmarked object. It then shifts marked bits out of
      the bitmap word preparing it for the next search.
      Once all the unmarked objects are used in the
      cached work the bitmap gets another word and
      repeats the process.
      
      Change-Id: Id2fc42d1d4b9893efaa2e1bd01896985b7e42f82
      Reviewed-on: https://go-review.googlesource.com/21366Reviewed-by: 's avatarAustin Clements <austin@google.com>
      2fb75ea6
    • Rick Hudson's avatar
      [dev.garbage] runtime: restructure alloc and mark bits · 2063d5d9
      Rick Hudson authored
      Two changes are included here that are dependent on the other.
      The first is that allocBits and gcamrkBits are changed to
      a *uint8 which points to the first byte of that span's
      mark and alloc bits. Several places were altered to
      perform pointer arithmetic to locate the byte corresponding
      to an object in the span. The actual bit corresponding
      to an object is indexed in the byte by using the lower three
      bits of the objects index.
      
      The second change avoids the redundant calculation of an
      object's index. The index is returned from heapBitsForObject
      and then used by the functions indexing allocBits
      and gcmarkBits.
      
      Finally we no longer allocate the gc bits in the span
      structures. Instead we use an arena based allocation scheme
      that allows for a more compact bit map as well as recycling
      and bulk clearing of the mark bits.
      
      Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef
      Reviewed-on: https://go-review.googlesource.com/20705Reviewed-by: 's avatarAustin Clements <austin@google.com>
      2063d5d9
  3. 28 Apr, 2016 6 commits