1. 24 Jan, 2014 6 commits
    • Dmitriy Vyukov's avatar
      runtime: combine small NoScan allocations · 1fa70294
      Dmitriy Vyukov authored
      Combine NoScan allocations < 16 bytes into a single memory block.
      Reduces number of allocations on json/garbage benchmarks by 10+%.
      
      json-1
      allocated                 8039872      7949194      -1.13%
      allocs                     105774        93776     -11.34%
      cputime                 156200000    100700000     -35.53%
      gc-pause-one              4908873      3814853     -22.29%
      gc-pause-total            2748969      2899288      +5.47%
      rss                      52674560     43560960     -17.30%
      sys-gc                    3796976      3256304     -14.24%
      sys-heap                 43843584     35192832     -19.73%
      sys-other                 5589312      5310784      -4.98%
      sys-stack                  393216       393216      +0.00%
      sys-total                53623088     44153136     -17.66%
      time                    156193436    100886714     -35.41%
      virtual-mem             256548864    256540672      -0.00%
      
      garbage-1
      allocated                 2996885      2932982      -2.13%
      allocs                      62904        55200     -12.25%
      cputime                  17470000     17400000      -0.40%
      gc-pause-one            932757485    925806143      -0.75%
      gc-pause-total            4663787      4629030      -0.75%
      rss                    1151074304   1133670400      -1.51%
      sys-gc                   66068352     65085312      -1.49%
      sys-heap               1039728640   1024065536      -1.51%
      sys-other                38038208     37485248      -1.45%
      sys-stack                 8650752      8781824      +1.52%
      sys-total              1152485952   1135417920      -1.48%
      time                     17478088     17418005      -0.34%
      virtual-mem            1343709184   1324204032      -1.45%
      
      LGTM=iant, bradfitz
      R=golang-codereviews, dave, iant, rsc, bradfitz
      CC=golang-codereviews, khr
      https://golang.org/cl/38750047
      1fa70294
    • Dmitriy Vyukov's avatar
      sync: scalable Pool · f8e0057b
      Dmitriy Vyukov authored
      Introduce fixed-size P-local caches.
      When local caches overflow/underflow a batch of items
      is transferred to/from global mutex-protected cache.
      
      benchmark                    old ns/op    new ns/op    delta
      BenchmarkPool                    50554        22423  -55.65%
      BenchmarkPool-4                 400359         5904  -98.53%
      BenchmarkPool-16                403311         1598  -99.60%
      BenchmarkPool-32                367310         1526  -99.58%
      
      BenchmarkPoolOverlflow            5214         3633  -30.32%
      BenchmarkPoolOverlflow-4         42663         9539  -77.64%
      BenchmarkPoolOverlflow-8         46919        11385  -75.73%
      BenchmarkPoolOverlflow-16        39454        13048  -66.93%
      
      BenchmarkSprintfEmpty                    84           63  -25.68%
      BenchmarkSprintfEmpty-2                 371           32  -91.13%
      BenchmarkSprintfEmpty-4                 465           22  -95.25%
      BenchmarkSprintfEmpty-8                 565           12  -97.77%
      BenchmarkSprintfEmpty-16                498            5  -98.87%
      BenchmarkSprintfEmpty-32                492            4  -99.04%
      
      BenchmarkSprintfString                  259          229  -11.58%
      BenchmarkSprintfString-2                574          144  -74.91%
      BenchmarkSprintfString-4                651           77  -88.05%
      BenchmarkSprintfString-8                868           47  -94.48%
      BenchmarkSprintfString-16               825           33  -95.96%
      BenchmarkSprintfString-32               825           30  -96.28%
      
      BenchmarkSprintfInt                     213          188  -11.74%
      BenchmarkSprintfInt-2                   448          138  -69.20%
      BenchmarkSprintfInt-4                   624           52  -91.63%
      BenchmarkSprintfInt-8                   691           31  -95.43%
      BenchmarkSprintfInt-16                  724           18  -97.46%
      BenchmarkSprintfInt-32                  718           16  -97.70%
      
      BenchmarkSprintfIntInt                  311          282   -9.32%
      BenchmarkSprintfIntInt-2                333          145  -56.46%
      BenchmarkSprintfIntInt-4                642          110  -82.87%
      BenchmarkSprintfIntInt-8                832           42  -94.90%
      BenchmarkSprintfIntInt-16               817           24  -97.00%
      BenchmarkSprintfIntInt-32               805           22  -97.17%
      
      BenchmarkSprintfPrefixedInt             309          269  -12.94%
      BenchmarkSprintfPrefixedInt-2           245          168  -31.43%
      BenchmarkSprintfPrefixedInt-4           598           99  -83.36%
      BenchmarkSprintfPrefixedInt-8           770           67  -91.23%
      BenchmarkSprintfPrefixedInt-16          829           54  -93.49%
      BenchmarkSprintfPrefixedInt-32          824           50  -93.83%
      
      BenchmarkSprintfFloat                   418          398   -4.78%
      BenchmarkSprintfFloat-2                 295          203  -31.19%
      BenchmarkSprintfFloat-4                 585          128  -78.12%
      BenchmarkSprintfFloat-8                 873           60  -93.13%
      BenchmarkSprintfFloat-16                884           33  -96.24%
      BenchmarkSprintfFloat-32                881           29  -96.62%
      
      BenchmarkManyArgs                      1097         1069   -2.55%
      BenchmarkManyArgs-2                     705          567  -19.57%
      BenchmarkManyArgs-4                     792          319  -59.72%
      BenchmarkManyArgs-8                     963          172  -82.14%
      BenchmarkManyArgs-16                   1115          103  -90.76%
      BenchmarkManyArgs-32                   1133           90  -92.03%
      
      LGTM=rsc
      R=golang-codereviews, bradfitz, minux.ma, gobot, rsc
      CC=golang-codereviews
      https://golang.org/cl/46010043
      f8e0057b
    • Dmitriy Vyukov's avatar
      runtime: do not zero terminate strings · 9fa9613e
      Dmitriy Vyukov authored
      On top of "tiny allocator" (cl/38750047), reduces number of allocs by 1% on json.
      No code must rely on zero termination. So will also make debugging simpler,
      by uncovering issues earlier.
      
      json-1
      allocated                 7949686      7915766      -0.43%
      allocs                      93778        92790      -1.05%
      time                    100957795     97250949      -3.67%
      rest of the metrics are too noisy.
      
      LGTM=r
      R=golang-codereviews, r, bradfitz, iant
      CC=golang-codereviews
      https://golang.org/cl/40370061
      9fa9613e
    • Russ Cox's avatar
      cmd/gc: add zeroing to enable precise stack accounting · a81692e2
      Russ Cox authored
      There is more zeroing than I would like right now -
      temporaries used for the new map and channel runtime
      calls need to be eliminated - but it will do for now.
      
      This CL only has an effect if you are building with
      
              GOEXPERIMENT=precisestack ./all.bash
      
      (or make.bash). It costs about 5% in the overall time
      spent in all.bash. That number will come down before
      we make it on by default, but this should be enough for
      Keith to try using the precise maps for copying stacks.
      
      amd64 only (and it's not really great generated code).
      
      TBR=khr, iant
      CC=golang-codereviews
      https://golang.org/cl/56430043
      a81692e2
    • Russ Cox's avatar
      liblink, runtime: fix cgo on arm · b377c9c6
      Russ Cox authored
      The addition of TLS to ARM rewrote the MRC instruction
      differently depending on whether we were using internal
      or external linking mode. That's clearly not okay, since we
      don't know that during compilation, which is when we now
      generate the code. Also, because the change did not introduce
      a real MRC instruction but instead just macro-expanded it
      in the assembler, liblink is rewriting a WORD instruction that
      may actually be looking for that specific constant, which would
      lead to very unexpected results. It was also using one value
      that happened to be 8 where a different value that also
      happened to be 8 belonged. So the code was correct for those
      values but not correct in general, and very confusing.
      
      Throw it all away.
      
      Replace with the following. There is a linker-provided symbol
      runtime.tlsgm with a value (address) set to the offset from the
      hardware-provided TLS base register to the g and m storage.
      Any reference to that name emits an appropriate TLS relocation
      to be resolved by either the internal linker or the external linker,
      depending on the link mode. The relocation has exactly the
      semantics of the R_ARM_TLS_LE32 relocation, which is what
      the external linker provides.
      
      This symbol is only used in two routines, runtime.load_gm and
      runtime.save_gm. In both cases it is now used like this:
      
              MRC		15, 0, R0, C13, C0, 3 // fetch TLS base pointer
              MOVW	$runtime·tlsgm(SB), R2
              ADD	R2, R0 // now R0 points at thread-local g+m storage
      
      It is likely that this change breaks the generation of shared libraries
      on ARM, because the MOVW needs to be rewritten to use the global
      offset table and a different relocation type. But let's get the supported
      functionality working again before we worry about unsupported
      functionality.
      
      LGTM=dave, iant
      R=iant, dave
      CC=golang-codereviews
      https://golang.org/cl/56120043
      b377c9c6
    • Rob Pike's avatar
      effective_go: move 'Type switch' section into 'Control structures' section. · 592415d6
      Rob Pike authored
      Needs to be an h3, not an h2.
      Thanks to Mingjie Xing for pointing it out.
      
      LGTM=dsymonds
      R=golang-codereviews, dsymonds
      CC=golang-codereviews
      https://golang.org/cl/55980046
      592415d6
  2. 23 Jan, 2014 5 commits
    • Keith Randall's avatar
      runtime: Print elision message if we skipped frames on traceback. · be5d2d44
      Keith Randall authored
      Fixes bug 7180
      
      R=golang-codereviews, dvyukov
      CC=golang-codereviews, gri
      https://golang.org/cl/55810044
      be5d2d44
    • Dmitriy Vyukov's avatar
      bufio: fix benchmarks behavior · 0ad2cd00
      Dmitriy Vyukov authored
      Currently the benchmarks lie to testing package by doing O(N)
      work under StopTimer. And that hidden O(N) actually consitutes
      the bulk of benchmark work (e.g includes GC per iteration).
      This behavior accounts for windows-amd64-race builder hangs.
      
      Before:
      BenchmarkReaderCopyOptimal-4	 1000000	      1861 ns/op
      BenchmarkReaderCopyUnoptimal-4	  500000	      3327 ns/op
      BenchmarkReaderCopyNoWriteTo-4	   50000	     34549 ns/op
      BenchmarkWriterCopyOptimal-4	  100000	     16849 ns/op
      BenchmarkWriterCopyUnoptimal-4	  500000	      3126 ns/op
      BenchmarkWriterCopyNoReadFrom-4	   50000	     34609 ns/op
      ok  	bufio	65.273s
      
      After:
      BenchmarkReaderCopyOptimal-4	10000000	       172 ns/op
      BenchmarkReaderCopyUnoptimal-4	10000000	       267 ns/op
      BenchmarkReaderCopyNoWriteTo-4	  100000	     22905 ns/op
      BenchmarkWriterCopyOptimal-4	10000000	       170 ns/op
      BenchmarkWriterCopyUnoptimal-4	10000000	       226 ns/op
      BenchmarkWriterCopyNoReadFrom-4	  100000	     20575 ns/op
      ok  	bufio	14.074s
      
      Note the change in total time.
      
      LGTM=alex.brainman, rsc
      R=golang-codereviews, alex.brainman, rsc
      CC=golang-codereviews
      https://golang.org/cl/51360046
      0ad2cd00
    • Russ Cox's avatar
      lib/codereview: add LGTM= line to commit messages · 672ab629
      Russ Cox authored
      The R= is populated by Rietveld, so it's basically
      anyone who replied to the CL. The LGTM= is meant
      to record who actually signed off on the CL.
      
      LGTM=r
      R=r
      CC=golang-codereviews
      https://golang.org/cl/55390043
      672ab629
    • Dmitriy Vyukov's avatar
      undo CL 45770044 / d795425bfa18 · 8371b014
      Dmitriy Vyukov authored
      Breaks darwin and freebsd.
      
      ««« original CL description
      runtime: increase page size to 8K
      Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
      Only Chromium uses 4K pages today (in "slow but small" configuration).
      The general tendency is to increase page size, because it reduces
      metadata size and DTLB pressure.
      This change reduces GC pause by ~10% and slightly improves other metrics.
      
      json-1
      allocated                 8037492      8038689      +0.01%
      allocs                     105762       105573      -0.18%
      cputime                 158400000    155800000      -1.64%
      gc-pause-one              4412234      4135702      -6.27%
      gc-pause-total            2647340      2398707      -9.39%
      rss                      54923264     54525952      -0.72%
      sys-gc                    3952624      3928048      -0.62%
      sys-heap                 46399488     46006272      -0.85%
      sys-other                 5597504      5290304      -5.49%
      sys-stack                  393216       393216      +0.00%
      sys-total                56342832     55617840      -1.29%
      time                    158478890    156046916      -1.53%
      virtual-mem             256548864    256593920      +0.02%
      
      garbage-1
      allocated                 2991113      2986259      -0.16%
      allocs                      62844        62652      -0.31%
      cputime                  16330000     15860000      -2.88%
      gc-pause-one            789108229    725555211      -8.05%
      gc-pause-total            3945541      3627776      -8.05%
      rss                    1143660544   1132253184      -1.00%
      sys-gc                   65609600     65806208      +0.30%
      sys-heap               1032388608   1035599872      +0.31%
      sys-other                37501632     22777664     -39.26%
      sys-stack                 8650752      8781824      +1.52%
      sys-total              1144150592   1132965568      -0.98%
      time                     16364602     15891994      -2.89%
      virtual-mem            1327296512   1313746944      -1.02%
      
      R=golang-codereviews, dave, khr, rsc, khr
      CC=golang-codereviews
      https://golang.org/cl/45770044
      »»»
      
      R=golang-codereviews
      CC=golang-codereviews
      https://golang.org/cl/56060043
      8371b014
    • Dmitriy Vyukov's avatar
      runtime: increase page size to 8K · 6d603af6
      Dmitriy Vyukov authored
      Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
      Only Chromium uses 4K pages today (in "slow but small" configuration).
      The general tendency is to increase page size, because it reduces
      metadata size and DTLB pressure.
      This change reduces GC pause by ~10% and slightly improves other metrics.
      
      json-1
      allocated                 8037492      8038689      +0.01%
      allocs                     105762       105573      -0.18%
      cputime                 158400000    155800000      -1.64%
      gc-pause-one              4412234      4135702      -6.27%
      gc-pause-total            2647340      2398707      -9.39%
      rss                      54923264     54525952      -0.72%
      sys-gc                    3952624      3928048      -0.62%
      sys-heap                 46399488     46006272      -0.85%
      sys-other                 5597504      5290304      -5.49%
      sys-stack                  393216       393216      +0.00%
      sys-total                56342832     55617840      -1.29%
      time                    158478890    156046916      -1.53%
      virtual-mem             256548864    256593920      +0.02%
      
      garbage-1
      allocated                 2991113      2986259      -0.16%
      allocs                      62844        62652      -0.31%
      cputime                  16330000     15860000      -2.88%
      gc-pause-one            789108229    725555211      -8.05%
      gc-pause-total            3945541      3627776      -8.05%
      rss                    1143660544   1132253184      -1.00%
      sys-gc                   65609600     65806208      +0.30%
      sys-heap               1032388608   1035599872      +0.31%
      sys-other                37501632     22777664     -39.26%
      sys-stack                 8650752      8781824      +1.52%
      sys-total              1144150592   1132965568      -0.98%
      time                     16364602     15891994      -2.89%
      virtual-mem            1327296512   1313746944      -1.02%
      
      R=golang-codereviews, dave, khr, rsc, khr
      CC=golang-codereviews
      https://golang.org/cl/45770044
      6d603af6
  3. 22 Jan, 2014 26 commits
  4. 21 Jan, 2014 3 commits