1. 06 Sep, 2016 16 commits
    • Austin Clements's avatar
      runtime: don't hard-code physical page size · 6dda7b2f
      Austin Clements authored
      Now that the runtime fetches the true physical page size from the OS,
      make the physical page size used by heap growth a variable instead of
      a constant. This isn't used in any performance-critical paths, so it
      shouldn't be an issue.
      
      sys.PhysPageSize is also renamed to sys.DefaultPhysPageSize to make it
      clear that it's not necessarily the true page size. There are no uses
      of this constant any more, but we'll keep it around for now.
      
      Updates #12480 and #10180.
      
      Change-Id: I6c23b9df860db309c38c8287a703c53817754f03
      Reviewed-on: https://go-review.googlesource.com/25022
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      6dda7b2f
    • Austin Clements's avatar
      runtime: fetch physical page size from the OS · 276a52de
      Austin Clements authored
      Currently the physical page size assumed by the runtime is hard-coded.
      On Linux the runtime at least fetches the OS page size during init and
      sanity checks against the hard-coded value, but they may still differ.
      On other OSes we wouldn't even notice.
      
      Add support on all OSes to fetch the actual OS physical page size
      during runtime init and lift the sanity check of PhysPageSize from the
      Linux init code to general malloc init. Currently this is the only use
      of the retrieved page size, but we'll add more shortly.
      
      Updates #12480 and #10180.
      
      Change-Id: I065f2834bc97c71d3208edc17fd990ec9058b6da
      Reviewed-on: https://go-review.googlesource.com/25050
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      276a52de
    • Austin Clements's avatar
      runtime: assume 64kB physical pages on ARM · d7de8b6d
      Austin Clements authored
      Currently we assume the physical page size on ARM is 4kB. While this
      is usually true, the architecture also supports 16kB and 64kB physical
      pages, and Linux (and possibly other OSes) can be configured to use
      these larger page sizes.
      
      With Go 1.6, such a configuration could potentially run, but generally
      resulted in memory corruption or random panics. With current master,
      this configuration will cause the runtime to panic during init on
      Linux when it checks the true physical page size (and will still cause
      corruption or panics on other OSes).
      
      However, the assumed physical page size only has to be a multiple of
      the true physical page size, the scavenger can now deal with large
      physical page sizes, and the rest of the runtime can deal with a
      larger assumed physical page size than the true size. Hence, there's
      little disadvantage to conservatively setting the assumed physical
      page size to 64kB on ARM.
      
      This may result in some extra memory use, since we can only return
      memory at multiples of the assumed physical page size. However, it is
      a simple change that should make Go run on systems configured for
      larger page sizes. The following commits will make the runtime query
      the actual physical page size from the OS, but this is a simple step
      there.
      
      Updates #12480.
      
      Change-Id: I851829595bc9e0c76235c847a7b5f62ad82b5302
      Reviewed-on: https://go-review.googlesource.com/25021
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMinux Ma <minux@golang.org>
      d7de8b6d
    • Austin Clements's avatar
      runtime: bound scanobject to ~100 µs · cf4f1d07
      Austin Clements authored
      Currently the time spent in scanobject is proportional to the size of
      the object being scanned. Since scanobject is non-preemptible, large
      objects can cause significant goroutine (and even whole application)
      delays through several means:
      
      1. If a GC assist picks up a large object, the allocating goroutine is
         blocked for the whole scan, even if that scan well exceeds that
         goroutine's debt.
      
      2. Since the scheduler does not run on the P performing a large object
         scan, goroutines in that P's run queue do not run unless they are
         stolen by another P (which can take some time). If there are a few
         large objects, all of the Ps may get tied up so the scheduler
         doesn't run anywhere.
      
      3. Even if a large object is scanned by a background worker and other
         Ps are still running the scheduler, the large object scan doesn't
         flush background credit until the whole scan is done. This can
         easily cause all allocations to block in assists, waiting for
         credit, causing an effective STW.
      
      Fix this by splitting large objects into 128 KB "oblets" and scanning
      at most one oblet at a time. Since we can scan 1–2 MB/ms, this equates
      to bounding scanobject at roughly 100 µs. This improves assist
      behavior both because assists can no longer get "unlucky" and be stuck
      scanning a large object, and because it causes the background worker
      to flush credit and unblock assists more frequently when scanning
      large objects. This also improves GC parallelism if the heap consists
      primarily of a small number of very large objects by letting multiple
      workers scan a large objects in parallel.
      
      Fixes #10345. Fixes #16293.
      
      This substantially improves goroutine latency in the benchmark from
      issue #16293, which exercises several forms of very large objects:
      
      name                 old max-latency    new max-latency    delta
      SliceNoPointer-12           154µs ± 1%        155µs ±  2%     ~     (p=0.087 n=13+12)
      SlicePointer-12             314ms ± 1%       5.94ms ±138%  -98.11%  (p=0.000 n=19+20)
      SliceLivePointer-12        1148ms ± 0%       4.72ms ±167%  -99.59%  (p=0.000 n=19+20)
      MapNoPointer-12           72509µs ± 1%        408µs ±325%  -99.44%  (p=0.000 n=19+18)
      ChanPointer-12              313ms ± 0%       4.74ms ±140%  -98.49%  (p=0.000 n=18+20)
      ChanLivePointer-12         1147ms ± 0%       3.30ms ±149%  -99.71%  (p=0.000 n=19+20)
      
      name                 old P99.9-latency  new P99.9-latency  delta
      SliceNoPointer-12           113µs ±25%         107µs ±12%     ~     (p=0.153 n=20+18)
      SlicePointer-12          309450µs ± 0%         133µs ±23%  -99.96%  (p=0.000 n=20+20)
      SliceLivePointer-12         961ms ± 0%        1.35ms ±27%  -99.86%  (p=0.000 n=20+20)
      MapNoPointer-12            448µs ±288%         119µs ±18%  -73.34%  (p=0.000 n=18+20)
      ChanPointer-12           309450µs ± 0%         134µs ±23%  -99.96%  (p=0.000 n=20+19)
      ChanLivePointer-12          961ms ± 0%        1.35ms ±27%  -99.86%  (p=0.000 n=20+20)
      
      This has negligible effect on all metrics from the garbage, JSON, and
      HTTP x/benchmarks.
      
      It shows slight improvement on some of the go1 benchmarks,
      particularly Revcomp, which uses some multi-megabyte buffers:
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.46s ± 1%     2.47s ± 1%  +0.32%  (p=0.012 n=20+20)
      Fannkuch11-12                2.82s ± 0%     2.81s ± 0%  -0.61%  (p=0.000 n=17+20)
      FmtFprintfEmpty-12          50.8ns ± 5%    50.5ns ± 2%    ~     (p=0.197 n=17+19)
      FmtFprintfString-12          131ns ± 1%     132ns ± 0%  +0.57%  (p=0.000 n=20+16)
      FmtFprintfInt-12             117ns ± 0%     116ns ± 0%  -0.47%  (p=0.000 n=15+20)
      FmtFprintfIntInt-12          180ns ± 0%     179ns ± 1%  -0.78%  (p=0.000 n=16+20)
      FmtFprintfPrefixedInt-12     186ns ± 1%     185ns ± 1%  -0.55%  (p=0.000 n=19+20)
      FmtFprintfFloat-12           263ns ± 1%     271ns ± 0%  +2.84%  (p=0.000 n=18+20)
      FmtManyArgs-12               741ns ± 1%     742ns ± 1%    ~     (p=0.190 n=19+19)
      GobDecode-12                7.44ms ± 0%    7.35ms ± 1%  -1.21%  (p=0.000 n=20+20)
      GobEncode-12                6.22ms ± 1%    6.21ms ± 1%    ~     (p=0.336 n=20+19)
      Gzip-12                      220ms ± 1%     219ms ± 1%    ~     (p=0.130 n=19+19)
      Gunzip-12                   37.9ms ± 0%    37.9ms ± 1%    ~     (p=1.000 n=20+19)
      HTTPClientServer-12         82.5µs ± 3%    82.6µs ± 3%    ~     (p=0.776 n=20+19)
      JSONEncode-12               16.4ms ± 1%    16.5ms ± 2%  +0.49%  (p=0.003 n=18+19)
      JSONDecode-12               53.7ms ± 1%    54.1ms ± 1%  +0.71%  (p=0.000 n=19+18)
      Mandelbrot200-12            4.19ms ± 1%    4.20ms ± 1%    ~     (p=0.452 n=19+19)
      GoParse-12                  3.38ms ± 1%    3.37ms ± 1%    ~     (p=0.123 n=19+19)
      RegexpMatchEasy0_32-12      72.1ns ± 1%    71.8ns ± 1%    ~     (p=0.397 n=19+17)
      RegexpMatchEasy0_1K-12       242ns ± 0%     242ns ± 0%    ~     (p=0.168 n=17+20)
      RegexpMatchEasy1_32-12      72.1ns ± 1%    72.1ns ± 1%    ~     (p=0.538 n=18+19)
      RegexpMatchEasy1_1K-12       385ns ± 1%     384ns ± 1%    ~     (p=0.388 n=20+20)
      RegexpMatchMedium_32-12      112ns ± 1%     112ns ± 3%    ~     (p=0.539 n=20+20)
      RegexpMatchMedium_1K-12     34.4µs ± 2%    34.4µs ± 2%    ~     (p=0.628 n=18+18)
      RegexpMatchHard_32-12       1.80µs ± 1%    1.80µs ± 1%    ~     (p=0.522 n=18+19)
      RegexpMatchHard_1K-12       54.0µs ± 1%    54.1µs ± 1%    ~     (p=0.647 n=20+19)
      Revcomp-12                   387ms ± 1%     369ms ± 5%  -4.89%  (p=0.000 n=17+19)
      Template-12                 62.3ms ± 1%    62.0ms ± 0%  -0.48%  (p=0.002 n=20+17)
      TimeParse-12                 314ns ± 1%     314ns ± 0%    ~     (p=1.011 n=20+13)
      TimeFormat-12                358ns ± 0%     354ns ± 0%  -1.12%  (p=0.000 n=17+20)
      [Geo mean]                  53.5µs         53.3µs       -0.23%
      
      Change-Id: I2a0a179d1d6bf7875dd054b7693dd12d2a340132
      Reviewed-on: https://go-review.googlesource.com/23540
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      cf4f1d07
    • Austin Clements's avatar
      runtime: clean up more traces of the old mark bit · b275e55d
      Austin Clements authored
      Commit 59877bfa renamed bitMarked to bitScan, since the bitmap is no
      longer used for marking. However, there were several other references
      to this strewn about comments and in some other constant names. Fix
      these up, too.
      
      Change-Id: I4183d28c6b01977f1d75a99ad55b150f2211772d
      Reviewed-on: https://go-review.googlesource.com/28450
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      b275e55d
    • Cherry Zhang's avatar
      cmd/compile: remove nil check if followed by storezero on ARM64, MIPS64 · 4d5bb762
      Cherry Zhang authored
      Change-Id: Ib90c92056fa70b27feb734837794ef53e842c41a
      Reviewed-on: https://go-review.googlesource.com/28513
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      4d5bb762
    • David Chase's avatar
      cmd/compile: remove ld/st-followed nil checks for PPC64 · 0e0ab203
      David Chase authored
      Enabled checks (except for DUFF-ops which aren't implemented yet).
      Added ppc64le to relevant test.
      
      Also updated register list to reflect no-longer-reserved-
      for-constants status (file was missed in that change).
      
      Updates #16010.
      
      Change-Id: I31b1aac19e14994f760f2ecd02edbeb1f78362e7
      Reviewed-on: https://go-review.googlesource.com/28548
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      0e0ab203
    • David Crawshaw's avatar
      cmd/link: remove outdated cast and comment · b926bf83
      David Crawshaw authored
      This program is written in Go now.
      
      Change-Id: Ieec21a1bcac7c7a59e88cd1e1359977659de1757
      Reviewed-on: https://go-review.googlesource.com/28549Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      b926bf83
    • Aliaksandr Valialkin's avatar
      regexp: reduce mallocs in Regexp.Find* and Regexp.ReplaceAll*. · bea39e63
      Aliaksandr Valialkin authored
      This improves Regexp.Find* and Regexp.ReplaceAll* speed:
      
      name                  old time/op    new time/op    delta
      Find-4                   345ns ± 1%     314ns ± 1%    -8.94%    (p=0.000 n=9+8)
      FindString-4             341ns ± 1%     308ns ± 0%    -9.85%   (p=0.000 n=10+9)
      FindSubmatch-4           440ns ± 1%     404ns ± 0%    -8.27%   (p=0.000 n=10+8)
      FindStringSubmatch-4     426ns ± 0%     387ns ± 0%    -9.07%   (p=0.000 n=10+9)
      ReplaceAll-4            1.75µs ± 1%    1.67µs ± 0%    -4.45%   (p=0.000 n=9+10)
      
      name                  old alloc/op   new alloc/op   delta
      Find-4                   16.0B ± 0%     0.0B ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindString-4             16.0B ± 0%     0.0B ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindSubmatch-4           80.0B ± 0%     48.0B ± 0%   -40.00%  (p=0.000 n=10+10)
      FindStringSubmatch-4     64.0B ± 0%     32.0B ± 0%   -50.00%  (p=0.000 n=10+10)
      ReplaceAll-4              152B ± 0%      104B ± 0%   -31.58%  (p=0.000 n=10+10)
      
      name                  old allocs/op  new allocs/op  delta
      Find-4                    1.00 ± 0%     0.00 ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindString-4              1.00 ± 0%     0.00 ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindSubmatch-4            2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
      FindStringSubmatch-4      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
      ReplaceAll-4              8.00 ± 0%      5.00 ± 0%   -37.50%  (p=0.000 n=10+10)
      
      Fixes #15643
      
      Change-Id: I594fe51172373e2adb98d1d25c76ca2cde54ff48
      Reviewed-on: https://go-review.googlesource.com/23030Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      bea39e63
    • David Crawshaw's avatar
      cmd/compile: generate table of main symbol types · 5923df1a
      David Crawshaw authored
      For each exported symbol in package main, add its name and type to
      go.plugin.tabs symbol. This is used by the runtime when loading a
      plugin to return a typed interface{} value.
      
      Change-Id: I23c39583e57180acb8f7a74d218dae4368614f46
      Reviewed-on: https://go-review.googlesource.com/27818
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      5923df1a
    • Ilya Tocar's avatar
      math: fix sqrt regression on AMD64 · 6e703ae7
      Ilya Tocar authored
      1.7 introduced a significant regression compared to 1.6:
      
      SqrtIndirect-4  2.32ns ± 0%  7.86ns ± 0%  +238.79%        (p=0.000 n=20+18)
      
      This is caused by sqrtsd preserving upper part of destination register.
      Which introduces dependency on previous  value of X0.
      In 1.6 benchmark loop didn't use X0 immediately after call:
      
      callq  *%rbx
      movsd  0x8(%rsp),%xmm2
      movsd  0x20(%rsp),%xmm1
      addsd  %xmm2,%xmm1
      mov    0x18(%rsp),%rax
      inc    %rax
      jmp    loop
      
      In 1.7 however xmm0 is used just after call:
      
      callq  *%rbx
      mov    0x10(%rsp),%rcx
      lea    0x1(%rcx),%rax
      movsd  0x8(%rsp),%xmm0
      movsd  0x18(%rsp),%xmm1
      
      I've  verified that this is caused by dependency, by inserting
      XORPS X0,X0 in the beginning of math.Sqrt, which puts performance back on 1.6 level.
      
      Splitting SQRTSD mem,reg into:
      MOVSD mem,reg
      SQRTSD reg,reg
      
      Removes dependency, because MOVSD (load version)
      doesn't need to preserve upper part of a register.
      And reg,reg operation is solved by renamer in CPU.
      
      As a result of this change regression is gone:
      SqrtIndirect-4  7.86ns ± 0%  2.33ns ± 0%  -70.36%  (p=0.000 n=18+17)
      
      This also removes old Sqrt benchmarks, in favor of benchmarks measuring latency.
      Only SqrtIndirect is kept, to show impact of this patch.
      
      Change-Id: Ic7eebe8866445adff5bc38192fa8d64c9a6b8872
      Reviewed-on: https://go-review.googlesource.com/28392
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      6e703ae7
    • Josh Bleecher Snyder's avatar
      cmd/go: run mkalldocs.sh · 6bcca5e9
      Josh Bleecher Snyder authored
      This should have happened as part of CL 28485.
      
      Change-Id: I63cd31303e542ceaec3f4002c5573f186a1e9a52
      Reviewed-on: https://go-review.googlesource.com/28547
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarDavid Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      6bcca5e9
    • Cherry Zhang's avatar
      cmd/compile: fix intrinsifying sync/atomic.Swap* on AMD64 · 644c16c7
      Cherry Zhang authored
      It should alias to Xchg instead of Swap. Found when testing #16985.
      
      Change-Id: If9fd734a1f89b8b2656f421eb31b9d1b0d95a49f
      Reviewed-on: https://go-review.googlesource.com/28512
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      644c16c7
    • Cherry Zhang's avatar
      cmd/compile: mark some AMD64 atomic ops as clobberFlags · f1ef5a06
      Cherry Zhang authored
      Fixes #16985.
      
      Change-Id: I5954db28f7b70dd3ac7768e471d5df871a5b20f9
      Reviewed-on: https://go-review.googlesource.com/28510
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      f1ef5a06
    • Brad Fitzpatrick's avatar
      syscall: add yet more TestGetfsstat debugging · 6db13e07
      Brad Fitzpatrick authored
      Updates #16937
      
      Change-Id: I98aa203176f8f2ca2fcca6e334a65bc60d6f824d
      Reviewed-on: https://go-review.googlesource.com/28535Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      6db13e07
    • Erik Staab's avatar
      runtime: remove redundant expression from SetFinalizer · 66121ce8
      Erik Staab authored
      The previous if condition already checks the same expression and doesn't
      have side effects.
      
      Change-Id: Ieaf30a786572b608d0a883052b45fd3f04bc6147
      Reviewed-on: https://go-review.googlesource.com/28475Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      66121ce8
  2. 05 Sep, 2016 7 commits
  3. 04 Sep, 2016 11 commits
  4. 03 Sep, 2016 2 commits
  5. 02 Sep, 2016 4 commits