1. 29 Jul, 2015 10 commits
    • Russ Cox's avatar
      cmd/link: fix nosplit stack overflow checks · 767e0658
      Russ Cox authored
      The nosplit stack overflow checks were confused about morestack.
      The comment about not having correct SP information at the call
      to morestack was true, but that was a real bug, not something to
      work around. I fixed that problem in CL 12144. With that fixed,
      no need to special-case morestack in the way done here.
      
      This cleanup and simplification of the code was the first step
      to fixing a bug that happened when I started working on the
      arm64 frame size adjustments, but the cleanup was sufficient
      to make the bug go away.
      
      For #9880.
      
      Change-Id: I16b69a5c16b6b8cb4090295d3029c42d606e3b9b
      Reviewed-on: https://go-review.googlesource.com/12846Reviewed-by: 's avatarAustin Clements <austin@google.com>
      767e0658
    • Russ Cox's avatar
      runtime, reflect: use correctly aligned stack frame sizes on arm64 · 42122057
      Russ Cox authored
      arm64 requires either no stack frame or a frame with a size that is 8 mod 16
      (adding the saved LR will make it 16-aligned).
      
      The cmd/internal/obj/arm64 has been silently aligning frames, but it led to
      a terrible bug when the compiler and obj disagreed on the frame size,
      and it's just generally confusing, so we're going to make misaligned frames
      an error instead of something that is silently changed.
      
      This CL prepares by updating assembly files.
      Note that the changes in this CL are already being done silently by
      cmd/internal/obj/arm64, so there is no semantic effect here,
      just a clarity effect.
      
      For #9880.
      
      Change-Id: Ibd6928dc5fdcd896c2bacd0291bf26b364591e28
      Reviewed-on: https://go-review.googlesource.com/12845Reviewed-by: 's avatarAustin Clements <austin@google.com>
      42122057
    • Russ Cox's avatar
      cmd/compile: align arm64 stack frames correctly · 3952057c
      Russ Cox authored
      If the compiler doesn't do it, cmd/internal/obj/arm64 will,
      and that will break the zeroing of ambiguously live values
      done in zerorange, which in turn produces uninitialized
      pointer cells that the GC trips over.
      
      For #9880.
      
      Change-Id: Ice97c30bc8b36d06b7b88d778d87fab8e1827fdc
      Reviewed-on: https://go-review.googlesource.com/12847Reviewed-by: 's avatarAustin Clements <austin@google.com>
      3952057c
    • Austin Clements's avatar
      runtime: report GC CPU utilization in MemStats · 23e4744c
      Austin Clements authored
      This adds a GCCPUFraction field to MemStats that reports the
      cumulative fraction of the program's execution time spent in the
      garbage collector. This is equivalent to the utilization percent shown
      in the gctrace output and makes this available programmatically.
      
      This does make one small effect on the gctrace output: we now report
      the duration of mark termination up to just before the final
      start-the-world, rather than up to just after. However, unlike
      stop-the-world, I don't believe there's any way that start-the-world
      can block, so it should take negligible time.
      
      While there are many statistics one might want to expose via MemStats,
      this is one of the few that will undoubtedly remain meaningful
      regardless of future changes to the memory system.
      
      The diff for this change is larger than the actual change. Mostly it
      lifts the code for computing the GC CPU utilization out of the
      debug.gctrace path.
      
      Updates #10323.
      
      Change-Id: I0f7dc3fdcafe95e8d1233ceb79de606b48acd989
      Reviewed-on: https://go-review.googlesource.com/12844Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      23e4744c
    • Austin Clements's avatar
      runtime: always capture GC phase transition times · 4b71660c
      Austin Clements authored
      Currently we only capture GC phase transition times if
      debug.gctrace>0, but we're about to compute GC CPU utilization
      regardless of whether debug.gctrace is set, so we need these
      regardless of debug.gctrace.
      
      Change-Id: If3acf16505a43d416e9a99753206f03287180660
      Reviewed-on: https://go-review.googlesource.com/12843Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      4b71660c
    • Austin Clements's avatar
      runtime: avoid race between SIGPROF traceback and stack barriers · 87f97c73
      Austin Clements authored
      The following sequence of events can lead to the runtime attempting an
      out-of-bounds access on a stack barrier slice:
      
      1. A SIGPROF comes in on a thread while the G on that thread is in
         _Gsyscall. The sigprof handler calls gentraceback, which saves a
         local copy of the G's stkbar slice. Currently the G has no stack
         barriers, so this slice is empty.
      
      2. On another thread, the GC concurrently scans the stack of the
         goroutine being profiled (it considers it stopped because it's in
         _Gsyscall) and installs stack barriers.
      
      3. Back on the sigprof thread, gentraceback comes across a stack
         barrier in the stack and attempts to look it up in its (zero
         length) copy of G's old stkbar slice, which causes an out-of-bounds
         access.
      
      This commit fixes this by adding a simple cas spin to synchronize the
      SIGPROF handler with stack barrier insertion.
      
      In general I would prefer that this synchronization be done through
      the G status, since that's how stack scans are otherwise synchronized,
      but adding a new lock is a much smaller change and G statuses are full
      of subtlety.
      
      Fixes #11863.
      
      Change-Id: Ie89614a6238bb9c6a5b1190499b0b48ec759eaf7
      Reviewed-on: https://go-review.googlesource.com/12748Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      87f97c73
    • Rick Hudson's avatar
      runtime: force mutator to give work buffer to GC · e95bc5fe
      Rick Hudson authored
      The scheduler, work buffer's dispose, and write barriers
      can conspire to hide the a pointer from the GC's concurent
      mark phase. If this pointer is the only path to a large
      amount of marking the STW mark termination phase may take
      a lot of time.
      
      Consider the following:
      1) dispose places a work buffer on the partial queue
      2) the GC is busy so it does not immediately remove and
         process the work buffer
      3) the scheduler runs a mutator whose write barrier dequeues the
         work buffer from the partial queue so the GC won't see it
      This repeats until the GC reaches the mark termination
      phase where the GC finally discovers the pointer along
      with a lot of work to do.
      
      This CL fixes the problem by having the mutator
      dispose of the buffer to the full queue instead of
      the partial queue. Since the write buffer never asks for full
      buffers the conspiracy described above is not possible.
      
      Updates #11694.
      
      Change-Id: I2ce832f9657a7570f800e8ce4459cd9e304ef43b
      Reviewed-on: https://go-review.googlesource.com/12840Reviewed-by: 's avatarAustin Clements <austin@google.com>
      e95bc5fe
    • Rob Pike's avatar
      doc: add json tokenizer to go1.5.html · ff991940
      Rob Pike authored
      Change-Id: I45d92fed757fa1866d5b80e53ed1af6712fa6741
      Reviewed-on: https://go-review.googlesource.com/12782Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      ff991940
    • Mikio Hara's avatar
      cmd/internal/asm: delete · 861af804
      Mikio Hara authored
      Updates #10510.
      
      Change-Id: Ib4d39943969d18517b373292b83d87650d5df12a
      Reviewed-on: https://go-review.googlesource.com/12787
      Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      861af804
    • Rob Pike's avatar
      cmd: delete old[5689]a · ba0c142d
      Rob Pike authored
      These are the old assemblers written in C, and now they are
      not needed.
      
      Fixes #10510.
      
      Change-Id: Id9337ffc8eccfd93c84b2e23f427fb1a576b543d
      Reviewed-on: https://go-review.googlesource.com/12784Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      ba0c142d
  2. 28 Jul, 2015 23 commits
  3. 27 Jul, 2015 7 commits
    • Brad Fitzpatrick's avatar
      net/http: pause briefly after closing Server connection when body remains · a3ffd836
      Brad Fitzpatrick authored
      From https://github.com/golang/go/issues/11745#issuecomment-123555313
      this implements option (b), having the server pause slightly after
      sending the final response on a TCP connection when we're about to close
      it when we know there's a request body outstanding. This biases the
      client (which might not be Go) to prefer our response header over the
      request body write error.
      
      Updates #11745
      
      Change-Id: I07cb0b74519d266c8049d9e0eb23a61304eedbf8
      Reviewed-on: https://go-review.googlesource.com/12658Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      a3ffd836
    • David Crawshaw's avatar
      runtime/cgo: remove TMPDIR logic for iOS · 249894ab
      David Crawshaw authored
      Seems like the simplest solution for 1.5. All the parts of the test
      suite I can run on my current device (for which my exception handler
      fix no longer works, apparently) pass without this code. I'll move it
      into x/mobile/app.
      
      Fixes #11884
      
      Change-Id: I2da40c8c7b48a4c6970c4d709dd7c148a22e8727
      Reviewed-on: https://go-review.googlesource.com/12721Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      249894ab
    • Austin Clements's avatar
      runtime: close window that hides GC work from concurrent mark · c1f7a56f
      Austin Clements authored
      Currently we enter mark 2 by first flushing all existing gcWork caches
      and then setting gcBlackenPromptly, which disables further gcWork
      caching. However, if a worker or assist pulls a work buffer in to its
      gcWork cache after that cache has been flushed but before caching is
      disabled, that work may remain in that cache until mark termination.
      If that work represents a heap bottleneck (e.g., a single pointer that
      is the only way to reach a large amount of the heap), this can force
      mark termination to do a large amount of work, resulting in a long
      STW.
      
      Fix this by reversing the order of these steps: first disable caching,
      then flush all existing caches.
      
      Rick Hudson <rlh> did the hard work of tracking this down. This CL
      combined with CL 12672 and CL 12646 distills the critical parts of his
      fix from CL 12539.
      
      Fixes #11694.
      
      Change-Id: Ib10d0a21e3f6170a80727d0286f9990df049fed2
      Reviewed-on: https://go-review.googlesource.com/12688Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      c1f7a56f
    • Austin Clements's avatar
      runtime: enable GC assists ASAP · 510fd135
      Austin Clements authored
      Currently the GC coordinator enables GC assists at the same time it
      enables background mark workers, after the concurrent scan phase is
      done. However, this means a rapidly allocating mutator has the entire
      scan phase during which to allocate beyond the heap trigger and
      potentially beyond the heap goal with no back-pressure from assists.
      This prevents the feedback system that's supposed to keep the heap
      size under the heap goal from doing its job.
      
      Fix this by enabling mutator assists during the scan phase. This is
      safe because the write barrier is already enabled and globally
      acknowledged at this point.
      
      There's still a very small window between when the heap size reaches
      the heap trigger and when the GC coordinator is able to stop the world
      during which the mutator can allocate unabated. This allows *very*
      rapidly allocator mutators like TestTraceStress to still occasionally
      exceed the heap goal by a small amount (~20 MB at most for
      TestTraceStress). However, this seems like a corner case.
      
      Fixes #11677.
      
      Change-Id: I0f80d949ec82341cd31ca1604a626efb7295a819
      Reviewed-on: https://go-review.googlesource.com/12674Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      510fd135
    • Austin Clements's avatar
      runtime: allow GC drain whenever write barrier is enabled · f5e67e53
      Austin Clements authored
      Currently we hand-code a set of phases when draining is allowed.
      However, this set of phases is conservative. The critical invariant is
      simply that the write barrier must be enabled if we're draining.
      
      Shortly we're going to enable mutator assists during the scan phase,
      which means we may drain during the scan phase. In preparation, this
      commit generalizes these assertions to check the fundamental condition
      that the write barrier is enabled, rather than checking that we're in
      any particular phase.
      
      Change-Id: I0e1bec1ca823d4a697a0831ec4c50f5dd3f2a893
      Reviewed-on: https://go-review.googlesource.com/12673Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      f5e67e53
    • Austin Clements's avatar
      runtime: don't start workers between mark 1 & 2 · 64a32ffe
      Austin Clements authored
      Currently we clear both the mark 1 and mark 2 signals at the beginning
      of concurrent mark. If either if these is clear, it acts as a signal
      to the scheduler that it should start background workers. However,
      this means that in the interim *between* mark 1 and mark 2, the
      scheduler basically loops starting up new workers only to have them
      return with nothing to do. In addition to harming performance and
      delaying mutator work, this approach has a race where workers started
      for mark 1 can mistakenly signal mark 2, causing it to complete
      prematurely. This approach also interferes with starting assists
      earlier to fix #11677.
      
      Fix this by initially setting both mark 1 and mark 2 to "signaled".
      The scheduler will not start background mark workers, though assists
      can still run. When we're ready to enter mark 1, we clear the mark 1
      signal and wait for it. Then, when we're ready to enter mark 2, we
      clear the mark 2 signal and wait for it.
      
      This structure also lets us deal cleanly with the situation where all
      work is drained *prior* to the mark 2 wait, meaning that there may be
      no workers to signal completion. Currently we deal with this using a
      racy (and possibly incorrect) check for work in the coordinator itself
      to skip the mark 2 wait if there's no work. This change makes the
      coordinator unconditionally wait for mark completion and makes the
      scheduler itself signal completion by slightly extending the logic it
      already has to determine that there's no work and hence no use in
      starting a new worker.
      
      This is a prerequisite to fixing the remaining component of #11677,
      which will require enabling assists during the scan phase. However, we
      don't want to enable background workers until the mark phase because
      they will compete with the scan. This change lets us use bgMark1 and
      bgMark2 to indicate when it's okay to start background workers
      independent of assists.
      
      This is also a prerequisite to fixing #11694. It significantly reduces
      the occurrence of long mark termination pauses in #11694 (from 64 out
      of 1000 to 2 out of 1000 in one experiment).
      
      Coincidentally, this also reduces the final heap size (and hence run
      time) of TestTraceStress from ~100 MB and ~1.9 seconds to ~14 MB and
      ~0.4 seconds because it significantly shortens concurrent mark
      duration.
      
      Rick Hudson <rlh> did the hard work of tracking this down.
      
      Change-Id: I12ea9ee2db9a0ae9d3a90dde4944a75fcf408f4c
      Reviewed-on: https://go-review.googlesource.com/12672Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      64a32ffe
    • Austin Clements's avatar
      runtime: retry GC assist until debt is paid off · 8f34b253
      Austin Clements authored
      Currently, there are three ways to satisfy a GC assist: 1) the mutator
      steals credit from background GC, 2) the mutator actually does GC
      work, and 3) there is no more work available. 3 was never really
      intended as a way to satisfy an assist, and it causes problems: there
      are periods when it's expected that the GC won't have any work, such
      as when transitioning from mark 1 to mark 2 and from mark 2 to mark
      termination. During these periods, there's no back-pressure on rapidly
      allocating mutators, which lets them race ahead of the heap goal.
      
      For example, test/init1.go and the runtime/trace test both have small
      reachable heaps and contain loops that rapidly allocate large garbage
      byte slices. This bug lets these tests exceed the heap goal by several
      orders of magnitude.
      
      Fix this by forcing the assist (and hence the allocation) to block
      until it can satisfy its debt via either 1 or 2, or the GC cycle
      terminates.
      
      This fixes one the causes of #11677. It's still possible to overshoot
      the GC heap goal, but with this change the overshoot is almost exactly
      by the amount of allocation that happens during the concurrent scan
      phase, between when the heap passes the GC trigger and when the GC
      enables assists.
      
      Change-Id: I5ef4edcb0d2e13a1e432e66e8245f2bd9f8995be
      Reviewed-on: https://go-review.googlesource.com/12671Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      8f34b253