1. 29 Sep, 2016 10 commits
  2. 28 Sep, 2016 8 commits
  3. 27 Sep, 2016 21 commits
  4. 26 Sep, 2016 1 commit
    • Austin Clements's avatar
      runtime: optimize defer code · f8b2314c
      Austin Clements authored
      This optimizes deferproc and deferreturn in various ways.
      
      The most important optimization is that it more carefully arranges to
      prevent preemption or stack growth. Currently we do this by switching
      to the system stack on every deferproc and every deferreturn. While we
      need to be on the system stack for the slow path of allocating and
      freeing defers, in the common case we can fit in the nosplit stack.
      Hence, this change pushes the system stack switch down into the slow
      paths and makes everything now exposed to the user stack nosplit. This
      also eliminates the need for various acquirem/releasem pairs, since we
      are now preventing preemption by preventing stack split checks.
      
      As another smaller optimization, we special case the common cases of
      zero-sized and pointer-sized defer frames to respectively skip the
      copy and perform the copy in line instead of calling memmove.
      
      This speeds up the runtime defer benchmark by 42%:
      
      name           old time/op  new time/op  delta
      Defer-4        75.1ns ± 1%  43.3ns ± 1%  -42.31%   (p=0.000 n=8+10)
      
      In reality, this speeds up defer by about 2.2X. The two benchmarks
      below compare a Lock/defer Unlock pair (DeferLock) with a Lock/Unlock
      pair (NoDeferLock). NoDeferLock establishes a baseline cost, so these
      two benchmarks together show that this change reduces the overhead of
      defer from 61.4ns to 27.9ns.
      
      name           old time/op  new time/op  delta
      DeferLock-4    77.4ns ± 1%  43.9ns ± 1%  -43.31%  (p=0.000 n=10+10)
      NoDeferLock-4  16.0ns ± 0%  15.9ns ± 0%   -0.39%    (p=0.000 n=9+8)
      
      This also shaves 34ns off cgo calls:
      
      name       old time/op  new time/op  delta
      CgoNoop-4   122ns ± 1%  88.3ns ± 1%  -27.72%  (p=0.000 n=8+9)
      
      Updates #14939, #16051.
      
      Change-Id: I2baa0dea378b7e4efebbee8fca919a97d5e15f38
      Reviewed-on: https://go-review.googlesource.com/29656Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      f8b2314c