1. 19 May, 2015 9 commits
    • Ryan Brown's avatar
      cmd/internal/ld: output dwarf in external link mode on darwin · 8b83306c
      Ryan Brown authored
      Fixes #8973
      
      Change-Id: Idd53fc6d9e6971ae31ed72a3df3cfdce0bfbc1fd
      Reviewed-on: https://go-review.googlesource.com/8661Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      Run-TryBot: Russ Cox <rsc@golang.org>
      8b83306c
    • Russ Cox's avatar
      cmd/internal/gc: sync nowritebarrier checks and write barrier insertion · 9c9e36b3
      Russ Cox authored
      Change-Id: I348223d0336e28d95b8e68d7653aa547acc7c9c3
      Reviewed-on: https://go-review.googlesource.com/10262Reviewed-by: 's avatarAustin Clements <austin@google.com>
      9c9e36b3
    • Russ Cox's avatar
      cmd/internal/gc: type of str[i] is byte, not uint8 · c735064c
      Russ Cox authored
      Fixes #8745.
      
      Change-Id: Id0641e3c0f259812b41ed871e83c68740feb2b19
      Reviewed-on: https://go-review.googlesource.com/10261Reviewed-by: 's avatarAustin Clements <austin@google.com>
      c735064c
    • Rick Hudson's avatar
      runtime: run background mark helpers only if work is available · 913db768
      Rick Hudson authored
      Prior to this CL whenever the GC marking was enabled and
      a P was looking for work we supplied a G to help
      the GC do its marking tasks. Once this G finished all
      the marking available it would release the P to find another
      available G. In the case where there was no work the P would drop
      into findrunnable which would execute the mark helper G which would
      immediately return and the P would drop into findrunnable again repeating
      the process. Since the P was always given a G to run it never blocks.
      This CL first checks if the GC mark helper G has available work and if
      not the P immediately falls through to its blocking logic.
      
      Fixes #10901
      
      Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5
      Reviewed-on: https://go-review.googlesource.com/10189Reviewed-by: 's avatarAustin Clements <austin@google.com>
      913db768
    • Austin Clements's avatar
      runtime: minor clean up to heapminimum · f4d51eb2
      Austin Clements authored
      Currently setGCPercent sets heapminimum to heapminimum*GOGC/100. The
      real intent is to set heapminimum to a scaled multiple of a fixed
      default heap minimum, not to scale heapminimum based on its current
      value. This turns out to be okay because setGCPercent is only called
      once and heapminimum is initially set to this default heap minimum.
      However, the code as written is confusing, especially since
      setGCPercent is otherwise written so it could be called again to
      change GOGC. Fix this by introducing a defaultHeapMinimum constant and
      using this instead of the current value of heapminimum to compute the
      scaled heap minimum.
      
      As part of this, this commit improves the documentation on
      heapminimum.
      
      Change-Id: I4eb82c73dc2eb44a6e5a17c780a747a2e73d7493
      Reviewed-on: https://go-review.googlesource.com/10181Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      f4d51eb2
    • Russ Cox's avatar
      runtime: add fast check for self-loop pointer in scanobject · 8903b3db
      Russ Cox authored
      Addresses a problem reported on the mailing list.
      
      This will come up mainly in programs custom allocators that batch allocations,
      but it still helps in our programs, which mainly do not have such allocations.
      
      name                   old mean              new mean              delta
      BinaryTree17            5.95s × (0.97,1.03)   5.93s × (0.97,1.04)    ~    (p=0.613)
      Fannkuch11              4.46s × (0.98,1.04)   4.33s × (0.99,1.01)  -2.93% (p=0.000)
      FmtFprintfEmpty        86.6ns × (0.98,1.03)  86.8ns × (0.98,1.02)    ~    (p=0.523)
      FmtFprintfString        290ns × (0.98,1.05)   287ns × (0.98,1.03)    ~    (p=0.061)
      FmtFprintfInt           271ns × (0.98,1.04)   286ns × (0.99,1.01)  +5.54% (p=0.000)
      FmtFprintfIntInt        495ns × (0.98,1.04)   489ns × (0.99,1.01)  -1.24% (p=0.015)
      FmtFprintfPrefixedInt   391ns × (0.99,1.02)   407ns × (0.99,1.01)  +4.00% (p=0.000)
      FmtFprintfFloat         578ns × (0.99,1.01)   559ns × (0.99,1.01)  -3.35% (p=0.000)
      FmtManyArgs            1.96µs × (0.98,1.05)  1.94µs × (0.99,1.01)  -1.33% (p=0.030)
      GobDecode              15.9ms × (0.97,1.05)  15.7ms × (0.99,1.01)  -1.35% (p=0.044)
      GobEncode              11.4ms × (0.97,1.05)  11.3ms × (0.98,1.03)    ~    (p=0.141)
      Gzip                    658ms × (0.98,1.05)   648ms × (0.99,1.01)  -1.59% (p=0.009)
      Gunzip                  144ms × (0.99,1.03)   144ms × (0.99,1.01)    ~    (p=0.867)
      HTTPClientServer       92.1µs × (0.97,1.05)  90.3µs × (0.99,1.01)  -1.89% (p=0.005)
      JSONEncode             31.0ms × (0.96,1.07)  30.2ms × (0.98,1.03)  -2.66% (p=0.001)
      JSONDecode              110ms × (0.97,1.04)   107ms × (0.99,1.01)  -2.59% (p=0.000)
      Mandelbrot200          6.15ms × (0.98,1.04)  6.07ms × (0.99,1.02)  -1.32% (p=0.045)
      GoParse                6.79ms × (0.97,1.04)  6.74ms × (0.97,1.04)    ~    (p=0.242)
      RegexpMatchEasy0_32     158ns × (0.98,1.05)   155ns × (0.99,1.01)  -1.64% (p=0.010)
      RegexpMatchEasy0_1K     548ns × (0.97,1.04)   540ns × (0.99,1.01)  -1.34% (p=0.042)
      RegexpMatchEasy1_32     133ns × (0.97,1.04)   132ns × (0.97,1.05)    ~    (p=0.466)
      RegexpMatchEasy1_1K     899ns × (0.96,1.05)   878ns × (0.99,1.01)  -2.32% (p=0.002)
      RegexpMatchMedium_32    250ns × (0.96,1.03)   243ns × (0.99,1.01)  -2.90% (p=0.000)
      RegexpMatchMedium_1K   73.4µs × (0.98,1.04)  73.0µs × (0.98,1.04)    ~    (p=0.411)
      RegexpMatchHard_32     3.87µs × (0.97,1.07)  3.84µs × (0.98,1.04)    ~    (p=0.273)
      RegexpMatchHard_1K      120µs × (0.97,1.08)   117µs × (0.99,1.01)  -2.06% (p=0.010)
      Revcomp                 940ms × (0.96,1.07)   924ms × (0.97,1.07)    ~    (p=0.071)
      Template                128ms × (0.96,1.05)   128ms × (0.99,1.01)    ~    (p=0.502)
      TimeParse               632ns × (0.96,1.07)   616ns × (0.99,1.01)  -2.58% (p=0.001)
      TimeFormat              671ns × (0.97,1.06)   657ns × (0.99,1.02)  -2.10% (p=0.002)
      
      In contrast to the one in test/bench/go1 (above), the binarytree program on the
      shootout site uses more goroutines, batches allocations, and sets GOMAXPROCS
      to runtime.NumCPU()*2.
      
      Using that version, before vs after:
      
      name          old mean             new mean             delta
      BinaryTree20  18.6s × (0.96,1.05)  11.3s × (0.98,1.02)  -39.46% (p=0.000)
      
      And Go 1.4 vs after:
      
      name          old mean             new mean             delta
      BinaryTree20  13.0s × (0.97,1.02)  11.3s × (0.98,1.02)  -13.21% (p=0.000)
      
      There is still a scheduling problem - the raw run times are hiding the fact that
      this chews up 2x the CPU - but we'll take care of that separately.
      
      Change-Id: I3f5da879b24ae73a0d06745381ffb88c3744948b
      Reviewed-on: https://go-review.googlesource.com/10220Reviewed-by: 's avatarAustin Clements <austin@google.com>
      8903b3db
    • Russ Cox's avatar
      cmd/internal/gc: add missing write barrier in append(x, BigStructWithPointers) · 366ba526
      Russ Cox authored
      Fixes #10897.
      
      Change-Id: I5c2d1f9d26333e2b2a0613ebf496daa465e07c24
      Reviewed-on: https://go-review.googlesource.com/10221Reviewed-by: 's avatarAustin Clements <austin@google.com>
      366ba526
    • Shenghou Ma's avatar
      time: document that not all Unix time can be represented · f3fc8b02
      Shenghou Ma authored
      Fixes #10906.
      
      Change-Id: I7ae25a500df493c1e78183d69d89b3e2a64a0d1a
      Reviewed-on: https://go-review.googlesource.com/10223Reviewed-by: 's avatarAndrew Gerrand <adg@golang.org>
      f3fc8b02
    • Aaron Jacobs's avatar
      flag: Fix up a package comment a bit. · b21ff396
      Aaron Jacobs authored
      I think "the flag" was a typo, and the word "after" was repetitive.
      
      Change-Id: I81c034ca11a3a778ff1eb4b3af5b96bc525ab985
      Reviewed-on: https://go-review.googlesource.com/10195Reviewed-by: 's avatarRob Pike <r@golang.org>
      Reviewed-by: 's avatarAndrew Gerrand <adg@golang.org>
      b21ff396
  2. 18 May, 2015 16 commits
    • Josh Bleecher Snyder's avatar
      cmd/internal/gc: rearrange Node fields · 82833b31
      Josh Bleecher Snyder authored
      Rearrange Node fields to enable better struct packing.
      This reduces readability in favor of shrinking
      the size of Nodes.
      
      This reduces the size of Node from 328 to 312.
      This reduces the memory usage to compile the
      rotate tests by about 4.4%.
      
      No functional changes. Passes toolstash -cmp.
      
      Updates #9933.
      
      Change-Id: I2764c5847fb1635ddc898e2ee385d007d67f03c5
      Reviewed-on: https://go-review.googlesource.com/10141Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      82833b31
    • Josh Bleecher Snyder's avatar
      cmd/internal/gc: separate Node param fields · f4ab8203
      Josh Bleecher Snyder authored
      Param will be converted from an anonymous to a
      named field in a subsequent, automated CL.
      
      Reduces Node size from 368 to 328.
      Reduces inuse_space on the rotate tests by about 3%.
      
      No functional changes. Passes toolstash -cmp.
      
      Updates #9933.
      
      Change-Id: I5867b00328abf17ee24aea6ca58876bae9d8bfed
      Reviewed-on: https://go-review.googlesource.com/10210Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      f4ab8203
    • Josh Bleecher Snyder's avatar
      cmd/6g, cmd/internal/gc: use Etype instead of Ostk · ddc93398
      Josh Bleecher Snyder authored
      Change-Id: Ifda5d84b28717986c93b63767298180a6d6236c0
      Reviewed-on: https://go-review.googlesource.com/10140Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      ddc93398
    • Josh Bleecher Snyder's avatar
      cmd/internal/gc: make all Node depths int32 · 2b063bdf
      Josh Bleecher Snyder authored
      Funcdepth was already int32. Make Escloopdepth
      and Decldepth also int32 instead of int.
      
      No functional changes for non-absurd code. Passes toolstash -cmp.
      
      Change-Id: I47e145dd732b6a73cfcc6d45956df0dbccdcd999
      Reviewed-on: https://go-review.googlesource.com/10129Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      2b063bdf
    • Josh Bleecher Snyder's avatar
      runtime/pprof: write heap statistics to heap profile always · 79986e24
      Josh Bleecher Snyder authored
      This is a duplicate of CL 9491.
      That CL broke the build due to pprof shortcomings
      and was reverted in CL 9565.
      
      CL 9623 fixed pprof, so this can go in again.
      
      Fixes #10659.
      
      Change-Id: If470fc90b3db2ade1d161b4417abd2f5c6c330b8
      Reviewed-on: https://go-review.googlesource.com/10212Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      79986e24
    • Daniel Morsing's avatar
      cmd/pprof/internal/profile: ignore comments when parsing heap profiles · 19354b9d
      Daniel Morsing authored
      Fixes #10659.
      
      Change-Id: I22dc306ce6f398dd40010ac430928a718d67d466
      Reviewed-on: https://go-review.googlesource.com/9623Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      19354b9d
    • Rob Pike's avatar
      cmd/doc: put blank lines around comment for types, etc. · 6f7b4e89
      Rob Pike authored
      Better layout.
      
      Fixes #10859.
      
      The issue suggests rearranging so the comment comes out
      after the methods. I tried this and it looks good but it is less
      useful, since the stuff you're probably looking for - the methods
      - are scrolled away by the comment. The most important
      information should be last because that leaves it on your
      screen after the print if the output is long.
      
      Change-Id: I560f992601ccbe2293c347fa1b1018a3f5346c82
      Reviewed-on: https://go-review.googlesource.com/10160Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      6f7b4e89
    • Michael Hudson-Doyle's avatar
      misc/cgo/testshared: rewrite in Go · 362a40e3
      Michael Hudson-Doyle authored
      And fix to work on filesystems with only 1s resolution.
      
      Fixes #10724
      
      Change-Id: Ia07463f090b4290fc27f5953fa94186463d7afc7
      Reviewed-on: https://go-review.googlesource.com/9768Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      362a40e3
    • Robert Griesemer's avatar
      spec: fix typo · f9ec929a
      Robert Griesemer authored
      Fixes #10893.
      
      Change-Id: I8afeb55acda1e1c8e181379dbaf443716d63ded1
      Reviewed-on: https://go-review.googlesource.com/10201Reviewed-by: 's avatarRob Pike <r@golang.org>
      f9ec929a
    • David Chase's avatar
      cmd/internal/gc: extend escape analysis to pointers in slices · a21cf5b6
      David Chase authored
      Modified esc.go to allow slice literals (before append)
      to be non-escaping.  Modified tests to account for changes
      in escape behavior and to also test the two cases that
      were previously not tested.
      
      Also minor cleanups to debug-printing within esc.go
      
      Allocation stats for running compiler
      ( cd src/html/template;
        for i in {1..5} ; do
           go tool 6g -memprofile=testzz.${i}.prof  -memprofilerate=1 *.go ;
           go tool pprof -alloc_objects -text  testzz.${i}.prof ;
           done ; )
      before about 86k allocations
      after  about 83k allocations
      
      Fixes #8972
      
      Change-Id: Ib61dd70dc74adb40d6f6fdda6eaa4bf7d83481de
      Reviewed-on: https://go-review.googlesource.com/10118Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      a21cf5b6
    • Austin Clements's avatar
      runtime: use separate count and note for forEachP · f0dd0028
      Austin Clements authored
      Currently, forEachP reuses the stopwait and stopnote fields from
      stopTheWorld to track how many Ps have not responded to the safe-point
      request and to sleep until all Ps have responded.
      
      It was assumed this was safe because both stopTheWorld and forEachP
      must occur under the worlsema and hence stopwait and stopnote cannot
      be used for both purposes simultaneously and callers could always
      determine the appropriate use based on sched.gcwaiting (which is only
      set by stopTheWorld). However, this is not the case, since it's
      possible for there to be a window between when an M observes that
      gcwaiting is set and when it checks stopwait during which stopwait
      could have changed meanings. When this happens, the M decrements
      stopwait and may wakeup stopnote, but does not otherwise participate
      in the forEachP protocol. As a result, stopwait is decremented too
      many times, so it may reach zero before all Ps have run the safe-point
      function, causing forEachP to wake up early. It will then either
      observe that some P has not run the safe-point function and panic with
      "P did not run fn", or the remaining P (or Ps) will run the safe-point
      function before it wakes up and it will observe that stopwait is
      negative and panic with "not stopped".
      
      Fix this problem by giving forEachP its own safePointWait and
      safePointNote fields.
      
      One known sequence of events that can cause this race is as
      follows. It involves three actors:
      
      G1 is running on M1 on P1. P1 has an empty run queue.
      
      G2/M2 is in a blocked syscall and has lost its P. (The details of this
      don't matter, it just needs to be in a position where it needs to grab
      an idle P.)
      
      GC just started on G3/M3/P3. (These aren't very involved, they just
      have to be separate from the other G's, M's, and P's.)
      
      1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1.
      
      Now G1/M1 begins to enter a syscall:
      
      2. G1/M1 invokes reentersyscall, which sets the P1's status to
         _Psyscall.
      
      3. G1/M1's reentersyscall observes gcwaiting != 0 and calls
         entersyscall_gcwait.
      
      4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock.
      
      Back on GC:
      
      5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and
         returns.
      
      6. GC does stuff and then calls startTheWorld().
      
      7. startTheWorld() calls procresize(), which sets P1's status to
         _Pidle and puts P1 on the idle list.
      
      Now G2/M2 returns from its syscall and takes over P1:
      
      8. G2/M2 returns from its blocked syscall and gets P1 from the idle
         list.
      
      9. G2/M2 acquires P1, which sets P1's status to _Prunning.
      
      10. G2/M2 starts a new syscall and invokes reentersyscall, which sets
          P1's status to _Psyscall.
      
      Back on G1/M1:
      
      11. G1/M1 finally acquires sched.lock in entersyscall_gcwait.
      
      At this point, G1/M1 still thinks it's running on P1. P1's status is
      _Psyscall, which is consistent with what G1/M1 is doing, but it's
      _Psyscall because *G2/M2* put it in to _Psyscall, not G1/M1. This is
      basically an ABA race on P1's status.
      
      Because forEachP currently shares stopwait with stopTheWorld. G1/M1's
      entersyscall_gcwait observes the non-zero stopwait set by forEachP,
      but mistakes it for a stopTheWorld. It cas's P1's status from
      _Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement
      stopwait one more time than forEachP was expecting.
      
      Fixes #10618. (See the issue for details on why the above race is safe
      when forEachP is not involved.)
      
      Prior to this commit, the command
        stress ./runtime.test -test.run TestFutexsleep\|TestGoroutineProfile
      would reliably fail after a few hundred runs. With this commit, it
      ran for over 2 million runs and never crashed.
      
      Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30
      Reviewed-on: https://go-review.googlesource.com/10157Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      f0dd0028
    • Austin Clements's avatar
      runtime: hold worldsema while starting the world · 277acca2
      Austin Clements authored
      Currently, startTheWorld releases worldsema before starting the
      world. Since startTheWorld can change gomaxprocs after allowing Ps to
      run, this means that gomaxprocs can change while another P holds
      worldsema.
      
      Unfortunately, the garbage collector and forEachP assume that holding
      worldsema protects against changes in gomaxprocs (which it *almost*
      does). In particular, this is causing somewhat frequent "P did not run
      fn" crashes in forEachP in the runtime tests because gomaxprocs is
      changing between the several loops that forEachP does over all the Ps.
      
      Fix this by only releasing worldsema after the world is started.
      
      This relates to issue #10618. forEachP still fails under stress
      testing, but much less frequently.
      
      Change-Id: I085d627b70cca9ebe9af28fe73b9872f1bb224ff
      Reviewed-on: https://go-review.googlesource.com/10156Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      277acca2
    • Austin Clements's avatar
      runtime: disallow preemption during startTheWorld · 9c44a41d
      Austin Clements authored
      Currently, startTheWorld clears preemptoff for the current M before
      starting the world. A few callers increment m.locks around
      startTheWorld, presumably to prevent preemption any time during
      starting the world. This is almost certainly pointless (none of the
      other callers do this), but there's no harm in making startTheWorld
      keep preemption disabled until it's all done, which definitely lets us
      drop these m.locks manipulations.
      
      Change-Id: I8a93658abd0c72276c9bafa3d2c7848a65b4691a
      Reviewed-on: https://go-review.googlesource.com/10155Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      9c44a41d
    • Austin Clements's avatar
      runtime: factor stoptheworld/starttheworld pattern · a1da255a
      Austin Clements authored
      There are several steps to stopping and starting the world and
      currently they're open-coded in several places. The garbage collector
      is the only thing that needs to stop and start the world in a
      non-trivial pattern. Replace all other uses with calls to higher-level
      functions that implement the entire pattern necessary to stop and
      start the world.
      
      This is a pure refectoring and should not change any code semantics.
      In the following commits, we'll make changes that are easier to do
      with this abstraction in place.
      
      This commit renames the old starttheworld to startTheWorldWithSema.
      This is a slight misnomer right now because the callers release
      worldsema just before calling this. However, a later commit will swap
      these and I don't want to think of another name in the mean time.
      
      Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911
      Reviewed-on: https://go-review.googlesource.com/10154Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      a1da255a
    • Austin Clements's avatar
      runtime: don't start GC if preemptoff is set · 5f7060af
      Austin Clements authored
      In order to avoid deadlocks, startGC avoids kicking off GC if locks
      are held by the calling M. However, it currently fails to check
      preemptoff, which is the other way to disable preemption.
      
      Fix this by adding a check for preemptoff.
      
      Change-Id: Ie1083166e5ba4af5c9d6c5a42efdfaaef41ca997
      Reviewed-on: https://go-review.googlesource.com/10153Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      5f7060af
    • Alex Brainman's avatar
      runtime: correct exception stack trace output · e544bee1
      Alex Brainman authored
      It is misleading when stack trace say:
      
      signal arrived during cgo execution
      
      but we are not in cgo call.
      
      Change-Id: I627e2f2bdc7755074677f77f21befc070a101914
      Reviewed-on: https://go-review.googlesource.com/9190Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      e544bee1
  3. 17 May, 2015 3 commits
  4. 16 May, 2015 4 commits
    • Russ Cox's avatar
      cmd/internal/gc: refine ginscmp comment · 6e8bcbbe
      Russ Cox authored
      Change-Id: I2ebb36c6c5de9d34e52ed523e9c888452591924a
      Reviewed-on: https://go-review.googlesource.com/10152Reviewed-by: 's avatarMinux Ma <minux@golang.org>
      6e8bcbbe
    • Russ Cox's avatar
      reflect: make PtrTo(FuncOf(...)) not crash · d36cc027
      Russ Cox authored
      Change-Id: Ie67e295bf327126dfdc75b73979fe33fbcb79ad9
      Reviewed-on: https://go-review.googlesource.com/10150Reviewed-by: 's avatarAustin Clements <austin@google.com>
      d36cc027
    • Russ Cox's avatar
      runtime: replace GC programs with simpler encoding, faster decoder · 512f75e8
      Russ Cox authored
      Small types record the location of pointers in their memory layout
      by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries,
      and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using
      a bitmap for a large type containing arrays does not make sense:
      if someone refers to the type [1<<28]*byte in a program in such
      a way that the type information makes it into the binary, it would be
      a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB
      (for 1-bit entries) bitmap full of 1s into the binary or even to keep
      one in memory during the execution of the program.
      
      For large types containing arrays, it is much more compact to describe
      the locations of pointers using a notation that can express repetition
      than to lay out a bitmap of pointers. Go 1.4 included such a notation,
      called ``GC programs'' but it was complex, required recursion during
      decoding, and was generally slow. Dmitriy measured the execution of
      these programs writing directly to the heap bitmap as being 7x slower
      than copying from a preunrolled 4-bit mask (and frankly that code was
      not terribly fast either). For some tests, unrollgcprog1 was seen costing
      as much as 3x more than the rest of malloc combined.
      
      This CL introduces a different form for the GC programs. They use a
      simple Lempel-Ziv-style encoding of the 1-bit pointer information,
      in which the only operations are (1) emit the following n bits
      and (2) repeat the last n bits c more times. This encoding can be
      generated directly from the Go type information (using repetition
      only for arrays or large runs of non-pointer data) and it can be decoded
      very efficiently. In particular the decoding requires little state and
      no recursion, so that the entire decoding can run without any memory
      accesses other than the reads of the encoding and the writes of the
      decoded form to the heap bitmap. For recursive types like arrays of
      arrays of arrays, the inner instructions are only executed once, not
      n times, so that large repetitions run at full speed. (In contrast, large
      repetitions in the old programs repeated the individual bit-level layout
      of the inner data over and over.) The result is as much as 25x faster
      decoding compared to the old form.
      
      Because the old decoder was so slow, Go 1.4 had three (or so) cases
      for how to set the heap bitmap bits for an allocation of a given type:
      
      (1) If the type had an even number of words up to 32 words, then
      the 4-bit pointer mask for the type fit in no more than 16 bytes;
      store the 4-bit pointer mask directly in the binary and copy from it.
      
      (1b) If the type had an odd number of words up to 15 words, then
      the 4-bit pointer mask for the type, doubled to end on a byte boundary,
      fit in no more than 16 bytes; store that doubled mask directly in the
      binary and copy from it.
      
      (2) If the type had an even number of words up to 128 words,
      or an odd number of words up to 63 words (again due to doubling),
      then the 4-bit pointer mask would fit in a 64-byte unrolled mask.
      Store a GC program in the binary, but leave space in the BSS for
      the unrolled mask. Execute the GC program to construct the mask the
      first time it is needed, and thereafter copy from the mask.
      
      (3) Otherwise, store a GC program and execute it to write directly to
      the heap bitmap each time an object of that type is allocated.
      (This is the case that was 7x slower than the other two.)
      
      Because the new pointer masks store 1-bit entries instead of 4-bit
      entries and because using the decoder no longer carries a significant
      overhead, after this CL (that is, for Go 1.5) there are only two cases:
      
      (1) If the type is 128 words or less (no condition about odd or even),
      store the 1-bit pointer mask directly in the binary and use it to
      initialize the heap bitmap during malloc. (Implemented in CL 9702.)
      
      (2) There is no case 2 anymore.
      
      (3) Otherwise, store a GC program and execute it to write directly to
      the heap bitmap each time an object of that type is allocated.
      
      Executing the GC program directly into the heap bitmap (case (3) above)
      was disabled for the Go 1.5 dev cycle, both to avoid needing to use
      GC programs for typedmemmove and to avoid updating that code as
      the heap bitmap format changed. Typedmemmove no longer uses this
      type information; as of CL 9886 it uses the heap bitmap directly.
      Now that the heap bitmap format is stable, we reintroduce GC programs
      and their space savings.
      
      Benchmarks for heapBitsSetType, before this CL vs this CL:
      
      name                    old mean               new mean              delta
      SetTypePtr              7.59ns × (0.99,1.02)   5.16ns × (1.00,1.00)  -32.05% (p=0.000)
      SetTypePtr8             21.0ns × (0.98,1.05)   21.4ns × (1.00,1.00)     ~    (p=0.179)
      SetTypePtr16            24.1ns × (0.99,1.01)   24.6ns × (1.00,1.00)   +2.41% (p=0.001)
      SetTypePtr32            31.2ns × (0.99,1.01)   32.4ns × (0.99,1.02)   +3.72% (p=0.001)
      SetTypePtr64            45.2ns × (1.00,1.00)   47.2ns × (1.00,1.00)   +4.42% (p=0.000)
      SetTypePtr126           75.8ns × (0.99,1.01)   79.1ns × (1.00,1.00)   +4.25% (p=0.000)
      SetTypePtr128           74.3ns × (0.99,1.01)   77.6ns × (1.00,1.01)   +4.55% (p=0.000)
      SetTypePtrSlice          726ns × (1.00,1.01)    712ns × (1.00,1.00)   -1.95% (p=0.001)
      SetTypeNode1            20.0ns × (0.99,1.01)   20.7ns × (1.00,1.00)   +3.71% (p=0.000)
      SetTypeNode1Slice        112ns × (1.00,1.00)    113ns × (0.99,1.00)     ~    (p=0.070)
      SetTypeNode8            23.9ns × (1.00,1.00)   24.7ns × (1.00,1.01)   +3.18% (p=0.000)
      SetTypeNode8Slice        294ns × (0.99,1.02)    287ns × (0.99,1.01)   -2.38% (p=0.015)
      SetTypeNode64           52.8ns × (0.99,1.03)   51.8ns × (0.99,1.01)     ~    (p=0.069)
      SetTypeNode64Slice      1.13µs × (0.99,1.05)   1.14µs × (0.99,1.00)     ~    (p=0.767)
      SetTypeNode64Dead       36.0ns × (1.00,1.01)   32.5ns × (0.99,1.00)   -9.67% (p=0.000)
      SetTypeNode64DeadSlice  1.43µs × (0.99,1.01)   1.40µs × (1.00,1.00)   -2.39% (p=0.001)
      SetTypeNode124          75.7ns × (1.00,1.01)   79.0ns × (1.00,1.00)   +4.44% (p=0.000)
      SetTypeNode124Slice     1.94µs × (1.00,1.01)   2.04µs × (0.99,1.01)   +4.98% (p=0.000)
      SetTypeNode126          75.4ns × (1.00,1.01)   77.7ns × (0.99,1.01)   +3.11% (p=0.000)
      SetTypeNode126Slice     1.95µs × (0.99,1.01)   2.03µs × (1.00,1.00)   +3.74% (p=0.000)
      SetTypeNode128          85.4ns × (0.99,1.01)  122.0ns × (1.00,1.00)  +42.89% (p=0.000)
      SetTypeNode128Slice     2.20µs × (1.00,1.01)   2.36µs × (0.98,1.02)   +7.48% (p=0.001)
      SetTypeNode130          83.3ns × (1.00,1.00)  123.0ns × (1.00,1.00)  +47.61% (p=0.000)
      SetTypeNode130Slice     2.30µs × (0.99,1.01)   2.40µs × (0.98,1.01)   +4.37% (p=0.000)
      SetTypeNode1024          498ns × (1.00,1.00)    537ns × (1.00,1.00)   +7.96% (p=0.000)
      SetTypeNode1024Slice    15.5µs × (0.99,1.01)   17.8µs × (1.00,1.00)  +15.27% (p=0.000)
      
      The above compares always using a cached pointer mask (and the
      corresponding waste of memory) against using the programs directly.
      Some slowdown is expected, in exchange for having a better general algorithm.
      The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024,
      along with the slice variants of those.
      It is possible that the cutoff of 128 words (bits) should be raised
      in a followup CL, but even with this low cutoff the GC programs are
      faster than Go 1.4's "fast path" non-GC program case.
      
      Benchmarks for heapBitsSetType, Go 1.4 vs this CL:
      
      name                    old mean              new mean              delta
      SetTypePtr              6.89ns × (1.00,1.00)  5.17ns × (1.00,1.00)  -25.02% (p=0.000)
      SetTypePtr8             25.8ns × (0.97,1.05)  21.5ns × (1.00,1.00)  -16.70% (p=0.000)
      SetTypePtr16            39.8ns × (0.97,1.02)  24.7ns × (0.99,1.01)  -37.81% (p=0.000)
      SetTypePtr32            68.8ns × (0.98,1.01)  32.2ns × (1.00,1.01)  -53.18% (p=0.000)
      SetTypePtr64             130ns × (1.00,1.00)    47ns × (1.00,1.00)  -63.67% (p=0.000)
      SetTypePtr126            241ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.25% (p=0.000)
      SetTypePtr128           2.07µs × (1.00,1.00)  0.08µs × (1.00,1.00)  -96.27% (p=0.000)
      SetTypePtrSlice         1.05µs × (0.99,1.01)  0.72µs × (0.99,1.02)  -31.70% (p=0.000)
      SetTypeNode1            16.0ns × (0.99,1.01)  20.8ns × (0.99,1.03)  +29.91% (p=0.000)
      SetTypeNode1Slice        184ns × (0.99,1.01)   112ns × (0.99,1.01)  -39.26% (p=0.000)
      SetTypeNode8            29.5ns × (0.97,1.02)  24.6ns × (1.00,1.00)  -16.50% (p=0.000)
      SetTypeNode8Slice        624ns × (0.98,1.02)   285ns × (1.00,1.00)  -54.31% (p=0.000)
      SetTypeNode64            135ns × (0.96,1.08)    52ns × (0.99,1.02)  -61.32% (p=0.000)
      SetTypeNode64Slice      3.83µs × (1.00,1.00)  1.14µs × (0.99,1.01)  -70.16% (p=0.000)
      SetTypeNode64Dead        134ns × (0.99,1.01)    32ns × (1.00,1.01)  -75.74% (p=0.000)
      SetTypeNode64DeadSlice  3.83µs × (0.99,1.00)  1.40µs × (1.00,1.01)  -63.42% (p=0.000)
      SetTypeNode124           240ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.05% (p=0.000)
      SetTypeNode124Slice     7.27µs × (1.00,1.00)  2.04µs × (1.00,1.00)  -71.95% (p=0.000)
      SetTypeNode126          2.06µs × (0.99,1.01)  0.08µs × (0.99,1.01)  -96.23% (p=0.000)
      SetTypeNode126Slice     64.4µs × (1.00,1.00)   2.0µs × (1.00,1.00)  -96.85% (p=0.000)
      SetTypeNode128          2.09µs × (1.00,1.01)  0.12µs × (1.00,1.00)  -94.15% (p=0.000)
      SetTypeNode128Slice     65.4µs × (1.00,1.00)   2.4µs × (0.99,1.03)  -96.39% (p=0.000)
      SetTypeNode130          2.11µs × (1.00,1.00)  0.12µs × (1.00,1.00)  -94.18% (p=0.000)
      SetTypeNode130Slice     66.3µs × (1.00,1.00)   2.4µs × (0.97,1.08)  -96.34% (p=0.000)
      SetTypeNode1024         16.0µs × (1.00,1.01)   0.5µs × (1.00,1.00)  -96.65% (p=0.000)
      SetTypeNode1024Slice     512µs × (1.00,1.00)    18µs × (0.98,1.04)  -96.45% (p=0.000)
      
      SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation.
      Both Go 1.4 and this CL are using pointer bitmaps for this case,
      so that's an overall 3x speedup for using pointer bitmaps.
      
      SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation.
      Both Go 1.4 and this CL are running the GC program for this case,
      so that's an overall 17x speedup when using GC programs (and
      I've seen >20x on other systems).
      
      Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against
      this CL's SetTypeNode128 (GC program), the slow path in the
      code in this CL is 2x faster than the fast path in Go 1.4.
      
      The Go 1 benchmarks are basically unaffected compared to just before this CL.
      
      Go 1 benchmarks, before this CL vs this CL:
      
      name                   old mean              new mean              delta
      BinaryTree17            5.87s × (0.97,1.04)   5.91s × (0.96,1.04)    ~    (p=0.306)
      Fannkuch11              4.38s × (1.00,1.00)   4.37s × (1.00,1.01)  -0.22% (p=0.006)
      FmtFprintfEmpty        90.7ns × (0.97,1.10)  89.3ns × (0.96,1.09)    ~    (p=0.280)
      FmtFprintfString        282ns × (0.98,1.04)   287ns × (0.98,1.07)  +1.72% (p=0.039)
      FmtFprintfInt           269ns × (0.99,1.03)   282ns × (0.97,1.04)  +4.87% (p=0.000)
      FmtFprintfIntInt        478ns × (0.99,1.02)   481ns × (0.99,1.02)  +0.61% (p=0.048)
      FmtFprintfPrefixedInt   399ns × (0.98,1.03)   400ns × (0.98,1.05)    ~    (p=0.533)
      FmtFprintfFloat         563ns × (0.99,1.01)   570ns × (1.00,1.01)  +1.37% (p=0.000)
      FmtManyArgs            1.89µs × (0.99,1.01)  1.92µs × (0.99,1.02)  +1.88% (p=0.000)
      GobDecode              15.2ms × (0.99,1.01)  15.2ms × (0.98,1.05)    ~    (p=0.609)
      GobEncode              11.6ms × (0.98,1.03)  11.9ms × (0.98,1.04)  +2.17% (p=0.000)
      Gzip                    648ms × (0.99,1.01)   648ms × (1.00,1.01)    ~    (p=0.835)
      Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.169)
      HTTPClientServer       90.5µs × (0.98,1.03)  91.5µs × (0.98,1.04)  +1.04% (p=0.045)
      JSONEncode             31.5ms × (0.98,1.03)  31.4ms × (0.98,1.03)    ~    (p=0.549)
      JSONDecode              111ms × (0.99,1.01)   107ms × (0.99,1.01)  -3.21% (p=0.000)
      Mandelbrot200          6.01ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.878)
      GoParse                6.54ms × (0.99,1.02)  6.61ms × (0.99,1.03)  +1.08% (p=0.004)
      RegexpMatchEasy0_32     160ns × (1.00,1.01)   161ns × (1.00,1.00)  +0.40% (p=0.000)
      RegexpMatchEasy0_1K     560ns × (0.99,1.01)   559ns × (0.99,1.01)    ~    (p=0.088)
      RegexpMatchEasy1_32     138ns × (0.99,1.01)   138ns × (1.00,1.00)    ~    (p=0.380)
      RegexpMatchEasy1_1K     877ns × (1.00,1.00)   878ns × (1.00,1.00)    ~    (p=0.157)
      RegexpMatchMedium_32    251ns × (0.99,1.00)   251ns × (1.00,1.01)  +0.28% (p=0.021)
      RegexpMatchMedium_1K   72.6µs × (1.00,1.00)  72.6µs × (1.00,1.00)    ~    (p=0.539)
      RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.378)
      RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)    ~    (p=0.067)
      Revcomp                 904ms × (0.99,1.02)   904ms × (0.99,1.01)    ~    (p=0.943)
      Template                125ms × (0.99,1.02)   127ms × (0.99,1.01)  +1.79% (p=0.000)
      TimeParse               627ns × (0.99,1.01)   622ns × (0.99,1.01)  -0.88% (p=0.000)
      TimeFormat              655ns × (0.99,1.02)   655ns × (0.99,1.02)    ~    (p=0.976)
      
      For the record, Go 1 benchmarks, Go 1.4 vs this CL:
      
      name                   old mean              new mean              delta
      BinaryTree17            4.61s × (0.97,1.05)   5.91s × (0.98,1.03)  +28.35% (p=0.000)
      Fannkuch11              4.40s × (0.99,1.03)   4.41s × (0.99,1.01)     ~    (p=0.212)
      FmtFprintfEmpty         102ns × (0.99,1.01)    84ns × (0.99,1.02)  -18.38% (p=0.000)
      FmtFprintfString        302ns × (0.98,1.01)   303ns × (0.99,1.02)     ~    (p=0.203)
      FmtFprintfInt           313ns × (0.97,1.05)   270ns × (0.99,1.01)  -13.69% (p=0.000)
      FmtFprintfIntInt        524ns × (0.98,1.02)   477ns × (0.99,1.00)   -8.87% (p=0.000)
      FmtFprintfPrefixedInt   424ns × (0.98,1.02)   386ns × (0.99,1.01)   -8.96% (p=0.000)
      FmtFprintfFloat         652ns × (0.98,1.02)   594ns × (0.97,1.05)   -8.97% (p=0.000)
      FmtManyArgs            2.13µs × (0.99,1.02)  1.94µs × (0.99,1.01)   -8.92% (p=0.000)
      GobDecode              17.1ms × (0.99,1.02)  14.9ms × (0.98,1.03)  -13.07% (p=0.000)
      GobEncode              13.5ms × (0.98,1.03)  11.5ms × (0.98,1.03)  -15.25% (p=0.000)
      Gzip                    656ms × (0.99,1.02)   647ms × (0.99,1.01)   -1.29% (p=0.000)
      Gunzip                  143ms × (0.99,1.02)   144ms × (0.99,1.01)     ~    (p=0.204)
      HTTPClientServer       88.2µs × (0.98,1.02)  90.8µs × (0.98,1.01)   +2.93% (p=0.000)
      JSONEncode             32.2ms × (0.98,1.02)  30.9ms × (0.97,1.04)   -4.06% (p=0.001)
      JSONDecode              121ms × (0.98,1.02)   110ms × (0.98,1.05)   -8.95% (p=0.000)
      Mandelbrot200          6.06ms × (0.99,1.01)  6.11ms × (0.98,1.04)     ~    (p=0.184)
      GoParse                6.76ms × (0.97,1.04)  6.58ms × (0.98,1.05)   -2.63% (p=0.003)
      RegexpMatchEasy0_32     195ns × (1.00,1.01)   155ns × (0.99,1.01)  -20.43% (p=0.000)
      RegexpMatchEasy0_1K     479ns × (0.98,1.03)   535ns × (0.99,1.02)  +11.59% (p=0.000)
      RegexpMatchEasy1_32     169ns × (0.99,1.02)   131ns × (0.99,1.03)  -22.44% (p=0.000)
      RegexpMatchEasy1_1K    1.53µs × (0.99,1.01)  0.87µs × (0.99,1.02)  -43.07% (p=0.000)
      RegexpMatchMedium_32    334ns × (0.99,1.01)   242ns × (0.99,1.01)  -27.53% (p=0.000)
      RegexpMatchMedium_1K    125µs × (1.00,1.01)    72µs × (0.99,1.03)  -42.53% (p=0.000)
      RegexpMatchHard_32     6.03µs × (0.99,1.01)  3.79µs × (0.99,1.01)  -37.12% (p=0.000)
      RegexpMatchHard_1K      189µs × (0.99,1.02)   115µs × (0.99,1.01)  -39.20% (p=0.000)
      Revcomp                 935ms × (0.96,1.03)   926ms × (0.98,1.02)     ~    (p=0.083)
      Template                146ms × (0.97,1.05)   119ms × (0.99,1.01)  -18.37% (p=0.000)
      TimeParse               660ns × (0.99,1.01)   624ns × (0.99,1.02)   -5.43% (p=0.000)
      TimeFormat              670ns × (0.98,1.02)   710ns × (1.00,1.01)   +5.97% (p=0.000)
      
      This CL is a bit larger than I would like, but the compiler, linker, runtime,
      and package reflect all need to be in sync about the format of these programs,
      so there is no easy way to split this into independent changes (at least
      while keeping the build working at each change).
      
      Fixes #9625.
      Fixes #10524.
      
      Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a
      Reviewed-on: https://go-review.googlesource.com/9888Reviewed-by: 's avatarAustin Clements <austin@google.com>
      Run-TryBot: Russ Cox <rsc@golang.org>
      512f75e8
    • Didier Spezia's avatar
      text/template: fix race condition on function maps · ebe733cb
      Didier Spezia authored
      The Template objects are supposed to be goroutine-safe once they
      have been parsed. This includes the text and html ones.
      
      For html/template, the escape mechanism is triggered at execution
      time. It may alter the internal structures of the template, so
      a mutex protects them against concurrent accesses.
      
      The text/template package is free of any synchronization primitive.
      
      A race condition may occur when nested templates are escaped:
      the escape algorithm alters the function maps of the associated
      text templates, while a concurrent template execution may access
      the function maps in read mode.
      
      The less invasive fix I have found is to introduce a RWMutex in
      text/template to protect the function maps. This is unfortunate
      but it should be effective.
      
      Fixes #9945
      
      Change-Id: I1edb73c0ed0f1fcddd2f1516230b548b92ab1269
      Reviewed-on: https://go-review.googlesource.com/10101Reviewed-by: 's avatarRob Pike <r@golang.org>
      ebe733cb
  5. 15 May, 2015 8 commits