1. 28 Apr, 2017 16 commits
  2. 27 Apr, 2017 19 commits
    • Josh Bleecher Snyder's avatar
      cmd/compile: minor writebarrier cleanup · 12c286c1
      Josh Bleecher Snyder authored
      This CL mainly moves some work to the switch on w.Op,
      to make a follow-up change simpler and clearer.
      
      Updates #19838
      
      Change-Id: I86f3181c380dd60960afcc24224f655276b8956c
      Reviewed-on: https://go-review.googlesource.com/42010
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      12c286c1
    • Josh Bleecher Snyder's avatar
      cmd/compile: move Used from gc.Node to gc.Name · fc08a19c
      Josh Bleecher Snyder authored
      Node.Used was written to from the backend
      concurrently with reads of Node.Class
      for the same ONAME Nodes.
      I do not know why it was not failing consistently
      under the race detector, but it is a race.
      
      This is likely also a problem with Node.HasVal and Node.HasOpt.
      They will be handled in a separate CL.
      
      Fix Used by moving it to gc.Name and making it a separate bool.
      There was one non-Name use of Used, marking OLABELs as used.
      That is no longer needed, now that goto and label checking
      happens early in the front end.
      
      Leave the getters and setters in place,
      to ease changing the representation in the future
      (or changing to an interface!).
      
      Updates #20144
      
      Change-Id: I9bec7c6d33dcb129a4cfa9d338462ea33087f9f7
      Reviewed-on: https://go-review.googlesource.com/42015
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      fc08a19c
    • Josh Bleecher Snyder's avatar
      cmd/compile: add Type.MustSize and Type.MustAlignment · 94d540a4
      Josh Bleecher Snyder authored
      Type.Size and Type.Alignment are for the front end:
      They calculate size and alignment if needed.
      
      Type.MustSize and Type.MustAlignment are for the back end:
      They call Fatal if size and alignment are not already calculated.
      
      Most uses are of MustSize and MustAlignment,
      but that's because the back end is newer,
      and this API was added to support it.
      
      This CL was mostly generated with sed and selective reversion.
      The only mildly interesting bit is the change of the ssa.Type interface
      and the supporting ssa dummy types.
      
      Follow-up to review feedback on CL 41970.
      
      Passes toolstash-check.
      
      Change-Id: I0d9b9505e57453dae8fb6a236a07a7a02abd459e
      Reviewed-on: https://go-review.googlesource.com/42016
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      94d540a4
    • Josh Bleecher Snyder's avatar
      cmd/compile: dowidth more in the front end · 0b6a10ef
      Josh Bleecher Snyder authored
      dowidth is fundamentally unsafe to call from the back end;
      it will cause data races.
      
      Replace all calls to dowidth in the backend with
      assertions that the width has been calculated.
      
      Then fix all the cases in which that was not so,
      including the cases from #20145.
      
      Fixes #20145.
      
      Change-Id: Idba3d19d75638851a30ec2ebcdb703c19da3e92b
      Reviewed-on: https://go-review.googlesource.com/41970
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      0b6a10ef
    • Michael Hudson-Doyle's avatar
      cmd/internal/objabi, cmd/link: move linker-only symkind values into linker · be2ee2a4
      Michael Hudson-Doyle authored
      Many (most!) of the values of objapi.SymKind are used only in the linker, so
      this creates a separate cmd/link/internal/ld.SymKind type, removes most values
      from SymKind and maps one to the other when reading object files in the linker.
      
      Two of the remaining objapi.SymKind values are only checked for, never set and
      so will never be actually found but I wanted to keep this to the most
      mechanical change possible.
      
      Change-Id: I4bbc5aed6713cab3e8de732e6e288eb77be0474c
      Reviewed-on: https://go-review.googlesource.com/40985
      Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      be2ee2a4
    • Hana Kim's avatar
      dwarf: add marker for embedded fields in dwarf · b1868cf1
      Hana Kim authored
      Currently, the following two codes generate the identical dwarf info
      for type Foo.
      
      prog 1)
      type Foo struct {
         Bar
      }
      
      prog 2)
      type Foo struct {
         Bar Bar
      }
      
      This change adds a go-specific attribute DW_AT_go_embedded_field
      to annotate each member entry. Its absence or false value indicates
      the corresponding member is not an embedded field.
      
      Update #20037
      
      Change-Id: Ibcbd2714f3e4d97c7b523d7398f29ab2301cc897
      Reviewed-on: https://go-review.googlesource.com/41873Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      b1868cf1
    • Josh Bleecher Snyder's avatar
      cmd/compile: randomize compilation order when race-enabled · f5c878e0
      Josh Bleecher Snyder authored
      There's been one failure on the race builder so far,
      before we started sorting functions by length.
      
      The race detector can only detect actual races,
      and ordering functions by length might reduce the odds
      of catching some kinds of races. Give it more to chew on.
      
      Updates #20144
      
      Change-Id: I0206ac182cb98b70a729dea9703ecb0fef54d2d0
      Reviewed-on: https://go-review.googlesource.com/41973
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      f5c878e0
    • Josh Bleecher Snyder's avatar
      cmd/compile: move nodarg to walk.go · 26e126d6
      Josh Bleecher Snyder authored
      Its sole use is in walk.go. 100% code movement.
      
      gsubr.go increasingly contains backend-y things.
      With a few more relocations, it could probably be
      fruitfully renamed progs.go.
      
      Change-Id: I61ec5c2bc1f8cfdda64c6d6f580952c154ff60e0
      Reviewed-on: https://go-review.googlesource.com/41972
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      26e126d6
    • Josh Bleecher Snyder's avatar
      cmd/compile: move addrescapes and moveToHeap to esc.go · fcee3777
      Josh Bleecher Snyder authored
      They were used only in esc.go. 100% code movement.
      
      Also, remove the rather outdated comment at the top of gen.go.
      It's not really clear what gen.go is for any more.
      
      Change-Id: Iaedfe7015ef6f5c11c49f3e6721b15d779a00faa
      Reviewed-on: https://go-review.googlesource.com/41971
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      fcee3777
    • Keith Randall's avatar
      cmd/internal/obj: ARM, use immediates instead of constant pool entries · 14f3ca56
      Keith Randall authored
      When a constant doesn't fit in a single instruction, use two
      paired instructions instead of the constant pool.  For example
      
        ADD $0xaa00bb, R0, R1
      
      Used to rewrite to:
      
        MOV ?(IP), R11
        ADD R11, R0, R1
      
      Instead, do:
      
        ADD $0xaa0000, R0, R1
        ADD $0xbb, R1, R1
      
      Same number of instructions.
      Good:
        4 less bytes (no constant pool entry)
        One less load.
      Bad:
        Critical path is one instruction longer.
      
      It's probably worth it to avoid the loads, they are expensive.
      
      Dave Cheney got us some performance numbers: https://perf.golang.org/search?q=upload:20170426.1
      TL;DR mean 1.37% improvement.
      
      Change-Id: Ib206836161fdc94a3962db6f9caa635c87d57cf1
      Reviewed-on: https://go-review.googlesource.com/41612
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      14f3ca56
    • Bryan C. Mills's avatar
      encoding/gob: replace RWMutex usage with sync.Map · c120e449
      Bryan C. Mills authored
      This provides a significant speedup for encoding and decoding when
      using many CPU cores.
      
      name                        old time/op  new time/op  delta
      EndToEndPipe                5.26µs ± 2%  5.38µs ± 7%     ~     (p=0.121 n=8+7)
      EndToEndPipe-6              1.86µs ± 5%  1.80µs ±11%     ~     (p=0.442 n=8+8)
      EndToEndPipe-48             1.39µs ± 2%  1.41µs ± 4%     ~     (p=0.645 n=8+8)
      EndToEndByteBuffer          1.54µs ± 5%  1.57µs ± 5%     ~     (p=0.130 n=8+8)
      EndToEndByteBuffer-6         620ns ± 6%   310ns ± 8%  -50.04%  (p=0.000 n=8+8)
      EndToEndByteBuffer-48        506ns ± 4%   110ns ± 3%  -78.22%  (p=0.000 n=8+8)
      EndToEndSliceByteBuffer      149µs ± 3%   153µs ± 5%   +2.80%  (p=0.021 n=8+8)
      EndToEndSliceByteBuffer-6    103µs ±17%    31µs ±12%  -70.06%  (p=0.000 n=8+8)
      EndToEndSliceByteBuffer-48  93.2µs ± 2%  18.0µs ± 5%  -80.66%  (p=0.000 n=7+8)
      EncodeComplex128Slice       20.6µs ± 5%  20.9µs ± 8%     ~     (p=0.959 n=8+8)
      EncodeComplex128Slice-6     4.10µs ±10%  3.75µs ± 8%   -8.58%  (p=0.004 n=8+7)
      EncodeComplex128Slice-48    1.14µs ± 2%  0.81µs ± 2%  -28.98%  (p=0.000 n=8+8)
      EncodeFloat64Slice          10.2µs ± 7%  10.1µs ± 6%     ~     (p=0.694 n=7+8)
      EncodeFloat64Slice-6        2.01µs ± 6%  1.80µs ±11%  -10.30%  (p=0.004 n=8+8)
      EncodeFloat64Slice-48        701ns ± 3%   408ns ± 2%  -41.72%  (p=0.000 n=8+8)
      EncodeInt32Slice            11.8µs ± 7%  11.7µs ± 6%     ~     (p=0.463 n=8+7)
      EncodeInt32Slice-6          2.32µs ± 4%  2.06µs ± 5%  -10.89%  (p=0.000 n=8+8)
      EncodeInt32Slice-48          731ns ± 2%   445ns ± 2%  -39.10%  (p=0.000 n=7+8)
      EncodeStringSlice           9.13µs ± 9%  9.18µs ± 8%     ~     (p=0.798 n=8+8)
      EncodeStringSlice-6         1.91µs ± 5%  1.70µs ± 5%  -11.07%  (p=0.000 n=8+8)
      EncodeStringSlice-48         679ns ± 3%   397ns ± 3%  -41.50%  (p=0.000 n=8+8)
      EncodeInterfaceSlice         449µs ±11%   461µs ± 9%     ~     (p=0.328 n=8+8)
      EncodeInterfaceSlice-6       503µs ± 7%    88µs ± 7%  -82.51%  (p=0.000 n=7+8)
      EncodeInterfaceSlice-48      335µs ± 8%    22µs ± 1%  -93.55%  (p=0.000 n=8+7)
      DecodeComplex128Slice       67.2µs ± 4%  67.0µs ± 6%     ~     (p=0.721 n=8+8)
      DecodeComplex128Slice-6     22.0µs ± 8%  18.9µs ± 5%  -14.44%  (p=0.000 n=8+8)
      DecodeComplex128Slice-48    46.8µs ± 3%  34.9µs ± 3%  -25.48%  (p=0.000 n=8+8)
      DecodeFloat64Slice          39.4µs ± 4%  40.3µs ± 3%     ~     (p=0.105 n=8+8)
      DecodeFloat64Slice-6        16.1µs ± 2%  11.2µs ± 7%  -30.64%  (p=0.001 n=6+7)
      DecodeFloat64Slice-48       38.1µs ± 3%  24.0µs ± 7%  -37.10%  (p=0.000 n=8+8)
      DecodeInt32Slice            39.1µs ± 4%  40.1µs ± 5%     ~     (p=0.083 n=8+8)
      DecodeInt32Slice-6          16.3µs ±21%  10.6µs ± 1%  -35.17%  (p=0.000 n=8+7)
      DecodeInt32Slice-48         36.5µs ± 6%  21.9µs ± 9%  -39.89%  (p=0.000 n=8+8)
      DecodeStringSlice           82.9µs ± 6%  85.5µs ± 5%     ~     (p=0.121 n=8+7)
      DecodeStringSlice-6         32.4µs ±11%  26.8µs ±16%  -17.37%  (p=0.000 n=8+8)
      DecodeStringSlice-48        76.0µs ± 2%  57.0µs ± 5%  -25.02%  (p=0.000 n=8+8)
      DecodeInterfaceSlice         718µs ± 4%   752µs ± 5%   +4.83%  (p=0.038 n=8+8)
      DecodeInterfaceSlice-6       500µs ± 6%   165µs ± 7%  -66.95%  (p=0.000 n=7+8)
      DecodeInterfaceSlice-48      470µs ± 5%   120µs ± 6%  -74.55%  (p=0.000 n=8+7)
      DecodeMap                   3.29ms ± 5%  3.34ms ± 5%     ~     (p=0.279 n=8+8)
      DecodeMap-6                 7.73ms ± 8%  7.53ms ±18%     ~     (p=0.779 n=7+8)
      DecodeMap-48                7.46ms ± 6%  7.71ms ± 3%     ~     (p=0.161 n=8+8)
      
      https://perf.golang.org/search?q=upload:20170426.4
      
      Change-Id: I335874028ef8d7c991051004f8caadd16c92d5cc
      Reviewed-on: https://go-review.googlesource.com/41872
      Run-TryBot: Bryan Mills <bcmills@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      c120e449
    • Bryan C. Mills's avatar
      reflect: use sync.Map instead of RWMutex for type caches · 33b92cd6
      Bryan C. Mills authored
      This provides a significant speedup when using reflection-heavy code
      on many CPU cores, such as when marshaling or unmarshaling protocol
      buffers.
      
      updates #17973
      updates #18177
      
      name                       old time/op    new time/op     delta
      Call                          239ns ±10%      245ns ± 7%       ~     (p=0.562 n=10+9)
      Call-6                        201ns ±38%       48ns ±29%    -76.39%  (p=0.000 n=10+9)
      Call-48                       133ns ± 8%       12ns ± 2%    -90.92%  (p=0.000 n=10+8)
      CallArgCopy/size=128          169ns ±12%      197ns ± 2%    +16.35%  (p=0.000 n=10+7)
      CallArgCopy/size=128-6        142ns ± 9%       34ns ± 7%    -76.10%  (p=0.000 n=10+9)
      CallArgCopy/size=128-48       125ns ± 3%        9ns ± 7%    -93.01%  (p=0.000 n=8+8)
      CallArgCopy/size=256          177ns ± 8%      197ns ± 5%    +11.24%  (p=0.000 n=10+9)
      CallArgCopy/size=256-6        148ns ±11%       35ns ± 6%    -76.23%  (p=0.000 n=10+9)
      CallArgCopy/size=256-48       127ns ± 4%        9ns ± 9%    -92.66%  (p=0.000 n=10+9)
      CallArgCopy/size=1024         196ns ± 6%      228ns ± 7%    +16.09%  (p=0.000 n=10+9)
      CallArgCopy/size=1024-6       143ns ± 6%       42ns ± 5%    -70.39%  (p=0.000 n=8+8)
      CallArgCopy/size=1024-48      130ns ± 7%       10ns ± 1%    -91.99%  (p=0.000 n=10+8)
      CallArgCopy/size=4096         330ns ± 9%      351ns ± 5%     +6.20%  (p=0.004 n=10+9)
      CallArgCopy/size=4096-6       173ns ±14%       62ns ± 6%    -63.83%  (p=0.000 n=10+8)
      CallArgCopy/size=4096-48      141ns ± 6%       15ns ± 6%    -89.59%  (p=0.000 n=10+8)
      CallArgCopy/size=65536       7.71µs ±10%     7.74µs ±10%       ~     (p=0.859 n=10+9)
      CallArgCopy/size=65536-6     1.33µs ± 4%     1.34µs ± 6%       ~     (p=0.720 n=10+9)
      CallArgCopy/size=65536-48     347ns ± 2%      344ns ± 2%       ~     (p=0.202 n=10+9)
      PtrTo                        30.2ns ±10%     41.3ns ±11%    +36.97%  (p=0.000 n=10+9)
      PtrTo-6                       126ns ± 6%        7ns ±10%    -94.47%  (p=0.000 n=9+9)
      PtrTo-48                     86.9ns ± 9%      1.7ns ± 9%    -98.08%  (p=0.000 n=10+9)
      FieldByName1                 86.6ns ± 5%     87.3ns ± 7%       ~     (p=0.737 n=10+9)
      FieldByName1-6               19.8ns ±10%     18.7ns ±10%       ~     (p=0.073 n=9+9)
      FieldByName1-48              7.54ns ± 4%     7.74ns ± 5%     +2.55%  (p=0.023 n=9+9)
      FieldByName2                 1.63µs ± 8%     1.70µs ± 4%     +4.13%  (p=0.020 n=9+9)
      FieldByName2-6                481ns ± 6%      490ns ±10%       ~     (p=0.474 n=9+9)
      FieldByName2-48               723ns ± 3%      736ns ± 2%     +1.76%  (p=0.045 n=8+8)
      FieldByName3                 10.5µs ± 7%     10.8µs ± 7%       ~     (p=0.234 n=8+8)
      FieldByName3-6               2.78µs ± 3%     2.94µs ±10%     +5.87%  (p=0.031 n=9+9)
      FieldByName3-48              3.72µs ± 2%     3.91µs ± 5%     +4.91%  (p=0.003 n=9+9)
      InterfaceBig                 10.8ns ± 5%     10.7ns ± 5%       ~     (p=0.849 n=9+9)
      InterfaceBig-6               9.62ns ±81%     1.79ns ± 4%    -81.38%  (p=0.003 n=9+9)
      InterfaceBig-48              0.48ns ±34%     0.50ns ± 7%       ~     (p=0.071 n=8+9)
      InterfaceSmall               10.7ns ± 5%     10.9ns ± 4%       ~     (p=0.243 n=9+9)
      InterfaceSmall-6             1.85ns ± 5%     1.79ns ± 1%     -2.97%  (p=0.006 n=7+8)
      InterfaceSmall-48            0.49ns ±20%     0.48ns ± 5%       ~     (p=0.740 n=7+9)
      New                          28.2ns ±20%     26.6ns ± 3%       ~     (p=0.617 n=9+9)
      New-6                        4.69ns ± 4%     4.44ns ± 3%     -5.33%  (p=0.001 n=9+9)
      New-48                       1.10ns ± 9%     1.08ns ± 6%       ~     (p=0.285 n=9+8)
      
      name                       old alloc/op   new alloc/op    delta
      Call                          0.00B           0.00B            ~     (all equal)
      Call-6                        0.00B           0.00B            ~     (all equal)
      Call-48                       0.00B           0.00B            ~     (all equal)
      
      name                       old allocs/op  new allocs/op   delta
      Call                           0.00            0.00            ~     (all equal)
      Call-6                         0.00            0.00            ~     (all equal)
      Call-48                        0.00            0.00            ~     (all equal)
      
      name                       old speed      new speed       delta
      CallArgCopy/size=128        757MB/s ±11%    649MB/s ± 1%    -14.33%  (p=0.000 n=10+7)
      CallArgCopy/size=128-6      901MB/s ± 9%   3781MB/s ± 7%   +319.69%  (p=0.000 n=10+9)
      CallArgCopy/size=128-48    1.02GB/s ± 2%  14.63GB/s ± 6%  +1337.98%  (p=0.000 n=8+8)
      CallArgCopy/size=256       1.45GB/s ± 9%   1.30GB/s ± 5%    -10.17%  (p=0.000 n=10+9)
      CallArgCopy/size=256-6     1.73GB/s ±11%   7.28GB/s ± 7%   +320.76%  (p=0.000 n=10+9)
      CallArgCopy/size=256-48    2.00GB/s ± 4%  27.46GB/s ± 9%  +1270.85%  (p=0.000 n=10+9)
      CallArgCopy/size=1024      5.21GB/s ± 6%   4.49GB/s ± 8%    -13.74%  (p=0.000 n=10+9)
      CallArgCopy/size=1024-6    7.18GB/s ± 7%  24.17GB/s ± 5%   +236.64%  (p=0.000 n=9+8)
      CallArgCopy/size=1024-48   7.87GB/s ± 7%  98.43GB/s ± 1%  +1150.99%  (p=0.000 n=10+8)
      CallArgCopy/size=4096      12.3GB/s ± 6%   11.7GB/s ± 5%     -5.00%  (p=0.008 n=9+9)
      CallArgCopy/size=4096-6    23.8GB/s ±16%   65.6GB/s ± 5%   +175.02%  (p=0.000 n=10+8)
      CallArgCopy/size=4096-48   29.0GB/s ± 7%  279.6GB/s ± 6%   +862.87%  (p=0.000 n=10+8)
      CallArgCopy/size=65536     8.52GB/s ±11%   8.49GB/s ± 9%       ~     (p=0.842 n=10+9)
      CallArgCopy/size=65536-6   49.3GB/s ± 4%   49.0GB/s ± 6%       ~     (p=0.720 n=10+9)
      CallArgCopy/size=65536-48   189GB/s ± 2%    190GB/s ± 2%       ~     (p=0.211 n=10+9)
      
      https://perf.golang.org/search?q=upload:20170426.3
      
      Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27
      Reviewed-on: https://go-review.googlesource.com/41871
      Run-TryBot: Bryan Mills <bcmills@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      33b92cd6
    • Elias Naur's avatar
      misc/ios: increase iOS test harness timeout · 6e54fe47
      Elias Naur authored
      The "lldb start" phase often times out on the iOS builder. Increase
      the timeout and see if that helps.
      
      Change-Id: I92fd67cbfa90659600e713198d6b2c5c78dde20f
      Reviewed-on: https://go-review.googlesource.com/41863Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Elias Naur <elias.naur@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      6e54fe47
    • Weichao Tang's avatar
      net/http: close resp.Body when error occurred during redirection · e51e0f9c
      Weichao Tang authored
      Fixes #19976
      
      Change-Id: I48486467066784a9dcc24357ec94a1be85265a6f
      Reviewed-on: https://go-review.googlesource.com/40940
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      e51e0f9c
    • Wei Xiao's avatar
      cmd/internal/obj/arm64: fix encoding of condition · 2b6c58f6
      Wei Xiao authored
      The current code treats condition as special register and write
      its raw data directly into instruction.
      
      The fix converts the raw data into correct condition encoding.
      Also fix the operand catogery of FCCMP.
      
      Add tests to cover all cases.
      
      Change-Id: Ib194041bd9017dd0edbc241564fe983082ac616b
      Reviewed-on: https://go-review.googlesource.com/41511
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      2b6c58f6
    • Ian Lance Taylor's avatar
      os: use kernel limit on pipe size if possible · 220e0e0f
      Ian Lance Taylor authored
      Fixes #20134
      
      Change-Id: I92699d118c713179961c037a6bbbcbec4efa63ba
      Reviewed-on: https://go-review.googlesource.com/41823
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      220e0e0f
    • Nigel Tao's avatar
      image/jpeg: fix extended sequential Huffman table selector (Th). · 35cbc3b5
      Nigel Tao authored
      Previously, the package did not distinguish between baseline and
      extended sequential images. Both are non-progressive images, but the Th
      range differs between the two, as per Annex B of
      https://www.w3.org/Graphics/JPEG/itu-t81.pdf
      
      Extended sequential images are often emitted by the Guetzli encoder.
      
      Fixes #19913
      
      Change-Id: I3d0f9e16d5d374ee1c65e3a8fb87519de61cff94
      Reviewed-on: https://go-review.googlesource.com/41831Reviewed-by: 's avatarDavid Symonds <dsymonds@golang.org>
      35cbc3b5
    • Josh Bleecher Snyder's avatar
      cmd/compile: compile more complex functions first · 6664ccb4
      Josh Bleecher Snyder authored
      When using a concurrent backend,
      the overall compilation time is bounded
      in part by the slowest function to compile.
      The number of top-level statements in a function
      is an easily calculated and fairly reliable
      proxy for compilation time.
      
      Here's a standard compilecmp output for -c=8 with this CL:
      
      name       old time/op       new time/op       delta
      Template         127ms ± 4%        125ms ± 6%   -1.33%  (p=0.000 n=47+50)
      Unicode         84.8ms ± 4%       84.5ms ± 4%     ~     (p=0.217 n=49+49)
      GoTypes          289ms ± 3%        287ms ± 3%   -0.78%  (p=0.002 n=48+50)
      Compiler         1.36s ± 3%        1.34s ± 2%   -1.29%  (p=0.000 n=49+47)
      SSA              2.95s ± 3%        2.77s ± 4%   -6.23%  (p=0.000 n=50+49)
      Flate           70.7ms ± 3%       70.9ms ± 2%     ~     (p=0.112 n=50+49)
      GoParser        85.0ms ± 3%       83.0ms ± 4%   -2.31%  (p=0.000 n=48+49)
      Reflect          229ms ± 3%        225ms ± 4%   -1.83%  (p=0.000 n=49+49)
      Tar             70.2ms ± 3%       69.4ms ± 3%   -1.17%  (p=0.000 n=49+49)
      XML              115ms ± 7%        114ms ± 6%     ~     (p=0.158 n=49+47)
      
      name       old user-time/op  new user-time/op  delta
      Template         352ms ± 5%        342ms ± 8%   -2.74%  (p=0.000 n=49+50)
      Unicode          117ms ± 5%        118ms ± 4%   +0.88%  (p=0.005 n=46+48)
      GoTypes          986ms ± 3%        980ms ± 4%     ~     (p=0.110 n=46+48)
      Compiler         4.39s ± 2%        4.43s ± 4%   +0.97%  (p=0.002 n=50+50)
      SSA              12.0s ± 2%        13.3s ± 3%  +11.33%  (p=0.000 n=49+49)
      Flate            222ms ± 5%        219ms ± 6%   -1.56%  (p=0.002 n=50+50)
      GoParser         271ms ± 5%        268ms ± 4%   -0.83%  (p=0.036 n=49+48)
      Reflect          560ms ± 4%        571ms ± 3%   +1.90%  (p=0.000 n=50+49)
      Tar              183ms ± 3%        183ms ± 3%     ~     (p=0.903 n=45+50)
      XML              364ms ±13%        391ms ± 4%   +7.16%  (p=0.000 n=50+40)
      
      A more interesting way of viewing the data is by
      looking at the ratio of the time taken to compile
      the slowest-to-compile function to the overall
      time spent compiling functions.
      
      If this ratio is small (near 0), then increased concurrency might help.
      If this ratio is big (near 1), then we're bounded by that single function.
      
      I instrumented the compiler to emit this ratio per-package,
      ran 'go build -a -gcflags=-c=C -p=P std cmd' three times,
      for varying values of C and P,
      and collected the ratios encountered into an ASCII histogram.
      
      Here's c=1 p=1, which is a non-concurrent backend, single process at a time:
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|
       20%|**
       10%|***
        0%|*********
      ----+----------
          |0123456789
      
      The x-axis is floor(10*ratio), so the first column indicates the percent of
      ratios that fell in the 0% to 9.9999% range.
      We can see in this histogram that more concurrency will help;
      in most cases, the ratio is small.
      
      Here's c=8 p=1, before this CL:
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|         *
       20%|         *
       10%|*   *    *
        0%|**********
      ----+----------
          |0123456789
      
      In 30-40% of cases, we're mostly bound by the compilation time
      of a single function.
      
      Here's c=8 p=1, after this CL:
      
       90%|
       80%|
       70%|
       60%|
       50%|         *
       40%|         *
       30%|         *
       20%|         *
       10%|         *
        0%|**********
      ----+----------
          |0123456789
      
      The sorting pays off; we are bound by the
      compilation time of a single function in over half of packages.
      The single * in the histogram indicates 0-10%.
      The actual values for this chart are:
      0: 5%, 1: 1%, 2: 1%, 3: 4%, 4: 5%, 5: 7%, 6: 7%, 7: 7%, 8: 9%, 9: 55%
      
      This indicates that efforts to increase or enable more concurrency,
      e.g. by optimizing mutexes or increasing the value of c,
      will probably not yield fruit.
      That matches what compilecmp tells us.
      
      Further optimization efforts should thus focus instead on one of:
      
      (1) making more functions compile concurrently
      (2) improving the compilation time of the slowest functions
      (3) speeding up the remaining serial parts of the compiler
      (4) automatically splitting up some large autogenerated functions
          into small ones, as discussed in #19751
      
      I hope to spend more time on (1) before the freeze.
      
      Adding process parallelism doesn't change the story much.
      For example, here's c=8 p=8, after this CL:
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|         *
       30%|         *
       20%|         *
       10%|       ***
        0%|**********
      ----+----------
          |0123456789
      
      Since we don't need to worry much about p,
      these histograms can help us select a good
      general value of c to use as a default,
      assuming we're not bounded by GOMAXPROCS.
      
      Here are some charts after this CL, for c from 1 to 8:
      
      c=1 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|
       20%|**
       10%|***
        0%|*********
      ----+----------
          |0123456789
      
      c=2 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|
       20%|
       10%| ****    *
        0%|**********
      ----+----------
          |0123456789
      
      c=3 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|
       20%|         *
       10%|  ** *   *
        0%|**********
      ----+----------
          |0123456789
      
      c=4 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|         *
       20%|         *
       10%|     *   *
        0%|**********
      ----+----------
          |0123456789
      
      c=5 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|         *
       20%|         *
       10%|     *   *
        0%|**********
      ----+----------
          |0123456789
      
      c=6 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|         *
       30%|         *
       20%|         *
       10%|         *
        0%|**********
      ----+----------
          |0123456789
      
      c=7 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|         *
       40%|         *
       30%|         *
       20%|         *
       10%|        **
        0%|**********
      ----+----------
          |0123456789
      
      c=8 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|         *
       40%|         *
       30%|         *
       20%|         *
       10%|         *
        0%|**********
      ----+----------
          |0123456789
      
      Given the increased user-CPU costs as
      c increases, it looks like c=4 is probably
      the sweet spot, at least for now.
      
      Pleasingly, this matches (and explains)
      the results of the standard benchmarking
      that I have done.
      
      Updates #15756
      
      Change-Id: I82b606c06efd34a5dbd1afdbcf66a605905b2aeb
      Reviewed-on: https://go-review.googlesource.com/41192
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      6664ccb4
    • Josh Bleecher Snyder's avatar
      cmd/compile: add initial backend concurrency support · 756b9ce3
      Josh Bleecher Snyder authored
      This CL adds initial support for concurrent backend compilation.
      
      BACKGROUND
      
      The compiler currently consists (very roughly) of the following phases:
      
      1. Initialization.
      2. Lexing and parsing into the cmd/compile/internal/syntax AST.
      3. Translation into the cmd/compile/internal/gc AST.
      4. Some gc AST passes: typechecking, escape analysis, inlining,
         closure handling, expression evaluation ordering (order.go),
         and some lowering and optimization (walk.go).
      5. Translation into the cmd/compile/internal/ssa SSA form.
      6. Optimization and lowering of SSA form.
      7. Translation from SSA form to assembler instructions.
      8. Translation from assembler instructions to machine code.
      9. Writing lots of output: machine code, DWARF symbols,
         type and reflection info, export data.
      
      Phase 2 was already concurrent as of Go 1.8.
      
      Phase 3 is planned for eventual removal;
      we hope to go straight from syntax AST to SSA.
      
      Phases 5–8 are per-function; this CL adds support for
      processing multiple functions concurrently.
      The slowest phases in the compiler are 5 and 6,
      so this offers the opportunity for some good speed-ups.
      
      Unfortunately, it's not quite that straightforward.
      In the current compiler, the latter parts of phase 4
      (order, walk) are done function-at-a-time as needed.
      Making order and walk concurrency-safe proved hard,
      and they're not particularly slow, so there wasn't much reward.
      To enable phases 5–8 to be done concurrently,
      when concurrent backend compilation is requested,
      we complete phase 4 for all functions
      before starting later phases for any functions.
      
      Also, in reality, we automatically generate new
      functions in phase 9, such as method wrappers
      and equality and has routines.
      Those new functions then go through phases 4–8.
      This CL disables concurrent backend compilation
      after the first, big, user-provided batch of
      functions has been compiled.
      This is done to keep things simple,
      and because the autogenerated functions
      tend to be small, few, simple, and fast to compile.
      
      USAGE
      
      Concurrent backend compilation still defaults to off.
      To set the number of functions that may be backend-compiled
      concurrently, use the compiler flag -c.
      In future work, cmd/go will automatically set -c.
      
      Furthermore, this CL has been intentionally written
      so that the c=1 path has no backend concurrency whatsoever,
      not even spawning any goroutines.
      This helps ensure that, should problems arise
      late in the development cycle,
      we can simply have cmd/go set c=1 always,
      and revert to the original compiler behavior.
      
      MUTEXES
      
      Most of the work required to make concurrent backend
      compilation safe has occurred over the past month.
      This CL adds a handful of mutexes to get the rest of the way there;
      they are the mutexes that I didn't see a clean way to avoid.
      Some of them may still be eliminable in future work.
      
      In no particular order:
      
      * gc.funcsymsmu. The global funcsyms slice is populated
        lazily when we need function symbols for closures.
        This occurs during gc AST to SSA translation.
        The function funcsym also does a package lookup,
        which is a source of races on types.Pkg.Syms;
        funcsymsmu also covers that package lookup.
        This mutex is low priority: it adds a single global,
        it is in an infrequently used code path, and it is low contention.
        Since funcsyms may now be added in any order,
        we must sort them to preserve reproducible builds.
      
      * gc.largeStackFramesMu. We don't discover until after SSA compilation
        that a function's stack frame is gigantic.
        Recording that error happens basically never,
        but it does happen concurrently.
        Fix with a low priority mutex and sorting.
      
      * obj.Link.hashmu. ctxt.hash stores the mapping from
        types.Syms (compiler symbols) to obj.LSyms (linker symbols).
        It is accessed fairly heavily through all the phases.
        This is the only heavily contended mutex.
      
      * gc.signatlistmu. The global signatlist map is
        populated with types through several of the concurrent phases,
        including notably via ngotype during DWARF generation.
        It is low priority for removal.
      
      * gc.typepkgmu. Looking up symbols in the types package
        happens a fair amount during backend compilation
        and DWARF generation, particularly via ngotype.
        This mutex helps us to avoid a broader mutex on types.Pkg.Syms.
        It has low-to-moderate contention.
      
      * types.internedStringsmu. gc AST to SSA conversion and
        some SSA work introduce new autotmps.
        Those autotmps have their names interned to reduce allocations.
        That interning requires protecting types.internedStrings.
        The autotmp names are heavily re-used, and the mutex
        overhead and contention here are low, so it is probably
        a worthwhile performance optimization to keep this mutex.
      
      TESTING
      
      I have been testing this code locally by running
      'go install -race cmd/compile'
      and then doing
      'go build -a -gcflags=-c=128 std cmd'
      for all architectures and a variety of compiler flags.
      This obviously needs to be made part of the builders,
      but it is too expensive to make part of all.bash.
      I have filed #19962 for this.
      
      REPRODUCIBLE BUILDS
      
      This version of the compiler generates reproducible builds.
      Testing reproducible builds also needs automation, however,
      and is also too expensive for all.bash.
      This is #19961.
      
      Also of note is that some of the compiler flags used by 'toolstash -cmp'
      are currently incompatible with concurrent backend compilation.
      They still work fine with c=1.
      Time will tell whether this is a problem.
      
      NEXT STEPS
      
      * Continue to find and fix races and bugs,
        using a combination of code inspection, fuzzing,
        and hopefully some community experimentation.
        I do not know of any outstanding races,
        but there probably are some.
      * Improve testing.
      * Improve performance, for many values of c.
      * Integrate with cmd/go and fine tune.
      * Support concurrent compilation with the -race flag.
        It is a sad irony that it does not yet work.
      * Minor code cleanup that has been deferred during
        the last month due to uncertainty about the
        ultimate shape of this CL.
      
      PERFORMANCE
      
      Here's the buried lede, at last. :)
      
      All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop.
      
      First, going from tip to this CL with c=1 has almost no impact.
      
      name        old time/op       new time/op       delta
      Template          195ms ± 3%        194ms ± 5%    ~     (p=0.370 n=30+29)
      Unicode          86.6ms ± 3%       87.0ms ± 7%    ~     (p=0.958 n=29+30)
      GoTypes           548ms ± 3%        555ms ± 4%  +1.35%  (p=0.001 n=30+28)
      Compiler          2.51s ± 2%        2.54s ± 2%  +1.17%  (p=0.000 n=28+30)
      SSA               5.16s ± 3%        5.16s ± 2%    ~     (p=0.910 n=30+29)
      Flate             124ms ± 5%        124ms ± 4%    ~     (p=0.947 n=30+30)
      GoParser          146ms ± 3%        146ms ± 3%    ~     (p=0.150 n=29+28)
      Reflect           354ms ± 3%        352ms ± 4%    ~     (p=0.096 n=29+29)
      Tar               107ms ± 5%        106ms ± 3%    ~     (p=0.370 n=30+29)
      XML               200ms ± 4%        201ms ± 4%    ~     (p=0.313 n=29+28)
      [Geo mean]        332ms             333ms       +0.10%
      
      name        old user-time/op  new user-time/op  delta
      Template          227ms ± 5%        225ms ± 5%    ~     (p=0.457 n=28+27)
      Unicode           109ms ± 4%        109ms ± 5%    ~     (p=0.758 n=29+29)
      GoTypes           713ms ± 4%        721ms ± 5%    ~     (p=0.051 n=30+29)
      Compiler          3.36s ± 2%        3.38s ± 3%    ~     (p=0.146 n=30+30)
      SSA               7.46s ± 3%        7.47s ± 3%    ~     (p=0.804 n=30+29)
      Flate             146ms ± 7%        147ms ± 3%    ~     (p=0.833 n=29+27)
      GoParser          179ms ± 5%        179ms ± 5%    ~     (p=0.866 n=30+30)
      Reflect           431ms ± 4%        429ms ± 4%    ~     (p=0.593 n=29+30)
      Tar               124ms ± 5%        123ms ± 5%    ~     (p=0.140 n=29+29)
      XML               243ms ± 4%        242ms ± 7%    ~     (p=0.404 n=29+29)
      [Geo mean]        415ms             415ms       +0.02%
      
      name        old obj-bytes     new obj-bytes     delta
      Template           382k ± 0%         382k ± 0%    ~     (all equal)
      Unicode            203k ± 0%         203k ± 0%    ~     (all equal)
      GoTypes           1.18M ± 0%        1.18M ± 0%    ~     (all equal)
      Compiler          3.98M ± 0%        3.98M ± 0%    ~     (all equal)
      SSA               8.28M ± 0%        8.28M ± 0%    ~     (all equal)
      Flate              230k ± 0%         230k ± 0%    ~     (all equal)
      GoParser           287k ± 0%         287k ± 0%    ~     (all equal)
      Reflect           1.00M ± 0%        1.00M ± 0%    ~     (all equal)
      Tar                190k ± 0%         190k ± 0%    ~     (all equal)
      XML                416k ± 0%         416k ± 0%    ~     (all equal)
      [Geo mean]         660k              660k       +0.00%
      
      Comparing this CL to itself, from c=1 to c=2
      improves real times 20-30%, costs 5-10% more CPU time,
      and adds about 2% alloc.
      The allocation increase comes from allocating more ssa.Caches.
      
      name       old time/op       new time/op       delta
      Template         202ms ± 3%        149ms ± 3%  -26.15%  (p=0.000 n=49+49)
      Unicode         87.4ms ± 4%       84.2ms ± 3%   -3.68%  (p=0.000 n=48+48)
      GoTypes          560ms ± 2%        398ms ± 2%  -28.96%  (p=0.000 n=49+49)
      Compiler         2.46s ± 3%        1.76s ± 2%  -28.61%  (p=0.000 n=48+46)
      SSA              6.17s ± 2%        4.04s ± 1%  -34.52%  (p=0.000 n=49+49)
      Flate            126ms ± 3%         92ms ± 2%  -26.81%  (p=0.000 n=49+48)
      GoParser         148ms ± 4%        107ms ± 2%  -27.78%  (p=0.000 n=49+48)
      Reflect          361ms ± 3%        281ms ± 3%  -22.10%  (p=0.000 n=49+49)
      Tar              109ms ± 4%         86ms ± 3%  -20.81%  (p=0.000 n=49+47)
      XML              204ms ± 3%        144ms ± 2%  -29.53%  (p=0.000 n=48+45)
      
      name       old user-time/op  new user-time/op  delta
      Template         246ms ± 9%        246ms ± 4%     ~     (p=0.401 n=50+48)
      Unicode          109ms ± 4%        111ms ± 4%   +1.47%  (p=0.000 n=44+50)
      GoTypes          728ms ± 3%        765ms ± 3%   +5.04%  (p=0.000 n=46+50)
      Compiler         3.33s ± 3%        3.41s ± 2%   +2.31%  (p=0.000 n=49+48)
      SSA              8.52s ± 2%        9.11s ± 2%   +6.93%  (p=0.000 n=49+47)
      Flate            149ms ± 4%        161ms ± 3%   +8.13%  (p=0.000 n=50+47)
      GoParser         181ms ± 5%        192ms ± 2%   +6.40%  (p=0.000 n=49+46)
      Reflect          452ms ± 9%        474ms ± 2%   +4.99%  (p=0.000 n=50+48)
      Tar              126ms ± 6%        136ms ± 4%   +7.95%  (p=0.000 n=50+49)
      XML              247ms ± 5%        264ms ± 3%   +6.94%  (p=0.000 n=48+50)
      
      name       old alloc/op      new alloc/op      delta
      Template        38.8MB ± 0%       39.3MB ± 0%   +1.48%  (p=0.008 n=5+5)
      Unicode         29.8MB ± 0%       30.2MB ± 0%   +1.19%  (p=0.008 n=5+5)
      GoTypes          113MB ± 0%        114MB ± 0%   +0.69%  (p=0.008 n=5+5)
      Compiler         443MB ± 0%        447MB ± 0%   +0.95%  (p=0.008 n=5+5)
      SSA             1.25GB ± 0%       1.26GB ± 0%   +0.89%  (p=0.008 n=5+5)
      Flate           25.3MB ± 0%       25.9MB ± 1%   +2.35%  (p=0.008 n=5+5)
      GoParser        31.7MB ± 0%       32.2MB ± 0%   +1.59%  (p=0.008 n=5+5)
      Reflect         78.2MB ± 0%       78.9MB ± 0%   +0.91%  (p=0.008 n=5+5)
      Tar             26.6MB ± 0%       27.0MB ± 0%   +1.80%  (p=0.008 n=5+5)
      XML             42.4MB ± 0%       43.4MB ± 0%   +2.35%  (p=0.008 n=5+5)
      
      name       old allocs/op     new allocs/op     delta
      Template          379k ± 0%         378k ± 0%     ~     (p=0.421 n=5+5)
      Unicode           322k ± 0%         321k ± 0%     ~     (p=0.222 n=5+5)
      GoTypes          1.14M ± 0%        1.14M ± 0%     ~     (p=0.548 n=5+5)
      Compiler         4.12M ± 0%        4.11M ± 0%   -0.14%  (p=0.032 n=5+5)
      SSA              9.72M ± 0%        9.72M ± 0%     ~     (p=0.421 n=5+5)
      Flate             234k ± 1%         234k ± 0%     ~     (p=0.421 n=5+5)
      GoParser          316k ± 1%         315k ± 0%     ~     (p=0.222 n=5+5)
      Reflect           980k ± 0%         979k ± 0%     ~     (p=0.095 n=5+5)
      Tar               249k ± 1%         249k ± 1%     ~     (p=0.841 n=5+5)
      XML               392k ± 0%         391k ± 0%     ~     (p=0.095 n=5+5)
      
      From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%:
      
      name       old time/op       new time/op       delta
      Template         203ms ± 3%        131ms ± 5%  -35.45%  (p=0.000 n=50+50)
      Unicode         87.2ms ± 4%       84.1ms ± 2%   -3.61%  (p=0.000 n=48+47)
      GoTypes          560ms ± 4%        310ms ± 2%  -44.65%  (p=0.000 n=50+49)
      Compiler         2.47s ± 3%        1.41s ± 2%  -43.10%  (p=0.000 n=50+46)
      SSA              6.17s ± 2%        3.20s ± 2%  -48.06%  (p=0.000 n=49+49)
      Flate            126ms ± 4%         74ms ± 2%  -41.06%  (p=0.000 n=49+48)
      GoParser         148ms ± 4%         89ms ± 3%  -39.97%  (p=0.000 n=49+50)
      Reflect          360ms ± 3%        242ms ± 3%  -32.81%  (p=0.000 n=49+49)
      Tar              108ms ± 4%         73ms ± 4%  -32.48%  (p=0.000 n=50+49)
      XML              203ms ± 3%        119ms ± 3%  -41.56%  (p=0.000 n=49+48)
      
      name       old user-time/op  new user-time/op  delta
      Template         246ms ± 9%        287ms ± 9%  +16.98%  (p=0.000 n=50+50)
      Unicode          109ms ± 4%        118ms ± 5%   +7.56%  (p=0.000 n=46+50)
      GoTypes          735ms ± 4%        806ms ± 2%   +9.62%  (p=0.000 n=50+50)
      Compiler         3.34s ± 4%        3.56s ± 2%   +6.78%  (p=0.000 n=49+49)
      SSA              8.54s ± 3%       10.04s ± 3%  +17.55%  (p=0.000 n=50+50)
      Flate            149ms ± 6%        176ms ± 3%  +17.82%  (p=0.000 n=50+48)
      GoParser         181ms ± 5%        213ms ± 3%  +17.47%  (p=0.000 n=50+50)
      Reflect          453ms ± 6%        499ms ± 2%  +10.11%  (p=0.000 n=50+48)
      Tar              126ms ± 5%        149ms ±11%  +18.76%  (p=0.000 n=50+50)
      XML              246ms ± 5%        287ms ± 4%  +16.53%  (p=0.000 n=49+50)
      
      name       old alloc/op      new alloc/op      delta
      Template        38.8MB ± 0%       40.4MB ± 0%   +4.21%  (p=0.008 n=5+5)
      Unicode         29.8MB ± 0%       30.9MB ± 0%   +3.68%  (p=0.008 n=5+5)
      GoTypes          113MB ± 0%        116MB ± 0%   +2.71%  (p=0.008 n=5+5)
      Compiler         443MB ± 0%        455MB ± 0%   +2.75%  (p=0.008 n=5+5)
      SSA             1.25GB ± 0%       1.27GB ± 0%   +1.84%  (p=0.008 n=5+5)
      Flate           25.3MB ± 0%       26.9MB ± 1%   +6.31%  (p=0.008 n=5+5)
      GoParser        31.7MB ± 0%       33.2MB ± 0%   +4.61%  (p=0.008 n=5+5)
      Reflect         78.2MB ± 0%       80.2MB ± 0%   +2.53%  (p=0.008 n=5+5)
      Tar             26.6MB ± 0%       27.9MB ± 0%   +5.19%  (p=0.008 n=5+5)
      XML             42.4MB ± 0%       44.6MB ± 0%   +5.20%  (p=0.008 n=5+5)
      
      name       old allocs/op     new allocs/op     delta
      Template          380k ± 0%         379k ± 0%   -0.39%  (p=0.032 n=5+5)
      Unicode           321k ± 0%         321k ± 0%     ~     (p=0.841 n=5+5)
      GoTypes          1.14M ± 0%        1.14M ± 0%     ~     (p=0.421 n=5+5)
      Compiler         4.12M ± 0%        4.14M ± 0%   +0.52%  (p=0.008 n=5+5)
      SSA              9.72M ± 0%        9.76M ± 0%   +0.37%  (p=0.008 n=5+5)
      Flate             234k ± 1%         234k ± 1%     ~     (p=0.690 n=5+5)
      GoParser          316k ± 0%         317k ± 1%     ~     (p=0.841 n=5+5)
      Reflect           981k ± 0%         981k ± 0%     ~     (p=1.000 n=5+5)
      Tar               250k ± 0%         249k ± 1%     ~     (p=0.151 n=5+5)
      XML               393k ± 0%         392k ± 0%     ~     (p=0.056 n=5+5)
      
      Going beyond c=4 on my machine tends to increase CPU time and allocs
      without impacting real time.
      
      The CPU time numbers matter, because when there are many concurrent
      compilation processes, that will impact the overall throughput.
      
      The numbers above are in many ways the best case scenario;
      we can take full advantage of all cores.
      Fortunately, the most common compilation scenario is incremental
      re-compilation of a single package during a build/test cycle.
      
      Updates #15756
      
      Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea
      Reviewed-on: https://go-review.googlesource.com/40693Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      756b9ce3
  3. 26 Apr, 2017 5 commits