1. 27 Apr, 2017 10 commits
    • Keith Randall's avatar
      cmd/internal/obj: ARM, use immediates instead of constant pool entries · 14f3ca56
      Keith Randall authored
      When a constant doesn't fit in a single instruction, use two
      paired instructions instead of the constant pool.  For example
      
        ADD $0xaa00bb, R0, R1
      
      Used to rewrite to:
      
        MOV ?(IP), R11
        ADD R11, R0, R1
      
      Instead, do:
      
        ADD $0xaa0000, R0, R1
        ADD $0xbb, R1, R1
      
      Same number of instructions.
      Good:
        4 less bytes (no constant pool entry)
        One less load.
      Bad:
        Critical path is one instruction longer.
      
      It's probably worth it to avoid the loads, they are expensive.
      
      Dave Cheney got us some performance numbers: https://perf.golang.org/search?q=upload:20170426.1
      TL;DR mean 1.37% improvement.
      
      Change-Id: Ib206836161fdc94a3962db6f9caa635c87d57cf1
      Reviewed-on: https://go-review.googlesource.com/41612
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      14f3ca56
    • Bryan C. Mills's avatar
      encoding/gob: replace RWMutex usage with sync.Map · c120e449
      Bryan C. Mills authored
      This provides a significant speedup for encoding and decoding when
      using many CPU cores.
      
      name                        old time/op  new time/op  delta
      EndToEndPipe                5.26µs ± 2%  5.38µs ± 7%     ~     (p=0.121 n=8+7)
      EndToEndPipe-6              1.86µs ± 5%  1.80µs ±11%     ~     (p=0.442 n=8+8)
      EndToEndPipe-48             1.39µs ± 2%  1.41µs ± 4%     ~     (p=0.645 n=8+8)
      EndToEndByteBuffer          1.54µs ± 5%  1.57µs ± 5%     ~     (p=0.130 n=8+8)
      EndToEndByteBuffer-6         620ns ± 6%   310ns ± 8%  -50.04%  (p=0.000 n=8+8)
      EndToEndByteBuffer-48        506ns ± 4%   110ns ± 3%  -78.22%  (p=0.000 n=8+8)
      EndToEndSliceByteBuffer      149µs ± 3%   153µs ± 5%   +2.80%  (p=0.021 n=8+8)
      EndToEndSliceByteBuffer-6    103µs ±17%    31µs ±12%  -70.06%  (p=0.000 n=8+8)
      EndToEndSliceByteBuffer-48  93.2µs ± 2%  18.0µs ± 5%  -80.66%  (p=0.000 n=7+8)
      EncodeComplex128Slice       20.6µs ± 5%  20.9µs ± 8%     ~     (p=0.959 n=8+8)
      EncodeComplex128Slice-6     4.10µs ±10%  3.75µs ± 8%   -8.58%  (p=0.004 n=8+7)
      EncodeComplex128Slice-48    1.14µs ± 2%  0.81µs ± 2%  -28.98%  (p=0.000 n=8+8)
      EncodeFloat64Slice          10.2µs ± 7%  10.1µs ± 6%     ~     (p=0.694 n=7+8)
      EncodeFloat64Slice-6        2.01µs ± 6%  1.80µs ±11%  -10.30%  (p=0.004 n=8+8)
      EncodeFloat64Slice-48        701ns ± 3%   408ns ± 2%  -41.72%  (p=0.000 n=8+8)
      EncodeInt32Slice            11.8µs ± 7%  11.7µs ± 6%     ~     (p=0.463 n=8+7)
      EncodeInt32Slice-6          2.32µs ± 4%  2.06µs ± 5%  -10.89%  (p=0.000 n=8+8)
      EncodeInt32Slice-48          731ns ± 2%   445ns ± 2%  -39.10%  (p=0.000 n=7+8)
      EncodeStringSlice           9.13µs ± 9%  9.18µs ± 8%     ~     (p=0.798 n=8+8)
      EncodeStringSlice-6         1.91µs ± 5%  1.70µs ± 5%  -11.07%  (p=0.000 n=8+8)
      EncodeStringSlice-48         679ns ± 3%   397ns ± 3%  -41.50%  (p=0.000 n=8+8)
      EncodeInterfaceSlice         449µs ±11%   461µs ± 9%     ~     (p=0.328 n=8+8)
      EncodeInterfaceSlice-6       503µs ± 7%    88µs ± 7%  -82.51%  (p=0.000 n=7+8)
      EncodeInterfaceSlice-48      335µs ± 8%    22µs ± 1%  -93.55%  (p=0.000 n=8+7)
      DecodeComplex128Slice       67.2µs ± 4%  67.0µs ± 6%     ~     (p=0.721 n=8+8)
      DecodeComplex128Slice-6     22.0µs ± 8%  18.9µs ± 5%  -14.44%  (p=0.000 n=8+8)
      DecodeComplex128Slice-48    46.8µs ± 3%  34.9µs ± 3%  -25.48%  (p=0.000 n=8+8)
      DecodeFloat64Slice          39.4µs ± 4%  40.3µs ± 3%     ~     (p=0.105 n=8+8)
      DecodeFloat64Slice-6        16.1µs ± 2%  11.2µs ± 7%  -30.64%  (p=0.001 n=6+7)
      DecodeFloat64Slice-48       38.1µs ± 3%  24.0µs ± 7%  -37.10%  (p=0.000 n=8+8)
      DecodeInt32Slice            39.1µs ± 4%  40.1µs ± 5%     ~     (p=0.083 n=8+8)
      DecodeInt32Slice-6          16.3µs ±21%  10.6µs ± 1%  -35.17%  (p=0.000 n=8+7)
      DecodeInt32Slice-48         36.5µs ± 6%  21.9µs ± 9%  -39.89%  (p=0.000 n=8+8)
      DecodeStringSlice           82.9µs ± 6%  85.5µs ± 5%     ~     (p=0.121 n=8+7)
      DecodeStringSlice-6         32.4µs ±11%  26.8µs ±16%  -17.37%  (p=0.000 n=8+8)
      DecodeStringSlice-48        76.0µs ± 2%  57.0µs ± 5%  -25.02%  (p=0.000 n=8+8)
      DecodeInterfaceSlice         718µs ± 4%   752µs ± 5%   +4.83%  (p=0.038 n=8+8)
      DecodeInterfaceSlice-6       500µs ± 6%   165µs ± 7%  -66.95%  (p=0.000 n=7+8)
      DecodeInterfaceSlice-48      470µs ± 5%   120µs ± 6%  -74.55%  (p=0.000 n=8+7)
      DecodeMap                   3.29ms ± 5%  3.34ms ± 5%     ~     (p=0.279 n=8+8)
      DecodeMap-6                 7.73ms ± 8%  7.53ms ±18%     ~     (p=0.779 n=7+8)
      DecodeMap-48                7.46ms ± 6%  7.71ms ± 3%     ~     (p=0.161 n=8+8)
      
      https://perf.golang.org/search?q=upload:20170426.4
      
      Change-Id: I335874028ef8d7c991051004f8caadd16c92d5cc
      Reviewed-on: https://go-review.googlesource.com/41872
      Run-TryBot: Bryan Mills <bcmills@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      c120e449
    • Bryan C. Mills's avatar
      reflect: use sync.Map instead of RWMutex for type caches · 33b92cd6
      Bryan C. Mills authored
      This provides a significant speedup when using reflection-heavy code
      on many CPU cores, such as when marshaling or unmarshaling protocol
      buffers.
      
      updates #17973
      updates #18177
      
      name                       old time/op    new time/op     delta
      Call                          239ns ±10%      245ns ± 7%       ~     (p=0.562 n=10+9)
      Call-6                        201ns ±38%       48ns ±29%    -76.39%  (p=0.000 n=10+9)
      Call-48                       133ns ± 8%       12ns ± 2%    -90.92%  (p=0.000 n=10+8)
      CallArgCopy/size=128          169ns ±12%      197ns ± 2%    +16.35%  (p=0.000 n=10+7)
      CallArgCopy/size=128-6        142ns ± 9%       34ns ± 7%    -76.10%  (p=0.000 n=10+9)
      CallArgCopy/size=128-48       125ns ± 3%        9ns ± 7%    -93.01%  (p=0.000 n=8+8)
      CallArgCopy/size=256          177ns ± 8%      197ns ± 5%    +11.24%  (p=0.000 n=10+9)
      CallArgCopy/size=256-6        148ns ±11%       35ns ± 6%    -76.23%  (p=0.000 n=10+9)
      CallArgCopy/size=256-48       127ns ± 4%        9ns ± 9%    -92.66%  (p=0.000 n=10+9)
      CallArgCopy/size=1024         196ns ± 6%      228ns ± 7%    +16.09%  (p=0.000 n=10+9)
      CallArgCopy/size=1024-6       143ns ± 6%       42ns ± 5%    -70.39%  (p=0.000 n=8+8)
      CallArgCopy/size=1024-48      130ns ± 7%       10ns ± 1%    -91.99%  (p=0.000 n=10+8)
      CallArgCopy/size=4096         330ns ± 9%      351ns ± 5%     +6.20%  (p=0.004 n=10+9)
      CallArgCopy/size=4096-6       173ns ±14%       62ns ± 6%    -63.83%  (p=0.000 n=10+8)
      CallArgCopy/size=4096-48      141ns ± 6%       15ns ± 6%    -89.59%  (p=0.000 n=10+8)
      CallArgCopy/size=65536       7.71µs ±10%     7.74µs ±10%       ~     (p=0.859 n=10+9)
      CallArgCopy/size=65536-6     1.33µs ± 4%     1.34µs ± 6%       ~     (p=0.720 n=10+9)
      CallArgCopy/size=65536-48     347ns ± 2%      344ns ± 2%       ~     (p=0.202 n=10+9)
      PtrTo                        30.2ns ±10%     41.3ns ±11%    +36.97%  (p=0.000 n=10+9)
      PtrTo-6                       126ns ± 6%        7ns ±10%    -94.47%  (p=0.000 n=9+9)
      PtrTo-48                     86.9ns ± 9%      1.7ns ± 9%    -98.08%  (p=0.000 n=10+9)
      FieldByName1                 86.6ns ± 5%     87.3ns ± 7%       ~     (p=0.737 n=10+9)
      FieldByName1-6               19.8ns ±10%     18.7ns ±10%       ~     (p=0.073 n=9+9)
      FieldByName1-48              7.54ns ± 4%     7.74ns ± 5%     +2.55%  (p=0.023 n=9+9)
      FieldByName2                 1.63µs ± 8%     1.70µs ± 4%     +4.13%  (p=0.020 n=9+9)
      FieldByName2-6                481ns ± 6%      490ns ±10%       ~     (p=0.474 n=9+9)
      FieldByName2-48               723ns ± 3%      736ns ± 2%     +1.76%  (p=0.045 n=8+8)
      FieldByName3                 10.5µs ± 7%     10.8µs ± 7%       ~     (p=0.234 n=8+8)
      FieldByName3-6               2.78µs ± 3%     2.94µs ±10%     +5.87%  (p=0.031 n=9+9)
      FieldByName3-48              3.72µs ± 2%     3.91µs ± 5%     +4.91%  (p=0.003 n=9+9)
      InterfaceBig                 10.8ns ± 5%     10.7ns ± 5%       ~     (p=0.849 n=9+9)
      InterfaceBig-6               9.62ns ±81%     1.79ns ± 4%    -81.38%  (p=0.003 n=9+9)
      InterfaceBig-48              0.48ns ±34%     0.50ns ± 7%       ~     (p=0.071 n=8+9)
      InterfaceSmall               10.7ns ± 5%     10.9ns ± 4%       ~     (p=0.243 n=9+9)
      InterfaceSmall-6             1.85ns ± 5%     1.79ns ± 1%     -2.97%  (p=0.006 n=7+8)
      InterfaceSmall-48            0.49ns ±20%     0.48ns ± 5%       ~     (p=0.740 n=7+9)
      New                          28.2ns ±20%     26.6ns ± 3%       ~     (p=0.617 n=9+9)
      New-6                        4.69ns ± 4%     4.44ns ± 3%     -5.33%  (p=0.001 n=9+9)
      New-48                       1.10ns ± 9%     1.08ns ± 6%       ~     (p=0.285 n=9+8)
      
      name                       old alloc/op   new alloc/op    delta
      Call                          0.00B           0.00B            ~     (all equal)
      Call-6                        0.00B           0.00B            ~     (all equal)
      Call-48                       0.00B           0.00B            ~     (all equal)
      
      name                       old allocs/op  new allocs/op   delta
      Call                           0.00            0.00            ~     (all equal)
      Call-6                         0.00            0.00            ~     (all equal)
      Call-48                        0.00            0.00            ~     (all equal)
      
      name                       old speed      new speed       delta
      CallArgCopy/size=128        757MB/s ±11%    649MB/s ± 1%    -14.33%  (p=0.000 n=10+7)
      CallArgCopy/size=128-6      901MB/s ± 9%   3781MB/s ± 7%   +319.69%  (p=0.000 n=10+9)
      CallArgCopy/size=128-48    1.02GB/s ± 2%  14.63GB/s ± 6%  +1337.98%  (p=0.000 n=8+8)
      CallArgCopy/size=256       1.45GB/s ± 9%   1.30GB/s ± 5%    -10.17%  (p=0.000 n=10+9)
      CallArgCopy/size=256-6     1.73GB/s ±11%   7.28GB/s ± 7%   +320.76%  (p=0.000 n=10+9)
      CallArgCopy/size=256-48    2.00GB/s ± 4%  27.46GB/s ± 9%  +1270.85%  (p=0.000 n=10+9)
      CallArgCopy/size=1024      5.21GB/s ± 6%   4.49GB/s ± 8%    -13.74%  (p=0.000 n=10+9)
      CallArgCopy/size=1024-6    7.18GB/s ± 7%  24.17GB/s ± 5%   +236.64%  (p=0.000 n=9+8)
      CallArgCopy/size=1024-48   7.87GB/s ± 7%  98.43GB/s ± 1%  +1150.99%  (p=0.000 n=10+8)
      CallArgCopy/size=4096      12.3GB/s ± 6%   11.7GB/s ± 5%     -5.00%  (p=0.008 n=9+9)
      CallArgCopy/size=4096-6    23.8GB/s ±16%   65.6GB/s ± 5%   +175.02%  (p=0.000 n=10+8)
      CallArgCopy/size=4096-48   29.0GB/s ± 7%  279.6GB/s ± 6%   +862.87%  (p=0.000 n=10+8)
      CallArgCopy/size=65536     8.52GB/s ±11%   8.49GB/s ± 9%       ~     (p=0.842 n=10+9)
      CallArgCopy/size=65536-6   49.3GB/s ± 4%   49.0GB/s ± 6%       ~     (p=0.720 n=10+9)
      CallArgCopy/size=65536-48   189GB/s ± 2%    190GB/s ± 2%       ~     (p=0.211 n=10+9)
      
      https://perf.golang.org/search?q=upload:20170426.3
      
      Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27
      Reviewed-on: https://go-review.googlesource.com/41871
      Run-TryBot: Bryan Mills <bcmills@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      33b92cd6
    • Elias Naur's avatar
      misc/ios: increase iOS test harness timeout · 6e54fe47
      Elias Naur authored
      The "lldb start" phase often times out on the iOS builder. Increase
      the timeout and see if that helps.
      
      Change-Id: I92fd67cbfa90659600e713198d6b2c5c78dde20f
      Reviewed-on: https://go-review.googlesource.com/41863Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Elias Naur <elias.naur@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      6e54fe47
    • Weichao Tang's avatar
      net/http: close resp.Body when error occurred during redirection · e51e0f9c
      Weichao Tang authored
      Fixes #19976
      
      Change-Id: I48486467066784a9dcc24357ec94a1be85265a6f
      Reviewed-on: https://go-review.googlesource.com/40940
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      e51e0f9c
    • Wei Xiao's avatar
      cmd/internal/obj/arm64: fix encoding of condition · 2b6c58f6
      Wei Xiao authored
      The current code treats condition as special register and write
      its raw data directly into instruction.
      
      The fix converts the raw data into correct condition encoding.
      Also fix the operand catogery of FCCMP.
      
      Add tests to cover all cases.
      
      Change-Id: Ib194041bd9017dd0edbc241564fe983082ac616b
      Reviewed-on: https://go-review.googlesource.com/41511
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      2b6c58f6
    • Ian Lance Taylor's avatar
      os: use kernel limit on pipe size if possible · 220e0e0f
      Ian Lance Taylor authored
      Fixes #20134
      
      Change-Id: I92699d118c713179961c037a6bbbcbec4efa63ba
      Reviewed-on: https://go-review.googlesource.com/41823
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      220e0e0f
    • Nigel Tao's avatar
      image/jpeg: fix extended sequential Huffman table selector (Th). · 35cbc3b5
      Nigel Tao authored
      Previously, the package did not distinguish between baseline and
      extended sequential images. Both are non-progressive images, but the Th
      range differs between the two, as per Annex B of
      https://www.w3.org/Graphics/JPEG/itu-t81.pdf
      
      Extended sequential images are often emitted by the Guetzli encoder.
      
      Fixes #19913
      
      Change-Id: I3d0f9e16d5d374ee1c65e3a8fb87519de61cff94
      Reviewed-on: https://go-review.googlesource.com/41831Reviewed-by: 's avatarDavid Symonds <dsymonds@golang.org>
      35cbc3b5
    • Josh Bleecher Snyder's avatar
      cmd/compile: compile more complex functions first · 6664ccb4
      Josh Bleecher Snyder authored
      When using a concurrent backend,
      the overall compilation time is bounded
      in part by the slowest function to compile.
      The number of top-level statements in a function
      is an easily calculated and fairly reliable
      proxy for compilation time.
      
      Here's a standard compilecmp output for -c=8 with this CL:
      
      name       old time/op       new time/op       delta
      Template         127ms ± 4%        125ms ± 6%   -1.33%  (p=0.000 n=47+50)
      Unicode         84.8ms ± 4%       84.5ms ± 4%     ~     (p=0.217 n=49+49)
      GoTypes          289ms ± 3%        287ms ± 3%   -0.78%  (p=0.002 n=48+50)
      Compiler         1.36s ± 3%        1.34s ± 2%   -1.29%  (p=0.000 n=49+47)
      SSA              2.95s ± 3%        2.77s ± 4%   -6.23%  (p=0.000 n=50+49)
      Flate           70.7ms ± 3%       70.9ms ± 2%     ~     (p=0.112 n=50+49)
      GoParser        85.0ms ± 3%       83.0ms ± 4%   -2.31%  (p=0.000 n=48+49)
      Reflect          229ms ± 3%        225ms ± 4%   -1.83%  (p=0.000 n=49+49)
      Tar             70.2ms ± 3%       69.4ms ± 3%   -1.17%  (p=0.000 n=49+49)
      XML              115ms ± 7%        114ms ± 6%     ~     (p=0.158 n=49+47)
      
      name       old user-time/op  new user-time/op  delta
      Template         352ms ± 5%        342ms ± 8%   -2.74%  (p=0.000 n=49+50)
      Unicode          117ms ± 5%        118ms ± 4%   +0.88%  (p=0.005 n=46+48)
      GoTypes          986ms ± 3%        980ms ± 4%     ~     (p=0.110 n=46+48)
      Compiler         4.39s ± 2%        4.43s ± 4%   +0.97%  (p=0.002 n=50+50)
      SSA              12.0s ± 2%        13.3s ± 3%  +11.33%  (p=0.000 n=49+49)
      Flate            222ms ± 5%        219ms ± 6%   -1.56%  (p=0.002 n=50+50)
      GoParser         271ms ± 5%        268ms ± 4%   -0.83%  (p=0.036 n=49+48)
      Reflect          560ms ± 4%        571ms ± 3%   +1.90%  (p=0.000 n=50+49)
      Tar              183ms ± 3%        183ms ± 3%     ~     (p=0.903 n=45+50)
      XML              364ms ±13%        391ms ± 4%   +7.16%  (p=0.000 n=50+40)
      
      A more interesting way of viewing the data is by
      looking at the ratio of the time taken to compile
      the slowest-to-compile function to the overall
      time spent compiling functions.
      
      If this ratio is small (near 0), then increased concurrency might help.
      If this ratio is big (near 1), then we're bounded by that single function.
      
      I instrumented the compiler to emit this ratio per-package,
      ran 'go build -a -gcflags=-c=C -p=P std cmd' three times,
      for varying values of C and P,
      and collected the ratios encountered into an ASCII histogram.
      
      Here's c=1 p=1, which is a non-concurrent backend, single process at a time:
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|
       20%|**
       10%|***
        0%|*********
      ----+----------
          |0123456789
      
      The x-axis is floor(10*ratio), so the first column indicates the percent of
      ratios that fell in the 0% to 9.9999% range.
      We can see in this histogram that more concurrency will help;
      in most cases, the ratio is small.
      
      Here's c=8 p=1, before this CL:
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|         *
       20%|         *
       10%|*   *    *
        0%|**********
      ----+----------
          |0123456789
      
      In 30-40% of cases, we're mostly bound by the compilation time
      of a single function.
      
      Here's c=8 p=1, after this CL:
      
       90%|
       80%|
       70%|
       60%|
       50%|         *
       40%|         *
       30%|         *
       20%|         *
       10%|         *
        0%|**********
      ----+----------
          |0123456789
      
      The sorting pays off; we are bound by the
      compilation time of a single function in over half of packages.
      The single * in the histogram indicates 0-10%.
      The actual values for this chart are:
      0: 5%, 1: 1%, 2: 1%, 3: 4%, 4: 5%, 5: 7%, 6: 7%, 7: 7%, 8: 9%, 9: 55%
      
      This indicates that efforts to increase or enable more concurrency,
      e.g. by optimizing mutexes or increasing the value of c,
      will probably not yield fruit.
      That matches what compilecmp tells us.
      
      Further optimization efforts should thus focus instead on one of:
      
      (1) making more functions compile concurrently
      (2) improving the compilation time of the slowest functions
      (3) speeding up the remaining serial parts of the compiler
      (4) automatically splitting up some large autogenerated functions
          into small ones, as discussed in #19751
      
      I hope to spend more time on (1) before the freeze.
      
      Adding process parallelism doesn't change the story much.
      For example, here's c=8 p=8, after this CL:
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|         *
       30%|         *
       20%|         *
       10%|       ***
        0%|**********
      ----+----------
          |0123456789
      
      Since we don't need to worry much about p,
      these histograms can help us select a good
      general value of c to use as a default,
      assuming we're not bounded by GOMAXPROCS.
      
      Here are some charts after this CL, for c from 1 to 8:
      
      c=1 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|
       20%|**
       10%|***
        0%|*********
      ----+----------
          |0123456789
      
      c=2 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|
       20%|
       10%| ****    *
        0%|**********
      ----+----------
          |0123456789
      
      c=3 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|
       20%|         *
       10%|  ** *   *
        0%|**********
      ----+----------
          |0123456789
      
      c=4 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|         *
       20%|         *
       10%|     *   *
        0%|**********
      ----+----------
          |0123456789
      
      c=5 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|
       30%|         *
       20%|         *
       10%|     *   *
        0%|**********
      ----+----------
          |0123456789
      
      c=6 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|
       40%|         *
       30%|         *
       20%|         *
       10%|         *
        0%|**********
      ----+----------
          |0123456789
      
      c=7 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|         *
       40%|         *
       30%|         *
       20%|         *
       10%|        **
        0%|**********
      ----+----------
          |0123456789
      
      c=8 p=1
      
       90%|
       80%|
       70%|
       60%|
       50%|         *
       40%|         *
       30%|         *
       20%|         *
       10%|         *
        0%|**********
      ----+----------
          |0123456789
      
      Given the increased user-CPU costs as
      c increases, it looks like c=4 is probably
      the sweet spot, at least for now.
      
      Pleasingly, this matches (and explains)
      the results of the standard benchmarking
      that I have done.
      
      Updates #15756
      
      Change-Id: I82b606c06efd34a5dbd1afdbcf66a605905b2aeb
      Reviewed-on: https://go-review.googlesource.com/41192
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      6664ccb4
    • Josh Bleecher Snyder's avatar
      cmd/compile: add initial backend concurrency support · 756b9ce3
      Josh Bleecher Snyder authored
      This CL adds initial support for concurrent backend compilation.
      
      BACKGROUND
      
      The compiler currently consists (very roughly) of the following phases:
      
      1. Initialization.
      2. Lexing and parsing into the cmd/compile/internal/syntax AST.
      3. Translation into the cmd/compile/internal/gc AST.
      4. Some gc AST passes: typechecking, escape analysis, inlining,
         closure handling, expression evaluation ordering (order.go),
         and some lowering and optimization (walk.go).
      5. Translation into the cmd/compile/internal/ssa SSA form.
      6. Optimization and lowering of SSA form.
      7. Translation from SSA form to assembler instructions.
      8. Translation from assembler instructions to machine code.
      9. Writing lots of output: machine code, DWARF symbols,
         type and reflection info, export data.
      
      Phase 2 was already concurrent as of Go 1.8.
      
      Phase 3 is planned for eventual removal;
      we hope to go straight from syntax AST to SSA.
      
      Phases 5–8 are per-function; this CL adds support for
      processing multiple functions concurrently.
      The slowest phases in the compiler are 5 and 6,
      so this offers the opportunity for some good speed-ups.
      
      Unfortunately, it's not quite that straightforward.
      In the current compiler, the latter parts of phase 4
      (order, walk) are done function-at-a-time as needed.
      Making order and walk concurrency-safe proved hard,
      and they're not particularly slow, so there wasn't much reward.
      To enable phases 5–8 to be done concurrently,
      when concurrent backend compilation is requested,
      we complete phase 4 for all functions
      before starting later phases for any functions.
      
      Also, in reality, we automatically generate new
      functions in phase 9, such as method wrappers
      and equality and has routines.
      Those new functions then go through phases 4–8.
      This CL disables concurrent backend compilation
      after the first, big, user-provided batch of
      functions has been compiled.
      This is done to keep things simple,
      and because the autogenerated functions
      tend to be small, few, simple, and fast to compile.
      
      USAGE
      
      Concurrent backend compilation still defaults to off.
      To set the number of functions that may be backend-compiled
      concurrently, use the compiler flag -c.
      In future work, cmd/go will automatically set -c.
      
      Furthermore, this CL has been intentionally written
      so that the c=1 path has no backend concurrency whatsoever,
      not even spawning any goroutines.
      This helps ensure that, should problems arise
      late in the development cycle,
      we can simply have cmd/go set c=1 always,
      and revert to the original compiler behavior.
      
      MUTEXES
      
      Most of the work required to make concurrent backend
      compilation safe has occurred over the past month.
      This CL adds a handful of mutexes to get the rest of the way there;
      they are the mutexes that I didn't see a clean way to avoid.
      Some of them may still be eliminable in future work.
      
      In no particular order:
      
      * gc.funcsymsmu. The global funcsyms slice is populated
        lazily when we need function symbols for closures.
        This occurs during gc AST to SSA translation.
        The function funcsym also does a package lookup,
        which is a source of races on types.Pkg.Syms;
        funcsymsmu also covers that package lookup.
        This mutex is low priority: it adds a single global,
        it is in an infrequently used code path, and it is low contention.
        Since funcsyms may now be added in any order,
        we must sort them to preserve reproducible builds.
      
      * gc.largeStackFramesMu. We don't discover until after SSA compilation
        that a function's stack frame is gigantic.
        Recording that error happens basically never,
        but it does happen concurrently.
        Fix with a low priority mutex and sorting.
      
      * obj.Link.hashmu. ctxt.hash stores the mapping from
        types.Syms (compiler symbols) to obj.LSyms (linker symbols).
        It is accessed fairly heavily through all the phases.
        This is the only heavily contended mutex.
      
      * gc.signatlistmu. The global signatlist map is
        populated with types through several of the concurrent phases,
        including notably via ngotype during DWARF generation.
        It is low priority for removal.
      
      * gc.typepkgmu. Looking up symbols in the types package
        happens a fair amount during backend compilation
        and DWARF generation, particularly via ngotype.
        This mutex helps us to avoid a broader mutex on types.Pkg.Syms.
        It has low-to-moderate contention.
      
      * types.internedStringsmu. gc AST to SSA conversion and
        some SSA work introduce new autotmps.
        Those autotmps have their names interned to reduce allocations.
        That interning requires protecting types.internedStrings.
        The autotmp names are heavily re-used, and the mutex
        overhead and contention here are low, so it is probably
        a worthwhile performance optimization to keep this mutex.
      
      TESTING
      
      I have been testing this code locally by running
      'go install -race cmd/compile'
      and then doing
      'go build -a -gcflags=-c=128 std cmd'
      for all architectures and a variety of compiler flags.
      This obviously needs to be made part of the builders,
      but it is too expensive to make part of all.bash.
      I have filed #19962 for this.
      
      REPRODUCIBLE BUILDS
      
      This version of the compiler generates reproducible builds.
      Testing reproducible builds also needs automation, however,
      and is also too expensive for all.bash.
      This is #19961.
      
      Also of note is that some of the compiler flags used by 'toolstash -cmp'
      are currently incompatible with concurrent backend compilation.
      They still work fine with c=1.
      Time will tell whether this is a problem.
      
      NEXT STEPS
      
      * Continue to find and fix races and bugs,
        using a combination of code inspection, fuzzing,
        and hopefully some community experimentation.
        I do not know of any outstanding races,
        but there probably are some.
      * Improve testing.
      * Improve performance, for many values of c.
      * Integrate with cmd/go and fine tune.
      * Support concurrent compilation with the -race flag.
        It is a sad irony that it does not yet work.
      * Minor code cleanup that has been deferred during
        the last month due to uncertainty about the
        ultimate shape of this CL.
      
      PERFORMANCE
      
      Here's the buried lede, at last. :)
      
      All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop.
      
      First, going from tip to this CL with c=1 has almost no impact.
      
      name        old time/op       new time/op       delta
      Template          195ms ± 3%        194ms ± 5%    ~     (p=0.370 n=30+29)
      Unicode          86.6ms ± 3%       87.0ms ± 7%    ~     (p=0.958 n=29+30)
      GoTypes           548ms ± 3%        555ms ± 4%  +1.35%  (p=0.001 n=30+28)
      Compiler          2.51s ± 2%        2.54s ± 2%  +1.17%  (p=0.000 n=28+30)
      SSA               5.16s ± 3%        5.16s ± 2%    ~     (p=0.910 n=30+29)
      Flate             124ms ± 5%        124ms ± 4%    ~     (p=0.947 n=30+30)
      GoParser          146ms ± 3%        146ms ± 3%    ~     (p=0.150 n=29+28)
      Reflect           354ms ± 3%        352ms ± 4%    ~     (p=0.096 n=29+29)
      Tar               107ms ± 5%        106ms ± 3%    ~     (p=0.370 n=30+29)
      XML               200ms ± 4%        201ms ± 4%    ~     (p=0.313 n=29+28)
      [Geo mean]        332ms             333ms       +0.10%
      
      name        old user-time/op  new user-time/op  delta
      Template          227ms ± 5%        225ms ± 5%    ~     (p=0.457 n=28+27)
      Unicode           109ms ± 4%        109ms ± 5%    ~     (p=0.758 n=29+29)
      GoTypes           713ms ± 4%        721ms ± 5%    ~     (p=0.051 n=30+29)
      Compiler          3.36s ± 2%        3.38s ± 3%    ~     (p=0.146 n=30+30)
      SSA               7.46s ± 3%        7.47s ± 3%    ~     (p=0.804 n=30+29)
      Flate             146ms ± 7%        147ms ± 3%    ~     (p=0.833 n=29+27)
      GoParser          179ms ± 5%        179ms ± 5%    ~     (p=0.866 n=30+30)
      Reflect           431ms ± 4%        429ms ± 4%    ~     (p=0.593 n=29+30)
      Tar               124ms ± 5%        123ms ± 5%    ~     (p=0.140 n=29+29)
      XML               243ms ± 4%        242ms ± 7%    ~     (p=0.404 n=29+29)
      [Geo mean]        415ms             415ms       +0.02%
      
      name        old obj-bytes     new obj-bytes     delta
      Template           382k ± 0%         382k ± 0%    ~     (all equal)
      Unicode            203k ± 0%         203k ± 0%    ~     (all equal)
      GoTypes           1.18M ± 0%        1.18M ± 0%    ~     (all equal)
      Compiler          3.98M ± 0%        3.98M ± 0%    ~     (all equal)
      SSA               8.28M ± 0%        8.28M ± 0%    ~     (all equal)
      Flate              230k ± 0%         230k ± 0%    ~     (all equal)
      GoParser           287k ± 0%         287k ± 0%    ~     (all equal)
      Reflect           1.00M ± 0%        1.00M ± 0%    ~     (all equal)
      Tar                190k ± 0%         190k ± 0%    ~     (all equal)
      XML                416k ± 0%         416k ± 0%    ~     (all equal)
      [Geo mean]         660k              660k       +0.00%
      
      Comparing this CL to itself, from c=1 to c=2
      improves real times 20-30%, costs 5-10% more CPU time,
      and adds about 2% alloc.
      The allocation increase comes from allocating more ssa.Caches.
      
      name       old time/op       new time/op       delta
      Template         202ms ± 3%        149ms ± 3%  -26.15%  (p=0.000 n=49+49)
      Unicode         87.4ms ± 4%       84.2ms ± 3%   -3.68%  (p=0.000 n=48+48)
      GoTypes          560ms ± 2%        398ms ± 2%  -28.96%  (p=0.000 n=49+49)
      Compiler         2.46s ± 3%        1.76s ± 2%  -28.61%  (p=0.000 n=48+46)
      SSA              6.17s ± 2%        4.04s ± 1%  -34.52%  (p=0.000 n=49+49)
      Flate            126ms ± 3%         92ms ± 2%  -26.81%  (p=0.000 n=49+48)
      GoParser         148ms ± 4%        107ms ± 2%  -27.78%  (p=0.000 n=49+48)
      Reflect          361ms ± 3%        281ms ± 3%  -22.10%  (p=0.000 n=49+49)
      Tar              109ms ± 4%         86ms ± 3%  -20.81%  (p=0.000 n=49+47)
      XML              204ms ± 3%        144ms ± 2%  -29.53%  (p=0.000 n=48+45)
      
      name       old user-time/op  new user-time/op  delta
      Template         246ms ± 9%        246ms ± 4%     ~     (p=0.401 n=50+48)
      Unicode          109ms ± 4%        111ms ± 4%   +1.47%  (p=0.000 n=44+50)
      GoTypes          728ms ± 3%        765ms ± 3%   +5.04%  (p=0.000 n=46+50)
      Compiler         3.33s ± 3%        3.41s ± 2%   +2.31%  (p=0.000 n=49+48)
      SSA              8.52s ± 2%        9.11s ± 2%   +6.93%  (p=0.000 n=49+47)
      Flate            149ms ± 4%        161ms ± 3%   +8.13%  (p=0.000 n=50+47)
      GoParser         181ms ± 5%        192ms ± 2%   +6.40%  (p=0.000 n=49+46)
      Reflect          452ms ± 9%        474ms ± 2%   +4.99%  (p=0.000 n=50+48)
      Tar              126ms ± 6%        136ms ± 4%   +7.95%  (p=0.000 n=50+49)
      XML              247ms ± 5%        264ms ± 3%   +6.94%  (p=0.000 n=48+50)
      
      name       old alloc/op      new alloc/op      delta
      Template        38.8MB ± 0%       39.3MB ± 0%   +1.48%  (p=0.008 n=5+5)
      Unicode         29.8MB ± 0%       30.2MB ± 0%   +1.19%  (p=0.008 n=5+5)
      GoTypes          113MB ± 0%        114MB ± 0%   +0.69%  (p=0.008 n=5+5)
      Compiler         443MB ± 0%        447MB ± 0%   +0.95%  (p=0.008 n=5+5)
      SSA             1.25GB ± 0%       1.26GB ± 0%   +0.89%  (p=0.008 n=5+5)
      Flate           25.3MB ± 0%       25.9MB ± 1%   +2.35%  (p=0.008 n=5+5)
      GoParser        31.7MB ± 0%       32.2MB ± 0%   +1.59%  (p=0.008 n=5+5)
      Reflect         78.2MB ± 0%       78.9MB ± 0%   +0.91%  (p=0.008 n=5+5)
      Tar             26.6MB ± 0%       27.0MB ± 0%   +1.80%  (p=0.008 n=5+5)
      XML             42.4MB ± 0%       43.4MB ± 0%   +2.35%  (p=0.008 n=5+5)
      
      name       old allocs/op     new allocs/op     delta
      Template          379k ± 0%         378k ± 0%     ~     (p=0.421 n=5+5)
      Unicode           322k ± 0%         321k ± 0%     ~     (p=0.222 n=5+5)
      GoTypes          1.14M ± 0%        1.14M ± 0%     ~     (p=0.548 n=5+5)
      Compiler         4.12M ± 0%        4.11M ± 0%   -0.14%  (p=0.032 n=5+5)
      SSA              9.72M ± 0%        9.72M ± 0%     ~     (p=0.421 n=5+5)
      Flate             234k ± 1%         234k ± 0%     ~     (p=0.421 n=5+5)
      GoParser          316k ± 1%         315k ± 0%     ~     (p=0.222 n=5+5)
      Reflect           980k ± 0%         979k ± 0%     ~     (p=0.095 n=5+5)
      Tar               249k ± 1%         249k ± 1%     ~     (p=0.841 n=5+5)
      XML               392k ± 0%         391k ± 0%     ~     (p=0.095 n=5+5)
      
      From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%:
      
      name       old time/op       new time/op       delta
      Template         203ms ± 3%        131ms ± 5%  -35.45%  (p=0.000 n=50+50)
      Unicode         87.2ms ± 4%       84.1ms ± 2%   -3.61%  (p=0.000 n=48+47)
      GoTypes          560ms ± 4%        310ms ± 2%  -44.65%  (p=0.000 n=50+49)
      Compiler         2.47s ± 3%        1.41s ± 2%  -43.10%  (p=0.000 n=50+46)
      SSA              6.17s ± 2%        3.20s ± 2%  -48.06%  (p=0.000 n=49+49)
      Flate            126ms ± 4%         74ms ± 2%  -41.06%  (p=0.000 n=49+48)
      GoParser         148ms ± 4%         89ms ± 3%  -39.97%  (p=0.000 n=49+50)
      Reflect          360ms ± 3%        242ms ± 3%  -32.81%  (p=0.000 n=49+49)
      Tar              108ms ± 4%         73ms ± 4%  -32.48%  (p=0.000 n=50+49)
      XML              203ms ± 3%        119ms ± 3%  -41.56%  (p=0.000 n=49+48)
      
      name       old user-time/op  new user-time/op  delta
      Template         246ms ± 9%        287ms ± 9%  +16.98%  (p=0.000 n=50+50)
      Unicode          109ms ± 4%        118ms ± 5%   +7.56%  (p=0.000 n=46+50)
      GoTypes          735ms ± 4%        806ms ± 2%   +9.62%  (p=0.000 n=50+50)
      Compiler         3.34s ± 4%        3.56s ± 2%   +6.78%  (p=0.000 n=49+49)
      SSA              8.54s ± 3%       10.04s ± 3%  +17.55%  (p=0.000 n=50+50)
      Flate            149ms ± 6%        176ms ± 3%  +17.82%  (p=0.000 n=50+48)
      GoParser         181ms ± 5%        213ms ± 3%  +17.47%  (p=0.000 n=50+50)
      Reflect          453ms ± 6%        499ms ± 2%  +10.11%  (p=0.000 n=50+48)
      Tar              126ms ± 5%        149ms ±11%  +18.76%  (p=0.000 n=50+50)
      XML              246ms ± 5%        287ms ± 4%  +16.53%  (p=0.000 n=49+50)
      
      name       old alloc/op      new alloc/op      delta
      Template        38.8MB ± 0%       40.4MB ± 0%   +4.21%  (p=0.008 n=5+5)
      Unicode         29.8MB ± 0%       30.9MB ± 0%   +3.68%  (p=0.008 n=5+5)
      GoTypes          113MB ± 0%        116MB ± 0%   +2.71%  (p=0.008 n=5+5)
      Compiler         443MB ± 0%        455MB ± 0%   +2.75%  (p=0.008 n=5+5)
      SSA             1.25GB ± 0%       1.27GB ± 0%   +1.84%  (p=0.008 n=5+5)
      Flate           25.3MB ± 0%       26.9MB ± 1%   +6.31%  (p=0.008 n=5+5)
      GoParser        31.7MB ± 0%       33.2MB ± 0%   +4.61%  (p=0.008 n=5+5)
      Reflect         78.2MB ± 0%       80.2MB ± 0%   +2.53%  (p=0.008 n=5+5)
      Tar             26.6MB ± 0%       27.9MB ± 0%   +5.19%  (p=0.008 n=5+5)
      XML             42.4MB ± 0%       44.6MB ± 0%   +5.20%  (p=0.008 n=5+5)
      
      name       old allocs/op     new allocs/op     delta
      Template          380k ± 0%         379k ± 0%   -0.39%  (p=0.032 n=5+5)
      Unicode           321k ± 0%         321k ± 0%     ~     (p=0.841 n=5+5)
      GoTypes          1.14M ± 0%        1.14M ± 0%     ~     (p=0.421 n=5+5)
      Compiler         4.12M ± 0%        4.14M ± 0%   +0.52%  (p=0.008 n=5+5)
      SSA              9.72M ± 0%        9.76M ± 0%   +0.37%  (p=0.008 n=5+5)
      Flate             234k ± 1%         234k ± 1%     ~     (p=0.690 n=5+5)
      GoParser          316k ± 0%         317k ± 1%     ~     (p=0.841 n=5+5)
      Reflect           981k ± 0%         981k ± 0%     ~     (p=1.000 n=5+5)
      Tar               250k ± 0%         249k ± 1%     ~     (p=0.151 n=5+5)
      XML               393k ± 0%         392k ± 0%     ~     (p=0.056 n=5+5)
      
      Going beyond c=4 on my machine tends to increase CPU time and allocs
      without impacting real time.
      
      The CPU time numbers matter, because when there are many concurrent
      compilation processes, that will impact the overall throughput.
      
      The numbers above are in many ways the best case scenario;
      we can take full advantage of all cores.
      Fortunately, the most common compilation scenario is incremental
      re-compilation of a single package during a build/test cycle.
      
      Updates #15756
      
      Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea
      Reviewed-on: https://go-review.googlesource.com/40693Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      756b9ce3
  2. 26 Apr, 2017 30 commits
    • Alex Brainman's avatar
      os: do not report ModeDir for symlinks on windows · 1989921a
      Alex Brainman authored
      When using Lstat against symlinks that point to a directory,
      the function returns FileInfo with both ModeDir and ModeSymlink set.
      Change that to never set ModeDir if ModeSymlink is set.
      
      Fixes #10424
      Fixes #17540
      Fixes #17541
      
      Change-Id: Iba280888aad108360b8c1f18180a24493fe7ad2b
      Reviewed-on: https://go-review.googlesource.com/41830Reviewed-by: 's avatarDaniel Martí <mvdan@mvdan.cc>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      1989921a
    • Mostyn Bramley-Moore's avatar
      build: fail nicely if somebody runs all.bash from a binary tarball package · 3d86d45d
      Mostyn Bramley-Moore authored
      Fixes golang/go#20008.
      
      Change-Id: I7a429490320595fc558a8c5e260ec41bc3a788e2
      Reviewed-on: https://go-review.googlesource.com/41858Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      3d86d45d
    • Damien Lespiau's avatar
      cmd/internal/obj/x86: fix adcb r/mem8,reg8 encoding · 92d918da
      Damien Lespiau authored
      Taken from the Intel Software Development Manual (of course, in the line
      below it's ADC DST, SRC; The opposite of the commit subject).
      
        12 /r		ADC r8, r/m8
      
      We need 0x12 for the corresponding ytab line, not 0x10.
      
        {Ymb, Ynone, Yrb, Zm_r, 1},
      
      Updates #14069
      
      Change-Id: Id37cbd0c581c9988c2de355efa908956278e2189
      Reviewed-on: https://go-review.googlesource.com/41857Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      92d918da
    • Josh Bleecher Snyder's avatar
      cmd/compile: split dumptypestructs further · 92607fdd
      Josh Bleecher Snyder authored
      This is preparatory cleanup to make future changes clearer.
      
      Change-Id: I20fb9c78257de61b8bd096fce6b1e751995c01f2
      Reviewed-on: https://go-review.googlesource.com/41818
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      92607fdd
    • Russ Cox's avatar
      runtime/pprof: ignore dummy huge page mapping in /proc/self/maps · 3ddf6501
      Russ Cox authored
      Change-Id: I72bea1450386100482b4681b20eb9a9af12c7522
      Reviewed-on: https://go-review.googlesource.com/41816Reviewed-by: 's avatarMichael Matloob <matloob@golang.org>
      3ddf6501
    • Russ Cox's avatar
      runtime/pprof: add /proc/self/maps parsing test · d1ac5927
      Russ Cox authored
      Delete old TestRuntimeFunctionTrimming, which is testing a dead API
      and is now handled in end-to-end tests.
      
      Change-Id: I64fc2991ed4a7690456356b5f6b546f36935bb67
      Reviewed-on: https://go-review.googlesource.com/41815
      Run-TryBot: Russ Cox <rsc@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMichael Matloob <matloob@golang.org>
      d1ac5927
    • Bryan C. Mills's avatar
      encoding/json: parallelize most benchmarks · c5b6c2ab
      Bryan C. Mills authored
      Don't bother with BenchmarkDecoderStream — it's doing something subtle
      with the input buffer that isn't easy to replicate in a parallel test.
      
      Results remain comparable with the non-parallel version with -cpu=1:
      
      benchmark                          old ns/op     new ns/op     delta
      BenchmarkCodeEncoder               22815832      21058729      -7.70%
      BenchmarkCodeEncoder-6             22190561      3579757       -83.87%
      BenchmarkCodeMarshal               25356621      25396429      +0.16%
      BenchmarkCodeMarshal-6             25359813      4944908       -80.50%
      BenchmarkCodeDecoder               94794556      88016360      -7.15%
      BenchmarkCodeDecoder-6             93795028      16726283      -82.17%
      BenchmarkDecoderStream             532           583           +9.59%
      BenchmarkDecoderStream-6           598           550           -8.03%
      BenchmarkCodeUnmarshal             97644168      89162504      -8.69%
      BenchmarkCodeUnmarshal-6           96615302      17036419      -82.37%
      BenchmarkCodeUnmarshalReuse        91747073      90298479      -1.58%
      BenchmarkCodeUnmarshalReuse-6      89397165      15518005      -82.64%
      BenchmarkUnmarshalString           808           843           +4.33%
      BenchmarkUnmarshalString-6         912           220           -75.88%
      BenchmarkUnmarshalFloat64          695           732           +5.32%
      BenchmarkUnmarshalFloat64-6        710           191           -73.10%
      BenchmarkUnmarshalInt64            635           640           +0.79%
      BenchmarkUnmarshalInt64-6          618           185           -70.06%
      BenchmarkIssue10335                916           947           +3.38%
      BenchmarkIssue10335-6              879           216           -75.43%
      BenchmarkNumberIsValid             34.7          34.3          -1.15%
      BenchmarkNumberIsValid-6           34.9          36.7          +5.16%
      BenchmarkNumberIsValidRegexp       1174          1121          -4.51%
      BenchmarkNumberIsValidRegexp-6     1134          1119          -1.32%
      BenchmarkSkipValue                 20506938      20708060      +0.98%
      BenchmarkSkipValue-6               21627665      22375630      +3.46%
      BenchmarkEncoderEncode             690           726           +5.22%
      BenchmarkEncoderEncode-6           649           157           -75.81%
      
      benchmark                    old MB/s     new MB/s     speedup
      BenchmarkCodeEncoder         85.05        92.15        1.08x
      BenchmarkCodeEncoder-6       87.45        542.07       6.20x
      BenchmarkCodeMarshal         76.53        76.41        1.00x
      BenchmarkCodeMarshal-6       76.52        392.42       5.13x
      BenchmarkCodeDecoder         20.47        22.05        1.08x
      BenchmarkCodeDecoder-6       20.69        116.01       5.61x
      BenchmarkCodeUnmarshal       19.87        21.76        1.10x
      BenchmarkCodeUnmarshal-6     20.08        113.90       5.67x
      BenchmarkSkipValue           90.55        89.67        0.99x
      BenchmarkSkipValue-6         90.83        87.80        0.97x
      
      benchmark                    old allocs     new allocs     delta
      BenchmarkIssue10335          4              4              +0.00%
      BenchmarkIssue10335-6        4              4              +0.00%
      BenchmarkEncoderEncode       1              1              +0.00%
      BenchmarkEncoderEncode-6     1              1              +0.00%
      
      benchmark                    old bytes     new bytes     delta
      BenchmarkIssue10335          320           320           +0.00%
      BenchmarkIssue10335-6        320           320           +0.00%
      BenchmarkEncoderEncode       8             8             +0.00%
      BenchmarkEncoderEncode-6     8             8             +0.00%
      
      updates #18177
      
      Change-Id: Ia4f5bf5ac0afbadb1705ed9f9e1b39dabba67b40
      Reviewed-on: https://go-review.googlesource.com/36724Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c5b6c2ab
    • Bryan C. Mills's avatar
      reflect: parallelize benchmarks · f5f5a00b
      Bryan C. Mills authored
      Add a benchmark for PtrTo: it's the motivation for #17973, which is
      the motivation for #18177.
      
      Results remain comparable with the non-parallel version with -cpu=1:
      
      benchmark                             old ns/op     new ns/op     delta
      BenchmarkCall                         357           360           +0.84%
      BenchmarkCall-6                       90.3          90.7          +0.44%
      BenchmarkCallArgCopy/size=128         319           323           +1.25%
      BenchmarkCallArgCopy/size=128-6       329           82.2          -75.02%
      BenchmarkCallArgCopy/size=256         354           335           -5.37%
      BenchmarkCallArgCopy/size=256-6       340           85.2          -74.94%
      BenchmarkCallArgCopy/size=1024        374           703           +87.97%
      BenchmarkCallArgCopy/size=1024-6      378           95.8          -74.66%
      BenchmarkCallArgCopy/size=4096        627           631           +0.64%
      BenchmarkCallArgCopy/size=4096-6      643           120           -81.34%
      BenchmarkCallArgCopy/size=65536       10502         10169         -3.17%
      BenchmarkCallArgCopy/size=65536-6     10298         2240          -78.25%
      BenchmarkFieldByName1                 139           132           -5.04%
      BenchmarkFieldByName1-6               144           24.9          -82.71%
      BenchmarkFieldByName2                 2721          2778          +2.09%
      BenchmarkFieldByName2-6               3953          578           -85.38%
      BenchmarkFieldByName3                 19136         18357         -4.07%
      BenchmarkFieldByName3-6               23072         3850          -83.31%
      BenchmarkInterfaceBig                 12.7          15.5          +22.05%
      BenchmarkInterfaceBig-6               14.2          2.48          -82.54%
      BenchmarkInterfaceSmall               13.1          15.1          +15.27%
      BenchmarkInterfaceSmall-6             13.0          2.54          -80.46%
      BenchmarkNew                          43.8          43.0          -1.83%
      BenchmarkNew-6                        40.5          6.67          -83.53%
      
      benchmark                             old MB/s     new MB/s     speedup
      BenchmarkCallArgCopy/size=128         400.24       395.15       0.99x
      BenchmarkCallArgCopy/size=128-6       388.74       1557.76      4.01x
      BenchmarkCallArgCopy/size=256         722.44       762.44       1.06x
      BenchmarkCallArgCopy/size=256-6       751.98       3003.83      3.99x
      BenchmarkCallArgCopy/size=1024        2733.22      1455.50      0.53x
      BenchmarkCallArgCopy/size=1024-6      2706.40      10687.53     3.95x
      BenchmarkCallArgCopy/size=4096        6523.32      6488.25      0.99x
      BenchmarkCallArgCopy/size=4096-6      6363.85      34003.09     5.34x
      BenchmarkCallArgCopy/size=65536       6239.88      6444.46      1.03x
      BenchmarkCallArgCopy/size=65536-6     6363.83      29255.26     4.60x
      
      benchmark           old allocs     new allocs     delta
      BenchmarkCall       0              0              +0.00%
      BenchmarkCall-6     0              0              +0.00%
      
      benchmark           old bytes     new bytes     delta
      BenchmarkCall       0             0             +0.00%
      BenchmarkCall-6     0             0             +0.00%
      
      updates #17973
      updates #18177
      
      Change-Id: If70c5c742e8d1b138347f4963ad7cff38fffc018
      Reviewed-on: https://go-review.googlesource.com/36831
      Run-TryBot: Bryan Mills <bcmills@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      f5f5a00b
    • Bryan C. Mills's avatar
      encoding/gob: parallelize Encode/Decode benchmarks · 3058b1f5
      Bryan C. Mills authored
      Results remain comparable with the non-parallel version with -cpu=1:
      
      benchmark                              old ns/op     new ns/op     delta
      BenchmarkEndToEndPipe                  6200          6171          -0.47%
      BenchmarkEndToEndPipe-6                1073          1024          -4.57%
      BenchmarkEndToEndByteBuffer            2925          2664          -8.92%
      BenchmarkEndToEndByteBuffer-6          516           560           +8.53%
      BenchmarkEndToEndSliceByteBuffer       231683        237450        +2.49%
      BenchmarkEndToEndSliceByteBuffer-6     59080         59452         +0.63%
      BenchmarkEncodeComplex128Slice         67541         66003         -2.28%
      BenchmarkEncodeComplex128Slice-6       72740         11316         -84.44%
      BenchmarkEncodeFloat64Slice            25769         27899         +8.27%
      BenchmarkEncodeFloat64Slice-6          26655         4557          -82.90%
      BenchmarkEncodeInt32Slice              18685         18845         +0.86%
      BenchmarkEncodeInt32Slice-6            18389         3462          -81.17%
      BenchmarkEncodeStringSlice             19089         19354         +1.39%
      BenchmarkEncodeStringSlice-6           20155         3237          -83.94%
      BenchmarkEncodeInterfaceSlice          659601        677129        +2.66%
      BenchmarkEncodeInterfaceSlice-6        640974        251621        -60.74%
      BenchmarkDecodeComplex128Slice         117130        129955        +10.95%
      BenchmarkDecodeComplex128Slice-6       155447        24924         -83.97%
      BenchmarkDecodeFloat64Slice            67695         68776         +1.60%
      BenchmarkDecodeFloat64Slice-6          82966         15225         -81.65%
      BenchmarkDecodeInt32Slice              63102         62733         -0.58%
      BenchmarkDecodeInt32Slice-6            77857         13003         -83.30%
      BenchmarkDecodeStringSlice             130240        129562        -0.52%
      BenchmarkDecodeStringSlice-6           165500        31507         -80.96%
      BenchmarkDecodeInterfaceSlice          937637        1060835       +13.14%
      BenchmarkDecodeInterfaceSlice-6        973495        270613        -72.20%
      
      updates #18177
      
      Change-Id: Ib3579010faa70827d5cbd02a826dbbb66ca13eb7
      Reviewed-on: https://go-review.googlesource.com/36722
      Run-TryBot: Bryan Mills <bcmills@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      3058b1f5
    • Bryan C. Mills's avatar
      encoding/xml: parallelize benchmarks · 9d37d4c8
      Bryan C. Mills authored
      Results remain comparable with the non-parallel version with -cpu=1:
      benchmark                old ns/op     new ns/op     delta
      BenchmarkMarshal         31220         28618         -8.33%
      BenchmarkMarshal-6       37181         7658          -79.40%
      BenchmarkUnmarshal       81837         83522         +2.06%
      BenchmarkUnmarshal-6     96339         18244         -81.06%
      
      benchmark                old allocs     new allocs     delta
      BenchmarkMarshal         23             23             +0.00%
      BenchmarkMarshal-6       23             23             +0.00%
      BenchmarkUnmarshal       189            189            +0.00%
      BenchmarkUnmarshal-6     189            189            +0.00%
      
      benchmark                old bytes     new bytes     delta
      BenchmarkMarshal         5776          5776          +0.00%
      BenchmarkMarshal-6       5776          5776          +0.00%
      BenchmarkUnmarshal       8576          8576          +0.00%
      BenchmarkUnmarshal-6     8576          8576          +0.00%
      
      updates #18177
      
      Change-Id: I7e7055a11d18896bd54d7d773f2ec64767cdb4c8
      Reviewed-on: https://go-review.googlesource.com/36810
      Run-TryBot: Bryan Mills <bcmills@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      9d37d4c8
    • Bryan C. Mills's avatar
      sync: import Map from x/sync/syncmap · 959025c0
      Bryan C. Mills authored
      This is a direct port of the version from
      commit a60ad46e0ed33d02e09bda439efaf9c9727dbc6c
      (https://go-review.googlesource.com/c/37342/).
      
      updates #17973
      updates #18177
      
      Change-Id: I63fa5ef6951b1edd39f84927d1181a4df9b15385
      Reviewed-on: https://go-review.googlesource.com/36617Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      959025c0
    • Josh Bleecher Snyder's avatar
      cmd/compile: minor cleanup · e1a7db7f
      Josh Bleecher Snyder authored
      Follow-up to review comments on CL 41797.
      
      Mask the input to set2 and set3, so that at the very least,
      we won't corrupt the rest of the flags in case of a bad input.
      It also seems more semantically appropriate.
      
      Do minor cleanup in addrescapes. I started on larger cleanup,
      but it wasn't clear that it was an improvement.
      
      Add warning comments and sanity checks to Initorder and Class constants,
      to attempt to prevent them from overflowing their allotted flag bits.
      
      Passes toolstash-check.
      
      Change-Id: I57b9661ba36f56406aa7a1d8da9b7c70338f9119
      Reviewed-on: https://go-review.googlesource.com/41817
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      e1a7db7f
    • Lynn Boger's avatar
      cmd/internal/obj/ppc64: use MOVDU to update stack reg for leaf functions where possible · 6910e108
      Lynn Boger authored
      When the stack register is decremented to acquire stack space at
      the beginning of a function, a MOVDU should be used so it is done
      atomically, unless the size of the stack frame is too large for
      that instruction.  The code to determine whether to use MOVDU
      or MOVD was checking if the function was a leaf and always generating MOVD
      when it was.  The choice of MOVD vs. MOVDU should only depend on the stack
      frame size.  This fixes that problem.
      
      Change-Id: I0e49c79036f1e8f7584179e1442b938fc6da085f
      Reviewed-on: https://go-review.googlesource.com/41813Reviewed-by: 's avatarMichael Munday <munday@ca.ibm.com>
      6910e108
    • Josh Bleecher Snyder's avatar
      cmd/compile: move Node.Class to flags · 386765af
      Josh Bleecher Snyder authored
      Put it at position zero, since it is fairly hot.
      
      This shrinks gc.Node into a smaller size class on 64 bit systems.
      
      name        old time/op       new time/op       delta
      Template          193ms ± 5%        192ms ± 3%    ~     (p=0.353 n=94+93)
      Unicode          86.1ms ± 5%       85.0ms ± 4%  -1.23%  (p=0.000 n=95+98)
      GoTypes           546ms ± 3%        544ms ± 4%  -0.40%  (p=0.007 n=94+97)
      Compiler          2.56s ± 3%        2.54s ± 3%  -0.67%  (p=0.000 n=99+97)
      SSA               5.13s ± 2%        5.10s ± 3%  -0.55%  (p=0.000 n=94+98)
      Flate             122ms ± 6%        121ms ± 4%  -0.75%  (p=0.002 n=97+95)
      GoParser          144ms ± 5%        144ms ± 4%    ~     (p=0.298 n=98+97)
      Reflect           348ms ± 4%        349ms ± 4%    ~     (p=0.350 n=98+97)
      Tar               105ms ± 5%        104ms ± 5%    ~     (p=0.154 n=96+98)
      XML               200ms ± 5%        198ms ± 4%  -0.71%  (p=0.015 n=97+98)
      [Geo mean]        330ms             328ms       -0.52%
      
      name        old user-time/op  new user-time/op  delta
      Template          229ms ±11%        224ms ± 7%  -2.16%  (p=0.001 n=100+87)
      Unicode           109ms ± 5%        109ms ± 6%    ~     (p=0.897 n=96+91)
      GoTypes           712ms ± 4%        709ms ± 4%    ~     (p=0.085 n=96+98)
      Compiler          3.41s ± 3%        3.36s ± 3%  -1.43%  (p=0.000 n=98+98)
      SSA               7.46s ± 3%        7.31s ± 3%  -2.02%  (p=0.000 n=100+99)
      Flate             145ms ± 6%        143ms ± 6%  -1.11%  (p=0.001 n=99+97)
      GoParser          177ms ± 5%        176ms ± 5%  -0.78%  (p=0.018 n=95+95)
      Reflect           432ms ± 7%        435ms ± 9%    ~     (p=0.296 n=100+100)
      Tar               121ms ± 7%        121ms ± 5%    ~     (p=0.072 n=100+95)
      XML               241ms ± 4%        239ms ± 5%    ~     (p=0.085 n=97+99)
      [Geo mean]        413ms             410ms       -0.73%
      
      name        old alloc/op      new alloc/op      delta
      Template         38.4MB ± 0%       37.7MB ± 0%  -1.85%  (p=0.008 n=5+5)
      Unicode          30.1MB ± 0%       28.8MB ± 0%  -4.09%  (p=0.008 n=5+5)
      GoTypes           112MB ± 0%        110MB ± 0%  -1.69%  (p=0.008 n=5+5)
      Compiler          470MB ± 0%        461MB ± 0%  -1.91%  (p=0.008 n=5+5)
      SSA              1.13GB ± 0%       1.11GB ± 0%  -1.70%  (p=0.008 n=5+5)
      Flate            25.0MB ± 0%       24.6MB ± 0%  -1.67%  (p=0.008 n=5+5)
      GoParser         31.6MB ± 0%       31.1MB ± 0%  -1.66%  (p=0.008 n=5+5)
      Reflect          77.1MB ± 0%       75.8MB ± 0%  -1.69%  (p=0.008 n=5+5)
      Tar              26.3MB ± 0%       25.7MB ± 0%  -2.06%  (p=0.008 n=5+5)
      XML              41.9MB ± 0%       41.1MB ± 0%  -1.93%  (p=0.008 n=5+5)
      [Geo mean]       73.5MB            72.0MB       -2.03%
      
      name        old allocs/op     new allocs/op     delta
      Template           383k ± 0%         383k ± 0%    ~     (p=0.690 n=5+5)
      Unicode            343k ± 0%         343k ± 0%    ~     (p=0.841 n=5+5)
      GoTypes           1.16M ± 0%        1.16M ± 0%    ~     (p=0.310 n=5+5)
      Compiler          4.43M ± 0%        4.42M ± 0%  -0.17%  (p=0.008 n=5+5)
      SSA               9.85M ± 0%        9.85M ± 0%    ~     (p=0.310 n=5+5)
      Flate              236k ± 0%         236k ± 1%    ~     (p=0.841 n=5+5)
      GoParser           320k ± 0%         320k ± 0%    ~     (p=0.421 n=5+5)
      Reflect            988k ± 0%         987k ± 0%    ~     (p=0.690 n=5+5)
      Tar                252k ± 0%         251k ± 0%    ~     (p=0.095 n=5+5)
      XML                399k ± 0%         399k ± 0%    ~     (p=1.000 n=5+5)
      [Geo mean]         741k              740k       -0.07%
      
      Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c
      Reviewed-on: https://go-review.googlesource.com/41797
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarMarvin Stenger <marvin.stenger94@gmail.com>
      386765af
    • Justin Nuß's avatar
      encoding/csv: add option to reuse slices returned by Read · 2181653b
      Justin Nuß authored
      In many cases the records returned by Reader.Read will only be used between calls
      to Read and become garbage once a new record is read. In this case, instead of
      allocating a new slice on each call to Read, we can reuse the last allocated slice
      for successive calls to avoid unnecessary allocations.
      
      This change adds a new field ReuseRecord to the Reader struct to enable this reuse.
      
      ReuseRecord is false by default to avoid breaking existing code which dependss on
      the current behaviour.
      
      I also added 4 new benchmarks, corresponding to the existing Read benchmarks, which
      set ReuseRecord to true.
      
      Benchstat on my local machine (old is ReuseRecord = false, new is ReuseRecord = true)
      
      name                          old time/op    new time/op    delta
      Read-8                          2.75µs ± 2%    1.88µs ± 1%  -31.52%  (p=0.000 n=14+15)
      ReadWithFieldsPerRecord-8       2.75µs ± 0%    1.89µs ± 1%  -31.43%  (p=0.000 n=13+13)
      ReadWithoutFieldsPerRecord-8    2.77µs ± 1%    1.88µs ± 1%  -32.06%  (p=0.000 n=15+15)
      ReadLargeFields-8               55.4µs ± 1%    54.2µs ± 0%   -2.07%  (p=0.000 n=15+14)
      
      name                          old alloc/op   new alloc/op   delta
      Read-8                            664B ± 0%       24B ± 0%  -96.39%  (p=0.000 n=15+15)
      ReadWithFieldsPerRecord-8         664B ± 0%       24B ± 0%  -96.39%  (p=0.000 n=15+15)
      ReadWithoutFieldsPerRecord-8      664B ± 0%       24B ± 0%  -96.39%  (p=0.000 n=15+15)
      ReadLargeFields-8               3.94kB ± 0%    2.98kB ± 0%  -24.39%  (p=0.000 n=15+15)
      
      name                          old allocs/op  new allocs/op  delta
      Read-8                            18.0 ± 0%       8.0 ± 0%  -55.56%  (p=0.000 n=15+15)
      ReadWithFieldsPerRecord-8         18.0 ± 0%       8.0 ± 0%  -55.56%  (p=0.000 n=15+15)
      ReadWithoutFieldsPerRecord-8      18.0 ± 0%       8.0 ± 0%  -55.56%  (p=0.000 n=15+15)
      ReadLargeFields-8                 24.0 ± 0%      12.0 ± 0%  -50.00%  (p=0.000 n=15+15)
      
      Fixes #19721
      
      Change-Id: I79b14128bb9bb3465f53f40f93b1b528a9da6f58
      Reviewed-on: https://go-review.googlesource.com/41730Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      2181653b
    • Brandon Bennett's avatar
      testing: add argument to list tests, benchmarks, and examples · ba8ff87d
      Brandon Bennett authored
      Some large testing/build systems require some form of test discovery before
      running tests.  This usually allows for analytics, history, and stats on a per
      tests basis.  Typically these systems are meant used in multi-language
      environments and the original source code is not known or available.
      
      This adds a -test.list option which takes a regular expression as an
      argument. Any tests, benchmarks, or examples that match that regular
      expression will be printed, one per line, to stdout and then the program
      will exit.
      
      Since subtests are named/discovered at run time this will only show
      top-level tests names and is a known limitation.
      
      Fixes #17209
      
      Change-Id: I7e607f5f4f084d623a1cae88a1f70e7d92b7f13e
      Reviewed-on: https://go-review.googlesource.com/41195Reviewed-by: 's avatarMarcel van Lohuizen <mpvl@golang.org>
      Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      ba8ff87d
    • Russ Cox's avatar
      context: define behavior for Err before Done is closed · 6e2c4bc0
      Russ Cox authored
      The Context definition to date has not defined what Err returns
      before the Done channel is closed. Define that it returns nil,
      as most implementations do.
      
      All the standard context implementations (those in package
      context and in golang.org/x/net/context) return Err() == nil
      when Done is not yet closed. However, some non-standard
      implementations may exist that return Err() != nil in this case,
      as permitted by the Context definition before this date.
      Call these "errorful implementations".
      
      Because all the standard context implementations ensure that
      Err() == nil when Done is not yet closed, clients now exist that
      assume Err() != nil implies Done is closed and use calling Err
      as a quick short-circuit check instead of first doing a non-blocking
      receive from Done and then, if that succeeds, needing to call Err.
      This assumption holds for all the standard Context implementations,
      so these clients work fine in practice, even though they are making
      unwarranted assumptions about the Context implementations.
      Call these "technically incorrect clients".
      
      If a technically incorrect client encounters an errorful
      implementation, the client misbehaves. Because there are few
      errorful implementations, over time we expect that many clients
      will end up being technically incorrect without realizing it,
      leading to latent, subtle bugs. If we want to eliminate these
      latent, subtle bugs, there are two ways to do this:
      either make errorful implementations more common
      (exposing the client bugs more often) or redefine the Context
      interface so that the clients are not buggy after all.
      
      If we make errorful implementations more common, such
      as by changing the standard context implementations to
      return ErrNotDone instead of nil when Err is called before
      Done is closed, this will shake out essentially all of the
      technically incorrect clients, forcing people to find and fix
      those clients during the transition to Go 1.9.
      Technically this is allowed by the compatibility policy,
      but we expect there are many pieces of code assuming
      that Err() != nil means done, so updating will cause real pain.
      
      If instead we disallow errorful implementations, then they
      will need to be fixed as they are discovered, but the fault
      will officially lie in the errorful Context implementation,
      not in the clients. Technically this is disallowed by the compatibility
      policy, because these errorful implementations were "correct"
      in earlier versions of Go, except that they didn't work with
      common client code. We expect there are hardly any errorful
      implementations, so that disallowing them will be less disruptive
      and more in the spirit of the compatibility policy.
      
      This CL takes the path of expected least disruption,
      narrowing the Context interface semantics and potentially
      invalidating existing implementations. A survey of the
      go-corpus v0.01 turned up only five Context implementations,
      all trivial and none errorful (details in #19856).
      We are aware of one early Context implementation inside Google,
      from before even golang.org/x/net/context existed,
      that is errorful. The misbehavior of an open-source library
      when passed such a context is what prompted #19856.
      That context implementation would be disallowed after this CL
      and would need to be corrected. We are aware of no other
      affected context implementations. On the other hand, a survey
      of the go-corpus v0.01 turned up many instances of client
      code assuming that Err() == nil implies not done yet
      (details also in #19856). On balance, narrowing Context and
      thereby allowing Err() == nil checks should invalidate significantly
      less code than a push to flush out all the currently technically
      incorrect Err() == nil checks.
      
      If release feedback shows that we're wrong about this balance,
      we can roll back this CL and try again in Go 1.10.
      
      Fixes #19856.
      
      Change-Id: Id45d126fac70e1fcc42d73e5a87ca1b66935b831
      Reviewed-on: https://go-review.googlesource.com/40291
      Run-TryBot: Russ Cox <rsc@golang.org>
      Reviewed-by: 's avatarSameer Ajmani <sameer@golang.org>
      6e2c4bc0
    • David du Colombier's avatar
      net: fix close on closed listener on Plan 9 · 8a4087ae
      David du Colombier authored
      Since close errors have been cleaned up in CL 39997,
      TestCloseError is failing on Plan 9, because
      TCPListener.Close didn't check that the listener
      has already been closed before writing the "hangup"
      string to the listener control file.
      
      This change fixes TCPListener.Close on Plan 9,
      by closing poll.FD before writing the "hangup"
      string.
      
      Fixes #20128.
      
      Change-Id: I13862b23a9055dd1be658acef7066707d98c591f
      Reviewed-on: https://go-review.googlesource.com/41850
      Run-TryBot: David du Colombier <0intro@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      8a4087ae
    • Fangming.Fang's avatar
      cmd/internal: fix bug getting wrong indicator in DRconv() · aecf73fc
      Fangming.Fang authored
      Change-Id: I251ae497b0ab237d4b3fe98e397052394142d437
      Reviewed-on: https://go-review.googlesource.com/41653Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      aecf73fc
    • Mike Strosaker's avatar
      crypto/sha256,crypto/sha512: improve performance for sha{256,512}.block on ppc64le · 48582e15
      Mike Strosaker authored
      This updates sha256.block and sha512.block to use vector instructions.  While
      each round must still be performed independently, this allows for the use of
      the vshasigma{w,d} crypto acceleration instructions.
      
      For crypto/sha256:
      
      benchmark               old ns/op     new ns/op     delta
      BenchmarkHash8Bytes     570           300           -47.37%
      BenchmarkHash1K         7529          3018          -59.91%
      BenchmarkHash8K         55308         21938         -60.33%
      
      benchmark               old MB/s     new MB/s     speedup
      BenchmarkHash8Bytes     14.01        26.58        1.90x
      BenchmarkHash1K         136.00       339.23       2.49x
      BenchmarkHash8K         148.11       373.40       2.52x
      
      For crypto/sha512:
      
      benchmark               old ns/op     new ns/op     delta
      BenchmarkHash8Bytes     725           394           -45.66%
      BenchmarkHash1K         5062          2107          -58.38%
      BenchmarkHash8K         34711         13918         -59.90%
      
      benchmark               old MB/s     new MB/s     speedup
      BenchmarkHash8Bytes     11.03        20.29        1.84x
      BenchmarkHash1K         202.28       485.84       2.40x
      BenchmarkHash8K         236.00       588.56       2.49x
      
      Fixes #20069
      
      Change-Id: I28bffe6e9eb484a83a004116fce84acb4942abca
      Reviewed-on: https://go-review.googlesource.com/41391
      Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCarlos Eduardo Seo <cseo@linux.vnet.ibm.com>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      Reviewed-by: 's avatarLynn Boger <laboger@linux.vnet.ibm.com>
      48582e15
    • Aliaksandr Valialkin's avatar
      runtime: align mcentral by cache line size · 259d6099
      Aliaksandr Valialkin authored
      This may improve perormance during concurrent access
      to mheap.central array from multiple CPU cores.
      
      Change-Id: I8f48dd2e72aa62e9c32de07ae60fe552d8642782
      Reviewed-on: https://go-review.googlesource.com/41550Reviewed-by: 's avatarAustin Clements <austin@google.com>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      259d6099
    • Emmanuel Odeke's avatar
      net: defer file.close() + minor style cleanup · c433c374
      Emmanuel Odeke authored
      Moved the relevant file.close() usages close to after the
      file opens and put them in defer statements, so that readers
      don't have to think too much as to where the file is
      being closed.
      
      Change-Id: Ic4190b02ea2f5ac281b9ba104e0023e9f87ca8c7
      Reviewed-on: https://go-review.googlesource.com/41796Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c433c374
    • Ian Lance Taylor's avatar
      os: consistently return ErrClosed for closed file · e3d7ec00
      Ian Lance Taylor authored
      Catch all the cases where a file operation might return ErrFileClosing,
      and convert to ErrClosed. Use a new method for the conversion, which
      permits us to remove some KeepAlive calls.
      
      Change-Id: I584178f297efe6cb86f3090b2341091b412f1041
      Reviewed-on: https://go-review.googlesource.com/41793
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      e3d7ec00
    • Josh Bleecher Snyder's avatar
      cmd/compile: move Node.Typecheck to flags · 502a03ff
      Josh Bleecher Snyder authored
      Change-Id: Id5aa4a1499068bf2d3497b21d794f970b7e47fdf
      Reviewed-on: https://go-review.googlesource.com/41795
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      502a03ff
    • Josh Bleecher Snyder's avatar
      cmd/compile: move Node.Initorder to flags · e2560ace
      Josh Bleecher Snyder authored
      Grand savings: 6 bits.
      
      Change-Id: I364be54cc41534689e01672ed0fe2c10a560d3d4
      Reviewed-on: https://go-review.googlesource.com/41794
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      e2560ace
    • Josh Bleecher Snyder's avatar
      cmd/compile: convert Node.Embedded into a flag · af7da9a5
      Josh Bleecher Snyder authored
      Change-Id: I30c59ba84dcacc3de39c42f94484b47bb7c36eba
      Reviewed-on: https://go-review.googlesource.com/41792
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      af7da9a5
    • Todd Neal's avatar
      plugin: resolve plugin import path issue · 7a92395d
      Todd Neal authored
      Resolve import paths to get plugin symbol prefixes.
      
      Fixes #19534
      
      Change-Id: Ic25d83e72465ba8f6be0337218a1627b5dc702dc
      Reviewed-on: https://go-review.googlesource.com/40994
      Run-TryBot: Todd Neal <todd@tneal.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Crawshaw <crawshaw@golang.org>
      7a92395d
    • Michael Fraenkel's avatar
      net/http: make LocalAddrContext handle wildcard interface · 819d1cce
      Michael Fraenkel authored
      The LocalAddrContext should have the network address of the actual
      interface.
      
      Fixes #18686
      
      Change-Id: I9c401eda312f3a0e7e65b013af827aeeef3b4d3d
      Reviewed-on: https://go-review.googlesource.com/35490
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      819d1cce
    • Josh Bleecher Snyder's avatar
      cmd/compile: move Node.Walkdef into flags · d2863996
      Josh Bleecher Snyder authored
      Node.Walkdef is 0, 1, or 2, so it only requires two bits.
      Add support for 2-bit values to bitset,
      and use it for Node.Walkdef.
      
      Class, Embedded, Typecheck, and Initorder will follow suit
      in subsequent CLs.
      
      The multi-bit flags will go at the beginning,
      since that generates (marginally) more efficient code.
      
      Change-Id: Id6e2e66e437f10aaa05b8a6e1652efb327d06128
      Reviewed-on: https://go-review.googlesource.com/41791
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      d2863996
    • Josh Bleecher Snyder's avatar
      cmd/compile: delete bitset16 · 804784c8
      Josh Bleecher Snyder authored
      It is no longer used.
      
      Change-Id: Id64f387867a0503d13eaecda12e6606682c24595
      Reviewed-on: https://go-review.googlesource.com/41790
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      804784c8