1. 12 May, 2017 6 commits
  2. 11 May, 2017 7 commits
    • Keith Randall's avatar
      cmd/compile: fix store chain in schedule pass · 978af9c2
      Keith Randall authored
      Tuple ops are weird. They are essentially a pair of ops,
      one which consumes a mem and one which generates a mem (the Select1).
      The schedule pass didn't handle these quite right.
      
      Fix the scheduler to include both parts of the paired op in
      the store chain. That makes sure that loads are correctly ordered
      with respect to the first of the pair.
      
      Add a check for the ssacheck builder, that there is only one
      live store at a time. I thought we already had such a check, but
      apparently not...
      
      Fixes #20335
      
      Change-Id: I59eb3446a329100af38d22820b1ca2190ca46a78
      Reviewed-on: https://go-review.googlesource.com/43294
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      978af9c2
    • Josh Bleecher Snyder's avatar
      cmd/compile: restore panic deduplication · e5bb5e39
      Josh Bleecher Snyder authored
      The switch to detailed position information broke
      the removal of duplicate panics on the same line.
      Restore it.
      
      Neutral compiler performance impact:
      
      name        old alloc/op      new alloc/op      delta
      Template         38.8MB ± 0%       38.8MB ± 0%    ~     (p=0.690 n=5+5)
      Unicode          28.7MB ± 0%       28.7MB ± 0%  +0.13%  (p=0.032 n=5+5)
      GoTypes           109MB ± 0%        109MB ± 0%    ~     (p=1.000 n=5+5)
      Compiler          457MB ± 0%        457MB ± 0%    ~     (p=0.151 n=5+5)
      SSA              1.09GB ± 0%       1.10GB ± 0%  +0.17%  (p=0.008 n=5+5)
      Flate            24.6MB ± 0%       24.5MB ± 0%  -0.35%  (p=0.008 n=5+5)
      GoParser         30.9MB ± 0%       31.0MB ± 0%    ~     (p=0.421 n=5+5)
      Reflect          73.4MB ± 0%       73.4MB ± 0%    ~     (p=0.056 n=5+5)
      Tar              25.6MB ± 0%       25.5MB ± 0%  -0.61%  (p=0.008 n=5+5)
      XML              40.9MB ± 0%       40.9MB ± 0%    ~     (p=0.841 n=5+5)
      [Geo mean]       71.6MB            71.6MB       -0.07%
      
      name        old allocs/op     new allocs/op     delta
      Template           394k ± 0%         395k ± 1%    ~     (p=0.151 n=5+5)
      Unicode            343k ± 0%         344k ± 0%  +0.38%  (p=0.032 n=5+5)
      GoTypes           1.16M ± 0%        1.16M ± 0%    ~     (p=1.000 n=5+5)
      Compiler          4.41M ± 0%        4.42M ± 0%    ~     (p=0.151 n=5+5)
      SSA               9.79M ± 0%        9.79M ± 0%    ~     (p=0.690 n=5+5)
      Flate              238k ± 1%         238k ± 0%    ~     (p=0.151 n=5+5)
      GoParser           321k ± 0%         321k ± 1%    ~     (p=0.548 n=5+5)
      Reflect            958k ± 0%         957k ± 0%    ~     (p=0.841 n=5+5)
      Tar                252k ± 0%         252k ± 1%    ~     (p=0.151 n=5+5)
      XML                401k ± 0%         400k ± 0%    ~     (p=1.000 n=5+5)
      [Geo mean]         741k              742k       +0.08%
      
      
      Reduces object files a little bit:
      
      name        old object-bytes  new object-bytes  delta
      Template           386k ± 0%         386k ± 0%  -0.04%  (p=0.008 n=5+5)
      Unicode            202k ± 0%         202k ± 0%    ~     (all equal)
      GoTypes           1.16M ± 0%        1.16M ± 0%  -0.04%  (p=0.008 n=5+5)
      Compiler          3.91M ± 0%        3.91M ± 0%  -0.08%  (p=0.008 n=5+5)
      SSA               7.91M ± 0%        7.91M ± 0%  -0.04%  (p=0.008 n=5+5)
      Flate              228k ± 0%         227k ± 0%  -0.28%  (p=0.008 n=5+5)
      GoParser           283k ± 0%         283k ± 0%  -0.01%  (p=0.008 n=5+5)
      Reflect            952k ± 0%         951k ± 0%  -0.03%  (p=0.008 n=5+5)
      Tar                188k ± 0%         187k ± 0%  -0.09%  (p=0.008 n=5+5)
      XML                406k ± 0%         406k ± 0%  -0.04%  (p=0.008 n=5+5)
      [Geo mean]         648k              648k       -0.06%
      
      
      This was discovered in the context for the Fannkuch benchmark.
      It shrinks the number of panicindex calls in that function
      from 13 back to 9, their 1.8.1 level.
      
      It shrinks the function text a bit, from 829 to 801 bytes.
      It slows down execution a little, presumably due to alignment (?).
      
      name          old time/op  new time/op  delta
      Fannkuch11-8   2.68s ± 2%   2.74s ± 1%  +2.09%  (p=0.000 n=19+20)
      
      After this CL, 1.8.1 and tip are identical:
      
      name          old time/op  new time/op  delta
      Fannkuch11-8   2.74s ± 2%   2.74s ± 1%   ~     (p=0.301 n=20+20)
      
      Fixes #20332
      
      Change-Id: I2aeacc3e8cf2ac1ff10f36c572a27856f4f8f7c9
      Reviewed-on: https://go-review.googlesource.com/43291
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      e5bb5e39
    • Josh Bleecher Snyder's avatar
      cmd/compile: don't use statictmps for SSA-able composite literals · ee69c217
      Josh Bleecher Snyder authored
      The writebarrier test has to change.
      Now that T23 composite literals are passed to the backend,
      they get SSA'd, so writes to their fields are treated separately,
      so the relevant part of the first write to t23 is now a dead store.
      Preserve the intent of the test by splitting it up into two functions.
      
      Reduces code size a bit:
      
      name        old object-bytes  new object-bytes  delta
      Template           386k ± 0%         386k ± 0%    ~     (all equal)
      Unicode            202k ± 0%         202k ± 0%    ~     (all equal)
      GoTypes           1.16M ± 0%        1.16M ± 0%    ~     (all equal)
      Compiler          3.92M ± 0%        3.91M ± 0%  -0.19%  (p=0.008 n=5+5)
      SSA               7.91M ± 0%        7.91M ± 0%    ~     (all equal)
      Flate              228k ± 0%         228k ± 0%  -0.05%  (p=0.008 n=5+5)
      GoParser           283k ± 0%         283k ± 0%    ~     (all equal)
      Reflect            952k ± 0%         952k ± 0%  -0.06%  (p=0.008 n=5+5)
      Tar                188k ± 0%         188k ± 0%  -0.09%  (p=0.008 n=5+5)
      XML                406k ± 0%         406k ± 0%  -0.02%  (p=0.008 n=5+5)
      [Geo mean]         649k              648k       -0.04%
      
      Fixes #18872
      
      Change-Id: Ifeed0f71f13849732999aa731cc2bf40c0f0e32a
      Reviewed-on: https://go-review.googlesource.com/43154
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      ee69c217
    • Josh Bleecher Snyder's avatar
      cmd/compile: avoid checkwidth of [...] arrays · dccc653a
      Josh Bleecher Snyder authored
      Fixes #20333
      
      Change-Id: I0653cc859076f146d8ea8f5bd55cb22b0b8d987f
      Reviewed-on: https://go-review.googlesource.com/43290
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarJoe Tsai <thebrokentoaster@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      dccc653a
    • Tom Bergan's avatar
      net/http: for http2, use the priority write scheduler by default · 8f366681
      Tom Bergan authored
      Updates #18318
      
      Change-Id: Ibd4ebc7708abf87eded8da9661378b5777b8a400
      Reviewed-on: https://go-review.googlesource.com/43231
      Run-TryBot: Tom Bergan <tombergan@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      8f366681
    • Ben Shi's avatar
      cmd/internal/obj: continue to optimize ARM's constant pool · 6897030f
      Ben Shi authored
      Both Keith's https://go-review.googlesource.com/c/41612/ and
      and Ben's https://go-review.googlesource.com/c/41679/ optimized ARM's
      constant pool. But neither was complete.
      
      First, BIC was forgotten.
      1. "BIC $0xff00ff00, Reg" can be optimized to
         "BIC $0xff000000, Reg
          BIC $0x0000ff00, Reg"
      2. "BIC $0xffff00ff, Reg" can be optimized to
         "AND $0x0000ff00, Reg"
      3. "AND $0xffff00ff, Reg" can be optimized to
         "BIC $0x0000ff00, Reg"
      
      Second, break a non-ARMImmRot to the subtraction of two ARMImmRots was
      left as TODO.
      1. "ADD $0x00fffff0, Reg" can be optimized to
         "ADD $0x01000000, Reg
          SUB $0x00000010, Reg"
      2. "SUB $0x00fffff0, Reg" can be optimized to
         "SUB $0x01000000, Reg
          ADD $0x00000010, Reg"
      
      This patch fixes them and issue #19844.
      
      The go1 benchmark shows improvements.
      
      name                     old time/op    new time/op    delta
      BinaryTree17-4              41.4s ± 1%     41.7s ± 1%  +0.54%  (p=0.000 n=50+49)
      Fannkuch11-4                24.7s ± 1%     25.1s ± 0%  +1.70%  (p=0.000 n=50+49)
      FmtFprintfEmpty-4           853ns ± 1%     852ns ± 1%    ~     (p=0.833 n=50+50)
      FmtFprintfString-4         1.33µs ± 1%    1.33µs ± 1%    ~     (p=0.163 n=50+50)
      FmtFprintfInt-4            1.40µs ± 1%    1.40µs ± 0%    ~     (p=0.293 n=50+35)
      FmtFprintfIntInt-4         2.09µs ± 1%    2.08µs ± 1%  -0.39%  (p=0.000 n=50+49)
      FmtFprintfPrefixedInt-4    2.43µs ± 1%    2.43µs ± 1%    ~     (p=0.552 n=50+50)
      FmtFprintfFloat-4          4.57µs ± 1%    4.42µs ± 1%  -3.18%  (p=0.000 n=50+50)
      FmtManyArgs-4              8.62µs ± 1%    8.52µs ± 0%  -1.08%  (p=0.000 n=50+50)
      GobDecode-4                 101ms ± 1%     101ms ± 2%  +0.45%  (p=0.001 n=49+49)
      GobEncode-4                90.7ms ± 1%    91.1ms ± 2%  +0.51%  (p=0.001 n=50+50)
      Gzip-4                      4.23s ± 1%     4.21s ± 1%  -0.62%  (p=0.000 n=50+50)
      Gunzip-4                    623ms ± 1%     619ms ± 0%  -0.63%  (p=0.000 n=50+42)
      HTTPClientServer-4          721µs ± 5%     683µs ± 3%  -5.25%  (p=0.000 n=50+47)
      JSONEncode-4                251ms ± 1%     253ms ± 1%  +0.54%  (p=0.000 n=49+50)
      JSONDecode-4                941ms ± 1%     944ms ± 1%  +0.30%  (p=0.001 n=49+50)
      Mandelbrot200-4            49.3ms ± 1%    49.3ms ± 0%    ~     (p=0.918 n=50+48)
      GoParse-4                  47.1ms ± 1%    47.2ms ± 1%  +0.18%  (p=0.025 n=50+50)
      RegexpMatchEasy0_32-4      1.23µs ± 1%    1.24µs ± 1%  +0.30%  (p=0.000 n=49+50)
      RegexpMatchEasy0_1K-4      7.74µs ± 7%    7.76µs ± 5%    ~     (p=0.888 n=50+50)
      RegexpMatchEasy1_32-4      1.32µs ± 1%    1.32µs ± 1%  +0.23%  (p=0.003 n=50+50)
      RegexpMatchEasy1_1K-4      10.6µs ± 2%    10.5µs ± 3%  -1.29%  (p=0.000 n=49+50)
      RegexpMatchMedium_32-4     2.19µs ± 1%    2.10µs ± 1%  -3.79%  (p=0.000 n=49+49)
      RegexpMatchMedium_1K-4      544µs ± 0%     545µs ± 0%    ~     (p=0.123 n=41+50)
      RegexpMatchHard_32-4       28.8µs ± 0%    28.8µs ± 1%    ~     (p=0.580 n=46+50)
      RegexpMatchHard_1K-4        863µs ± 1%     865µs ± 1%  +0.31%  (p=0.027 n=47+50)
      Revcomp-4                  82.2ms ± 2%    82.3ms ± 2%    ~     (p=0.894 n=48+49)
      Template-4                  1.06s ± 1%     1.04s ± 1%  -1.18%  (p=0.000 n=50+49)
      TimeParse-4                7.25µs ± 1%    7.35µs ± 0%  +1.48%  (p=0.000 n=50+50)
      TimeFormat-4               13.3µs ± 1%    13.2µs ± 1%  -0.13%  (p=0.007 n=50+50)
      [Geo mean]                  736µs          733µs       -0.37%
      
      name                     old speed      new speed      delta
      GobDecode-4              7.60MB/s ± 1%  7.56MB/s ± 2%  -0.46%  (p=0.001 n=49+49)
      GobEncode-4              8.47MB/s ± 1%  8.42MB/s ± 2%  -0.50%  (p=0.001 n=50+50)
      Gzip-4                   4.58MB/s ± 1%  4.61MB/s ± 1%  +0.59%  (p=0.000 n=50+50)
      Gunzip-4                 31.2MB/s ± 1%  31.4MB/s ± 0%  +0.63%  (p=0.000 n=50+42)
      JSONEncode-4             7.73MB/s ± 1%  7.69MB/s ± 1%  -0.53%  (p=0.000 n=49+50)
      JSONDecode-4             2.06MB/s ± 1%  2.06MB/s ± 1%    ~     (p=0.052 n=44+50)
      GoParse-4                1.23MB/s ± 0%  1.23MB/s ± 2%    ~     (p=0.526 n=26+50)
      RegexpMatchEasy0_32-4    25.9MB/s ± 1%  25.9MB/s ± 1%  -0.30%  (p=0.000 n=49+50)
      RegexpMatchEasy0_1K-4     132MB/s ± 7%   132MB/s ± 6%    ~     (p=0.885 n=50+50)
      RegexpMatchEasy1_32-4    24.2MB/s ± 1%  24.1MB/s ± 1%  -0.22%  (p=0.003 n=50+50)
      RegexpMatchEasy1_1K-4    96.4MB/s ± 2%  97.8MB/s ± 3%  +1.36%  (p=0.000 n=50+50)
      RegexpMatchMedium_32-4    460kB/s ± 0%   476kB/s ± 1%  +3.43%  (p=0.000 n=49+50)
      RegexpMatchMedium_1K-4   1.88MB/s ± 0%  1.88MB/s ± 0%    ~     (all equal)
      RegexpMatchHard_32-4     1.11MB/s ± 0%  1.11MB/s ± 1%  +0.34%  (p=0.000 n=45+50)
      RegexpMatchHard_1K-4     1.19MB/s ± 1%  1.18MB/s ± 1%  -0.34%  (p=0.033 n=50+50)
      Revcomp-4                30.9MB/s ± 2%  30.9MB/s ± 2%    ~     (p=0.894 n=48+49)
      Template-4               1.84MB/s ± 1%  1.86MB/s ± 2%  +1.19%  (p=0.000 n=48+50)
      [Geo mean]               6.63MB/s       6.65MB/s       +0.26%
      
      
      Fixes #19844.
      
      Change-Id: I5ad16cc0b29267bb4579aca3dcc10a0b8ade1aa4
      Reviewed-on: https://go-review.googlesource.com/42430
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      6897030f
    • Daniel Martí's avatar
      reflect: remove dead v.typ assignment · 19b05acd
      Daniel Martí authored
      v is not a pointer receiver, and v.typ isn't used in the lines below.
      The assignment is dead. Remove it.
      
      Keep the comment, as it refers to the whole case block and not just the
      removed line.
      
      Change-Id: Icb2d20c287d9a41bf620ebe5cdec764cd84178a7
      Reviewed-on: https://go-review.googlesource.com/43134
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      19b05acd
  3. 10 May, 2017 15 commits
  4. 09 May, 2017 12 commits
    • Josh Bleecher Snyder's avatar
      cmd/compile: allow OpVarXXX calls to be duplicated in writebarrier blocks · 94a017f3
      Josh Bleecher Snyder authored
      OpVarXXX Values don't generate instructions,
      so there's no reason not to duplicate them,
      and duplicating them generates better code
      (fewer branches).
      
      This requires changing the start/end accounting
      to correctly handle the case in which we have run
      of Values beginning with an OpVarXXX, e.g.
      OpVarDef, OpZeroWB, OpMoveWB.
      In that case, the sequence of values should begin
      at the OpZeroWB, not the OpVarDef.
      
      This also lays the groundwork for experimenting
      with allowing duplication of some scalar stores.
      
      Shrinks function text sizes a tiny amount:
      
      name        old object-bytes  new object-bytes  delta
      Template           381k ± 0%         381k ± 0%  -0.01%  (p=0.008 n=5+5)
      Unicode            203k ± 0%         203k ± 0%  -0.04%  (p=0.008 n=5+5)
      GoTypes           1.17M ± 0%        1.17M ± 0%  -0.01%  (p=0.008 n=5+5)
      SSA               8.24M ± 0%        8.24M ± 0%  -0.00%  (p=0.008 n=5+5)
      Flate              230k ± 0%         230k ± 0%    ~     (all equal)
      GoParser           286k ± 0%         286k ± 0%    ~     (all equal)
      Reflect           1.00M ± 0%        1.00M ± 0%    ~     (all equal)
      Tar                189k ± 0%         189k ± 0%    ~     (all equal)
      XML                415k ± 0%         415k ± 0%  -0.01%  (p=0.008 n=5+5)
      
      Updates #19838
      
      Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27
      Reviewed-on: https://go-review.googlesource.com/42011
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      94a017f3
    • Ian Lance Taylor's avatar
      cmd/internal/obj, cmd/link: fix st_other field on PPC64 · 5331e7e9
      Ian Lance Taylor authored
      In PPC64 ELF files, the st_other field indicates the number of
      prologue instructions between the global and local entry points.
      We add the instructions in the compiler and assembler if -shared is used.
      We were assuming that the instructions were present when building a
      c-archive or PIE or doing dynamic linking, on the assumption that those
      are the cases where the go tool would be building with -shared.
      That assumption fails when using some other tool, such as Bazel,
      that does not necessarily use -shared in exactly the same way.
      
      This CL records in the object file whether a symbol was compiled
      with -shared (this will be the same for all symbols in a given compilation)
      and uses that information when setting the st_other field.
      
      Fixes #20290.
      
      Change-Id: Ib2b77e16aef38824871102e3c244fcf04a86c6ea
      Reviewed-on: https://go-review.googlesource.com/43051
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMichael Hudson-Doyle <michael.hudson@canonical.com>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      5331e7e9
    • Todd Neal's avatar
      cmd/compile: ignore types when considering tuple select for CSE · 08dca4c6
      Todd Neal authored
      Fixes #20097
      
      Change-Id: I3c9626ccc8cd0c46a7081ea8650b2ff07a5d4fcd
      Reviewed-on: https://go-review.googlesource.com/41505
      Run-TryBot: Todd Neal <todd@tneal.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      08dca4c6
    • Josh Bleecher Snyder's avatar
      cmd/compile: change ssa.Type into *types.Type · 46b88c9f
      Josh Bleecher Snyder authored
      When package ssa was created, Type was in package gc.
      To avoid circular dependencies, we used an interface (ssa.Type)
      to represent type information in SSA.
      
      In the Go 1.9 cycle, gri extricated the Type type from package gc.
      As a result, we can now use it in package ssa.
      Now, instead of package types depending on package ssa,
      it is the other way.
      This is a more sensible dependency tree,
      and helps compiler performance a bit.
      
      Though this is a big CL, most of the changes are
      mechanical and uninteresting.
      
      Interesting bits:
      
      * Add new singleton globals to package types for the special
        SSA types Memory, Void, Invalid, Flags, and Int128.
      * Add two new Types, TSSA for the special types,
        and TTUPLE, for SSA tuple types.
        ssa.MakeTuple is now types.NewTuple.
      * Move type comparison result constants CMPlt, CMPeq, and CMPgt
        to package types.
      * We had picked the name "types" in our rules for the handy
        list of types provided by ssa.Config. That conflicted with
        the types package name, so change it to "typ".
      * Update the type comparison routine to handle tuples and special
        types inline.
      * Teach gc/fmt.go how to print special types.
      * We can now eliminate ElemTypes in favor of just Elem,
        and probably also some other duplicated Type methods
        designed to return ssa.Type instead of *types.Type.
      * The ssa tests were using their own dummy types,
        and they were not particularly careful about types in general.
        Of necessity, this CL switches them to use *types.Type;
        it does not make them more type-accurate.
        Unfortunately, using types.Type means initializing a bit
        of the types universe.
        This is prime for refactoring and improvement.
      
      This shrinks ssa.Value; it now fits in a smaller size class
      on 64 bit systems. This doesn't have a giant impact,
      though, since most Values are preallocated in a chunk.
      
      name        old alloc/op      new alloc/op      delta
      Template         37.9MB ± 0%       37.7MB ± 0%  -0.57%  (p=0.000 n=10+8)
      Unicode          28.9MB ± 0%       28.7MB ± 0%  -0.52%  (p=0.000 n=10+10)
      GoTypes           110MB ± 0%        109MB ± 0%  -0.88%  (p=0.000 n=10+10)
      Flate            24.7MB ± 0%       24.6MB ± 0%  -0.66%  (p=0.000 n=10+10)
      GoParser         31.1MB ± 0%       30.9MB ± 0%  -0.61%  (p=0.000 n=10+9)
      Reflect          73.9MB ± 0%       73.4MB ± 0%  -0.62%  (p=0.000 n=10+8)
      Tar              25.8MB ± 0%       25.6MB ± 0%  -0.77%  (p=0.000 n=9+10)
      XML              41.2MB ± 0%       40.9MB ± 0%  -0.80%  (p=0.000 n=10+10)
      [Geo mean]       40.5MB            40.3MB       -0.68%
      
      name        old allocs/op     new allocs/op     delta
      Template           385k ± 0%         386k ± 0%    ~     (p=0.356 n=10+9)
      Unicode            343k ± 1%         344k ± 0%    ~     (p=0.481 n=10+10)
      GoTypes           1.16M ± 0%        1.16M ± 0%  -0.16%  (p=0.004 n=10+10)
      Flate              238k ± 1%         238k ± 1%    ~     (p=0.853 n=10+10)
      GoParser           320k ± 0%         320k ± 0%    ~     (p=0.720 n=10+9)
      Reflect            957k ± 0%         957k ± 0%    ~     (p=0.460 n=10+8)
      Tar                252k ± 0%         252k ± 0%    ~     (p=0.133 n=9+10)
      XML                400k ± 0%         400k ± 0%    ~     (p=0.796 n=10+10)
      [Geo mean]         428k              428k       -0.01%
      
      
      Removing all the interface calls helps non-trivially with CPU, though.
      
      name        old time/op       new time/op       delta
      Template          178ms ± 4%        173ms ± 3%  -2.90%  (p=0.000 n=94+96)
      Unicode          85.0ms ± 4%       83.9ms ± 4%  -1.23%  (p=0.000 n=96+96)
      GoTypes           543ms ± 3%        528ms ± 3%  -2.73%  (p=0.000 n=98+96)
      Flate             116ms ± 3%        113ms ± 4%  -2.34%  (p=0.000 n=96+99)
      GoParser          144ms ± 3%        140ms ± 4%  -2.80%  (p=0.000 n=99+97)
      Reflect           344ms ± 3%        334ms ± 4%  -3.02%  (p=0.000 n=100+99)
      Tar               106ms ± 5%        103ms ± 4%  -3.30%  (p=0.000 n=98+94)
      XML               198ms ± 5%        192ms ± 4%  -2.88%  (p=0.000 n=92+95)
      [Geo mean]        178ms             173ms       -2.65%
      
      name        old user-time/op  new user-time/op  delta
      Template          229ms ± 5%        224ms ± 5%  -2.36%  (p=0.000 n=95+99)
      Unicode           107ms ± 6%        106ms ± 5%  -1.13%  (p=0.001 n=93+95)
      GoTypes           696ms ± 4%        679ms ± 4%  -2.45%  (p=0.000 n=97+99)
      Flate             137ms ± 4%        134ms ± 5%  -2.66%  (p=0.000 n=99+96)
      GoParser          176ms ± 5%        172ms ± 8%  -2.27%  (p=0.000 n=98+100)
      Reflect           430ms ± 6%        411ms ± 5%  -4.46%  (p=0.000 n=100+92)
      Tar               128ms ±13%        123ms ±13%  -4.21%  (p=0.000 n=100+100)
      XML               239ms ± 6%        233ms ± 6%  -2.50%  (p=0.000 n=95+97)
      [Geo mean]        220ms             213ms       -2.76%
      
      
      Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
      Reviewed-on: https://go-review.googlesource.com/42145
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      46b88c9f
    • Josh Bleecher Snyder's avatar
      cmd/compile: add boolean simplification rules · 6a24b2d0
      Josh Bleecher Snyder authored
      These collectively fire a few hundred times during make.bash,
      mostly rewriting XOR SETNE -> SETEQ.
      
      Fixes #17905.
      
      Change-Id: Ic5eb241ee93ed67099da3de11f59e4df9fab64a3
      Reviewed-on: https://go-review.googlesource.com/42491
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      6a24b2d0
    • Marvin Stenger's avatar
      cmd/compile/internal/ssa: mark boolean instructions commutative · 9aeced65
      Marvin Stenger authored
      Mark AndB, OrB, EqB, and NeqB as commutative.
      
      Change-Id: Ife7cfcb9780cc5dd669617cb52339ab336667da4
      Reviewed-on: https://go-review.googlesource.com/42515Reviewed-by: 's avatarGiovanni Bajo <rasky@develer.com>
      Reviewed-by: 's avatarJosh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      9aeced65
    • Josh Bleecher Snyder's avatar
      cmd/compile: make builds reproducible in presence of **byte and **int8 · 6f2ee0f3
      Josh Bleecher Snyder authored
      CL 39915 introduced sorting of signats by ShortString
      for reproducible builds. But ShortString treats types
      byte and uint8 identically; same for rune and uint32.
      CL 39915 attempted to compensate for this by only
      adding the underlying type (uint8) to signats in addsignat.
      
      This only works for byte and uint8. For e.g. *byte and *uint,
      both get added, and their sort order is random,
      leading to non-reproducible builds.
      
      One fix would be to add yet another type printing mode
      that doesn't eliminate byte and rune, and use it
      for sorting signats. But the formatting routines
      are complicated enough as it is.
      
      Instead, just sort first by ShortString and then by String.
      We can't just use String, because ShortString makes distinctions
      that String doesn't. ShortString is really preferred here;
      String is serving only as a backstop for handling of bytes and runes.
      
      The long series of types in the test helps increase the odds of
      failure, allowing a smaller number of iterations in the test.
      On my machine, a full test takes 700ms.
      
      Passes toolstash-check.
      
      Updates #19961
      Fixes #20272
      
      name        old alloc/op      new alloc/op      delta
      Template         37.9MB ± 0%       37.9MB ± 0%  +0.12%  (p=0.032 n=5+5)
      Unicode          28.9MB ± 0%       28.9MB ± 0%    ~     (p=0.841 n=5+5)
      GoTypes           110MB ± 0%        110MB ± 0%    ~     (p=0.841 n=5+5)
      Compiler          463MB ± 0%        463MB ± 0%    ~     (p=0.056 n=5+5)
      SSA              1.11GB ± 0%       1.11GB ± 0%  +0.02%  (p=0.016 n=5+5)
      Flate            24.7MB ± 0%       24.8MB ± 0%  +0.14%  (p=0.032 n=5+5)
      GoParser         31.1MB ± 0%       31.1MB ± 0%    ~     (p=0.421 n=5+5)
      Reflect          73.9MB ± 0%       73.9MB ± 0%    ~     (p=1.000 n=5+5)
      Tar              25.8MB ± 0%       25.8MB ± 0%  +0.15%  (p=0.016 n=5+5)
      XML              41.2MB ± 0%       41.2MB ± 0%    ~     (p=0.310 n=5+5)
      [Geo mean]       72.0MB            72.0MB       +0.07%
      
      name        old allocs/op     new allocs/op     delta
      Template           384k ± 0%         385k ± 1%    ~     (p=0.056 n=5+5)
      Unicode            343k ± 0%         344k ± 0%    ~     (p=0.548 n=5+5)
      GoTypes           1.16M ± 0%        1.16M ± 0%    ~     (p=0.421 n=5+5)
      Compiler          4.43M ± 0%        4.44M ± 0%  +0.26%  (p=0.032 n=5+5)
      SSA               9.86M ± 0%        9.87M ± 0%  +0.10%  (p=0.032 n=5+5)
      Flate              237k ± 1%         238k ± 0%  +0.49%  (p=0.032 n=5+5)
      GoParser           319k ± 1%         320k ± 1%    ~     (p=0.151 n=5+5)
      Reflect            957k ± 0%         957k ± 0%    ~     (p=1.000 n=5+5)
      Tar                251k ± 0%         252k ± 1%  +0.49%  (p=0.016 n=5+5)
      XML                399k ± 0%         401k ± 1%    ~     (p=0.310 n=5+5)
      [Geo mean]         739k              741k       +0.26%
      
      Change-Id: Ic27995a8d374d012b8aca14546b1df9d28d30df7
      Reviewed-on: https://go-review.googlesource.com/42955
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      6f2ee0f3
    • Josh Bleecher Snyder's avatar
      cmd/compile: make "imported and not used" errors deterministic · 9fda4df9
      Josh Bleecher Snyder authored
      If there were more unused imports than
      the maximum default number of errors to report,
      the set of reported imports was non-deterministic.
      
      Fix by accumulating and sorting them prior to output.
      
      Fixes #20298
      
      Change-Id: Ib3d5a15fd7dc40009523fcdc1b93ddc62a1b05f2
      Reviewed-on: https://go-review.googlesource.com/42954
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      9fda4df9
    • Cherry Zhang's avatar
      cmd/internal/obj/arm64, cmd/compile: improve offset folding on ARM64 · fb0ccc5d
      Cherry Zhang authored
      ARM64 assembler backend only accepts loads and stores with small
      or aligned offset. The compiler therefore can only fold small or
      aligned offsets into loads and stores. For locals and args, their
      offsets to SP are not known until very late, and the compiler
      makes conservative decision not folding some of them. However,
      in most cases, the offset is indeed small or aligned, and can
      be folded into load and store (but actually not).
      
      This CL adds support of loads and stores with large and unaligned
      offsets. When the offset doesn't fit into the instruction, it
      uses two instructions and (for very large offset) the constant
      pool. This way, the compiler doesn't need to be conservative,
      and can simply fold the offset.
      
      To make it work, the assembler's optab matching rules need to be
      changed. Before, MOVD accepts C_UAUTO32K which matches multiple
      of 8 between 0 and 32K, and also C_UAUTO16K, which may not be
      multiple of 8 and does not fit into MOVD instruction. The
      assembler errors in the latter case. This change makes it only
      matches multiple of 8 (or offsets within ±256, which also fits
      in instruction), and uses the large-or-unaligned-offset rule
      for things doesn't fit (without error). Other sized move rules
      are changed similarly.
      
      Class C_UAUTO64K and C_UOREG64K are removed, as they are never
      used.
      
      In shared library, load/store of global is rewritten to using
      GOT and temp register, which conflicts with the use of temp
      register for assembling large offset. So the folding is disabled
      for globals in shared library mode.
      
      Reduce cmd/go binary size by 2%.
      
      name                     old time/op    new time/op    delta
      BinaryTree17-8              8.67s ± 0%     8.61s ± 0%   -0.60%  (p=0.000 n=9+10)
      Fannkuch11-8                6.24s ± 0%     6.19s ± 0%   -0.83%  (p=0.000 n=10+9)
      FmtFprintfEmpty-8           116ns ± 0%     116ns ± 0%     ~     (all equal)
      FmtFprintfString-8          196ns ± 0%     192ns ± 0%   -1.89%  (p=0.000 n=10+10)
      FmtFprintfInt-8             199ns ± 0%     198ns ± 0%   -0.35%  (p=0.001 n=9+10)
      FmtFprintfIntInt-8          294ns ± 0%     293ns ± 0%   -0.34%  (p=0.000 n=8+8)
      FmtFprintfPrefixedInt-8     318ns ± 1%     318ns ± 1%     ~     (p=1.000 n=10+10)
      FmtFprintfFloat-8           537ns ± 0%     531ns ± 0%   -1.17%  (p=0.000 n=9+10)
      FmtManyArgs-8              1.19µs ± 1%    1.18µs ± 1%   -1.41%  (p=0.001 n=10+10)
      GobDecode-8                17.2ms ± 1%    17.3ms ± 2%     ~     (p=0.165 n=10+10)
      GobEncode-8                14.7ms ± 1%    14.7ms ± 2%     ~     (p=0.631 n=10+10)
      Gzip-8                      837ms ± 0%     836ms ± 0%   -0.14%  (p=0.006 n=9+10)
      Gunzip-8                    141ms ± 0%     139ms ± 0%   -1.24%  (p=0.000 n=9+10)
      HTTPClientServer-8          256µs ± 1%     253µs ± 1%   -1.35%  (p=0.000 n=10+10)
      JSONEncode-8               40.1ms ± 1%    41.3ms ± 1%   +3.06%  (p=0.000 n=10+9)
      JSONDecode-8                157ms ± 1%     156ms ± 1%   -0.83%  (p=0.001 n=9+8)
      Mandelbrot200-8            8.94ms ± 0%    8.94ms ± 0%   +0.02%  (p=0.000 n=9+9)
      GoParse-8                  8.69ms ± 0%    8.54ms ± 1%   -1.69%  (p=0.000 n=8+10)
      RegexpMatchEasy0_32-8       227ns ± 1%     228ns ± 1%   +0.48%  (p=0.016 n=10+9)
      RegexpMatchEasy0_1K-8      1.92µs ± 0%    1.63µs ± 0%  -15.08%  (p=0.000 n=10+9)
      RegexpMatchEasy1_32-8       256ns ± 0%     251ns ± 0%   -2.19%  (p=0.000 n=10+9)
      RegexpMatchEasy1_1K-8      2.38µs ± 0%    2.09µs ± 0%  -12.49%  (p=0.000 n=10+9)
      RegexpMatchMedium_32-8      352ns ± 0%     354ns ± 0%   +0.39%  (p=0.002 n=10+9)
      RegexpMatchMedium_1K-8      106µs ± 0%     106µs ± 0%   -0.05%  (p=0.005 n=10+9)
      RegexpMatchHard_32-8       5.92µs ± 0%    5.89µs ± 0%   -0.40%  (p=0.000 n=9+8)
      RegexpMatchHard_1K-8        180µs ± 0%     179µs ± 0%   -0.14%  (p=0.000 n=10+9)
      Revcomp-8                   1.20s ± 0%     1.13s ± 0%   -6.29%  (p=0.000 n=9+8)
      Template-8                  159ms ± 1%     154ms ± 1%   -3.14%  (p=0.000 n=9+10)
      TimeParse-8                 800ns ± 3%     769ns ± 1%   -3.91%  (p=0.000 n=10+10)
      TimeFormat-8                826ns ± 2%     817ns ± 2%   -1.04%  (p=0.050 n=10+10)
      [Geo mean]                  145µs          143µs        -1.79%
      
      Change-Id: I5fc42087cee9b54ea414f8ef6d6d020b80eb5985
      Reviewed-on: https://go-review.googlesource.com/42172
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      fb0ccc5d
    • Josh Bleecher Snyder's avatar
      cmd/go: enable concurrent backend compilation by default · 5e0bcb38
      Josh Bleecher Snyder authored
      It can be disabled by setting the environment variable
      GO19CONCURRENTCOMPILATION=0, or with -gcflags=-c=1.
      
      Fixes #15756.
      
      Change-Id: I7acbf16330512b62ee14ecbab1f46b53ec5a67b6
      Reviewed-on: https://go-review.googlesource.com/41820
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      5e0bcb38
    • Josh Bleecher Snyder's avatar
      cmd/go: add support for concurrent backend compilation · f4e5bd48
      Josh Bleecher Snyder authored
      It is disabled by default.
      It can be enabled by setting the environment variable
      GO19CONCURRENTCOMPILATION=1.
      
      Benchmarking results are presented in a grid.
      Columns are different values of c (compiler backend concurrency);
      rows are different values of p (process concurrency).
      
      'go build -a std cmd', a 4 core raspberry pi 3:
      
                  c=1        c=2        c=4
      StdCmd/p=1  504s ± 2%  413s ± 4%  367s ± 3%
      StdCmd/p=2  314s ± 3%  266s ± 4%  267s ± 4%
      StdCmd/p=4  254s ± 5%  241s ± 5%  238s ± 6%
      
      'go build -a std cmd', an 8 core darwin/amd64 laptop:
      
                  c=1         c=2         c=4         c=6         c=8
      StdCmd/p=1  40.4s ± 7%  31.0s ± 1%  27.3s ± 1%  27.8s ± 0%  27.7s ± 0%
      StdCmd/p=2  21.9s ± 1%  17.9s ± 1%  16.9s ± 1%  17.0s ± 1%  17.2s ± 0%
      StdCmd/p=4  17.4s ± 2%  14.5s ± 2%  13.3s ± 2%  13.5s ± 2%  13.6s ± 2%
      StdCmd/p=6  16.9s ± 1%  14.2s ± 2%  13.1s ± 2%  13.2s ± 2%  13.3s ± 2%
      StdCmd/p=8  16.7s ± 2%  14.2s ± 2%  13.2s ± 3%  13.2s ± 2%  13.4s ± 2%
      
      'go build -a std cmd', a 96 core arm64 server:
      
                   c=1         c=2         c=4         c=6         c=8         c=16        c=32        c=64        c=96
      StdCmd/p=1    173s ± 1%   133s ± 1%   114s ± 1%   109s ± 1%   106s ± 0%   106s ± 1%   107s ± 1%   110s ± 1%   113s ± 1%
      StdCmd/p=2   94.2s ± 2%  71.5s ± 1%  61.7s ± 1%  58.7s ± 1%  57.5s ± 2%  56.9s ± 1%  58.0s ± 1%  59.6s ± 1%  61.0s ± 1%
      StdCmd/p=4   74.1s ± 2%  53.5s ± 1%  43.7s ± 2%  40.5s ± 1%  39.2s ± 2%  38.9s ± 2%  39.5s ± 3%  40.3s ± 2%  40.8s ± 1%
      StdCmd/p=6   69.3s ± 1%  50.2s ± 2%  40.3s ± 2%  37.3s ± 3%  36.0s ± 3%  35.3s ± 2%  36.0s ± 2%  36.8s ± 2%  37.5s ± 2%
      StdCmd/p=8   66.1s ± 2%  47.7s ± 2%  38.6s ± 2%  35.7s ± 2%  34.4s ± 1%  33.6s ± 2%  34.2s ± 2%  34.6s ± 1%  35.0s ± 1%
      StdCmd/p=16  63.4s ± 2%  45.3s ± 2%  36.3s ± 2%  33.3s ± 2%  32.0s ± 3%  31.6s ± 2%  32.1s ± 2%  32.5s ± 2%  32.7s ± 2%
      StdCmd/p=32  62.2s ± 1%  44.2s ± 2%  35.3s ± 2%  32.4s ± 2%  31.2s ± 2%  30.9s ± 2%  31.1s ± 2%  31.7s ± 2%  32.0s ± 2%
      StdCmd/p=64  62.2s ± 1%  44.3s ± 2%  35.4s ± 2%  32.4s ± 2%  31.2s ± 2%  30.9s ± 2%  31.2s ± 2%  31.8s ± 3%  32.2s ± 3%
      StdCmd/p=96  62.2s ± 2%  44.4s ± 2%  35.3s ± 2%  32.3s ± 2%  31.1s ± 2%  30.9s ± 3%  31.3s ± 2%  31.7s ± 1%  32.1s ± 2%
      
      benchjuju, an 8 core darwin/amd64 laptop:
      
                     c=1         c=2         c=4         c=6         c=8
      BuildJuju/p=1  55.3s ± 0%  46.3s ± 0%  41.9s ± 0%  41.4s ± 1%  41.3s ± 0%
      BuildJuju/p=2  33.7s ± 1%  28.4s ± 1%  26.7s ± 1%  26.6s ± 1%  26.8s ± 1%
      BuildJuju/p=4  24.7s ± 1%  22.3s ± 1%  21.4s ± 1%  21.7s ± 1%  21.8s ± 1%
      BuildJuju/p=6  20.6s ± 1%  19.3s ± 2%  19.4s ± 1%  19.7s ± 1%  19.9s ± 1%
      BuildJuju/p=8  20.6s ± 2%  19.5s ± 2%  19.3s ± 2%  19.6s ± 1%  19.8s ± 2%
      
      Updates #15756
      
      Change-Id: I8a56e88953071a05eee764002024c54cd888a56c
      Reviewed-on: https://go-review.googlesource.com/41819
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      f4e5bd48
    • Robert Griesemer's avatar
      spec: clarify unsafe.Pointer conversions · 86f5f7fd
      Robert Griesemer authored
      A pointer type of underlying type unsafe.Pointer can be used in
      unsafe conversions. Document unfortunate status quo.
      
      Fixes #19306.
      
      Change-Id: I28172508a200561f8df366bbf2c2807ef3b48c97
      Reviewed-on: https://go-review.googlesource.com/42132Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      86f5f7fd