1. 03 Mar, 2018 2 commits
    • Keith Randall's avatar
      internal/bytealg: move equal functions to bytealg · 1dfa380e
      Keith Randall authored
      Move bytes.Equal, runtime.memequal, and runtime.memequal_varlen
      to the bytealg package.
      
      Update #19792
      
      Change-Id: Ic4175e952936016ea0bda6c7c3dbb33afdc8e4ac
      Reviewed-on: https://go-review.googlesource.com/98355
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      1dfa380e
    • Joe Tsai's avatar
      encoding/json: use sync.Map for field cache · f0756ca2
      Joe Tsai authored
      The previous type cache is quadratic in time in the situation where
      new types are continually encountered. Now that it is possible to dynamically
      create new types with the reflect package, this can cause json to
      perform very poorly.
      
      Switch to sync.Map which does well when the cache has hit steady state,
      but also handles occasional updates in better than quadratic time.
      
      benchmark                                     old ns/op      new ns/op     delta
      BenchmarkTypeFieldsCache/MissTypes1-8         14817          16202         +9.35%
      BenchmarkTypeFieldsCache/MissTypes10-8        70926          69144         -2.51%
      BenchmarkTypeFieldsCache/MissTypes100-8       976467         208973        -78.60%
      BenchmarkTypeFieldsCache/MissTypes1000-8      79520162       1750371       -97.80%
      BenchmarkTypeFieldsCache/MissTypes10000-8     6873625837     16847806      -99.75%
      BenchmarkTypeFieldsCache/HitTypes1000-8       7.51           8.80          +17.18%
      BenchmarkTypeFieldsCache/HitTypes10000-8      7.58           8.68          +14.51%
      
      The old implementation takes 12 minutes just to build a cache of size 1e5
      due to the quadratic behavior. I did not bother benchmark sizes above that.
      
      Change-Id: I5e6facc1eb8e1b80e5ca285e4dd2cc8815618dad
      Reviewed-on: https://go-review.googlesource.com/76850
      Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
      Reviewed-by: 's avatarBryan Mills <bcmills@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      f0756ca2
  2. 02 Mar, 2018 16 commits
    • Shamil Garatuev's avatar
      internal/syscall/windows/registry: improve ReadSubKeyNames permissions · e658b85f
      Shamil Garatuev authored
      Make ReadSubKeyNames work even if key is opened with only
      ENUMERATE_SUB_KEYs access rights mask.
      
      Fixes #23869
      
      Change-Id: I138bd51715fdbc3bda05607c64bde1150f4fe6b2
      Reviewed-on: https://go-review.googlesource.com/97435Reviewed-by: 's avatarAlex Brainman <alex.brainman@gmail.com>
      e658b85f
    • Keith Randall's avatar
      internal/bytealg: move IndexByte asssembly to the new bytealg package · 403ab0f2
      Keith Randall authored
      Move the IndexByte function from the runtime to a new bytealg package.
      The new package will eventually hold all the optimized assembly for
      groveling through byte slices and strings. It seems a better home for
      this code than randomly keeping it in runtime.
      
      Once this is in, the next step is to move the other functions
      (Compare, Equal, ...).
      
      Update #19792
      
      This change seems complicated enough that we might just declare
      "not worth it" and abandon.  Opinions welcome.
      
      The core assembly is all unchanged, except minor modifications where
      the code reads cpu feature bits.
      
      The wrapper functions have been cleaned up as they are now actually
      checked by vet.
      
      Change-Id: I9fa75bee5d85db3a65b3fd3b7997e60367523796
      Reviewed-on: https://go-review.googlesource.com/98016
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      403ab0f2
    • Brad Fitzpatrick's avatar
      net: skip flaky TestLookupLongTXT for now · dcedcaa5
      Brad Fitzpatrick authored
      Flaky tests failing trybots help nobody.
      
      Updates #22857
      
      Change-Id: I87bc018651ab4fe02560a6d24c08a1d7ccd8ba37
      Reviewed-on: https://go-review.googlesource.com/97416Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      dcedcaa5
    • Damien Mathieu's avatar
      net/http: lock the read-only mutex in shouldRedirect · 2fd1b523
      Damien Mathieu authored
      Since that method uses 'mux.m', we need to lock the mutex to avoid data races.
      
      Change-Id: I998448a6e482b5d6a1b24f3354bb824906e23172
      GitHub-Last-Rev: 163a7d4942e793b328e05a7eb91f6d3fdc4ba12b
      GitHub-Pull-Request: golang/go#23994
      Reviewed-on: https://go-review.googlesource.com/96575Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      2fd1b523
    • David du Colombier's avatar
      cmd/compile: skip TestEmptyDwarfRanges on Plan 9 · 1c9297c3
      David du Colombier authored
      TestEmptyDwarfRanges has been added in CL 94816.
      This test is failing on Plan 9 because executables
      don't have a DWARF symbol table.
      
      Fixes #24226.
      
      Change-Id: Iff7e34b8c2703a2f19ee8087a4d64d0bb98496cd
      Reviewed-on: https://go-review.googlesource.com/98275Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      1c9297c3
    • Hana Kim's avatar
      internal/trace: Revert "remove backlinks from span/task end to start" · d3562c9d
      Hana Kim authored
      This reverts commit 16398894.
      This broke TestUserTaskSpan test.
      
      Change-Id: If5ff8bdfe84e8cb30787b03ead87205ece3d5601
      Reviewed-on: https://go-review.googlesource.com/98235Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      d3562c9d
    • Hana Kim's avatar
      internal/trace: remove backlinks from span/task end to start · 16398894
      Hana Kim authored
      Even though undocumented, the assumption is the Event's link field
      points to the following event in the future. The new span/task event
      processing breaks the assumption.
      
      Change-Id: I4ce2f30c67c4f525ec0a121a7e43d8bdd2ec3f77
      Reviewed-on: https://go-review.googlesource.com/96395Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      16398894
    • Alberto Donizetti's avatar
      test/codegen: add copyright headers to new codegen files · 644b2daf
      Alberto Donizetti authored
      Change-Id: I9fe6572d1043ef9ee09c0925059ded554ad24c6b
      Reviewed-on: https://go-review.googlesource.com/98215Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      644b2daf
    • Michael Fraenkel's avatar
      cmd/compile: convert type during finishcompare · 5b071bfa
      Michael Fraenkel authored
      When recursively calling walkexpr, r.Type is still the untyped value.
      It then sometimes recursively calls finishcompare, which complains that
      you can't compare the resulting expression to that untyped value.
      
      Updates #23834.
      
      Change-Id: I6b7acd3970ceaff8da9216bfa0ae24aca5dee828
      Reviewed-on: https://go-review.googlesource.com/97856Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      5b071bfa
    • Than McIntosh's avatar
      cmd/compile: add DWARF register mappings for ARM64. · 9b95611e
      Than McIntosh authored
      Add DWARF register mappings for ARM64, so that that arch will become
      usable with "-dwarflocationlists". [NB: I've plugged in a set of
      numbers from the doc, but this will require additional manual testing.]
      
      Change-Id: Id9aa63857bc8b4f5c825f49274101cf372e9e856
      Reviewed-on: https://go-review.googlesource.com/82515Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      9b95611e
    • Alessandro Arzilli's avatar
      cmd/link: fix up debug_range for dsymutil (revert CL 72371) · eca41af0
      Alessandro Arzilli authored
      Dsymutil, an utility used on macOS when externally linking executables,
      does not support base address selector entries in debug_ranges.
      
      CL 73271 worked around this problem by removing base address selectors
      and emitting CU-relative relocations for each list entry.
      
      This commit, as an optimization, reintroduces the base address
      selectors and changes the linker to remove them again, but only when it
      knows that it will have to invoke the external linker on macOS.
      
      Compilecmp comparing master with a branch that has scope tracking
      always enabled:
      
      completed   15 of   15, estimated time remaining 0s (eta 2:43PM)
      name        old time/op       new time/op       delta
      Template          272ms ± 8%        257ms ± 5%  -5.33%  (p=0.000 n=15+14)
      Unicode           124ms ± 7%        122ms ± 5%    ~     (p=0.210 n=14+14)
      GoTypes           873ms ± 3%        870ms ± 5%    ~     (p=0.856 n=15+13)
      Compiler          4.49s ± 2%        4.49s ± 5%    ~     (p=0.982 n=14+14)
      SSA               11.8s ± 4%        11.8s ± 3%    ~     (p=0.653 n=15+15)
      Flate             163ms ± 6%        164ms ± 9%    ~     (p=0.914 n=14+15)
      GoParser          203ms ± 6%        202ms ±10%    ~     (p=0.571 n=14+14)
      Reflect           547ms ± 7%        542ms ± 4%    ~     (p=0.914 n=15+14)
      Tar               244ms ± 7%        237ms ± 3%  -2.80%  (p=0.002 n=14+13)
      XML               289ms ± 6%        289ms ± 5%    ~     (p=0.839 n=14+14)
      [Geo mean]        537ms             531ms       -1.10%
      
      name        old user-time/op  new user-time/op  delta
      Template          360ms ± 4%        341ms ± 7%  -5.16%  (p=0.000 n=14+14)
      Unicode           189ms ±11%        190ms ± 8%    ~     (p=0.844 n=15+15)
      GoTypes           1.13s ± 4%        1.14s ± 7%    ~     (p=0.582 n=15+14)
      Compiler          5.34s ± 2%        5.40s ± 4%  +1.19%  (p=0.036 n=11+13)
      SSA               14.7s ± 2%        14.7s ± 3%    ~     (p=0.602 n=15+15)
      Flate             211ms ± 7%        214ms ± 8%    ~     (p=0.252 n=14+14)
      GoParser          267ms ±12%        266ms ± 2%    ~     (p=0.837 n=15+11)
      Reflect           706ms ± 4%        701ms ± 3%    ~     (p=0.213 n=14+12)
      Tar               331ms ± 9%        320ms ± 5%  -3.30%  (p=0.025 n=15+14)
      XML               378ms ± 4%        373ms ± 6%    ~     (p=0.253 n=14+15)
      [Geo mean]        704ms             700ms       -0.58%
      
      name        old alloc/op      new alloc/op      delta
      Template         38.0MB ± 0%       38.4MB ± 0%  +1.12%  (p=0.000 n=15+15)
      Unicode          28.8MB ± 0%       28.8MB ± 0%  +0.17%  (p=0.000 n=15+15)
      GoTypes           112MB ± 0%        114MB ± 0%  +1.47%  (p=0.000 n=15+15)
      Compiler          465MB ± 0%        473MB ± 0%  +1.71%  (p=0.000 n=15+15)
      SSA              1.48GB ± 0%       1.53GB ± 0%  +3.07%  (p=0.000 n=15+15)
      Flate            24.3MB ± 0%       24.7MB ± 0%  +1.67%  (p=0.000 n=15+15)
      GoParser         30.7MB ± 0%       31.0MB ± 0%  +1.15%  (p=0.000 n=12+15)
      Reflect          76.3MB ± 0%       77.1MB ± 0%  +0.97%  (p=0.000 n=15+15)
      Tar              39.2MB ± 0%       39.6MB ± 0%  +0.91%  (p=0.000 n=15+15)
      XML              41.5MB ± 0%       42.0MB ± 0%  +1.29%  (p=0.000 n=15+15)
      [Geo mean]       77.5MB            78.6MB       +1.35%
      
      name        old allocs/op     new allocs/op     delta
      Template           385k ± 0%         387k ± 0%  +0.51%  (p=0.000 n=15+15)
      Unicode            342k ± 0%         343k ± 0%  +0.10%  (p=0.000 n=14+15)
      GoTypes           1.19M ± 0%        1.19M ± 0%  +0.62%  (p=0.000 n=15+15)
      Compiler          4.51M ± 0%        4.54M ± 0%  +0.50%  (p=0.000 n=14+15)
      SSA               12.2M ± 0%        12.4M ± 0%  +1.12%  (p=0.000 n=14+15)
      Flate              234k ± 0%         236k ± 0%  +0.60%  (p=0.000 n=15+15)
      GoParser           318k ± 0%         320k ± 0%  +0.60%  (p=0.000 n=15+15)
      Reflect            974k ± 0%         977k ± 0%  +0.27%  (p=0.000 n=15+15)
      Tar                395k ± 0%         397k ± 0%  +0.37%  (p=0.000 n=14+15)
      XML                404k ± 0%         407k ± 0%  +0.53%  (p=0.000 n=15+15)
      [Geo mean]         794k              798k       +0.52%
      
      name        old text-bytes    new text-bytes    delta
      HelloSize         680kB ± 0%        680kB ± 0%    ~     (all equal)
      
      name        old data-bytes    new data-bytes    delta
      HelloSize        9.62kB ± 0%       9.62kB ± 0%    ~     (all equal)
      
      name        old bss-bytes     new bss-bytes     delta
      HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize        1.11MB ± 0%       1.13MB ± 0%  +1.85%  (p=0.000 n=15+15)
      
      Change-Id: I61c98ba0340cb798034b2bb55e3ab3a58ac1cf23
      Reviewed-on: https://go-review.googlesource.com/98075Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      eca41af0
    • Heschi Kreinick's avatar
      cmd/compile/internal/ssa: batch up all zero-width instructions · 9dc351be
      Heschi Kreinick authored
      When generating location lists, batch up changes for all zero-width
      instructions, not just phis. This prevents the creation of location list
      entries that don't actually cover any instructions.
      
      This isn't perfect because of the caveats in the prior CL (Copy is
      zero-width sometimes) but in practice this seems to fix all of the empty
      lists in std.
      
      Change-Id: Ice4a9ade36b6b24ca111d1494c414eec96e5af25
      Reviewed-on: https://go-review.googlesource.com/97958
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      9dc351be
    • Heschi Kreinick's avatar
      cmd/compile/internal/ssa: note zero-width Ops · caa1b4af
      Heschi Kreinick authored
      Add a bool to opInfo to indicate if an Op never results in any
      instructions. This is a conservative approximation: some operations,
      like Copy, may or may not generate code depending on their arguments.
      
      I built the list by reading each arch's ssaGenValue function. Hopefully
      I got them all.
      
      Change-Id: I130b251b65f18208294e129bb7ddc3f91d57d31d
      Reviewed-on: https://go-review.googlesource.com/97957Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      caa1b4af
    • Zhou Peng's avatar
      runtime: fix typo, func comments should start with function name · b77aad08
      Zhou Peng authored
      Change-Id: I289af4884583537639800e37928c22814d38cba9
      Reviewed-on: https://go-review.googlesource.com/98115Reviewed-by: 's avatarAlberto Donizetti <alb.donizetti@gmail.com>
      b77aad08
    • Alessandro Arzilli's avatar
      cmd/compile: optimize scope tracking · 3fca7306
      Alessandro Arzilli authored
      1. Detect and remove the markers of lexical scopes that don't contain
      any variables early in noder, instead of waiting until the end of DWARF
      generation.
      This saves memory by never allocating some of the markers and optimizes
      some of the algorithms that depend on the number of scopes.
      
      2. Assign scopes to Progs by doing, for each Prog, a binary search over
      the markers array. This is faster, compared to sorting the Prog list
      because there are fewer markers than there are Progs.
      
      completed   15 of   15, estimated time remaining 0s (eta 2:30PM)
      name        old time/op       new time/op       delta
      Template          274ms ± 5%        260ms ± 6%  -4.91%  (p=0.000 n=15+15)
      Unicode           126ms ± 5%        127ms ± 9%    ~     (p=0.856 n=13+15)
      GoTypes           861ms ± 5%        857ms ± 4%    ~     (p=0.595 n=15+15)
      Compiler          4.11s ± 4%        4.12s ± 5%    ~     (p=1.000 n=15+15)
      SSA               10.7s ± 2%        10.9s ± 4%  +2.01%  (p=0.002 n=14+14)
      Flate             163ms ± 4%        166ms ± 9%    ~     (p=0.134 n=14+15)
      GoParser          203ms ± 4%        205ms ± 6%    ~     (p=0.461 n=15+15)
      Reflect           544ms ± 5%        549ms ± 4%    ~     (p=0.174 n=15+15)
      Tar               249ms ± 9%        245ms ± 6%    ~     (p=0.285 n=15+15)
      XML               286ms ± 4%        291ms ± 5%    ~     (p=0.081 n=15+15)
      [Geo mean]        528ms             529ms       +0.14%
      
      name        old user-time/op  new user-time/op  delta
      Template          358ms ± 7%        354ms ± 5%    ~     (p=0.242 n=14+15)
      Unicode           189ms ±11%        191ms ±10%    ~     (p=0.438 n=15+15)
      GoTypes           1.15s ± 4%        1.14s ± 3%    ~     (p=0.405 n=15+15)
      Compiler          5.36s ± 6%        5.35s ± 5%    ~     (p=0.588 n=15+15)
      SSA               14.6s ± 3%        15.0s ± 4%  +2.58%  (p=0.000 n=15+15)
      Flate             214ms ±12%        216ms ± 8%    ~     (p=0.539 n=15+15)
      GoParser          267ms ± 6%        270ms ± 5%    ~     (p=0.569 n=15+15)
      Reflect           712ms ± 5%        709ms ± 4%    ~     (p=0.894 n=15+15)
      Tar               329ms ± 8%        330ms ± 5%    ~     (p=0.974 n=14+15)
      XML               371ms ± 3%        381ms ± 5%  +2.85%  (p=0.002 n=13+15)
      [Geo mean]        705ms             709ms       +0.62%
      
      name        old alloc/op      new alloc/op      delta
      Template         38.0MB ± 0%       38.4MB ± 0%  +1.27%  (p=0.000 n=15+14)
      Unicode          28.8MB ± 0%       28.8MB ± 0%  +0.16%  (p=0.000 n=15+14)
      GoTypes           112MB ± 0%        114MB ± 0%  +1.64%  (p=0.000 n=15+15)
      Compiler          465MB ± 0%        474MB ± 0%  +1.91%  (p=0.000 n=15+15)
      SSA              1.48GB ± 0%       1.53GB ± 0%  +3.32%  (p=0.000 n=15+15)
      Flate            24.3MB ± 0%       24.8MB ± 0%  +1.77%  (p=0.000 n=14+15)
      GoParser         30.7MB ± 0%       31.1MB ± 0%  +1.27%  (p=0.000 n=15+15)
      Reflect          76.3MB ± 0%       77.1MB ± 0%  +1.03%  (p=0.000 n=15+15)
      Tar              39.2MB ± 0%       39.6MB ± 0%  +1.02%  (p=0.000 n=13+15)
      XML              41.5MB ± 0%       42.1MB ± 0%  +1.45%  (p=0.000 n=15+15)
      [Geo mean]       77.5MB            78.7MB       +1.48%
      
      name        old allocs/op     new allocs/op     delta
      Template           385k ± 0%         387k ± 0%  +0.54%  (p=0.000 n=15+15)
      Unicode            342k ± 0%         343k ± 0%  +0.10%  (p=0.000 n=15+15)
      GoTypes           1.19M ± 0%        1.19M ± 0%  +0.64%  (p=0.000 n=14+15)
      Compiler          4.51M ± 0%        4.54M ± 0%  +0.53%  (p=0.000 n=15+15)
      SSA               12.2M ± 0%        12.4M ± 0%  +1.16%  (p=0.000 n=15+15)
      Flate              234k ± 0%         236k ± 0%  +0.63%  (p=0.000 n=14+15)
      GoParser           318k ± 0%         320k ± 0%  +0.63%  (p=0.000 n=15+15)
      Reflect            974k ± 0%         977k ± 0%  +0.28%  (p=0.000 n=15+15)
      Tar                395k ± 0%         397k ± 0%  +0.38%  (p=0.000 n=15+13)
      XML                404k ± 0%         407k ± 0%  +0.55%  (p=0.000 n=15+15)
      [Geo mean]         794k              799k       +0.55%
      
      name        old text-bytes    new text-bytes    delta
      HelloSize         680kB ± 0%        680kB ± 0%    ~     (all equal)
      
      name        old data-bytes    new data-bytes    delta
      HelloSize        9.62kB ± 0%       9.62kB ± 0%    ~     (all equal)
      
      name        old bss-bytes     new bss-bytes     delta
      HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize        1.11MB ± 0%       1.12MB ± 0%  +1.11%  (p=0.000 n=15+15)
      
      Change-Id: I95a0173ee28c52be1a4851d2a6e389529e74bf28
      Reviewed-on: https://go-review.googlesource.com/95396
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      3fca7306
    • Tobias Klauser's avatar
      syscall: fix nil pointer dereference in Select on linux/{arm64,mips64x} · 1023b016
      Tobias Klauser authored
      The timeout parameter might be nil, don't dereference it
      unconditionally.
      
      Fixes #24189
      
      Change-Id: I03e6a1ab74fe30322ce6bcfd3d6c42130b6d61be
      Reviewed-on: https://go-review.googlesource.com/97819
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      1023b016
  3. 01 Mar, 2018 22 commits
    • Brad Fitzpatrick's avatar
      Revert "runtime: use bytes.IndexByte in findnull" · 1fadbc1a
      Brad Fitzpatrick authored
      This reverts commit 7365fac2.
      
      Reason for revert: breaks the build on some architectures, reading unmapped pages?
      
      Change-Id: I3a8c02dc0b649269faacea79ecd8213defa97c54
      Reviewed-on: https://go-review.googlesource.com/97995Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      1fadbc1a
    • Heschi Kreinick's avatar
      cmd/link: fix up location lists for dsymutil · f1fc9da3
      Heschi Kreinick authored
      LLVM tools, particularly lldb and dsymutil, don't support base address
      selection entries in location lists. When targeting GOOS=darwin,
      mode, have the linker translate location lists to CU-relative form
      instead.
      
      Technically, this isn't necessary when linking internally, as long as
      nobody plans to use anything other than Delve to look at the DWARF. But
      someone might want to use lldb, and it's really confusing when dwarfdump
      shows gibberish for the location entries. The performance cost isn't
      noticeable, so enable it even for internal linking.
      
      Doing this in the linker is a little weird, but it was more expensive in
      the compiler, probably because the compiler is much more stressful to
      the GC. Also, if we decide to only do it for external linking, the
      compiler can't see the link mode.
      
      Benchmark before and after this commit on Mac with -dwarflocationlists=1:
      
      name        old time/op       new time/op       delta
      StdCmd            21.3s ± 1%        21.3s ± 1%    ~     (p=0.310 n=27+27)
      
      Only StdCmd is relevant, because only StdCmd runs the linker. Whatever
      the cost is here, it's not very large.
      
      Change-Id: Ic8ef780d0e263230ce6aa3ca3a32fc9abd750b1e
      Reviewed-on: https://go-review.googlesource.com/97956
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      f1fc9da3
    • Heschi Kreinick's avatar
      cmd/compile/internal/ssa: avoid accidental list ends · bff29f2d
      Heschi Kreinick authored
      Some SSA values don't translate into any instructions. If a function
      began with two of them, and both modified the storage of the same
      variable, we'd end up with a location list entry that started and ended
      at 0. That looks like an end-of-list entry, which would then confuse
      downstream tools, particularly the fixup in the linker.
      
      "Fix" this by changing the end of such entries to 1. Should be harmless,
      since AFAIK we don't generate any 1-byte instructions. Later CLs will
      reduce the frequency of these entries anyway.
      
      Change-Id: I9b7e5e69f914244cc826fb9f4a6acfe2dc695f81
      Reviewed-on: https://go-review.googlesource.com/97955
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      bff29f2d
    • Alessandro Arzilli's avatar
      cmd/compile: fix dwarf ranges of inlined subroutine entries · 87736fc4
      Alessandro Arzilli authored
      DWARF ranges are half-open.
      
      Fixes #23928
      
      Change-Id: I71b3384d1bc2c65bd37ca8a02a0b7ff48fec3688
      Reviewed-on: https://go-review.googlesource.com/94816Reviewed-by: 's avatarThan McIntosh <thanm@google.com>
      Run-TryBot: Than McIntosh <thanm@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      87736fc4
    • Cherry Zhang's avatar
      cmd/asm: fix assembling return jump · 2baed385
      Cherry Zhang authored
      In RET instruction, the operand is the return jump's target,
      which should be put in Prog.To.
      
      Add an action "buildrundir" to the test driver, which builds
      (compile+assemble+link) the code in a directory and runs the
      resulting binary.
      
      Fixes #23838.
      
      Change-Id: I7ebe7eda49024b40a69a24857322c5ca9c67babb
      Reviewed-on: https://go-review.googlesource.com/94175
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAustin Clements <austin@google.com>
      2baed385
    • Balaram Makam's avatar
      runtime: improve arm64 memmove implementation · 213a7517
      Balaram Makam authored
      Improve runtime memmove_arm64.s specializing for small copies and
      processing 32 bytes per iteration for 32 bytes or more.
      
      Benchmark results of runtime/Memmove on Amberwing:
      name                      old time/op    new time/op     delta
      Memmove/0                   7.61ns ± 0%     7.20ns ± 0%     ~     (p=0.053 n=5+7)
      Memmove/1                   9.28ns ± 0%     8.80ns ± 0%   -5.17%  (p=0.000 n=4+8)
      Memmove/2                   9.65ns ± 0%     9.20ns ± 0%   -4.68%  (p=0.000 n=5+8)
      Memmove/3                   10.0ns ± 0%      9.2ns ± 0%   -7.83%  (p=0.000 n=5+8)
      Memmove/4                   10.6ns ± 0%      9.2ns ± 0%  -13.21%  (p=0.000 n=5+8)
      Memmove/5                   11.0ns ± 0%      9.2ns ± 0%  -16.36%  (p=0.000 n=5+8)
      Memmove/6                   12.4ns ± 0%      9.2ns ± 0%  -25.81%  (p=0.000 n=5+8)
      Memmove/7                   13.1ns ± 0%      9.2ns ± 0%  -29.56%  (p=0.000 n=5+8)
      Memmove/8                   9.10ns ± 1%     9.20ns ± 0%   +1.08%  (p=0.002 n=5+8)
      Memmove/9                   9.67ns ± 0%     9.20ns ± 0%   -4.88%  (p=0.000 n=5+8)
      Memmove/10                  10.4ns ± 0%      9.2ns ± 0%  -11.54%  (p=0.000 n=5+8)
      Memmove/11                  10.9ns ± 0%      9.2ns ± 0%  -15.60%  (p=0.000 n=5+8)
      Memmove/12                  11.5ns ± 0%      9.2ns ± 0%  -20.00%  (p=0.000 n=5+8)
      Memmove/13                  12.4ns ± 0%      9.2ns ± 0%  -25.81%  (p=0.000 n=5+8)
      Memmove/14                  13.1ns ± 0%      9.2ns ± 0%  -29.77%  (p=0.000 n=5+8)
      Memmove/15                  13.8ns ± 0%      9.2ns ± 0%  -33.33%  (p=0.000 n=5+8)
      Memmove/16                  9.70ns ± 0%     9.20ns ± 0%   -5.19%  (p=0.000 n=5+8)
      Memmove/32                  10.6ns ± 0%      9.2ns ± 0%  -13.21%  (p=0.000 n=4+8)
      Memmove/64                  13.4ns ± 0%     10.2ns ± 0%  -23.88%  (p=0.000 n=4+8)
      Memmove/128                 18.1ns ± 1%     13.2ns ± 0%  -26.99%  (p=0.000 n=5+8)
      Memmove/256                 25.2ns ± 0%     16.4ns ± 0%  -34.92%  (p=0.000 n=5+8)
      Memmove/512                 36.4ns ± 0%     22.8ns ± 0%  -37.36%  (p=0.000 n=5+8)
      Memmove/1024                70.1ns ± 0%     36.8ns ±11%  -47.49%  (p=0.002 n=5+8)
      Memmove/2048                 121ns ± 0%       61ns ± 0%     ~     (p=0.053 n=5+7)
      Memmove/4096                 224ns ± 0%      120ns ± 0%  -46.43%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/0       8.40ns ± 0%     8.00ns ± 0%   -4.76%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/1       9.87ns ± 1%    10.00ns ± 0%     ~     (p=0.070 n=5+8)
      MemmoveUnalignedDst/2       10.6ns ± 0%     10.4ns ± 0%   -1.89%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/3       10.8ns ± 0%     10.4ns ± 0%   -3.70%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/4       10.9ns ± 0%     10.3ns ± 0%     ~     (p=0.053 n=5+7)
      MemmoveUnalignedDst/5       11.5ns ± 0%     10.3ns ± 1%  -10.22%  (p=0.000 n=4+8)
      MemmoveUnalignedDst/6       13.2ns ± 0%     10.4ns ± 1%  -21.50%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/7       13.7ns ± 0%     10.3ns ± 1%  -24.64%  (p=0.000 n=4+8)
      MemmoveUnalignedDst/8       10.1ns ± 0%     10.4ns ± 0%   +2.97%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/9       10.7ns ± 0%     10.4ns ± 0%   -2.80%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/10      11.2ns ± 1%     10.4ns ± 0%   -6.81%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/11      11.6ns ± 0%     10.4ns ± 0%  -10.34%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/12      12.5ns ± 2%     10.4ns ± 0%  -16.53%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/13      13.7ns ± 0%     10.4ns ± 0%  -24.09%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/14      14.0ns ± 0%     10.4ns ± 0%  -25.71%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/15      14.6ns ± 0%     10.4ns ± 0%  -28.77%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/16      10.5ns ± 0%     10.4ns ± 0%   -0.95%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/32      12.4ns ± 0%     11.6ns ± 0%   -6.05%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/64      15.2ns ± 0%     12.3ns ± 0%  -19.08%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/128     18.7ns ± 0%     15.2ns ± 0%  -18.72%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/256     25.1ns ± 0%     18.6ns ± 0%  -25.90%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/512     37.8ns ± 0%     24.4ns ± 0%  -35.45%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/1024    74.6ns ± 0%     40.4ns ± 0%     ~     (p=0.053 n=5+7)
      MemmoveUnalignedDst/2048     133ns ± 0%       75ns ± 0%  -43.91%  (p=0.000 n=5+8)
      MemmoveUnalignedDst/4096     247ns ± 0%      141ns ± 0%  -42.91%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/0       8.40ns ± 0%     8.00ns ± 0%   -4.76%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/1       9.81ns ± 0%    10.00ns ± 0%   +1.98%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/2       10.5ns ± 0%     10.0ns ± 0%   -4.76%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/3       10.7ns ± 1%     10.0ns ± 0%   -6.89%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/4       11.3ns ± 0%     10.0ns ± 0%  -11.50%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/5       11.6ns ± 0%     10.0ns ± 0%  -13.79%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/6       13.6ns ± 0%     10.0ns ± 0%  -26.47%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/7       14.4ns ± 0%     10.0ns ± 0%  -30.75%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/8       9.87ns ± 1%    10.00ns ± 0%     ~     (p=0.070 n=5+8)
      MemmoveUnalignedSrc/9       10.4ns ± 0%     10.0ns ± 0%   -3.85%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/10      11.2ns ± 0%     10.0ns ± 0%  -10.71%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/11      11.8ns ± 0%     10.0ns ± 0%  -15.25%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/12      12.1ns ± 0%     10.0ns ± 0%  -17.36%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/13      13.6ns ± 0%     10.0ns ± 0%  -26.47%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/14      14.7ns ± 0%     10.0ns ± 0%  -31.79%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/15      14.4ns ± 0%     10.0ns ± 0%  -30.56%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/16      11.0ns ± 0%     10.0ns ± 0%   -9.09%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/32      11.5ns ± 0%     10.0ns ± 0%  -13.04%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/64      14.9ns ± 0%     11.2ns ± 0%  -24.83%  (p=0.000 n=4+8)
      MemmoveUnalignedSrc/128     19.5ns ± 0%     15.2ns ± 0%  -22.05%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/256     27.3ns ± 2%     19.2ns ± 0%  -29.62%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/512     40.4ns ± 0%     27.2ns ± 0%  -32.67%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/1024    75.4ns ± 0%     44.4ns ± 0%  -41.15%  (p=0.000 n=5+8)
      MemmoveUnalignedSrc/2048     131ns ± 0%       77ns ± 3%  -41.56%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/4096     248ns ± 0%      145ns ± 0%  -41.53%  (p=0.000 n=5+8)
      
      name                      old speed      new speed       delta
      Memmove/1                  108MB/s ± 0%    114MB/s ± 0%   +5.37%  (p=0.004 n=4+8)
      Memmove/2                  207MB/s ± 0%    217MB/s ± 0%   +4.85%  (p=0.002 n=5+8)
      Memmove/3                  301MB/s ± 0%    326MB/s ± 0%   +8.45%  (p=0.002 n=5+8)
      Memmove/4                  377MB/s ± 0%    435MB/s ± 0%  +15.31%  (p=0.004 n=4+8)
      Memmove/5                  455MB/s ± 0%    543MB/s ± 0%  +19.46%  (p=0.002 n=5+8)
      Memmove/6                  483MB/s ± 0%    652MB/s ± 0%  +34.88%  (p=0.003 n=5+7)
      Memmove/7                  537MB/s ± 0%    761MB/s ± 0%  +41.71%  (p=0.002 n=5+8)
      Memmove/8                  879MB/s ± 1%    869MB/s ± 0%   -1.15%  (p=0.000 n=5+7)
      Memmove/9                  931MB/s ± 0%    978MB/s ± 0%   +5.05%  (p=0.002 n=5+8)
      Memmove/10                 960MB/s ± 0%   1086MB/s ± 0%  +13.13%  (p=0.002 n=5+8)
      Memmove/11                1.00GB/s ± 0%   1.20GB/s ± 0%  +18.92%  (p=0.003 n=5+7)
      Memmove/12                1.04GB/s ± 0%   1.30GB/s ± 0%  +25.40%  (p=0.002 n=5+8)
      Memmove/13                1.05GB/s ± 0%   1.41GB/s ± 0%  +34.87%  (p=0.002 n=5+8)
      Memmove/14                1.07GB/s ± 0%   1.52GB/s ± 0%  +42.14%  (p=0.002 n=5+8)
      Memmove/15                1.09GB/s ± 0%   1.63GB/s ± 0%  +49.91%  (p=0.002 n=5+8)
      Memmove/16                1.65GB/s ± 0%   1.74GB/s ± 0%   +5.40%  (p=0.003 n=5+7)
      Memmove/32                3.01GB/s ± 0%   3.48GB/s ± 0%  +15.58%  (p=0.003 n=5+7)
      Memmove/64                4.76GB/s ± 0%   6.27GB/s ± 0%  +31.75%  (p=0.003 n=5+7)
      Memmove/128               7.08GB/s ± 1%   9.69GB/s ± 0%  +36.96%  (p=0.002 n=5+8)
      Memmove/256               10.2GB/s ± 0%   15.6GB/s ± 0%  +53.58%  (p=0.002 n=5+8)
      Memmove/512               14.1GB/s ± 0%   22.4GB/s ± 0%  +59.57%  (p=0.003 n=5+7)
      Memmove/1024              14.6GB/s ± 0%   27.9GB/s ±10%  +91.00%  (p=0.002 n=5+8)
      Memmove/2048              16.9GB/s ± 0%   33.4GB/s ± 0%  +98.32%  (p=0.003 n=5+7)
      Memmove/4096              18.3GB/s ± 0%   33.9GB/s ± 0%  +85.80%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/1      101MB/s ± 1%    100MB/s ± 0%     ~     (p=0.586 n=5+8)
      MemmoveUnalignedDst/2      189MB/s ± 0%    192MB/s ± 0%   +1.82%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/3      278MB/s ± 0%    288MB/s ± 0%   +3.88%  (p=0.003 n=5+7)
      MemmoveUnalignedDst/4      368MB/s ± 0%    387MB/s ± 0%   +5.41%  (p=0.003 n=5+7)
      MemmoveUnalignedDst/5      434MB/s ± 0%    484MB/s ± 0%  +11.52%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/6      454MB/s ± 0%    580MB/s ± 0%  +27.62%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/7      509MB/s ± 0%    677MB/s ± 0%  +33.01%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/8      792MB/s ± 0%    770MB/s ± 0%   -2.77%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/9      841MB/s ± 0%    866MB/s ± 0%   +2.92%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/10     896MB/s ± 0%    962MB/s ± 0%   +7.35%  (p=0.003 n=5+7)
      MemmoveUnalignedDst/11     947MB/s ± 0%   1058MB/s ± 0%  +11.80%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/12     962MB/s ± 2%   1154MB/s ± 0%  +19.97%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/13     947MB/s ± 0%   1251MB/s ± 0%  +32.08%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/14    1.00GB/s ± 0%   1.35GB/s ± 0%  +34.55%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/15    1.03GB/s ± 0%   1.44GB/s ± 0%  +40.50%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/16    1.53GB/s ± 0%   1.54GB/s ± 0%   +0.77%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/32    2.58GB/s ± 0%   2.75GB/s ± 0%   +6.52%  (p=0.003 n=5+7)
      MemmoveUnalignedDst/64    4.21GB/s ± 0%   5.19GB/s ± 0%  +23.40%  (p=0.004 n=5+6)
      MemmoveUnalignedDst/128   6.86GB/s ± 0%   8.42GB/s ± 0%  +22.78%  (p=0.003 n=5+7)
      MemmoveUnalignedDst/256   10.2GB/s ± 0%   13.8GB/s ± 0%  +35.15%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/512   13.5GB/s ± 0%   21.0GB/s ± 0%  +54.90%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/1024  13.7GB/s ± 0%   25.3GB/s ± 0%  +84.61%  (p=0.003 n=5+7)
      MemmoveUnalignedDst/2048  15.3GB/s ± 0%   27.5GB/s ± 0%  +79.52%  (p=0.002 n=5+8)
      MemmoveUnalignedDst/4096  16.5GB/s ± 0%   28.9GB/s ± 0%  +74.74%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/1      102MB/s ± 0%    100MB/s ± 0%   -2.02%  (p=0.000 n=5+7)
      MemmoveUnalignedSrc/2      191MB/s ± 0%    200MB/s ± 0%   +4.78%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/3      279MB/s ± 0%    300MB/s ± 0%   +7.45%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/4      354MB/s ± 0%    400MB/s ± 0%  +13.10%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/5      431MB/s ± 0%    500MB/s ± 0%  +16.02%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/6      441MB/s ± 0%    600MB/s ± 0%  +36.03%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/7      485MB/s ± 0%    700MB/s ± 0%  +44.29%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/8      811MB/s ± 1%    800MB/s ± 0%   -1.36%  (p=0.016 n=5+8)
      MemmoveUnalignedSrc/9      864MB/s ± 0%    900MB/s ± 0%   +4.07%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/10     893MB/s ± 0%    999MB/s ± 0%  +11.97%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/11     932MB/s ± 0%   1099MB/s ± 0%  +18.01%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/12     988MB/s ± 0%   1199MB/s ± 0%  +21.35%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/13     955MB/s ± 0%   1299MB/s ± 0%  +36.02%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/14     955MB/s ± 0%   1399MB/s ± 0%  +46.52%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/15    1.04GB/s ± 0%   1.50GB/s ± 0%  +44.18%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/16    1.45GB/s ± 0%   1.60GB/s ± 0%  +10.14%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/32    2.78GB/s ± 0%   3.20GB/s ± 0%  +15.16%  (p=0.003 n=5+7)
      MemmoveUnalignedSrc/64    4.30GB/s ± 0%   5.72GB/s ± 0%  +32.90%  (p=0.003 n=5+7)
      MemmoveUnalignedSrc/128   6.57GB/s ± 0%   8.42GB/s ± 0%  +28.06%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/256   9.39GB/s ± 1%  13.33GB/s ± 0%  +41.96%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/512   12.7GB/s ± 0%   18.8GB/s ± 0%  +48.53%  (p=0.003 n=5+7)
      MemmoveUnalignedSrc/1024  13.6GB/s ± 0%   23.0GB/s ± 0%  +69.82%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/2048  15.6GB/s ± 0%   26.8GB/s ± 3%  +71.37%  (p=0.002 n=5+8)
      MemmoveUnalignedSrc/4096  16.5GB/s ± 0%   28.2GB/s ± 0%  +71.40%  (p=0.002 n=5+8)
      
      Fixes #22925
      
      Change-Id: I38c1a9ad5c6e3f4f95fc521c4b7e3140b58b4737
      Reviewed-on: https://go-review.googlesource.com/83799
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      213a7517
    • Josh Bleecher Snyder's avatar
      runtime: use bytes.IndexByte in findnull · 7365fac2
      Josh Bleecher Snyder authored
      bytes.IndexByte is heavily optimized.
      Use it in findnull.
      
      name        old time/op  new time/op  delta
      GoString-8  65.5ns ± 1%  40.2ns ± 1%  -38.62%  (p=0.000 n=19+19)
      
      findnull is also used in gostringnocopy,
      which is used in many hot spots in the runtime.
      
      Fixes #23830
      
      Change-Id: I2e6cb279c7d8078f8844065de684cc3567fe89d7
      Reviewed-on: https://go-review.googlesource.com/97523
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      7365fac2
    • Chad Rosier's avatar
      cmd/compile/internal/ssa: combine consecutive BigEndian stores on arm64 · 39fefa07
      Chad Rosier authored
      This optimization mirrors that which is already implemented for AMD64.  The
      optimization specifically targets the binary.BigEndian.PutUint* functions.
      
      encoding-binary results on Amberwing:
      name                   old time/op    new time/op    delta
      ReadSlice1000Int32s      9.83µs ± 2%    9.78µs ± 1%     ~     (p=0.362 n=9+10)
      ReadStruct               5.24µs ± 3%    5.19µs ± 2%     ~     (p=0.285 n=10+10)
      ReadInts                 8.35µs ± 8%    8.44µs ± 3%     ~     (p=0.323 n=10+10)
      WriteInts                3.38µs ± 3%    3.44µs ±15%     ~     (p=0.921 n=9+10)
      WriteSlice1000Int32s     11.4µs ± 6%    10.2µs ± 4%   -9.94%  (p=0.000 n=10+10)
      PutUint16                 510ns ±12%     500ns ± 0%     ~     (p=0.586 n=10+7)
      PutUint32                 530ns ±15%     490ns ±12%     ~     (p=0.086 n=10+10)
      PutUint64                 550ns ± 0%     470ns ± 6%  -14.52%  (p=0.000 n=7+10)
      LittleEndianPutUint16     500ns ± 0%     475ns ±16%     ~     (p=0.120 n=7+10)
      LittleEndianPutUint32     450ns ± 0%     517ns ±16%  +14.81%  (p=0.004 n=8+9)
      LittleEndianPutUint64     550ns ± 0%     485ns ±13%  -11.82%  (p=0.000 n=8+10)
      PutUvarint32              685ns ±12%     622ns ± 4%   -9.17%  (p=0.005 n=10+9)
      PutUvarint64              735ns ± 9%     711ns ± 9%     ~     (p=0.272 n=10+9)
      [Geo mean]               1.47µs         1.42µs        -3.87%
      
      name                   old speed      new speed      delta
      ReadSlice1000Int32s     407MB/s ± 2%   409MB/s ± 1%     ~     (p=0.362 n=9+10)
      ReadStruct             14.3MB/s ± 3%  14.4MB/s ± 2%     ~     (p=0.250 n=10+10)
      ReadInts               3.59MB/s ± 7%  3.56MB/s ± 4%     ~     (p=0.340 n=10+10)
      WriteInts              8.87MB/s ± 3%  8.74MB/s ±13%     ~     (p=0.890 n=9+10)
      WriteSlice1000Int32s    352MB/s ± 6%   391MB/s ± 4%  +11.03%  (p=0.000 n=10+10)
      PutUint16              3.95MB/s ±13%  4.00MB/s ± 0%     ~     (p=0.312 n=10+7)
      PutUint32              7.62MB/s ±17%  8.21MB/s ±11%     ~     (p=0.086 n=10+10)
      PutUint64              14.6MB/s ± 0%  17.1MB/s ± 6%  +17.28%  (p=0.000 n=7+10)
      LittleEndianPutUint16  4.00MB/s ± 0%  4.23MB/s ±18%     ~     (p=0.176 n=7+10)
      LittleEndianPutUint32  8.89MB/s ± 0%  7.64MB/s ±20%  -14.05%  (p=0.001 n=8+10)
      LittleEndianPutUint64  14.6MB/s ± 0%  16.6MB/s ±12%  +13.86%  (p=0.000 n=8+10)
      PutUvarint32           5.86MB/s ±14%  6.44MB/s ± 5%   +9.84%  (p=0.006 n=10+9)
      PutUvarint64           10.9MB/s ± 8%  11.3MB/s ± 9%     ~     (p=0.373 n=10+9)
      [Geo mean]             14.2MB/s       14.8MB/s        +3.93%
      
      go1 results on Amberwing:
      RegexpMatchEasy0_32       254ns ± 0%     254ns ± 0%    ~     (all equal)
      RegexpMatchEasy0_1K       547ns ± 0%     547ns ± 0%    ~     (all equal)
      RegexpMatchEasy1_32       252ns ± 0%     253ns ± 1%    ~     (p=0.294 n=8+10)
      RegexpMatchEasy1_1K       782ns ± 0%     783ns ± 1%    ~     (p=0.529 n=8+9)
      RegexpMatchMedium_32      316ns ± 0%     316ns ± 0%    ~     (all equal)
      RegexpMatchMedium_1K     51.5µs ± 0%    51.5µs ± 0%    ~     (p=0.645 n=10+9)
      RegexpMatchHard_32       2.75µs ± 0%    2.75µs ± 0%    ~     (all equal)
      RegexpMatchHard_1K       78.7µs ± 0%    78.7µs ± 0%    ~     (p=0.754 n=10+10)
      FmtFprintfEmpty          57.0ns ± 0%    57.0ns ± 0%    ~     (all equal)
      FmtFprintfString          111ns ± 0%     111ns ± 0%    ~     (all equal)
      FmtFprintfInt             114ns ± 0%     114ns ± 1%    ~     (p=0.065 n=9+10)
      FmtFprintfIntInt          182ns ± 0%     178ns ± 0%  -2.20%  (p=0.000 n=10+10)
      FmtFprintfPrefixedInt     225ns ± 0%     227ns ± 0%  +0.89%  (p=0.000 n=10+10)
      FmtFprintfFloat           307ns ± 0%     307ns ± 0%    ~     (p=1.000 n=9+9)
      FmtManyArgs               697ns ± 0%     701ns ± 2%    ~     (p=0.108 n=9+10)
      Gzip                      436ms ± 0%     437ms ± 0%  +0.23%  (p=0.000 n=10+8)
      HTTPClientServer         88.8µs ± 2%    89.6µs ± 1%  +0.98%  (p=0.019 n=10+10)
      JSONEncode               20.1ms ± 1%    20.2ms ± 1%  +0.48%  (p=0.007 n=10+10)
      JSONDecode               94.7ms ± 1%    94.1ms ± 0%  -0.62%  (p=0.000 n=10+9)
      GobDecode                12.6ms ± 2%    12.6ms ± 1%    ~     (p=0.360 n=10+8)
      GobEncode                12.0ms ± 1%    11.9ms ± 1%  -1.34%  (p=0.000 n=10+10)
      Mandelbrot200            5.05ms ± 0%    5.05ms ± 0%  +0.12%  (p=0.000 n=10+10)
      TimeParse                 448ns ± 0%     448ns ± 0%    ~     (p=0.529 n=8+9)
      TimeFormat                501ns ± 1%     501ns ± 1%    ~     (p=1.000 n=10+9)
      Template                 90.6ms ± 0%    89.1ms ± 0%  -1.67%  (p=0.000 n=9+9)
      GoParse                  6.01ms ± 0%    5.96ms ± 0%  -0.83%  (p=0.000 n=10+9)
      BinaryTree17              11.7s ± 0%     11.7s ± 0%    ~     (p=0.481 n=10+10)
      Revcomp                   675ms ± 0%     675ms ± 0%    ~     (p=0.436 n=9+9)
      Fannkuch11                3.26s ± 0%     3.27s ± 1%  +0.57%  (p=0.000 n=10+10)
      [Geo mean]               67.4µs         67.3µs       -0.10%
      
      name                   old speed      new speed      delta
      RegexpMatchEasy0_32     126MB/s ± 0%   126MB/s ± 0%    ~     (p=0.353 n=10+7)
      RegexpMatchEasy0_1K    1.87GB/s ± 0%  1.87GB/s ± 0%    ~     (p=0.275 n=8+10)
      RegexpMatchEasy1_32     127MB/s ± 0%   126MB/s ± 1%    ~     (p=0.110 n=8+10)
      RegexpMatchEasy1_1K    1.31GB/s ± 0%  1.31GB/s ± 1%    ~     (p=0.079 n=8+10)
      RegexpMatchMedium_32   3.16MB/s ± 0%  3.16MB/s ± 0%    ~     (all equal)
      RegexpMatchMedium_1K   19.9MB/s ± 0%  19.9MB/s ± 0%    ~     (p=0.889 n=10+9)
      RegexpMatchHard_32     11.7MB/s ± 0%  11.7MB/s ± 0%    ~     (all equal)
      RegexpMatchHard_1K     13.0MB/s ± 0%  13.0MB/s ± 0%    ~     (p=1.000 n=10+10)
      Gzip                   44.5MB/s ± 0%  44.4MB/s ± 0%  -0.22%  (p=0.000 n=10+8)
      JSONEncode             96.6MB/s ± 1%  96.1MB/s ± 1%  -0.48%  (p=0.007 n=10+10)
      JSONDecode             20.5MB/s ± 1%  20.6MB/s ± 0%  +0.63%  (p=0.000 n=10+9)
      GobDecode              61.0MB/s ± 2%  61.1MB/s ± 1%    ~     (p=0.372 n=10+8)
      GobEncode              63.8MB/s ± 1%  64.7MB/s ± 1%  +1.36%  (p=0.000 n=10+10)
      Template               21.4MB/s ± 0%  21.8MB/s ± 0%  +1.69%  (p=0.000 n=9+9)
      GoParse                9.63MB/s ± 0%  9.71MB/s ± 0%  +0.84%  (p=0.000 n=9+8)
      Revcomp                 377MB/s ± 0%   376MB/s ± 0%    ~     (p=0.399 n=9+9)
      [Geo mean]             56.2MB/s       56.3MB/s       +0.20%
      
      Change-Id: Ic915373f5ef512f9fbc45745860e5db7f6de6286
      Reviewed-on: https://go-review.googlesource.com/97755
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      39fefa07
    • Ilya Tocar's avatar
      crypto: remove hand encoded amd64 instructions · 93665c0d
      Ilya Tocar authored
      Replace BYTE.. encodings with asm. This is possible due to asm
      implementing more instructions and removal of
      MOV $0, reg -> XOR reg, reg transformation from asm.
      
      Change-Id: I011749ab6b3f64403ab6e746f3760c5841548b57
      Reviewed-on: https://go-review.googlesource.com/97936
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      93665c0d
    • Pascal S. de Kloe's avatar
      encoding/json: read ahead after value consumption · 5d118386
      Pascal S. de Kloe authored
      Eliminates the need for an extra scanner, read undo and some other tricks.
      
      name                    old time/op    new time/op    delta
      CodeEncoder-12            1.92ms ± 0%    1.91ms ± 1%   -0.65%  (p=0.000 n=17+20)
      CodeMarshal-12            2.13ms ± 2%    2.12ms ± 1%   -0.49%  (p=0.038 n=18+17)
      CodeDecoder-12            8.55ms ± 2%    8.49ms ± 1%     ~     (p=0.119 n=20+18)
      UnicodeDecoder-12          411ns ± 0%     422ns ± 0%   +2.77%  (p=0.000 n=19+15)
      DecoderStream-12           320ns ± 1%     307ns ± 1%   -3.80%  (p=0.000 n=18+20)
      CodeUnmarshal-12          9.65ms ± 3%    9.58ms ± 3%     ~     (p=0.157 n=20+20)
      CodeUnmarshalReuse-12     8.54ms ± 3%    8.56ms ± 2%     ~     (p=0.602 n=20+20)
      UnmarshalString-12         110ns ± 1%      87ns ± 2%  -21.53%  (p=0.000 n=16+20)
      UnmarshalFloat64-12        101ns ± 1%      77ns ± 2%  -23.08%  (p=0.000 n=19+20)
      UnmarshalInt64-12         94.5ns ± 2%    68.4ns ± 1%  -27.60%  (p=0.000 n=20+20)
      Issue10335-12              128ns ± 1%     100ns ± 1%  -21.89%  (p=0.000 n=19+18)
      Unmapped-12                427ns ± 3%     247ns ± 4%  -42.17%  (p=0.000 n=20+20)
      NumberIsValid-12          23.0ns ± 0%    21.7ns ± 0%   -5.73%  (p=0.000 n=20+20)
      NumberIsValidRegexp-12     641ns ± 0%     642ns ± 0%   +0.15%  (p=0.003 n=19+19)
      EncoderEncode-12          56.9ns ± 0%    55.0ns ± 1%   -3.32%  (p=0.012 n=2+17)
      
      name                    old speed      new speed      delta
      CodeEncoder-12          1.01GB/s ± 1%  1.02GB/s ± 1%   +0.71%  (p=0.000 n=18+20)
      CodeMarshal-12           913MB/s ± 2%   917MB/s ± 1%   +0.49%  (p=0.038 n=18+17)
      CodeDecoder-12           227MB/s ± 2%   229MB/s ± 1%     ~     (p=0.110 n=20+18)
      UnicodeDecoder-12       34.1MB/s ± 0%  33.1MB/s ± 0%   -2.73%  (p=0.000 n=19+19)
      CodeUnmarshal-12         201MB/s ± 3%   203MB/s ± 3%     ~     (p=0.151 n=20+20)
      
      name                    old alloc/op   new alloc/op   delta
      Issue10335-12               320B ± 0%      184B ± 0%  -42.50%  (p=0.000 n=20+20)
      Unmapped-12                 568B ± 0%      216B ± 0%  -61.97%  (p=0.000 n=20+20)
      EncoderEncode-12           0.00B          0.00B          ~     (all equal)
      
      name                    old allocs/op  new allocs/op  delta
      Issue10335-12               4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.000 n=20+20)
      Unmapped-12                 18.0 ± 0%       4.0 ± 0%  -77.78%  (p=0.000 n=20+20)
      EncoderEncode-12            0.00           0.00          ~     (all equal)
      
      Fixes #17914
      Updates #20693
      Updates #10335
      
      Change-Id: I0459a52febb8b79c9a2991e69ed2614cf8740429
      Reviewed-on: https://go-review.googlesource.com/47152Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      5d118386
    • Ilya Tocar's avatar
      math: remove unused variable · c15984c6
      Ilya Tocar authored
      useSSE41 was used inside asm implementation of floor to select between base and ss4 code path.
      We intrinsified floor and left asm functions as a backup for non-sse4 systems.
      This made variable unused, so remove it.
      
      Change-Id: Ia2633de7c7cb1ef1d5b15a2366b523e481b722d9
      Reviewed-on: https://go-review.googlesource.com/97935
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c15984c6
    • Hana Kim's avatar
      runtime/trace: skip TestUserTaskSpan upon timestamp error · e75f805e
      Hana Kim authored
      Change-Id: I030baaa0a0abf1e43449faaf676d389a28a868a3
      Reviewed-on: https://go-review.googlesource.com/97857
      Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
      Reviewed-by: 's avatarPeter Weinberger <pjw@google.com>
      e75f805e
    • Giovanni Bajo's avatar
      test: implement negative rules in asmcheck · f16cc298
      Giovanni Bajo authored
      Change-Id: I2b507e35cc314100eaf2ec2d1e5107cc2fc9e7cf
      Reviewed-on: https://go-review.googlesource.com/97818Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      f16cc298
    • Giovanni Bajo's avatar
      test: in asmcheck, regexp must match from beginning of line · 0bcf8bcd
      Giovanni Bajo authored
      This avoid simple bugs like "ADD" matching "FADD". Obviously
      "ADD" will still match "ADDQ" so some care is still required
      in this regard, but at least a first class of possible errors
      is taken care of.
      
      Change-Id: I7deb04c31de30bedac9c026d9889ace4a1d2adcb
      Reviewed-on: https://go-review.googlesource.com/97817Reviewed-by: 's avatarGiovanni Bajo <rasky@develer.com>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      0bcf8bcd
    • Giovanni Bajo's avatar
      test: improve asmcheck syntax · 879a1ff1
      Giovanni Bajo authored
      asmcheck comments now support a compact form of specifying
      multiple checks for each platform, using the following syntax:
      
         amd64:"SHL\t[$]4","SHR\t[$]4"
      
      Negative checks are also parsed using the following syntax:
      
         amd64:-"ROR"
      
      though they are still not working.
      
      Moreover, out-of-line comments have been implemented. This
      allows to specify asmchecks on comment-only lines, that will
      be matched on the first subsequent non-comment non-empty line.
      
          // amd64:"XOR"
          // arm:"EOR"
      
          x ^= 1
      
      Change-Id: I110c7462fc6a5c70fd4af0d42f516016ae7f2760
      Reviewed-on: https://go-review.googlesource.com/97816Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      879a1ff1
    • Josh Bleecher Snyder's avatar
      runtime: don't allocate to build strings of length 1 · 9372e3f5
      Josh Bleecher Snyder authored
      Use staticbytes instead.
      Instrumenting make.bash shows approx 0.5%
      of all slicebytetostrings have a buffer of length 1.
      
      name                     old time/op  new time/op  delta
      SliceByteToString/1-8    14.1ns ± 1%   4.1ns ± 1%  -71.13%  (p=0.000 n=17+20)
      SliceByteToString/2-8    15.5ns ± 2%  15.5ns ± 1%     ~     (p=0.061 n=20+18)
      SliceByteToString/4-8    14.9ns ± 1%  15.0ns ± 2%   +1.25%  (p=0.000 n=20+20)
      SliceByteToString/8-8    17.1ns ± 1%  17.5ns ± 1%   +2.16%  (p=0.000 n=19+19)
      SliceByteToString/16-8   23.6ns ± 1%  23.9ns ± 1%   +1.41%  (p=0.000 n=20+18)
      SliceByteToString/32-8   26.0ns ± 1%  25.8ns ± 0%   -1.05%  (p=0.000 n=19+16)
      SliceByteToString/64-8   30.0ns ± 0%  30.2ns ± 0%   +0.56%  (p=0.000 n=16+18)
      SliceByteToString/128-8  38.9ns ± 0%  39.0ns ± 0%   +0.23%  (p=0.019 n=19+15)
      
      Fixes #24172
      
      Change-Id: I3dfa14eefbf9fb4387114e20c9cb40e186abe962
      Reviewed-on: https://go-review.googlesource.com/97717
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      9372e3f5
    • Josh Bleecher Snyder's avatar
      runtime: fix amd64p32 indexbytes in presence of overflow · aa9c1a8f
      Josh Bleecher Snyder authored
      When the slice/string length is very large,
      probably artifically large as in CL 97523,
      adding BX (length) to R11 (pointer) overflows.
      As a result, checking DI < R11 yields the wrong result.
      Since they will be equal when the loop is done,
      just check DI != R11 instead.
      Yes, the pointer itself could overflow, but if that happens,
      something else has gone pretty wrong; not our concern here.
      
      Fixes #24187
      
      Change-Id: I2f60fc6ccae739345d01bc80528560726ad4f8c6
      Reviewed-on: https://go-review.googlesource.com/97802
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      aa9c1a8f
    • Chad Rosier's avatar
      cmd/compile/internal/ssa: combine consecutive LittleEndian stores on arm64 · 77ba071e
      Chad Rosier authored
      This optimization mirrors that which is already implemented for AMD64.  The
      optimization specifically targets the binary.LittleEndian.PutUint* functions.
      
      encoding/binary results on Amberwing:
      name                   old time/op    new time/op    delta
      ReadSlice1000Int32s      9.67µs ± 1%    9.64µs ± 1%     ~     (p=0.185 n=9+9)
      ReadStruct               5.24µs ± 2%    5.36µs ± 2%   +2.24%  (p=0.002 n=10+8)
      ReadInts                 8.69µs ± 5%    8.88µs ± 5%     ~     (p=0.083 n=10+10)
      WriteInts                3.90µs ±10%    3.71µs ± 9%     ~     (p=0.077 n=10+10)
      WriteSlice1000Int32s     10.9µs ± 1%    10.9µs ± 1%     ~     (p=0.701 n=9+9)
      PutUint16                 572ns ±14%     505ns ±11%  -11.75%  (p=0.006 n=9+10)
      PutUint32                 550ns ±18%     540ns ±11%     ~     (p=0.692 n=10+10)
      PutUint64                 565ns ±15%     540ns ±17%     ~     (p=0.248 n=10+10)
      LittleEndianPutUint16     540ns ±11%     500ns ±10%     ~     (p=0.094 n=10+10)
      LittleEndianPutUint32     520ns ±15%     480ns ±15%     ~     (p=0.087 n=10+10)
      LittleEndianPutUint64     505ns ±29%     470ns ±17%     ~     (p=0.208 n=10+10)
      PutUvarint32              700ns ±21%     635ns ±10%   -9.29%  (p=0.028 n=10+10)
      PutUvarint64              740ns ± 8%     740ns ± 8%     ~     (p=0.713 n=10+10)
      [Geo mean]               1.53µs         1.47µs        -3.93%
      
      name                   old speed      new speed      delta
      ReadSlice1000Int32s     414MB/s ± 1%   415MB/s ± 1%     ~     (p=0.185 n=9+9)
      ReadStruct             14.3MB/s ± 2%  14.0MB/s ± 2%   -2.21%  (p=0.000 n=10+8)
      ReadInts               3.45MB/s ± 4%  3.38MB/s ± 6%     ~     (p=0.085 n=10+10)
      WriteInts              7.71MB/s ± 9%  8.09MB/s ± 8%   +4.93%  (p=0.048 n=10+10)
      WriteSlice1000Int32s    367MB/s ± 1%   366MB/s ± 1%     ~     (p=0.701 n=9+9)
      PutUint16              3.51MB/s ±14%  3.99MB/s ±11%  +13.47%  (p=0.009 n=9+10)
      PutUint32              7.35MB/s ±21%  7.44MB/s ±10%     ~     (p=0.692 n=10+10)
      PutUint64              14.3MB/s ±14%  15.0MB/s ±19%     ~     (p=0.248 n=10+10)
      LittleEndianPutUint16  3.72MB/s ±11%  4.03MB/s ±10%     ~     (p=0.094 n=10+10)
      LittleEndianPutUint32  7.75MB/s ±15%  8.39MB/s ±13%     ~     (p=0.087 n=10+10)
      LittleEndianPutUint64  16.1MB/s ±23%  17.2MB/s ±16%     ~     (p=0.208 n=10+10)
      PutUvarint32           5.76MB/s ±18%  6.32MB/s ±10%   +9.72%  (p=0.028 n=10+10)
      PutUvarint64           10.8MB/s ± 8%  10.8MB/s ± 8%     ~     (p=0.713 n=10+10)
      [Geo mean]             13.7MB/s       14.3MB/s        +4.02%
      
      go1 results on Amberwing:
      name                   old time/op    new time/op    delta
      RegexpMatchEasy0_32       249ns ± 0%     249ns ± 0%    ~     (p=0.087 n=10+10)
      RegexpMatchEasy0_1K       584ns ± 0%     584ns ± 0%    ~     (all equal)
      RegexpMatchEasy1_32       246ns ± 0%     246ns ± 0%    ~     (p=1.000 n=10+10)
      RegexpMatchEasy1_1K       806ns ± 0%     806ns ± 0%    ~     (p=0.706 n=10+9)
      RegexpMatchMedium_32      314ns ± 0%     314ns ± 0%    ~     (all equal)
      RegexpMatchMedium_1K     52.1µs ± 0%    52.1µs ± 0%    ~     (p=0.245 n=10+8)
      RegexpMatchHard_32       2.75µs ± 1%    2.75µs ± 1%    ~     (p=0.690 n=10+10)
      RegexpMatchHard_1K       78.9µs ± 0%    78.9µs ± 1%    ~     (p=0.295 n=9+9)
      FmtFprintfEmpty          58.5ns ± 0%    58.5ns ± 0%    ~     (all equal)
      FmtFprintfString          112ns ± 0%     112ns ± 0%    ~     (all equal)
      FmtFprintfInt             117ns ± 0%     116ns ± 0%  -0.85%  (p=0.000 n=10+10)
      FmtFprintfIntInt          181ns ± 0%     181ns ± 0%    ~     (all equal)
      FmtFprintfPrefixedInt     222ns ± 0%     224ns ± 0%  +0.90%  (p=0.000 n=9+10)
      FmtFprintfFloat           318ns ± 1%     322ns ± 0%    ~     (p=0.059 n=10+8)
      FmtManyArgs               736ns ± 1%     735ns ± 0%    ~     (p=0.206 n=9+9)
      Gzip                      437ms ± 0%     436ms ± 0%  -0.25%  (p=0.000 n=10+10)
      HTTPClientServer         89.8µs ± 1%    90.2µs ± 2%    ~     (p=0.393 n=10+10)
      JSONEncode               20.1ms ± 1%    20.2ms ± 1%    ~     (p=0.065 n=9+10)
      JSONDecode               94.2ms ± 1%    93.9ms ± 1%  -0.42%  (p=0.043 n=10+10)
      GobDecode                12.7ms ± 1%    12.8ms ± 2%  +0.94%  (p=0.019 n=10+10)
      GobEncode                12.1ms ± 0%    12.1ms ± 0%    ~     (p=0.052 n=10+10)
      Mandelbrot200            5.06ms ± 0%    5.05ms ± 0%  -0.04%  (p=0.000 n=9+10)
      TimeParse                 450ns ± 3%     446ns ± 0%    ~     (p=0.238 n=10+9)
      TimeFormat                485ns ± 1%     483ns ± 1%    ~     (p=0.073 n=10+10)
      Template                 90.4ms ± 0%    90.7ms ± 0%  +0.29%  (p=0.000 n=8+10)
      GoParse                  6.01ms ± 0%    6.03ms ± 0%  +0.35%  (p=0.000 n=10+10)
      BinaryTree17              11.7s ± 0%     11.7s ± 0%    ~     (p=0.481 n=10+10)
      Revcomp                   669ms ± 0%     669ms ± 0%    ~     (p=0.315 n=10+10)
      Fannkuch11                3.40s ± 0%     3.37s ± 0%  -0.92%  (p=0.000 n=10+10)
      [Geo mean]               67.9µs         67.9µs       +0.02%
      
      name                   old speed      new speed      delta
      RegexpMatchEasy0_32     128MB/s ± 0%   128MB/s ± 0%  -0.08%  (p=0.003 n=8+10)
      RegexpMatchEasy0_1K    1.75GB/s ± 0%  1.75GB/s ± 0%    ~     (p=0.642 n=8+10)
      RegexpMatchEasy1_32     130MB/s ± 0%   130MB/s ± 0%    ~     (p=0.690 n=10+9)
      RegexpMatchEasy1_1K    1.27GB/s ± 0%  1.27GB/s ± 0%    ~     (p=0.661 n=10+9)
      RegexpMatchMedium_32   3.18MB/s ± 0%  3.18MB/s ± 0%    ~     (all equal)
      RegexpMatchMedium_1K   19.7MB/s ± 0%  19.6MB/s ± 0%    ~     (p=0.190 n=10+9)
      RegexpMatchHard_32     11.6MB/s ± 0%  11.6MB/s ± 1%    ~     (p=0.669 n=10+10)
      RegexpMatchHard_1K     13.0MB/s ± 0%  13.0MB/s ± 0%    ~     (p=0.718 n=9+9)
      Gzip                   44.4MB/s ± 0%  44.5MB/s ± 0%  +0.24%  (p=0.000 n=10+10)
      JSONEncode             96.5MB/s ± 1%  96.1MB/s ± 1%    ~     (p=0.065 n=9+10)
      JSONDecode             20.6MB/s ± 1%  20.7MB/s ± 1%  +0.42%  (p=0.041 n=10+10)
      GobDecode              60.6MB/s ± 1%  60.0MB/s ± 2%  -0.92%  (p=0.016 n=10+10)
      GobEncode              63.4MB/s ± 0%  63.6MB/s ± 0%    ~     (p=0.055 n=10+10)
      Template               21.5MB/s ± 0%  21.4MB/s ± 0%  -0.30%  (p=0.000 n=9+10)
      GoParse                9.64MB/s ± 0%  9.61MB/s ± 0%  -0.36%  (p=0.000 n=10+10)
      Revcomp                 380MB/s ± 0%   380MB/s ± 0%    ~     (p=0.323 n=10+10)
      [Geo mean]             56.0MB/s       55.9MB/s       -0.07%
      
      Change-Id: I79a4978d42d01a5f72ed5ceec07f5e78ac6b3859
      Reviewed-on: https://go-review.googlesource.com/97175
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      77ba071e
    • Wei Xiao's avatar
      bytes: add asm version of Index for short strings on arm64 · 562346b7
      Wei Xiao authored
      Currently we have special case for 1-byte strings,
      this extends it to strings shorter than 9 bytes on arm64.
      
      Benchmark results:
      name                              old time/op    new time/op    delta
      IndexByte/10-32                     18.6ns ± 0%    18.1ns ± 0%    -2.69%  (p=0.008 n=5+5)
      IndexByte/32-32                     16.8ns ± 1%    16.9ns ± 1%      ~     (p=0.762 n=5+5)
      IndexByte/4K-32                      464ns ± 0%     464ns ± 0%      ~     (all equal)
      IndexByte/4M-32                      528µs ± 1%     506µs ± 1%    -4.17%  (p=0.008 n=5+5)
      IndexByte/64M-32                    18.7ms ± 0%    18.7ms ± 1%      ~     (p=0.730 n=4+5)
      IndexBytePortable/10-32             33.8ns ± 0%    34.9ns ± 3%      ~     (p=0.167 n=5+5)
      IndexBytePortable/32-32             65.3ns ± 0%    66.1ns ± 2%      ~     (p=0.444 n=5+5)
      IndexBytePortable/4K-32             5.88µs ± 0%    5.88µs ± 0%      ~     (p=0.325 n=5+5)
      IndexBytePortable/4M-32             6.03ms ± 0%    6.03ms ± 0%      ~     (p=1.000 n=5+5)
      IndexBytePortable/64M-32            98.8ms ± 0%    98.9ms ± 0%    +0.10%  (p=0.008 n=5+5)
      IndexRune/10-32                     57.7ns ± 0%    49.2ns ± 0%   -14.73%  (p=0.000 n=5+4)
      IndexRune/32-32                     57.7ns ± 0%    58.6ns ± 0%    +1.56%  (p=0.008 n=5+5)
      IndexRune/4K-32                      511ns ± 0%     513ns ± 0%    +0.39%  (p=0.008 n=5+5)
      IndexRune/4M-32                      527µs ± 1%     527µs ± 1%      ~     (p=0.690 n=5+5)
      IndexRune/64M-32                    18.7ms ± 0%    18.7ms ± 1%      ~     (p=0.190 n=4+5)
      IndexRuneASCII/10-32                23.8ns ± 0%    23.8ns ± 0%      ~     (all equal)
      IndexRuneASCII/32-32                24.3ns ± 0%    24.3ns ± 0%      ~     (all equal)
      IndexRuneASCII/4K-32                 468ns ± 0%     468ns ± 0%      ~     (all equal)
      IndexRuneASCII/4M-32                 521µs ± 1%     531µs ± 2%    +1.91%  (p=0.016 n=5+5)
      IndexRuneASCII/64M-32               18.6ms ± 1%    18.5ms ± 0%      ~     (p=0.730 n=5+4)
      Index/10-32                         89.1ns ±13%    25.2ns ± 0%   -71.72%  (p=0.008 n=5+5)
      Index/32-32                          225ns ± 2%     226ns ± 3%      ~     (p=0.683 n=5+5)
      Index/4K-32                         11.9µs ± 0%    11.8µs ± 0%    -0.22%  (p=0.008 n=5+5)
      Index/4M-32                         12.1ms ± 0%    12.1ms ± 0%      ~     (p=0.548 n=5+5)
      Index/64M-32                         197ms ± 0%     197ms ± 0%      ~     (p=0.690 n=5+5)
      IndexEasy/10-32                     46.2ns ± 0%    22.1ns ± 8%   -52.16%  (p=0.008 n=5+5)
      IndexEasy/32-32                     46.2ns ± 0%    47.2ns ± 0%    +2.16%  (p=0.008 n=5+5)
      IndexEasy/4K-32                      499ns ± 0%     502ns ± 0%    +0.44%  (p=0.008 n=5+5)
      IndexEasy/4M-32                      529µs ± 2%     529µs ± 1%      ~     (p=0.841 n=5+5)
      IndexEasy/64M-32                    18.6ms ± 1%    18.7ms ± 1%      ~     (p=0.222 n=5+5)
      IndexAnyASCII/1:1-32                15.7ns ± 0%    15.7ns ± 0%      ~     (all equal)
      IndexAnyASCII/1:2-32                17.2ns ± 0%    17.2ns ± 0%      ~     (all equal)
      IndexAnyASCII/1:4-32                20.0ns ± 0%    20.0ns ± 0%      ~     (all equal)
      IndexAnyASCII/1:8-32                34.8ns ± 0%    34.8ns ± 0%      ~     (all equal)
      IndexAnyASCII/1:16-32               48.1ns ± 0%    48.1ns ± 0%      ~     (all equal)
      IndexAnyASCII/16:1-32               97.9ns ± 1%    97.7ns ± 0%      ~     (p=0.857 n=5+5)
      IndexAnyASCII/16:2-32                102ns ± 0%     102ns ± 0%      ~     (all equal)
      IndexAnyASCII/16:4-32                116ns ± 1%     116ns ± 1%      ~     (p=1.000 n=5+5)
      IndexAnyASCII/16:8-32                141ns ± 1%     141ns ± 0%      ~     (p=0.571 n=5+4)
      IndexAnyASCII/16:16-32               178ns ± 0%     178ns ± 0%      ~     (all equal)
      IndexAnyASCII/256:1-32              1.09µs ± 0%    1.09µs ± 0%      ~     (all equal)
      IndexAnyASCII/256:2-32              1.09µs ± 0%    1.10µs ± 0%    +0.27%  (p=0.008 n=5+5)
      IndexAnyASCII/256:4-32              1.11µs ± 0%    1.11µs ± 0%      ~     (p=0.397 n=5+5)
      IndexAnyASCII/256:8-32              1.10µs ± 0%    1.10µs ± 0%      ~     (p=0.444 n=5+5)
      IndexAnyASCII/256:16-32             1.14µs ± 0%    1.14µs ± 0%      ~     (all equal)
      IndexAnyASCII/4096:1-32             16.5µs ± 0%    16.5µs ± 0%      ~     (p=1.000 n=5+5)
      IndexAnyASCII/4096:2-32             17.0µs ± 0%    17.0µs ± 0%      ~     (p=0.159 n=5+4)
      IndexAnyASCII/4096:4-32             17.1µs ± 0%    17.1µs ± 0%      ~     (p=0.921 n=4+5)
      IndexAnyASCII/4096:8-32             16.5µs ± 0%    16.5µs ± 0%      ~     (p=0.460 n=5+5)
      IndexAnyASCII/4096:16-32            16.5µs ± 0%    16.5µs ± 0%      ~     (p=0.794 n=5+4)
      IndexPeriodic/IndexPeriodic2-32      189µs ± 0%     189µs ± 0%      ~     (p=0.841 n=5+5)
      IndexPeriodic/IndexPeriodic4-32      189µs ± 0%     189µs ± 0%    -0.03%  (p=0.016 n=5+4)
      IndexPeriodic/IndexPeriodic8-32      189µs ± 0%     189µs ± 0%      ~     (p=0.651 n=5+5)
      IndexPeriodic/IndexPeriodic16-32     175µs ± 9%     174µs ± 7%      ~     (p=1.000 n=5+5)
      IndexPeriodic/IndexPeriodic32-32    75.1µs ± 0%    75.1µs ± 0%      ~     (p=0.690 n=5+5)
      IndexPeriodic/IndexPeriodic64-32    42.6µs ± 0%    44.7µs ± 0%    +4.98%  (p=0.008 n=5+5)
      
      name                              old speed      new speed      delta
      IndexByte/10-32                    538MB/s ± 0%   552MB/s ± 0%    +2.65%  (p=0.008 n=5+5)
      IndexByte/32-32                   1.90GB/s ± 1%  1.90GB/s ± 1%      ~     (p=0.548 n=5+5)
      IndexByte/4K-32                   8.82GB/s ± 0%  8.81GB/s ± 0%      ~     (p=0.548 n=5+5)
      IndexByte/4M-32                   7.95GB/s ± 1%  8.29GB/s ± 1%    +4.35%  (p=0.008 n=5+5)
      IndexByte/64M-32                  3.58GB/s ± 0%  3.60GB/s ± 1%      ~     (p=0.730 n=4+5)
      IndexBytePortable/10-32            296MB/s ± 0%   286MB/s ± 3%      ~     (p=0.381 n=4+5)
      IndexBytePortable/32-32            490MB/s ± 0%   485MB/s ± 2%      ~     (p=0.286 n=5+5)
      IndexBytePortable/4K-32            697MB/s ± 0%   697MB/s ± 0%      ~     (p=0.413 n=5+5)
      IndexBytePortable/4M-32            696MB/s ± 0%   695MB/s ± 0%      ~     (p=0.897 n=5+5)
      IndexBytePortable/64M-32           679MB/s ± 0%   678MB/s ± 0%    -0.10%  (p=0.008 n=5+5)
      IndexRune/10-32                    173MB/s ± 0%   203MB/s ± 0%   +17.24%  (p=0.016 n=5+4)
      IndexRune/32-32                    555MB/s ± 0%   546MB/s ± 0%    -1.62%  (p=0.008 n=5+5)
      IndexRune/4K-32                   8.01GB/s ± 0%  7.98GB/s ± 0%    -0.38%  (p=0.008 n=5+5)
      IndexRune/4M-32                   7.97GB/s ± 1%  7.95GB/s ± 1%      ~     (p=0.690 n=5+5)
      IndexRune/64M-32                  3.59GB/s ± 0%  3.58GB/s ± 1%      ~     (p=0.190 n=4+5)
      IndexRuneASCII/10-32               420MB/s ± 0%   420MB/s ± 0%      ~     (p=0.190 n=5+4)
      IndexRuneASCII/32-32              1.32GB/s ± 0%  1.32GB/s ± 0%      ~     (p=0.333 n=5+5)
      IndexRuneASCII/4K-32              8.75GB/s ± 0%  8.75GB/s ± 0%      ~     (p=0.690 n=5+5)
      IndexRuneASCII/4M-32              8.04GB/s ± 1%  7.89GB/s ± 2%    -1.87%  (p=0.016 n=5+5)
      IndexRuneASCII/64M-32             3.61GB/s ± 1%  3.62GB/s ± 0%      ~     (p=0.730 n=5+4)
      Index/10-32                        113MB/s ±14%   397MB/s ± 0%  +249.76%  (p=0.008 n=5+5)
      Index/32-32                        142MB/s ± 2%   141MB/s ± 3%      ~     (p=0.794 n=5+5)
      Index/4K-32                        345MB/s ± 0%   346MB/s ± 0%    +0.22%  (p=0.008 n=5+5)
      Index/4M-32                        345MB/s ± 0%   345MB/s ± 0%      ~     (p=0.619 n=5+5)
      Index/64M-32                       341MB/s ± 0%   341MB/s ± 0%      ~     (p=0.595 n=5+5)
      IndexEasy/10-32                    216MB/s ± 0%   453MB/s ± 8%  +109.60%  (p=0.008 n=5+5)
      IndexEasy/32-32                    692MB/s ± 0%   678MB/s ± 0%    -2.01%  (p=0.008 n=5+5)
      IndexEasy/4K-32                   8.19GB/s ± 0%  8.16GB/s ± 0%    -0.45%  (p=0.008 n=5+5)
      IndexEasy/4M-32                   7.93GB/s ± 2%  7.93GB/s ± 1%      ~     (p=0.841 n=5+5)
      IndexEasy/64M-32                  3.60GB/s ± 1%  3.59GB/s ± 1%      ~     (p=0.222 n=5+5)
      
      Change-Id: I4ca69378a2df6f9ba748c6a2706953ee1bd07343
      Reviewed-on: https://go-review.googlesource.com/96555
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      562346b7
    • Marcel van Lohuizen's avatar
      testing: gracefully handle subtest failing parent’s T · 4c1aff87
      Marcel van Lohuizen authored
      Don’t panic if a subtest inadvertently calls FailNow
      on a parent’s T.  Instead, report the offending subtest
      while still reporting the error with the ancestor test and
      keep exiting goroutines.
      
      Note that this implementation has a race if parallel
      subtests are failing the parent concurrently.
      This is fine:
      Calling FailNow on a parent is considered an error
      in principle, at the moment, and is reported if it is
      detected. Having the race allows the race detector
      to detect the error as well.
      
      Fixes #22882
      
      Change-Id: Ifa6d5e55bb88f6bcbb562fc8c99f1f77e320015a
      Reviewed-on: https://go-review.googlesource.com/97635
      Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
      Reviewed-by: 's avatarKunpei Sakai <namusyaka@gmail.com>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      4c1aff87
    • Giovanni Bajo's avatar
      test: add support for code generation tests (asmcheck) · c9438cb1
      Giovanni Bajo authored
      The top-level test harness is modified to support a new kind
      of test: "asmcheck". This is meant to replace asm_test.go
      as an easier and more readable way to test code generation.
      
      I've added a couple of codegen tests to get initial feedback
      on the syntax. I've created them under a common "codegen"
      subdirectory, so that it's easier to run them all with
      "go run run.go -v codegen".
      
      The asmcheck syntax allows to insert line comments that
      can specify a regular expression to match in the assembly code,
      for multiple architectures (the testsuite will automatically
      build each testfile multiple times, one per mentioned architecture).
      
      Negative matches are unsupported for now, so this cannot fully
      replace asm_test yet.
      
      Change-Id: Ifdbba389f01d55e63e73c99e5f5449e642101d55
      Reviewed-on: https://go-review.googlesource.com/97355
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      Reviewed-by: 's avatarAlberto Donizetti <alb.donizetti@gmail.com>
      c9438cb1
    • Tobias Klauser's avatar
      runtime: clean up libc_* definitions on Solaris · c7c01efd
      Tobias Klauser authored
      All functions defined in syscall2_solaris.go have the respective libc_*
      var in syscall_solaris.go, except for libc_close. Move it from
      os3_solaris.go
      
      Remove unused libc_fstat.
      
      Order go:cgo_import_dynamic and go:linkname lists in
      syscall2_solaris.go alphabetically.
      
      Change-Id: I9f12fa473cf1ae351448ac45597c82a67d799c31
      Reviewed-on: https://go-review.googlesource.com/97736Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      c7c01efd