1. 27 Feb, 2018 8 commits
    • Josh Bleecher Snyder's avatar
      runtime: improve 386/amd64 systemstack · c5d6c42d
      Josh Bleecher Snyder authored
      Minor improvements, noticed while investigating other things.
      
      Shorten the prologue.
      
      Make branch direction better for static branch prediction;
      the most common case by far is switching stacks (g==curg).
      
      Change-Id: Ib2211d3efecb60446355cda56194221ccb78057d
      Reviewed-on: https://go-review.googlesource.com/97377
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      c5d6c42d
    • Joe Tsai's avatar
      go/doc: replace unexported values with underscore if necessary · f399af31
      Joe Tsai authored
      When a var or const declaration contains a mixture of exported and unexported
      identifiers, replace the unexported identifiers with underscore.
      Otherwise, the LHS and the RHS may mismatch or the declaration may mismatch
      with an iota from above.
      
      Fixes #22426
      
      Change-Id: Icd5fb81b4ece647232a9f7d05cb140227091e9cb
      Reviewed-on: https://go-review.googlesource.com/94877
      Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      f399af31
    • erifan01's avatar
      math: optimize sinh and cosh · ed6c6c9c
      erifan01 authored
      Improve performance by reducing unnecessary function calls
      
      Benchmarks:
      
      Tme    old time/op  new time/op  delta
      Cosh-8   229ns ± 0%   138ns ± 0%  -39.74%  (p=0.008 n=5+5)
      Sinh-8   231ns ± 0%   139ns ± 0%  -39.83%  (p=0.008 n=5+5)
      
      Change-Id: Icab5485849bbfaafca8429d06b67c558101f4f3c
      Reviewed-on: https://go-review.googlesource.com/85477Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      ed6c6c9c
    • Josh Bleecher Snyder's avatar
      runtime: short-circuit typedmemmove when dst==src · 486caa26
      Josh Bleecher Snyder authored
      Change-Id: I855268a4c0d07ad602ec90f5da66422d3d87c5f2
      Reviewed-on: https://go-review.googlesource.com/94595
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      486caa26
    • Giovanni Bajo's avatar
      cmd/compile: fix bit-test rules for highest bit · 68def820
      Giovanni Bajo authored
      Bit-test rules failed to match when matching the highest bit
      of a word because operands in SSA are signed int64. Fix
      them by treating them as unsigned (and correctly handling
      32-bit operands as well).
      
      Tests will be added in next CL.
      
      Change-Id: I491c4e88e7e2f87e9bb72bd0d9fa5d4025b90736
      Reviewed-on: https://go-review.googlesource.com/94765Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      68def820
    • Giovanni Bajo's avatar
      cmd/compile: fold bit masking on bits that have been shifted away · 098208a0
      Giovanni Bajo authored
      Spotted while working on #18943, it triggers once during bootstrap.
      
      Change-Id: Ia4330ccc6395627c233a8eb4dcc0e3e2a770bea7
      Reviewed-on: https://go-review.googlesource.com/94764Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      098208a0
    • Chad Rosier's avatar
      cmd/compile/internal/ssa: combine zero stores into larger stores on arm64 · ecd9e8a2
      Chad Rosier authored
      This reduces the go tool binary on arm64 by 12k.
      
      go1 results on Amberwing:
      name                   old time/op    new time/op    delta
      RegexpMatchEasy0_32       249ns ± 0%     249ns ± 0%    ~     (p=0.087 n=10+10)
      RegexpMatchEasy0_1K       584ns ± 0%     584ns ± 0%    ~     (all equal)
      RegexpMatchEasy1_32       246ns ± 0%     246ns ± 0%    ~     (p=1.000 n=10+10)
      RegexpMatchEasy1_1K       806ns ± 0%     806ns ± 0%    ~     (p=0.706 n=10+9)
      RegexpMatchMedium_32      314ns ± 0%     314ns ± 0%    ~     (all equal)
      RegexpMatchMedium_1K     52.1µs ± 0%    52.1µs ± 0%    ~     (p=0.245 n=10+8)
      RegexpMatchHard_32       2.75µs ± 1%    2.75µs ± 1%    ~     (p=0.690 n=10+10)
      RegexpMatchHard_1K       78.9µs ± 0%    78.9µs ± 1%    ~     (p=0.295 n=9+9)
      FmtFprintfEmpty          58.5ns ± 0%    58.5ns ± 0%    ~     (all equal)
      FmtFprintfString          112ns ± 0%     112ns ± 0%    ~     (all equal)
      FmtFprintfInt             117ns ± 0%     116ns ± 0%  -0.85%  (p=0.000 n=10+10)
      FmtFprintfIntInt          181ns ± 0%     181ns ± 0%    ~     (all equal)
      FmtFprintfPrefixedInt     222ns ± 0%     224ns ± 0%  +0.90%  (p=0.000 n=9+10)
      FmtFprintfFloat           318ns ± 1%     322ns ± 0%    ~     (p=0.059 n=10+8)
      FmtManyArgs               736ns ± 1%     735ns ± 0%    ~     (p=0.206 n=9+9)
      Gzip                      437ms ± 0%     436ms ± 0%  -0.25%  (p=0.000 n=10+10)
      HTTPClientServer         89.8µs ± 1%    90.2µs ± 2%    ~     (p=0.393 n=10+10)
      JSONEncode               20.1ms ± 1%    20.2ms ± 1%    ~     (p=0.065 n=9+10)
      JSONDecode               94.2ms ± 1%    93.9ms ± 1%  -0.42%  (p=0.043 n=10+10)
      GobDecode                12.7ms ± 1%    12.8ms ± 2%  +0.94%  (p=0.019 n=10+10)
      GobEncode                12.1ms ± 0%    12.1ms ± 0%    ~     (p=0.052 n=10+10)
      Mandelbrot200            5.06ms ± 0%    5.05ms ± 0%  -0.04%  (p=0.000 n=9+10)
      TimeParse                 450ns ± 3%     446ns ± 0%    ~     (p=0.238 n=10+9)
      TimeFormat                485ns ± 1%     483ns ± 1%    ~     (p=0.073 n=10+10)
      Template                 90.4ms ± 0%    90.7ms ± 0%  +0.29%  (p=0.000 n=8+10)
      GoParse                  6.01ms ± 0%    6.03ms ± 0%  +0.35%  (p=0.000 n=10+10)
      BinaryTree17              11.7s ± 0%     11.7s ± 0%    ~     (p=0.481 n=10+10)
      Revcomp                   669ms ± 0%     669ms ± 0%    ~     (p=0.315 n=10+10)
      Fannkuch11                3.40s ± 0%     3.37s ± 0%  -0.92%  (p=0.000 n=10+10)
      [Geo mean]               67.9µs         67.9µs       +0.02%
      
      name                   old speed      new speed      delta
      RegexpMatchEasy0_32     128MB/s ± 0%   128MB/s ± 0%  -0.08%  (p=0.003 n=8+10)
      RegexpMatchEasy0_1K    1.75GB/s ± 0%  1.75GB/s ± 0%    ~     (p=0.642 n=8+10)
      RegexpMatchEasy1_32     130MB/s ± 0%   130MB/s ± 0%    ~     (p=0.690 n=10+9)
      RegexpMatchEasy1_1K    1.27GB/s ± 0%  1.27GB/s ± 0%    ~     (p=0.661 n=10+9)
      RegexpMatchMedium_32   3.18MB/s ± 0%  3.18MB/s ± 0%    ~     (all equal)
      RegexpMatchMedium_1K   19.7MB/s ± 0%  19.6MB/s ± 0%    ~     (p=0.190 n=10+9)
      RegexpMatchHard_32     11.6MB/s ± 0%  11.6MB/s ± 1%    ~     (p=0.669 n=10+10)
      RegexpMatchHard_1K     13.0MB/s ± 0%  13.0MB/s ± 0%    ~     (p=0.718 n=9+9)
      Gzip                   44.4MB/s ± 0%  44.5MB/s ± 0%  +0.24%  (p=0.000 n=10+10)
      JSONEncode             96.5MB/s ± 1%  96.1MB/s ± 1%    ~     (p=0.065 n=9+10)
      JSONDecode             20.6MB/s ± 1%  20.7MB/s ± 1%  +0.42%  (p=0.041 n=10+10)
      GobDecode              60.6MB/s ± 1%  60.0MB/s ± 2%  -0.92%  (p=0.016 n=10+10)
      GobEncode              63.4MB/s ± 0%  63.6MB/s ± 0%    ~     (p=0.055 n=10+10)
      Template               21.5MB/s ± 0%  21.4MB/s ± 0%  -0.30%  (p=0.000 n=9+10)
      GoParse                9.64MB/s ± 0%  9.61MB/s ± 0%  -0.36%  (p=0.000 n=10+10)
      Revcomp                 380MB/s ± 0%   380MB/s ± 0%    ~     (p=0.323 n=10+10)
      [Geo mean]             56.0MB/s       55.9MB/s       -0.07%
      
      Change-Id: Ia732fa57fbcf4767d72382516d9f16705d177736
      Reviewed-on: https://go-review.googlesource.com/96435
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      ecd9e8a2
    • Josh Bleecher Snyder's avatar
      cmd/compile: tighten after lowering · 3a9e4440
      Josh Bleecher Snyder authored
      Moving tighten after lowering benefits from the removal of values by
      lowering and lowered CSE. It lets us make better decisions about
      which values are rematerializable and which generate flags.
      Empirically, it lowers stack usage (by avoiding spills)
      and generates slightly smaller and faster binaries.
      
      
      Fixes #19853
      Fixes #21041
      
      name        old time/op       new time/op       delta
      Template          195ms ± 4%        193ms ± 4%  -1.33%  (p=0.000 n=92+97)
      Unicode          94.1ms ± 9%       92.5ms ± 8%  -1.66%  (p=0.002 n=97+95)
      GoTypes           572ms ± 5%        566ms ± 7%  -0.92%  (p=0.001 n=95+98)
      Compiler          2.56s ± 4%        2.52s ± 3%  -1.41%  (p=0.000 n=94+97)
      SSA               6.52s ± 2%        6.47s ± 3%  -0.82%  (p=0.000 n=96+94)
      Flate             117ms ± 5%        116ms ± 7%  -0.72%  (p=0.018 n=97+97)
      GoParser          148ms ± 6%        146ms ± 4%  -0.97%  (p=0.002 n=98+95)
      Reflect           370ms ± 7%        363ms ± 6%  -1.79%  (p=0.000 n=99+98)
      Tar               175ms ± 6%        173ms ± 6%  -1.11%  (p=0.001 n=94+95)
      XML               204ms ± 6%        201ms ± 5%  -1.49%  (p=0.000 n=97+96)
      [Geo mean]        363ms             359ms       -1.22%
      
      name        old user-time/op  new user-time/op  delta
      Template          251ms ± 5%        245ms ± 5%  -2.40%  (p=0.000 n=97+93)
      Unicode           131ms ±10%        128ms ± 9%  -1.93%  (p=0.001 n=100+99)
      GoTypes           760ms ± 4%        752ms ± 4%  -0.96%  (p=0.000 n=97+95)
      Compiler          3.51s ± 3%        3.48s ± 2%  -1.04%  (p=0.000 n=96+95)
      SSA               9.57s ± 4%        9.52s ± 2%  -0.50%  (p=0.004 n=97+96)
      Flate             149ms ± 6%        147ms ± 6%  -1.46%  (p=0.000 n=98+96)
      GoParser          184ms ± 5%        181ms ± 7%  -1.84%  (p=0.000 n=98+97)
      Reflect           469ms ± 6%        461ms ± 6%  -1.69%  (p=0.000 n=100+98)
      Tar               219ms ± 8%        217ms ± 7%  -0.90%  (p=0.035 n=96+96)
      XML               255ms ± 5%        251ms ± 6%  -1.48%  (p=0.000 n=98+98)
      [Geo mean]        476ms             469ms       -1.42%
      
      name        old alloc/op      new alloc/op      delta
      Template         37.8MB ± 0%       37.8MB ± 0%  -0.17%  (p=0.000 n=100+100)
      Unicode          28.8MB ± 0%       28.8MB ± 0%  -0.02%  (p=0.000 n=100+95)
      GoTypes           112MB ± 0%        112MB ± 0%  -0.20%  (p=0.000 n=100+97)
      Compiler          466MB ± 0%        464MB ± 0%  -0.27%  (p=0.000 n=100+100)
      SSA              1.49GB ± 0%       1.49GB ± 0%  -0.08%  (p=0.000 n=100+99)
      Flate            24.4MB ± 0%       24.3MB ± 0%  -0.25%  (p=0.000 n=98+99)
      GoParser         30.7MB ± 0%       30.6MB ± 0%  -0.26%  (p=0.000 n=99+100)
      Reflect          76.4MB ± 0%       76.4MB ± 0%    ~     (p=0.253 n=100+100)
      Tar              38.9MB ± 0%       38.8MB ± 0%  -0.20%  (p=0.000 n=100+97)
      XML              41.5MB ± 0%       41.4MB ± 0%  -0.19%  (p=0.000 n=100+98)
      [Geo mean]       77.5MB            77.4MB       -0.16%
      
      name        old allocs/op     new allocs/op     delta
      Template           381k ± 0%         381k ± 0%  -0.15%  (p=0.000 n=100+100)
      Unicode            342k ± 0%         342k ± 0%  -0.01%  (p=0.000 n=100+98)
      GoTypes           1.19M ± 0%        1.18M ± 0%  -0.24%  (p=0.000 n=100+100)
      Compiler          4.52M ± 0%        4.50M ± 0%  -0.29%  (p=0.000 n=100+100)
      SSA               12.3M ± 0%        12.3M ± 0%  -0.11%  (p=0.000 n=100+100)
      Flate              234k ± 0%         234k ± 0%  -0.26%  (p=0.000 n=99+96)
      GoParser           318k ± 0%         317k ± 0%  -0.21%  (p=0.000 n=99+100)
      Reflect            974k ± 0%         974k ± 0%  -0.03%  (p=0.000 n=100+100)
      Tar                392k ± 0%         391k ± 0%  -0.17%  (p=0.000 n=100+99)
      XML                404k ± 0%         403k ± 0%  -0.24%  (p=0.000 n=99+99)
      [Geo mean]         794k              792k       -0.17%
      
      name        old object-bytes  new object-bytes  delta
      Template          393kB ± 0%        392kB ± 0%  -0.19%  (p=0.008 n=5+5)
      Unicode           207kB ± 0%        207kB ± 0%    ~     (all equal)
      GoTypes          1.23MB ± 0%       1.22MB ± 0%  -0.11%  (p=0.008 n=5+5)
      Compiler         4.34MB ± 0%       4.33MB ± 0%  -0.15%  (p=0.008 n=5+5)
      SSA              9.85MB ± 0%       9.85MB ± 0%  -0.07%  (p=0.008 n=5+5)
      Flate             235kB ± 0%        234kB ± 0%  -0.59%  (p=0.008 n=5+5)
      GoParser          297kB ± 0%        296kB ± 0%  -0.22%  (p=0.008 n=5+5)
      Reflect          1.03MB ± 0%       1.03MB ± 0%  -0.00%  (p=0.008 n=5+5)
      Tar               332kB ± 0%        331kB ± 0%  -0.15%  (p=0.008 n=5+5)
      XML               413kB ± 0%        412kB ± 0%  -0.19%  (p=0.008 n=5+5)
      [Geo mean]        728kB             727kB       -0.17%
      
      Change-Id: I9b5cdb668ed102a001897a05e833105acba220a2
      Reviewed-on: https://go-review.googlesource.com/95995
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      3a9e4440
  2. 26 Feb, 2018 22 commits
  3. 25 Feb, 2018 1 commit
  4. 24 Feb, 2018 3 commits
    • Lubomir I. Ivanov (VMware)'s avatar
      os/user: obtain a user home path on Windows · 7a218942
      Lubomir I. Ivanov (VMware) authored
      newUserFromSid() is extended so that the retriaval of the user home
      path based on a user SID becomes possible.
      
      (1) The primary method it uses is to lookup the Windows registry for
      the following key:
        HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList\[SID]
      
      If the key does not exist the user might not have logged in yet.
      If (1) fails it falls back to (2)
      
      (2) The second method the function uses is to look at the default home
      path for users (e.g. WINAPI's GetProfilesDirectory()) and append
      the username to that. The procedure is in the lines of:
        c:\Users + \ + <username>
      
      The function newUser() now requires the following arguments:
        uid, gid, dir, username, domain
      This is done to avoid multiple calls to usid.String() and
      usid.LookupAccount("") in the case of a newUserFromSid()
      call stack.
      
      The functions current() and newUserFromSid() both call newUser()
      supplying the arguments in question. The helpers
      lookupUsernameAndDomain() and findHomeDirInRegistry() are
      added.
      
      This commit also updates:
      - go/build/deps_test.go, so that the test now includes the
      "internal/syscall/windows/registry" import.
      - os/user/user_test.go, so that User.HomeDir is tested on Windows.
      
      GitHub-Last-Rev: 25423e2a3820121f4c42321e7a77a3977f409724
      GitHub-Pull-Request: golang/go#23822
      Change-Id: I6c3ad1c4ce3e7bc0d1add024951711f615b84ee5
      Reviewed-on: https://go-review.googlesource.com/93935Reviewed-by: 's avatarAlex Brainman <alex.brainman@gmail.com>
      Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      7a218942
    • Daniel Martí's avatar
      cmd/compile/internal/syntax: use stringer for operators and tokens · c8791538
      Daniel Martí authored
      With its new -linecomment flag, it is now possible to use stringer on
      values whose strings aren't valid identifiers. This is the case with
      tokens and operators in Go.
      
      Operator alredy had inline comments with each operator's string
      representation; only minor modifications were needed. The inline
      comments were added to each of the token names, using the same strategy.
      
      Comments that were previously inline or part of the string arrays were
      moved to the line immediately before the name they correspond to.
      
      Finally, declare tokStrFast as a function that uses the generated arrays
      directly. Avoiding the branch and strconv call means that we avoid a
      performance regression in the scanner, perhaps due to the lack of
      mid-stack inlining.
      
      Performance is not affected. Measured with 'go test -run StdLib -fast'
      on an X1 Carbon Gen2 (i5-4300U @ 1.90GHz, 8GB RAM, SSD), the best of 5
      runs before and after the changes are:
      
      	parsed 1709399 lines (3763 files) in 1.707402159s (1001169 lines/s)
      	allocated 449.282Mb (263.137Mb/s)
      
      	parsed 1709329 lines (3765 files) in 1.706663154s (1001562 lines/s)
      	allocated 449.290Mb (263.256Mb/s)
      
      Change-Id: Idcc4f83393fcadd6579700e3602c09496ea2625b
      Reviewed-on: https://go-review.googlesource.com/95357Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      c8791538
    • Ilya Tocar's avatar
      math/big: speed-up addMulVVW on amd64 · c3935c08
      Ilya Tocar authored
      Use MULX/ADOX/ADCX instructions to speed-up addMulVVW,
      when they are available. addMulVVW is a hotspot in rsa.
      This is faster than ADD/ADC/IMUL version, because ADOX/ADCX only
      modify carry/overflow flag, so they can be interleaved with each other
      and with MULX, which doesn't modify flags at all.
      Increasing unroll factor to e. g. 16 makes rsa 1% faster, but 3PrimeRSA2048Decrypt
      performance falls back to baseline.
      
      Updates #20058
      
      AddMulVVW/1-8                       3.28ns ± 2%     3.26ns ± 3%     ~     (p=0.107 n=10+10)
      AddMulVVW/2-8                       4.26ns ± 2%     4.24ns ± 3%     ~     (p=0.327 n=9+9)
      AddMulVVW/3-8                       5.07ns ± 2%     5.26ns ± 2%   +3.73%  (p=0.000 n=10+10)
      AddMulVVW/4-8                       6.40ns ± 2%     6.50ns ± 2%   +1.61%  (p=0.000 n=10+10)
      AddMulVVW/5-8                       6.77ns ± 2%     6.86ns ± 1%   +1.38%  (p=0.001 n=9+9)
      AddMulVVW/10-8                      12.2ns ± 2%     10.6ns ± 3%  -13.65%  (p=0.000 n=10+10)
      AddMulVVW/100-8                     79.7ns ± 2%     52.4ns ± 1%  -34.17%  (p=0.000 n=10+10)
      AddMulVVW/1000-8                     695ns ± 1%      491ns ± 2%  -29.39%  (p=0.000 n=9+10)
      AddMulVVW/10000-8                   7.26µs ± 2%     5.92µs ± 6%  -18.42%  (p=0.000 n=10+10)
      AddMulVVW/100000-8                  72.6µs ± 2%     62.2µs ± 2%  -14.31%  (p=0.000 n=10+10)
      
      crypto/rsa speed-up is smaller, but stil noticeable:
      
      RSA2048Decrypt-8        1.61ms ± 1%  1.38ms ± 1%  -14.13%  (p=0.000 n=10+10)
      RSA2048Sign-8           1.93ms ± 1%  1.70ms ± 1%  -11.86%  (p=0.000 n=10+10)
      3PrimeRSA2048Decrypt-8   932µs ± 0%   828µs ± 0%  -11.15%  (p=0.000 n=10+10)
      
      Results on crypto/tls:
      
      HandshakeServer/RSA-8                        901µs ± 1%    777µs ± 0%  -13.70%  (p=0.000 n=10+8)
      HandshakeServer/ECDHE-P256-RSA-8            1.01ms ± 1%   0.90ms ± 0%  -11.53%  (p=0.000 n=10+9)
      
      Full math/big benchmarks:
      
      name                              old time/op    new time/op     delta
      AddVV/1-8                           3.74ns ± 6%     3.55ns ± 2%     ~     (p=0.082 n=10+8)
      AddVV/2-8                           3.96ns ± 2%     3.98ns ± 5%     ~     (p=0.794 n=10+9)
      AddVV/3-8                           4.97ns ± 2%     4.94ns ± 1%     ~     (p=0.081 n=10+9)
      AddVV/4-8                           5.59ns ± 2%     5.59ns ± 2%     ~     (p=0.809 n=10+10)
      AddVV/5-8                           6.63ns ± 1%     6.62ns ± 1%     ~     (p=0.560 n=9+10)
      AddVV/10-8                          8.11ns ± 1%     8.11ns ± 2%     ~     (p=0.402 n=10+10)
      AddVV/100-8                         46.9ns ± 2%     46.8ns ± 1%     ~     (p=0.809 n=10+10)
      AddVV/1000-8                         389ns ± 1%      391ns ± 4%     ~     (p=0.809 n=10+10)
      AddVV/10000-8                       5.05µs ± 5%     4.98µs ± 2%     ~     (p=0.113 n=9+10)
      AddVV/100000-8                      55.3µs ± 3%     55.2µs ± 3%     ~     (p=0.796 n=10+10)
      AddVW/1-8                           3.04ns ± 3%     3.02ns ± 3%     ~     (p=0.538 n=10+10)
      AddVW/2-8                           3.57ns ± 2%     3.61ns ± 2%   +1.12%  (p=0.032 n=9+9)
      AddVW/3-8                           3.77ns ± 1%     3.79ns ± 2%     ~     (p=0.719 n=10+10)
      AddVW/4-8                           4.69ns ± 1%     4.69ns ± 2%     ~     (p=0.920 n=10+9)
      AddVW/5-8                           4.58ns ± 1%     4.58ns ± 1%     ~     (p=0.812 n=10+10)
      AddVW/10-8                          7.62ns ± 2%     7.63ns ± 1%     ~     (p=0.926 n=10+10)
      AddVW/100-8                         41.1ns ± 2%     42.4ns ± 3%   +3.34%  (p=0.000 n=10+10)
      AddVW/1000-8                         386ns ± 2%      389ns ± 4%     ~     (p=0.514 n=10+10)
      AddVW/10000-8                       3.88µs ± 3%     3.87µs ± 3%     ~     (p=0.448 n=10+10)
      AddVW/100000-8                      41.2µs ± 3%     41.7µs ± 3%     ~     (p=0.148 n=10+10)
      AddMulVVW/1-8                       3.28ns ± 2%     3.26ns ± 3%     ~     (p=0.107 n=10+10)
      AddMulVVW/2-8                       4.26ns ± 2%     4.24ns ± 3%     ~     (p=0.327 n=9+9)
      AddMulVVW/3-8                       5.07ns ± 2%     5.26ns ± 2%   +3.73%  (p=0.000 n=10+10)
      AddMulVVW/4-8                       6.40ns ± 2%     6.50ns ± 2%   +1.61%  (p=0.000 n=10+10)
      AddMulVVW/5-8                       6.77ns ± 2%     6.86ns ± 1%   +1.38%  (p=0.001 n=9+9)
      AddMulVVW/10-8                      12.2ns ± 2%     10.6ns ± 3%  -13.65%  (p=0.000 n=10+10)
      AddMulVVW/100-8                     79.7ns ± 2%     52.4ns ± 1%  -34.17%  (p=0.000 n=10+10)
      AddMulVVW/1000-8                     695ns ± 1%      491ns ± 2%  -29.39%  (p=0.000 n=9+10)
      AddMulVVW/10000-8                   7.26µs ± 2%     5.92µs ± 6%  -18.42%  (p=0.000 n=10+10)
      AddMulVVW/100000-8                  72.6µs ± 2%     62.2µs ± 2%  -14.31%  (p=0.000 n=10+10)
      DecimalConversion-8                  108µs ±19%      104µs ± 4%     ~     (p=0.460 n=10+8)
      FloatString/100-8                    926ns ±14%      908ns ± 5%     ~     (p=0.398 n=9+9)
      FloatString/1000-8                  25.7µs ± 1%     25.7µs ± 1%     ~     (p=0.739 n=10+10)
      FloatString/10000-8                 2.13ms ± 1%     2.12ms ± 1%     ~     (p=0.353 n=10+10)
      FloatString/100000-8                 207ms ± 1%      206ms ± 2%     ~     (p=0.912 n=10+10)
      FloatAdd/10-8                       61.3ns ± 3%     61.9ns ± 3%     ~     (p=0.183 n=10+10)
      FloatAdd/100-8                      62.0ns ± 2%     62.9ns ± 4%     ~     (p=0.118 n=10+10)
      FloatAdd/1000-8                     84.7ns ± 2%     84.4ns ± 1%     ~     (p=0.591 n=10+10)
      FloatAdd/10000-8                     305ns ± 2%      306ns ± 1%     ~     (p=0.443 n=10+10)
      FloatAdd/100000-8                   2.45µs ± 1%     2.46µs ± 1%     ~     (p=0.782 n=10+10)
      FloatSub/10-8                       56.8ns ± 4%     56.5ns ± 5%     ~     (p=0.423 n=10+10)
      FloatSub/100-8                      57.3ns ± 4%     57.1ns ± 5%     ~     (p=0.540 n=10+10)
      FloatSub/1000-8                     66.8ns ± 4%     66.6ns ± 1%     ~     (p=0.868 n=10+10)
      FloatSub/10000-8                     199ns ± 1%      198ns ± 1%     ~     (p=0.287 n=10+9)
      FloatSub/100000-8                   1.47µs ± 2%     1.47µs ± 2%     ~     (p=0.920 n=10+9)
      ParseFloatSmallExp-8                8.74µs ±10%     9.48µs ±10%   +8.51%  (p=0.010 n=9+10)
      ParseFloatLargeExp-8                39.2µs ±25%     39.6µs ±12%     ~     (p=0.529 n=10+10)
      GCD10x10/WithoutXY-8                 173ns ±23%      177ns ±20%     ~     (p=0.698 n=10+10)
      GCD10x10/WithXY-8                    736ns ±12%      728ns ±16%     ~     (p=0.838 n=10+10)
      GCD10x100/WithoutXY-8                325ns ±16%      326ns ±14%     ~     (p=0.912 n=10+10)
      GCD10x100/WithXY-8                  1.14µs ±13%     1.16µs ± 6%     ~     (p=0.287 n=10+9)
      GCD10x1000/WithoutXY-8               851ns ±25%      820ns ±12%     ~     (p=0.592 n=10+10)
      GCD10x1000/WithXY-8                 2.89µs ±17%     2.85µs ± 5%     ~     (p=1.000 n=10+9)
      GCD10x10000/WithoutXY-8             6.66µs ±12%     6.82µs ±19%     ~     (p=0.529 n=10+10)
      GCD10x10000/WithXY-8                18.0µs ± 5%     17.2µs ±19%     ~     (p=0.315 n=7+10)
      GCD10x100000/WithoutXY-8            77.8µs ±18%     73.3µs ±11%     ~     (p=0.315 n=10+9)
      GCD10x100000/WithXY-8                186µs ±14%      204µs ±29%     ~     (p=0.218 n=10+10)
      GCD100x100/WithoutXY-8              1.09µs ± 1%     1.09µs ± 2%     ~     (p=0.117 n=9+10)
      GCD100x100/WithXY-8                 7.93µs ± 1%     7.97µs ± 1%   +0.52%  (p=0.006 n=10+10)
      GCD100x1000/WithoutXY-8             2.00µs ± 3%     2.04µs ± 6%     ~     (p=0.053 n=9+10)
      GCD100x1000/WithXY-8                9.23µs ± 1%     9.29µs ± 1%   +0.63%  (p=0.009 n=10+10)
      GCD100x10000/WithoutXY-8            10.2µs ±11%      9.7µs ± 6%     ~     (p=0.278 n=10+9)
      GCD100x10000/WithXY-8               33.3µs ± 4%     33.6µs ± 4%     ~     (p=0.481 n=10+10)
      GCD100x100000/WithoutXY-8            106µs ±17%      105µs ±13%     ~     (p=0.853 n=10+10)
      GCD100x100000/WithXY-8               289µs ±17%      276µs ± 8%     ~     (p=0.353 n=10+10)
      GCD1000x1000/WithoutXY-8            12.2µs ± 1%     12.1µs ± 1%   -0.45%  (p=0.007 n=10+10)
      GCD1000x1000/WithXY-8                131µs ± 1%      132µs ± 0%   +0.93%  (p=0.000 n=9+7)
      GCD1000x10000/WithoutXY-8           20.6µs ± 2%     20.6µs ± 1%     ~     (p=0.326 n=10+9)
      GCD1000x10000/WithXY-8               238µs ± 1%      237µs ± 1%     ~     (p=0.356 n=9+10)
      GCD1000x100000/WithoutXY-8           117µs ± 8%      114µs ±11%     ~     (p=0.190 n=10+10)
      GCD1000x100000/WithXY-8             1.51ms ± 1%     1.50ms ± 1%     ~     (p=0.053 n=9+10)
      GCD10000x10000/WithoutXY-8           220µs ± 1%      218µs ± 1%   -0.86%  (p=0.000 n=10+10)
      GCD10000x10000/WithXY-8             3.04ms ± 0%     3.05ms ± 0%   +0.33%  (p=0.001 n=9+10)
      GCD10000x100000/WithoutXY-8          513µs ± 0%      511µs ± 0%   -0.38%  (p=0.000 n=10+10)
      GCD10000x100000/WithXY-8            15.1ms ± 0%     15.0ms ± 0%     ~     (p=0.053 n=10+9)
      GCD100000x100000/WithoutXY-8        10.4ms ± 1%     10.4ms ± 2%     ~     (p=0.258 n=9+9)
      GCD100000x100000/WithXY-8            205ms ± 1%      205ms ± 1%     ~     (p=0.481 n=10+10)
      Hilbert-8                           1.25ms ±15%     1.24ms ±17%     ~     (p=0.853 n=10+10)
      Binomial-8                          3.03µs ±24%     2.90µs ±16%     ~     (p=0.481 n=10+10)
      QuoRem-8                            1.95µs ± 1%     1.95µs ± 2%     ~     (p=0.117 n=9+10)
      Exp-8                               5.12ms ± 2%     3.99ms ± 1%  -22.02%  (p=0.000 n=10+9)
      Exp2-8                              5.14ms ± 2%     3.98ms ± 0%  -22.55%  (p=0.000 n=10+9)
      Bitset-8                            16.4ns ± 2%     16.5ns ± 2%     ~     (p=0.311 n=9+10)
      BitsetNeg-8                         46.3ns ± 4%     45.8ns ± 4%     ~     (p=0.272 n=10+10)
      BitsetOrig-8                         250ns ±19%      247ns ±14%     ~     (p=0.671 n=10+10)
      BitsetNegOrig-8                      416ns ±14%      429ns ±14%     ~     (p=0.353 n=10+10)
      ModSqrt225_Tonelli-8                 400µs ± 0%      320µs ± 0%  -19.88%  (p=0.000 n=9+7)
      ModSqrt224_3Mod4-8                   123µs ± 1%       97µs ± 0%  -21.21%  (p=0.000 n=9+10)
      ModSqrt5430_Tonelli-8                1.87s ± 0%      1.39s ± 1%  -25.70%  (p=0.000 n=9+10)
      ModSqrt5430_3Mod4-8                  630ms ± 2%      465ms ± 1%  -26.12%  (p=0.000 n=10+10)
      Sqrt-8                              25.8µs ± 1%     25.9µs ± 0%   +0.66%  (p=0.002 n=10+8)
      IntSqr/1-8                          11.3ns ± 1%     11.3ns ± 2%     ~     (p=0.360 n=9+10)
      IntSqr/2-8                          26.6ns ± 1%     27.4ns ± 2%   +2.87%  (p=0.000 n=8+9)
      IntSqr/3-8                          36.5ns ± 6%     36.6ns ± 5%     ~     (p=0.589 n=10+10)
      IntSqr/5-8                          57.2ns ± 2%     57.8ns ± 1%   +0.92%  (p=0.045 n=10+9)
      IntSqr/8-8                           112ns ± 1%       93ns ± 1%  -16.60%  (p=0.000 n=10+10)
      IntSqr/10-8                          148ns ± 1%      129ns ± 5%  -12.85%  (p=0.000 n=10+10)
      IntSqr/20-8                          642ns ±28%      692ns ±21%     ~     (p=0.105 n=10+10)
      IntSqr/30-8                         1.03µs ±18%     1.06µs ±15%     ~     (p=0.422 n=10+8)
      IntSqr/50-8                         2.33µs ±14%     2.14µs ±20%     ~     (p=0.063 n=10+10)
      IntSqr/80-8                         4.06µs ±13%     3.72µs ±14%   -8.31%  (p=0.029 n=10+10)
      IntSqr/100-8                        5.79µs ±10%     5.20µs ±18%  -10.15%  (p=0.004 n=10+10)
      IntSqr/200-8                        17.1µs ± 1%     12.9µs ± 3%  -24.44%  (p=0.000 n=10+10)
      IntSqr/300-8                        35.9µs ± 0%     26.6µs ± 1%  -25.75%  (p=0.000 n=10+10)
      IntSqr/500-8                        84.9µs ± 0%     71.7µs ± 1%  -15.49%  (p=0.000 n=10+10)
      IntSqr/800-8                         170µs ± 1%      142µs ± 2%  -16.73%  (p=0.000 n=10+10)
      IntSqr/1000-8                        258µs ± 1%      218µs ± 1%  -15.65%  (p=0.000 n=10+10)
      Mul-8                               10.4ms ± 1%      8.3ms ± 0%  -20.05%  (p=0.000 n=10+9)
      Exp3Power/0x10-8                     311ns ±15%      321ns ±24%     ~     (p=0.447 n=10+10)
      Exp3Power/0x40-8                     358ns ±21%      346ns ±37%     ~     (p=0.591 n=10+10)
      Exp3Power/0x100-8                    611ns ±19%      570ns ±27%     ~     (p=0.393 n=10+10)
      Exp3Power/0x400-8                   1.31µs ±26%     1.34µs ±19%     ~     (p=0.853 n=10+10)
      Exp3Power/0x1000-8                  6.76µs ±23%     6.22µs ±16%     ~     (p=0.095 n=10+9)
      Exp3Power/0x4000-8                  37.6µs ±14%     36.4µs ±21%     ~     (p=0.247 n=10+10)
      Exp3Power/0x10000-8                  345µs ±14%      310µs ±11%   -9.99%  (p=0.005 n=10+10)
      Exp3Power/0x40000-8                 2.77ms ± 1%     2.34ms ± 1%  -15.47%  (p=0.000 n=10+10)
      Exp3Power/0x100000-8                25.1ms ± 1%     21.3ms ± 1%  -15.26%  (p=0.000 n=10+10)
      Exp3Power/0x400000-8                 225ms ± 1%      190ms ± 1%  -15.61%  (p=0.000 n=10+10)
      Fibo-8                              23.4ms ± 1%     23.3ms ± 0%     ~     (p=0.052 n=10+10)
      NatSqr/1-8                          58.4ns ±24%     59.8ns ±38%     ~     (p=0.739 n=10+10)
      NatSqr/2-8                           122ns ±21%      122ns ±16%     ~     (p=0.896 n=10+10)
      NatSqr/3-8                           140ns ±28%      148ns ±30%     ~     (p=0.288 n=10+10)
      NatSqr/5-8                           193ns ±29%      210ns ±34%     ~     (p=0.469 n=10+10)
      NatSqr/8-8                           317ns ±21%      296ns ±25%     ~     (p=0.393 n=10+10)
      NatSqr/10-8                          362ns ± 8%      373ns ±30%     ~     (p=0.617 n=9+10)
      NatSqr/20-8                         1.24µs ±16%     1.06µs ±29%  -14.57%  (p=0.019 n=10+10)
      NatSqr/30-8                         1.90µs ±32%     1.71µs ±10%     ~     (p=0.176 n=10+9)
      NatSqr/50-8                         4.22µs ±19%     3.67µs ± 7%  -13.03%  (p=0.017 n=10+9)
      NatSqr/80-8                         7.33µs ±20%     6.50µs ±15%  -11.26%  (p=0.009 n=10+10)
      NatSqr/100-8                        9.84µs ±18%     9.33µs ± 8%     ~     (p=0.280 n=10+10)
      NatSqr/200-8                        21.4µs ± 7%     20.0µs ±14%     ~     (p=0.075 n=10+10)
      NatSqr/300-8                        38.0µs ± 2%     31.3µs ±10%  -17.63%  (p=0.000 n=10+10)
      NatSqr/500-8                         102µs ± 5%      101µs ± 4%     ~     (p=0.780 n=9+10)
      NatSqr/800-8                         190µs ± 3%      166µs ± 6%  -12.29%  (p=0.000 n=10+10)
      NatSqr/1000-8                        277µs ± 2%      245µs ± 6%  -11.64%  (p=0.000 n=10+10)
      ScanPi-8                             144µs ±23%      149µs ±24%     ~     (p=0.579 n=10+10)
      StringPiParallel-8                  25.6µs ± 0%     25.8µs ± 0%   +0.69%  (p=0.000 n=9+10)
      Scan/10/Base2-8                      305ns ± 1%      309ns ± 1%   +1.32%  (p=0.000 n=10+9)
      Scan/100/Base2-8                    1.95µs ± 1%     1.98µs ± 1%   +1.10%  (p=0.000 n=10+10)
      Scan/1000/Base2-8                   19.5µs ± 1%     19.7µs ± 1%   +1.39%  (p=0.000 n=10+10)
      Scan/10000/Base2-8                   270µs ± 1%      272µs ± 1%   +0.58%  (p=0.024 n=9+9)
      Scan/100000/Base2-8                 10.3ms ± 0%     10.3ms ± 0%   +0.16%  (p=0.022 n=9+10)
      Scan/10/Base8-8                      146ns ± 4%      154ns ± 4%   +5.57%  (p=0.000 n=9+9)
      Scan/100/Base8-8                     748ns ± 1%      759ns ± 1%   +1.51%  (p=0.000 n=9+10)
      Scan/1000/Base8-8                   7.88µs ± 1%     8.00µs ± 1%   +1.64%  (p=0.000 n=10+10)
      Scan/10000/Base8-8                   155µs ± 1%      155µs ± 1%     ~     (p=0.968 n=10+9)
      Scan/100000/Base8-8                 9.11ms ± 0%     9.11ms ± 0%     ~     (p=0.604 n=9+10)
      Scan/10/Base10-8                     140ns ± 5%      149ns ± 5%   +6.39%  (p=0.000 n=9+10)
      Scan/100/Base10-8                    680ns ± 0%      688ns ± 1%   +1.08%  (p=0.000 n=9+10)
      Scan/1000/Base10-8                  7.09µs ± 1%     7.16µs ± 1%   +0.98%  (p=0.019 n=10+10)
      Scan/10000/Base10-8                  149µs ± 3%      150µs ± 3%     ~     (p=0.143 n=10+10)
      Scan/100000/Base10-8                9.16ms ± 0%     9.16ms ± 0%     ~     (p=0.661 n=10+9)
      Scan/10/Base16-8                     134ns ± 5%      135ns ± 3%     ~     (p=0.505 n=9+9)
      Scan/100/Base16-8                    560ns ± 1%      563ns ± 0%   +0.67%  (p=0.000 n=10+8)
      Scan/1000/Base16-8                  6.28µs ± 1%     6.26µs ± 1%     ~     (p=0.448 n=10+10)
      Scan/10000/Base16-8                  161µs ± 1%      162µs ± 1%   +0.74%  (p=0.008 n=9+9)
      Scan/100000/Base16-8                9.64ms ± 0%     9.64ms ± 0%     ~     (p=0.436 n=10+10)
      String/10/Base2-8                    116ns ±12%      118ns ±13%     ~     (p=0.645 n=10+10)
      String/100/Base2-8                   871ns ±23%      860ns ±22%     ~     (p=0.699 n=10+10)
      String/1000/Base2-8                 10.0µs ±20%     10.0µs ±23%     ~     (p=0.853 n=10+10)
      String/10000/Base2-8                 110µs ±21%      120µs ±25%     ~     (p=0.436 n=10+10)
      String/100000/Base2-8                768µs ±11%      733µs ±16%     ~     (p=0.393 n=10+10)
      String/10/Base8-8                   51.3ns ± 1%     51.0ns ± 3%     ~     (p=0.286 n=9+9)
      String/100/Base8-8                   284ns ± 9%      272ns ±12%     ~     (p=0.267 n=9+10)
      String/1000/Base8-8                 3.06µs ± 9%     3.04µs ±10%     ~     (p=0.739 n=10+10)
      String/10000/Base8-8                36.1µs ±14%     35.1µs ± 9%     ~     (p=0.447 n=10+9)
      String/100000/Base8-8                371µs ±12%      373µs ±16%     ~     (p=0.739 n=10+10)
      String/10/Base10-8                   167ns ±11%      165ns ± 9%     ~     (p=0.781 n=10+10)
      String/100/Base10-8                  727ns ± 1%      740ns ± 2%   +1.70%  (p=0.001 n=10+10)
      String/1000/Base10-8                5.30µs ±18%     5.37µs ±14%     ~     (p=0.631 n=10+10)
      String/10000/Base10-8               45.0µs ±14%     44.6µs ±10%     ~     (p=0.720 n=9+10)
      String/100000/Base10-8              5.10ms ± 1%     5.05ms ± 3%     ~     (p=0.211 n=9+10)
      String/10/Base16-8                  47.7ns ± 6%     47.7ns ± 6%     ~     (p=0.985 n=10+10)
      String/100/Base16-8                  221ns ±10%      234ns ±27%     ~     (p=0.541 n=10+10)
      String/1000/Base16-8                2.23µs ±11%     2.12µs ± 8%   -4.81%  (p=0.029 n=9+8)
      String/10000/Base16-8               28.3µs ±21%     28.5µs ±14%     ~     (p=0.796 n=10+10)
      String/100000/Base16-8               291µs ±16%      293µs ±15%     ~     (p=0.931 n=9+9)
      LeafSize/0-8                        2.43ms ± 1%     2.49ms ± 1%   +2.56%  (p=0.000 n=10+10)
      LeafSize/1-8                        49.7µs ± 9%     46.3µs ±16%   -6.78%  (p=0.017 n=10+9)
      LeafSize/2-8                        48.4µs ±18%     46.3µs ±19%     ~     (p=0.436 n=10+10)
      LeafSize/3-8                        81.7µs ± 3%     80.9µs ± 3%     ~     (p=0.278 n=10+9)
      LeafSize/4-8                        47.0µs ± 7%     47.9µs ±13%     ~     (p=0.905 n=9+10)
      LeafSize/5-8                        96.8µs ± 1%     97.3µs ± 2%     ~     (p=0.515 n=8+10)
      LeafSize/6-8                        82.5µs ± 4%     80.9µs ± 2%   -1.92%  (p=0.019 n=10+10)
      LeafSize/7-8                        67.2µs ±13%     66.6µs ± 9%     ~     (p=0.842 n=10+9)
      LeafSize/8-8                        46.0µs ±28%     45.1µs ±12%     ~     (p=0.739 n=10+10)
      LeafSize/9-8                         111µs ± 1%      111µs ± 1%     ~     (p=0.739 n=10+10)
      LeafSize/10-8                       98.8µs ± 4%     97.9µs ± 3%     ~     (p=0.278 n=10+9)
      LeafSize/11-8                       96.8µs ± 1%     96.4µs ± 1%     ~     (p=0.211 n=9+10)
      LeafSize/12-8                       81.0µs ± 4%     81.3µs ± 3%     ~     (p=0.579 n=10+10)
      LeafSize/13-8                       79.7µs ± 5%     79.2µs ± 3%     ~     (p=0.661 n=10+9)
      LeafSize/14-8                       67.6µs ±12%     65.8µs ± 7%     ~     (p=0.447 n=10+9)
      LeafSize/15-8                       63.9µs ±17%     66.3µs ±14%     ~     (p=0.481 n=10+10)
      LeafSize/16-8                       44.0µs ±28%     46.0µs ±27%     ~     (p=0.481 n=10+10)
      LeafSize/32-8                       46.2µs ±13%     43.5µs ±18%     ~     (p=0.156 n=9+10)
      LeafSize/64-8                       53.3µs ±10%     53.0µs ±19%     ~     (p=0.730 n=9+9)
      ProbablyPrime/n=0-8                 3.60ms ± 1%     3.39ms ± 1%   -5.87%  (p=0.000 n=10+9)
      ProbablyPrime/n=1-8                 4.42ms ± 1%     4.08ms ± 1%   -7.69%  (p=0.000 n=10+10)
      ProbablyPrime/n=5-8                 7.57ms ± 2%     6.79ms ± 1%  -10.24%  (p=0.000 n=10+10)
      ProbablyPrime/n=10-8                11.6ms ± 2%     10.2ms ± 1%  -11.69%  (p=0.000 n=10+10)
      ProbablyPrime/n=20-8                19.4ms ± 2%     16.9ms ± 2%  -12.89%  (p=0.000 n=10+10)
      ProbablyPrime/Lucas-8               2.81ms ± 2%     2.72ms ± 1%   -3.22%  (p=0.000 n=10+9)
      ProbablyPrime/MillerRabinBase2-8     797µs ± 1%      680µs ± 1%  -14.64%  (p=0.000 n=10+10)
      
      name                              old speed      new speed       delta
      AddVV/1-8                         17.1GB/s ± 6%   18.0GB/s ± 2%     ~     (p=0.122 n=10+8)
      AddVV/2-8                         32.4GB/s ± 2%   32.2GB/s ± 4%     ~     (p=0.661 n=10+9)
      AddVV/3-8                         38.6GB/s ± 2%   38.9GB/s ± 1%     ~     (p=0.113 n=10+9)
      AddVV/4-8                         45.8GB/s ± 2%   45.8GB/s ± 2%     ~     (p=0.796 n=10+10)
      AddVV/5-8                         48.1GB/s ± 2%   48.3GB/s ± 1%     ~     (p=0.315 n=10+10)
      AddVV/10-8                        78.9GB/s ± 1%   78.9GB/s ± 2%     ~     (p=0.353 n=10+10)
      AddVV/100-8                        136GB/s ± 2%    137GB/s ± 1%     ~     (p=0.971 n=10+10)
      AddVV/1000-8                       164GB/s ± 1%    164GB/s ± 4%     ~     (p=0.853 n=10+10)
      AddVV/10000-8                      126GB/s ± 6%    129GB/s ± 2%     ~     (p=0.063 n=10+10)
      AddVV/100000-8                     116GB/s ± 3%    116GB/s ± 3%     ~     (p=0.796 n=10+10)
      AddVW/1-8                         2.64GB/s ± 3%   2.64GB/s ± 3%     ~     (p=0.579 n=10+10)
      AddVW/2-8                         4.49GB/s ± 2%   4.44GB/s ± 2%   -1.09%  (p=0.040 n=9+9)
      AddVW/3-8                         6.36GB/s ± 1%   6.34GB/s ± 2%     ~     (p=0.684 n=10+10)
      AddVW/4-8                         6.83GB/s ± 1%   6.82GB/s ± 2%     ~     (p=0.905 n=10+9)
      AddVW/5-8                         8.75GB/s ± 1%   8.73GB/s ± 1%     ~     (p=0.796 n=10+10)
      AddVW/10-8                        10.5GB/s ± 2%   10.5GB/s ± 1%     ~     (p=0.971 n=10+10)
      AddVW/100-8                       19.5GB/s ± 2%   18.9GB/s ± 2%   -3.22%  (p=0.000 n=10+10)
      AddVW/1000-8                      20.7GB/s ± 2%   20.6GB/s ± 4%     ~     (p=0.631 n=10+10)
      AddVW/10000-8                     20.6GB/s ± 3%   20.7GB/s ± 3%     ~     (p=0.481 n=10+10)
      AddVW/100000-8                    19.4GB/s ± 2%   19.2GB/s ± 3%     ~     (p=0.165 n=10+10)
      AddMulVVW/1-8                     19.5GB/s ± 2%   19.7GB/s ± 3%     ~     (p=0.123 n=10+10)
      AddMulVVW/2-8                     30.1GB/s ± 2%   30.2GB/s ± 3%     ~     (p=0.297 n=9+9)
      AddMulVVW/3-8                     37.9GB/s ± 2%   36.5GB/s ± 2%   -3.63%  (p=0.000 n=10+10)
      AddMulVVW/4-8                     40.0GB/s ± 2%   39.4GB/s ± 2%   -1.58%  (p=0.001 n=10+10)
      AddMulVVW/5-8                     47.3GB/s ± 2%   46.6GB/s ± 1%   -1.35%  (p=0.001 n=9+9)
      AddMulVVW/10-8                    52.3GB/s ± 2%   60.6GB/s ± 3%  +15.76%  (p=0.000 n=10+10)
      AddMulVVW/100-8                   80.3GB/s ± 2%  122.1GB/s ± 1%  +51.92%  (p=0.000 n=10+10)
      AddMulVVW/1000-8                  92.0GB/s ± 1%  130.3GB/s ± 2%  +41.61%  (p=0.000 n=9+10)
      AddMulVVW/10000-8                 88.2GB/s ± 2%  108.2GB/s ± 5%  +22.66%  (p=0.000 n=10+10)
      AddMulVVW/100000-8                88.2GB/s ± 2%  102.9GB/s ± 2%  +16.69%  (p=0.000 n=10+10)
      
      Change-Id: Ic98e30c91d437d845fed03e07e976c3fdbf02b36
      Reviewed-on: https://go-review.googlesource.com/74851
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAdam Langley <agl@golang.org>
      c3935c08
  5. 23 Feb, 2018 6 commits
    • Joe Tsai's avatar
      archive/zip: fix handling of Info-ZIP Unix extended timestamps · 9697a119
      Joe Tsai authored
      The Info-ZIP Unix1 extra field is specified as such:
      >>>
      Value    Size   Description
      -----    ----   -----------
      0x5855   Short  tag for this extra block type ("UX")
      TSize    Short  total data size for this block
      AcTime   Long   time of last access (GMT/UTC)
      ModTime  Long   time of last modification (GMT/UTC)
      <<<
      
      The previous handling was incorrect in that it read the AcTime field
      instead of the ModTime field.
      
      The test-osx.zip test unfortunately locked in the wrong behavior.
      Manually parsing that ZIP file shows that the encoded MS-DOS
      date and time are 0x4b5f and 0xa97d, which corresponds with a
      date of 2017-10-31 21:11:58, which matches the correct mod time
      (off by 1 second due to MS-DOS timestamp resolution).
      
      Fixes #23901
      
      Change-Id: I567824c66e8316b9acd103dbecde366874a4b7ef
      Reviewed-on: https://go-review.googlesource.com/96895
      Run-TryBot: Joe Tsai <joetsai@google.com>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      9697a119
    • Ian Lance Taylor's avatar
      runtime: don't check for String/Error methods in printany · 804e3e56
      Ian Lance Taylor authored
      They have either already been called by preprintpanics, or they can
      not be called safely because of the various conditions checked at the
      start of gopanic.
      
      Fixes #24059
      
      Change-Id: I4a6233d12c9f7aaaee72f343257ea108bae79241
      Reviewed-on: https://go-review.googlesource.com/96755Reviewed-by: 's avatarAustin Clements <austin@google.com>
      804e3e56
    • Yuval Pavel Zholkover's avatar
      os: respect umask in Mkdir and OpenFile on BSD systems when perm has ModeSticky set · a5e8e2d9
      Yuval Pavel Zholkover authored
      Instead of calling Chmod directly on perm, stat the created file/dir to extract the
      actual permission bits which can be different from perm due to umask.
      
      Fixes #23120.
      
      Change-Id: I3e70032451fc254bf48ce9627e98988f84af8d91
      Reviewed-on: https://go-review.googlesource.com/84477
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      a5e8e2d9
    • Austin Clements's avatar
      runtime: reduce arena size to 4MB on 64-bit Windows · 78846472
      Austin Clements authored
      Currently, we use 64MB heap arenas on 64-bit platforms. This works
      well on UNIX-like OSes because they treat untouched pages as
      essentially free. However, on Windows, committed memory is charged
      against a process whether or not it has demand-faulted physical pages
      in. Hence, on Windows, even a process with a tiny heap will commit
      64MB for one heap arena, plus another 32MB for the arena map. Things
      are much worse under the race detector, which increases the heap
      commitment by a factor of 5.5X, leading to 384MB of committed memory
      at runtime init.
      
      Fix this by reducing the heap arena size to 4MB on Windows.
      
      To counterbalance the effect of increasing the arena map size by a
      factor of 16, and to further reduce the impact of the commitment for
      the arena map, we switch from a single entry L1 arena map to a 64
      entry L1 arena map.
      
      Compared to the original arena design, this slows down the
      x/benchmarks garbage benchmark by 0.49% (the slow down of this commit
      alone is 1.59%, but the previous commit bought us a 1% speed-up):
      
      name                       old time/op  new time/op  delta
      Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.29ms ± 1%  +0.49%  (p=0.000 n=17+18)
      
      (https://perf.golang.org/search?q=upload:20180223.1)
      
      (This was measured on linux/amd64 by modifying its arena configuration
      as above.)
      
      Fixes #23900.
      
      Change-Id: I6b7fa5ecebee2947bf20cfeb78c248809469c6b1
      Reviewed-on: https://go-review.googlesource.com/96780
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      78846472
    • Austin Clements's avatar
      runtime: support a two-level arena map · ec252105
      Austin Clements authored
      Currently, the heap arena map is a single, large array that covers
      every possible arena frame in the entire address space. This is
      practical up to about 48 bits of address space with 64 MB arenas.
      
      However, there are two problems with this:
      
      1. mips64, ppc64, and s390x support full 64-bit address spaces (though
         on Linux only s390x has kernel support for 64-bit address spaces).
         On these platforms, it would be good to support these larger
         address spaces.
      
      2. On Windows, processes are charged for untouched memory, so for
         processes with small heaps, the mostly-untouched 32 MB arena map
         plus a 64 MB arena are significant overhead. Hence, it would be
         good to reduce both the arena map size and the arena size, but with
         a single-level arena, these are inversely proportional.
      
      This CL adds support for a two-level arena map. Arena frame numbers
      are now divided into arenaL1Bits of L1 index and arenaL2Bits of L2
      index.
      
      At the moment, arenaL1Bits is always 0, so we effectively have a
      single level map. We do a few things so that this has no cost beyond
      the current single-level map:
      
      1. We embed the L2 array directly in mheap, so if there's a single
         entry in the L2 array, the representation is identical to the
         current representation and there's no extra level of indirection.
      
      2. Hot code that accesses the arena map is structured so that it
         optimizes to nearly the same machine code as it does currently.
      
      3. We make some small tweaks to hot code paths and to the inliner
         itself to keep some important functions inlined despite their
         now-larger ASTs. In particular, this is necessary for
         heapBitsForAddr and heapBits.next.
      
      Possibly as a result of some of the tweaks, this actually slightly
      improves the performance of the x/benchmarks garbage benchmark:
      
      name                       old time/op  new time/op  delta
      Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.26ms ± 1%  -1.07%  (p=0.000 n=17+19)
      
      (https://perf.golang.org/search?q=upload:20180223.2)
      
      For #23900.
      
      Change-Id: If5164e0961754f97eb9eca58f837f36d759505ff
      Reviewed-on: https://go-review.googlesource.com/96779
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      ec252105
    • Austin Clements's avatar
      cmd/compile: teach front-end deadcode about && and || · 2dbf15e8
      Austin Clements authored
      The front-end dead code elimination is very simple. Currently, it just
      looks for if statements with constant boolean conditions. Its main
      purpose is to reduce load on the compiler and shrink code before
      inlining computes hairiness.
      
      This CL teaches front-end dead code elimination about short-circuiting
      boolean expressions && and ||, since they're essentially the same as
      if statements.
      
      This also teaches the inliner that the constant 'if' form left behind
      by deadcode is free.
      
      These changes will help with runtime modifications in the next CL that
      would otherwise inhibit inlining in some hot code paths. Currently,
      however, they have no significant impact on benchmarks.
      
      Change-Id: I886203b3c4acdbfef08148fddd7f3a7af5afc7c1
      Reviewed-on: https://go-review.googlesource.com/96778
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      2dbf15e8