1. 26 Feb, 2018 12 commits
    • Robert Griesemer's avatar
      cmd/compile: track line directives w/ column information · 515fa58a
      Robert Griesemer authored
      Extend cmd/internal/src.PosBase to track column information,
      and adjust the meaning of the PosBase position to mean the
      position at which the PosBase's relative (line, col) position
      starts (rather than indicating the position of the //line
      directive). Because this semantic change is made in the
      compiler's noder, it doesn't affect the logic of src.PosBase,
      only its test setup (where PosBases are constructed with
      corrected incomming positions). In short, src.PosBase now
      matches syntax.PosBase with respect to the semantics of
      src.PosBase.pos.
      
      For #22662.
      
      Change-Id: I5b1451cb88fff3f149920c2eec08b6167955ce27
      Reviewed-on: https://go-review.googlesource.com/96535Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      515fa58a
    • Robert Griesemer's avatar
      cmd/compile/internal/syntax: implement //line :line:col handling · 6fa6bde9
      Robert Griesemer authored
      For line directives which have a line and a column number,
      an omitted filename means that the filename has not changed
      (per the issue below).
      
      For line directives w/o a column number, an omitted filename
      means the empty filename (to preserve the existing behavior).
      
      For #22662.
      
      Change-Id: I32cd9037550485da5445a34bb104706eccce1df1
      Reviewed-on: https://go-review.googlesource.com/96476Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      6fa6bde9
    • Robert Griesemer's avatar
      cmd/compile/internal/syntax: remove dependency on cmd/internal/src · 5c08b9e8
      Robert Griesemer authored
      For dependency reasons, the data structure implementing source
      positions in the compiler is in cmd/internal/src. It contains
      highly compiler specific details (e.g. inlining index).
      
      This change introduces a parallel but simpler position
      representation, defined in the syntax package, which removes
      that package's dependency on cmd/internal/src, and also removes
      the need to deal with certain filename-specific operations
      (defined by the needs of the compiler) in the syntax package.
      As a result, the syntax package becomes again a compiler-
      independent, stand-alone package that at some point might
      replace (or augment) the existing top-level go/* syntax-related
      packages.
      
      Additionally, line directives that update column numbers
      are now correctly tracked through the syntax package, with
      additional tests added. (The respective changes also need to
      be made in cmd/internal/src; i.e., the compiler accepts but
      still ignores column numbers in line directives.)
      
      This change comes at the cost of a new position translation
      step, but that step is cheap because it only needs to do real
      work if the position base changed (i.e., if there is a new file,
      or new line directive).
      
      There is no noticeable impact on overall compiler performance
      measured with `compilebench -count 5 -alloc`:
      
      name       old time/op       new time/op       delta
      Template         220ms ± 8%        228ms ±18%    ~     (p=0.548 n=5+5)
      Unicode          119ms ±11%        113ms ± 5%    ~     (p=0.056 n=5+5)
      GoTypes          684ms ± 6%        677ms ± 3%    ~     (p=0.841 n=5+5)
      Compiler         3.19s ± 7%        3.01s ± 1%    ~     (p=0.095 n=5+5)
      SSA              7.92s ± 8%        7.79s ± 1%    ~     (p=0.690 n=5+5)
      Flate            141ms ± 7%        139ms ± 4%    ~     (p=0.548 n=5+5)
      GoParser         173ms ±12%        171ms ± 4%    ~     (p=1.000 n=5+5)
      Reflect          417ms ± 5%        411ms ± 3%    ~     (p=0.548 n=5+5)
      Tar              205ms ± 5%        198ms ± 2%    ~     (p=0.690 n=5+5)
      XML              232ms ± 4%        229ms ± 4%    ~     (p=0.690 n=5+5)
      StdCmd           28.7s ± 5%        28.2s ± 2%    ~     (p=0.421 n=5+5)
      
      name       old user-time/op  new user-time/op  delta
      Template         269ms ± 4%        265ms ± 5%    ~     (p=0.421 n=5+5)
      Unicode          153ms ± 7%        149ms ± 3%    ~     (p=0.841 n=5+5)
      GoTypes          850ms ± 7%        862ms ± 4%    ~     (p=0.841 n=5+5)
      Compiler         4.01s ± 5%        3.86s ± 0%    ~     (p=0.190 n=5+4)
      SSA              10.9s ± 4%        10.8s ± 2%    ~     (p=0.548 n=5+5)
      Flate            166ms ± 7%        167ms ± 6%    ~     (p=1.000 n=5+5)
      GoParser         204ms ± 8%        206ms ± 7%    ~     (p=0.841 n=5+5)
      Reflect          514ms ± 5%        508ms ± 4%    ~     (p=0.548 n=5+5)
      Tar              245ms ± 6%        244ms ± 3%    ~     (p=0.690 n=5+5)
      XML              280ms ± 4%        278ms ± 4%    ~     (p=0.841 n=5+5)
      
      name       old alloc/op      new alloc/op      delta
      Template        37.9MB ± 0%       37.9MB ± 0%    ~     (p=0.841 n=5+5)
      Unicode         28.8MB ± 0%       28.8MB ± 0%    ~     (p=0.841 n=5+5)
      GoTypes          113MB ± 0%        113MB ± 0%    ~     (p=0.151 n=5+5)
      Compiler         468MB ± 0%        468MB ± 0%  -0.01%  (p=0.032 n=5+5)
      SSA             1.50GB ± 0%       1.50GB ± 0%    ~     (p=0.548 n=5+5)
      Flate           24.4MB ± 0%       24.4MB ± 0%    ~     (p=1.000 n=5+5)
      GoParser        30.7MB ± 0%       30.7MB ± 0%    ~     (p=1.000 n=5+5)
      Reflect         76.5MB ± 0%       76.5MB ± 0%    ~     (p=0.548 n=5+5)
      Tar             38.9MB ± 0%       38.9MB ± 0%    ~     (p=0.222 n=5+5)
      XML             41.6MB ± 0%       41.6MB ± 0%    ~     (p=0.548 n=5+5)
      
      name       old allocs/op     new allocs/op     delta
      Template          382k ± 0%         382k ± 0%  +0.01%  (p=0.008 n=5+5)
      Unicode           343k ± 0%         343k ± 0%    ~     (p=0.841 n=5+5)
      GoTypes          1.19M ± 0%        1.19M ± 0%  +0.01%  (p=0.008 n=5+5)
      Compiler         4.53M ± 0%        4.53M ± 0%  +0.03%  (p=0.008 n=5+5)
      SSA              12.4M ± 0%        12.4M ± 0%  +0.00%  (p=0.008 n=5+5)
      Flate             235k ± 0%         235k ± 0%    ~     (p=0.079 n=5+5)
      GoParser          318k ± 0%         318k ± 0%    ~     (p=0.730 n=5+5)
      Reflect           978k ± 0%         978k ± 0%    ~     (p=1.000 n=5+5)
      Tar               393k ± 0%         393k ± 0%    ~     (p=0.056 n=5+5)
      XML               405k ± 0%         405k ± 0%    ~     (p=0.548 n=5+5)
      
      name       old text-bytes    new text-bytes    delta
      HelloSize        672kB ± 0%        672kB ± 0%    ~     (all equal)
      CmdGoSize       7.12MB ± 0%       7.12MB ± 0%    ~     (all equal)
      
      name       old data-bytes    new data-bytes    delta
      HelloSize        133kB ± 0%        133kB ± 0%    ~     (all equal)
      CmdGoSize        390kB ± 0%        390kB ± 0%    ~     (all equal)
      
      name       old exe-bytes     new exe-bytes     delta
      HelloSize       1.07MB ± 0%       1.07MB ± 0%    ~     (all equal)
      CmdGoSize       11.2MB ± 0%       11.2MB ± 0%    ~     (all equal)
      
      Passes toolstash compare.
      
      For #22662.
      
      Change-Id: I19edb53dd9675af57f7122cb7dba2a6d8bdcc3da
      Reviewed-on: https://go-review.googlesource.com/94515Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      5c08b9e8
    • Brad Fitzpatrick's avatar
      strings: add Builder benchmarks comparing bytes.Buffer and strings.Builder · b1accced
      Brad Fitzpatrick authored
      Despite the existing test that locks in the allocation behavior, people
      really want a benchmark. So:
      
      BenchmarkBuildString_Builder/1Write_NoGrow-4    20000000  60.4 ns/op   48 B/op  1 allocs/op
      BenchmarkBuildString_Builder/3Write_NoGrow-4    10000000   230 ns/op  336 B/op  3 allocs/op
      BenchmarkBuildString_Builder/3Write_Grow-4      20000000   102 ns/op  112 B/op  1 allocs/op
      BenchmarkBuildString_ByteBuffer/1Write_NoGrow-4 10000000   125 ns/op  160 B/op  2 allocs/op
      BenchmarkBuildString_ByteBuffer/3Write_NoGrow-4  5000000   339 ns/op  400 B/op  3 allocs/op
      BenchmarkBuildString_ByteBuffer/3Write_Grow-4    5000000   316 ns/op  336 B/op  3 allocs/op
      
      I don't think these allocate-as-fast-as-you-can benchmarks are very
      interesting because they're effectively just GC benchmarks, but sure.
      If one wants to see that there's 1 fewer allocation, there it is. The
      ns/op and B/op numbers will change as the built string size changes.
      
      Updates #18990
      
      Change-Id: Ifccf535bd396217434a0e6989e195105f90132ae
      Reviewed-on: https://go-review.googlesource.com/96980
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAlan Donovan <adonovan@google.com>
      b1accced
    • Tobias Klauser's avatar
      syscall: remove/update outdated TODO comments · 495eb3f9
      Tobias Klauser authored
      Error returns for linux/arm syscalls are handled since a long time.
      
      Remove another list of unimplemented syscalls, following CL 96315.
      
      The root-only check in TestSyscallNoError was shown to be sufficient as
      part of CL 84485 already.
      
      NetBSD and OpenBSD do not implement the sendfile syscall (yet), so add a
      link to golang.org/issue/5847
      
      Change-Id: I07efc3c3203537a4142707385f31b59dc0ecca42
      Reviewed-on: https://go-review.googlesource.com/97115Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      495eb3f9
    • Tobias Klauser's avatar
      os: unify supportsCloseOnExec definition · ad9814de
      Tobias Klauser authored
      On Darwin and FreeBSD, supportsCloseOnExec is defined in its own file,
      even though it is set to true as on other Unices. Drop the separate
      definitions but keep the accompanying comments.
      
      Change-Id: Iab1d20e1b2590800f141d54b55a099c9cd7ae57e
      Reviewed-on: https://go-review.googlesource.com/97155
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      ad9814de
    • Alex Brainman's avatar
      os: do not forget to set ModeDevice when using ModeCharDevice · 9cae3aaf
      Alex Brainman authored
      Fixes #23123
      
      Change-Id: Ia4ac947cc49ef3d150ef60a095b86552dcef397d
      Reviewed-on: https://go-review.googlesource.com/84435Reviewed-by: 's avatarGiovanni Bajo <rasky@develer.com>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      9cae3aaf
    • Tobias Klauser's avatar
      net, internal/poll, net/internal/socktest: use SOCK_{CLOEXEC,NONBLOCK}… · 144bf04a
      Tobias Klauser authored
      net, internal/poll, net/internal/socktest: use SOCK_{CLOEXEC,NONBLOCK} accept4/socket flags on OpenBSD
      
      The SOCK_CLOEXEC and SOCK_NONBLOCK flags to the socket syscall and the
      accept4 syscall are supported since OpenBSD 5.7.
      
      Follows CL 40895 and CL 94295
      
      Change-Id: Icaf35ace2ef5e73279a70d4f1a9fbf3be9371e6c
      Reviewed-on: https://go-review.googlesource.com/97196Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      144bf04a
    • Kevin Burke's avatar
      os/user: clean up grammar in comments · db7af2e6
      Kevin Burke authored
      Change-Id: If9fe04894851d60a682346415c2e5523b2f04929
      Reviewed-on: https://go-review.googlesource.com/96981Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarAlex Brainman <alex.brainman@gmail.com>
      db7af2e6
    • Kunpei Sakai's avatar
      time: avoid unnecessary type conversions · 7e9a8546
      Kunpei Sakai authored
      Change-Id: Ic318c25b21298ec123eb27c814c79f637887713c
      Reviewed-on: https://go-review.googlesource.com/97135
      Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      7e9a8546
    • Giovanni Bajo's avatar
      build: small cleanup in error message in make.bat · dd3b4714
      Giovanni Bajo authored
      Contrary to bash, double quotes cannot be used to group
      arguments in Windows shell, so they were being printed as
      literals by the echo command.
      
      Since a literal '>' is present in the string, it is sufficient
      to escape it correctly through '^'.
      
      Change-Id: Icc8c92b3dc8d813825adadbe3d921a38d44a1a94
      Reviewed-on: https://go-review.googlesource.com/97056Reviewed-by: 's avatarAlex Brainman <alex.brainman@gmail.com>
      dd3b4714
    • unknown's avatar
      net/http,doc: use HTTP status code constants where applicable · e9c57bea
      unknown authored
      There are a few places where the integer value is used.
      Use the equivalent constants to aid with readability.
      
      Change-Id: I023b1dbe605340544c056d0e0d9d6d5a7d7d0edc
      GitHub-Last-Rev: c1c90bcd251901f9f2a305ce5ddd0d85009a3d49
      GitHub-Pull-Request: golang/go#24123
      Reviewed-on: https://go-review.googlesource.com/96984Reviewed-by: 's avatarAndrew Bonventre <andybons@golang.org>
      e9c57bea
  2. 25 Feb, 2018 1 commit
  3. 24 Feb, 2018 3 commits
    • Lubomir I. Ivanov (VMware)'s avatar
      os/user: obtain a user home path on Windows · 7a218942
      Lubomir I. Ivanov (VMware) authored
      newUserFromSid() is extended so that the retriaval of the user home
      path based on a user SID becomes possible.
      
      (1) The primary method it uses is to lookup the Windows registry for
      the following key:
        HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList\[SID]
      
      If the key does not exist the user might not have logged in yet.
      If (1) fails it falls back to (2)
      
      (2) The second method the function uses is to look at the default home
      path for users (e.g. WINAPI's GetProfilesDirectory()) and append
      the username to that. The procedure is in the lines of:
        c:\Users + \ + <username>
      
      The function newUser() now requires the following arguments:
        uid, gid, dir, username, domain
      This is done to avoid multiple calls to usid.String() and
      usid.LookupAccount("") in the case of a newUserFromSid()
      call stack.
      
      The functions current() and newUserFromSid() both call newUser()
      supplying the arguments in question. The helpers
      lookupUsernameAndDomain() and findHomeDirInRegistry() are
      added.
      
      This commit also updates:
      - go/build/deps_test.go, so that the test now includes the
      "internal/syscall/windows/registry" import.
      - os/user/user_test.go, so that User.HomeDir is tested on Windows.
      
      GitHub-Last-Rev: 25423e2a3820121f4c42321e7a77a3977f409724
      GitHub-Pull-Request: golang/go#23822
      Change-Id: I6c3ad1c4ce3e7bc0d1add024951711f615b84ee5
      Reviewed-on: https://go-review.googlesource.com/93935Reviewed-by: 's avatarAlex Brainman <alex.brainman@gmail.com>
      Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      7a218942
    • Daniel Martí's avatar
      cmd/compile/internal/syntax: use stringer for operators and tokens · c8791538
      Daniel Martí authored
      With its new -linecomment flag, it is now possible to use stringer on
      values whose strings aren't valid identifiers. This is the case with
      tokens and operators in Go.
      
      Operator alredy had inline comments with each operator's string
      representation; only minor modifications were needed. The inline
      comments were added to each of the token names, using the same strategy.
      
      Comments that were previously inline or part of the string arrays were
      moved to the line immediately before the name they correspond to.
      
      Finally, declare tokStrFast as a function that uses the generated arrays
      directly. Avoiding the branch and strconv call means that we avoid a
      performance regression in the scanner, perhaps due to the lack of
      mid-stack inlining.
      
      Performance is not affected. Measured with 'go test -run StdLib -fast'
      on an X1 Carbon Gen2 (i5-4300U @ 1.90GHz, 8GB RAM, SSD), the best of 5
      runs before and after the changes are:
      
      	parsed 1709399 lines (3763 files) in 1.707402159s (1001169 lines/s)
      	allocated 449.282Mb (263.137Mb/s)
      
      	parsed 1709329 lines (3765 files) in 1.706663154s (1001562 lines/s)
      	allocated 449.290Mb (263.256Mb/s)
      
      Change-Id: Idcc4f83393fcadd6579700e3602c09496ea2625b
      Reviewed-on: https://go-review.googlesource.com/95357Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      c8791538
    • Ilya Tocar's avatar
      math/big: speed-up addMulVVW on amd64 · c3935c08
      Ilya Tocar authored
      Use MULX/ADOX/ADCX instructions to speed-up addMulVVW,
      when they are available. addMulVVW is a hotspot in rsa.
      This is faster than ADD/ADC/IMUL version, because ADOX/ADCX only
      modify carry/overflow flag, so they can be interleaved with each other
      and with MULX, which doesn't modify flags at all.
      Increasing unroll factor to e. g. 16 makes rsa 1% faster, but 3PrimeRSA2048Decrypt
      performance falls back to baseline.
      
      Updates #20058
      
      AddMulVVW/1-8                       3.28ns ± 2%     3.26ns ± 3%     ~     (p=0.107 n=10+10)
      AddMulVVW/2-8                       4.26ns ± 2%     4.24ns ± 3%     ~     (p=0.327 n=9+9)
      AddMulVVW/3-8                       5.07ns ± 2%     5.26ns ± 2%   +3.73%  (p=0.000 n=10+10)
      AddMulVVW/4-8                       6.40ns ± 2%     6.50ns ± 2%   +1.61%  (p=0.000 n=10+10)
      AddMulVVW/5-8                       6.77ns ± 2%     6.86ns ± 1%   +1.38%  (p=0.001 n=9+9)
      AddMulVVW/10-8                      12.2ns ± 2%     10.6ns ± 3%  -13.65%  (p=0.000 n=10+10)
      AddMulVVW/100-8                     79.7ns ± 2%     52.4ns ± 1%  -34.17%  (p=0.000 n=10+10)
      AddMulVVW/1000-8                     695ns ± 1%      491ns ± 2%  -29.39%  (p=0.000 n=9+10)
      AddMulVVW/10000-8                   7.26µs ± 2%     5.92µs ± 6%  -18.42%  (p=0.000 n=10+10)
      AddMulVVW/100000-8                  72.6µs ± 2%     62.2µs ± 2%  -14.31%  (p=0.000 n=10+10)
      
      crypto/rsa speed-up is smaller, but stil noticeable:
      
      RSA2048Decrypt-8        1.61ms ± 1%  1.38ms ± 1%  -14.13%  (p=0.000 n=10+10)
      RSA2048Sign-8           1.93ms ± 1%  1.70ms ± 1%  -11.86%  (p=0.000 n=10+10)
      3PrimeRSA2048Decrypt-8   932µs ± 0%   828µs ± 0%  -11.15%  (p=0.000 n=10+10)
      
      Results on crypto/tls:
      
      HandshakeServer/RSA-8                        901µs ± 1%    777µs ± 0%  -13.70%  (p=0.000 n=10+8)
      HandshakeServer/ECDHE-P256-RSA-8            1.01ms ± 1%   0.90ms ± 0%  -11.53%  (p=0.000 n=10+9)
      
      Full math/big benchmarks:
      
      name                              old time/op    new time/op     delta
      AddVV/1-8                           3.74ns ± 6%     3.55ns ± 2%     ~     (p=0.082 n=10+8)
      AddVV/2-8                           3.96ns ± 2%     3.98ns ± 5%     ~     (p=0.794 n=10+9)
      AddVV/3-8                           4.97ns ± 2%     4.94ns ± 1%     ~     (p=0.081 n=10+9)
      AddVV/4-8                           5.59ns ± 2%     5.59ns ± 2%     ~     (p=0.809 n=10+10)
      AddVV/5-8                           6.63ns ± 1%     6.62ns ± 1%     ~     (p=0.560 n=9+10)
      AddVV/10-8                          8.11ns ± 1%     8.11ns ± 2%     ~     (p=0.402 n=10+10)
      AddVV/100-8                         46.9ns ± 2%     46.8ns ± 1%     ~     (p=0.809 n=10+10)
      AddVV/1000-8                         389ns ± 1%      391ns ± 4%     ~     (p=0.809 n=10+10)
      AddVV/10000-8                       5.05µs ± 5%     4.98µs ± 2%     ~     (p=0.113 n=9+10)
      AddVV/100000-8                      55.3µs ± 3%     55.2µs ± 3%     ~     (p=0.796 n=10+10)
      AddVW/1-8                           3.04ns ± 3%     3.02ns ± 3%     ~     (p=0.538 n=10+10)
      AddVW/2-8                           3.57ns ± 2%     3.61ns ± 2%   +1.12%  (p=0.032 n=9+9)
      AddVW/3-8                           3.77ns ± 1%     3.79ns ± 2%     ~     (p=0.719 n=10+10)
      AddVW/4-8                           4.69ns ± 1%     4.69ns ± 2%     ~     (p=0.920 n=10+9)
      AddVW/5-8                           4.58ns ± 1%     4.58ns ± 1%     ~     (p=0.812 n=10+10)
      AddVW/10-8                          7.62ns ± 2%     7.63ns ± 1%     ~     (p=0.926 n=10+10)
      AddVW/100-8                         41.1ns ± 2%     42.4ns ± 3%   +3.34%  (p=0.000 n=10+10)
      AddVW/1000-8                         386ns ± 2%      389ns ± 4%     ~     (p=0.514 n=10+10)
      AddVW/10000-8                       3.88µs ± 3%     3.87µs ± 3%     ~     (p=0.448 n=10+10)
      AddVW/100000-8                      41.2µs ± 3%     41.7µs ± 3%     ~     (p=0.148 n=10+10)
      AddMulVVW/1-8                       3.28ns ± 2%     3.26ns ± 3%     ~     (p=0.107 n=10+10)
      AddMulVVW/2-8                       4.26ns ± 2%     4.24ns ± 3%     ~     (p=0.327 n=9+9)
      AddMulVVW/3-8                       5.07ns ± 2%     5.26ns ± 2%   +3.73%  (p=0.000 n=10+10)
      AddMulVVW/4-8                       6.40ns ± 2%     6.50ns ± 2%   +1.61%  (p=0.000 n=10+10)
      AddMulVVW/5-8                       6.77ns ± 2%     6.86ns ± 1%   +1.38%  (p=0.001 n=9+9)
      AddMulVVW/10-8                      12.2ns ± 2%     10.6ns ± 3%  -13.65%  (p=0.000 n=10+10)
      AddMulVVW/100-8                     79.7ns ± 2%     52.4ns ± 1%  -34.17%  (p=0.000 n=10+10)
      AddMulVVW/1000-8                     695ns ± 1%      491ns ± 2%  -29.39%  (p=0.000 n=9+10)
      AddMulVVW/10000-8                   7.26µs ± 2%     5.92µs ± 6%  -18.42%  (p=0.000 n=10+10)
      AddMulVVW/100000-8                  72.6µs ± 2%     62.2µs ± 2%  -14.31%  (p=0.000 n=10+10)
      DecimalConversion-8                  108µs ±19%      104µs ± 4%     ~     (p=0.460 n=10+8)
      FloatString/100-8                    926ns ±14%      908ns ± 5%     ~     (p=0.398 n=9+9)
      FloatString/1000-8                  25.7µs ± 1%     25.7µs ± 1%     ~     (p=0.739 n=10+10)
      FloatString/10000-8                 2.13ms ± 1%     2.12ms ± 1%     ~     (p=0.353 n=10+10)
      FloatString/100000-8                 207ms ± 1%      206ms ± 2%     ~     (p=0.912 n=10+10)
      FloatAdd/10-8                       61.3ns ± 3%     61.9ns ± 3%     ~     (p=0.183 n=10+10)
      FloatAdd/100-8                      62.0ns ± 2%     62.9ns ± 4%     ~     (p=0.118 n=10+10)
      FloatAdd/1000-8                     84.7ns ± 2%     84.4ns ± 1%     ~     (p=0.591 n=10+10)
      FloatAdd/10000-8                     305ns ± 2%      306ns ± 1%     ~     (p=0.443 n=10+10)
      FloatAdd/100000-8                   2.45µs ± 1%     2.46µs ± 1%     ~     (p=0.782 n=10+10)
      FloatSub/10-8                       56.8ns ± 4%     56.5ns ± 5%     ~     (p=0.423 n=10+10)
      FloatSub/100-8                      57.3ns ± 4%     57.1ns ± 5%     ~     (p=0.540 n=10+10)
      FloatSub/1000-8                     66.8ns ± 4%     66.6ns ± 1%     ~     (p=0.868 n=10+10)
      FloatSub/10000-8                     199ns ± 1%      198ns ± 1%     ~     (p=0.287 n=10+9)
      FloatSub/100000-8                   1.47µs ± 2%     1.47µs ± 2%     ~     (p=0.920 n=10+9)
      ParseFloatSmallExp-8                8.74µs ±10%     9.48µs ±10%   +8.51%  (p=0.010 n=9+10)
      ParseFloatLargeExp-8                39.2µs ±25%     39.6µs ±12%     ~     (p=0.529 n=10+10)
      GCD10x10/WithoutXY-8                 173ns ±23%      177ns ±20%     ~     (p=0.698 n=10+10)
      GCD10x10/WithXY-8                    736ns ±12%      728ns ±16%     ~     (p=0.838 n=10+10)
      GCD10x100/WithoutXY-8                325ns ±16%      326ns ±14%     ~     (p=0.912 n=10+10)
      GCD10x100/WithXY-8                  1.14µs ±13%     1.16µs ± 6%     ~     (p=0.287 n=10+9)
      GCD10x1000/WithoutXY-8               851ns ±25%      820ns ±12%     ~     (p=0.592 n=10+10)
      GCD10x1000/WithXY-8                 2.89µs ±17%     2.85µs ± 5%     ~     (p=1.000 n=10+9)
      GCD10x10000/WithoutXY-8             6.66µs ±12%     6.82µs ±19%     ~     (p=0.529 n=10+10)
      GCD10x10000/WithXY-8                18.0µs ± 5%     17.2µs ±19%     ~     (p=0.315 n=7+10)
      GCD10x100000/WithoutXY-8            77.8µs ±18%     73.3µs ±11%     ~     (p=0.315 n=10+9)
      GCD10x100000/WithXY-8                186µs ±14%      204µs ±29%     ~     (p=0.218 n=10+10)
      GCD100x100/WithoutXY-8              1.09µs ± 1%     1.09µs ± 2%     ~     (p=0.117 n=9+10)
      GCD100x100/WithXY-8                 7.93µs ± 1%     7.97µs ± 1%   +0.52%  (p=0.006 n=10+10)
      GCD100x1000/WithoutXY-8             2.00µs ± 3%     2.04µs ± 6%     ~     (p=0.053 n=9+10)
      GCD100x1000/WithXY-8                9.23µs ± 1%     9.29µs ± 1%   +0.63%  (p=0.009 n=10+10)
      GCD100x10000/WithoutXY-8            10.2µs ±11%      9.7µs ± 6%     ~     (p=0.278 n=10+9)
      GCD100x10000/WithXY-8               33.3µs ± 4%     33.6µs ± 4%     ~     (p=0.481 n=10+10)
      GCD100x100000/WithoutXY-8            106µs ±17%      105µs ±13%     ~     (p=0.853 n=10+10)
      GCD100x100000/WithXY-8               289µs ±17%      276µs ± 8%     ~     (p=0.353 n=10+10)
      GCD1000x1000/WithoutXY-8            12.2µs ± 1%     12.1µs ± 1%   -0.45%  (p=0.007 n=10+10)
      GCD1000x1000/WithXY-8                131µs ± 1%      132µs ± 0%   +0.93%  (p=0.000 n=9+7)
      GCD1000x10000/WithoutXY-8           20.6µs ± 2%     20.6µs ± 1%     ~     (p=0.326 n=10+9)
      GCD1000x10000/WithXY-8               238µs ± 1%      237µs ± 1%     ~     (p=0.356 n=9+10)
      GCD1000x100000/WithoutXY-8           117µs ± 8%      114µs ±11%     ~     (p=0.190 n=10+10)
      GCD1000x100000/WithXY-8             1.51ms ± 1%     1.50ms ± 1%     ~     (p=0.053 n=9+10)
      GCD10000x10000/WithoutXY-8           220µs ± 1%      218µs ± 1%   -0.86%  (p=0.000 n=10+10)
      GCD10000x10000/WithXY-8             3.04ms ± 0%     3.05ms ± 0%   +0.33%  (p=0.001 n=9+10)
      GCD10000x100000/WithoutXY-8          513µs ± 0%      511µs ± 0%   -0.38%  (p=0.000 n=10+10)
      GCD10000x100000/WithXY-8            15.1ms ± 0%     15.0ms ± 0%     ~     (p=0.053 n=10+9)
      GCD100000x100000/WithoutXY-8        10.4ms ± 1%     10.4ms ± 2%     ~     (p=0.258 n=9+9)
      GCD100000x100000/WithXY-8            205ms ± 1%      205ms ± 1%     ~     (p=0.481 n=10+10)
      Hilbert-8                           1.25ms ±15%     1.24ms ±17%     ~     (p=0.853 n=10+10)
      Binomial-8                          3.03µs ±24%     2.90µs ±16%     ~     (p=0.481 n=10+10)
      QuoRem-8                            1.95µs ± 1%     1.95µs ± 2%     ~     (p=0.117 n=9+10)
      Exp-8                               5.12ms ± 2%     3.99ms ± 1%  -22.02%  (p=0.000 n=10+9)
      Exp2-8                              5.14ms ± 2%     3.98ms ± 0%  -22.55%  (p=0.000 n=10+9)
      Bitset-8                            16.4ns ± 2%     16.5ns ± 2%     ~     (p=0.311 n=9+10)
      BitsetNeg-8                         46.3ns ± 4%     45.8ns ± 4%     ~     (p=0.272 n=10+10)
      BitsetOrig-8                         250ns ±19%      247ns ±14%     ~     (p=0.671 n=10+10)
      BitsetNegOrig-8                      416ns ±14%      429ns ±14%     ~     (p=0.353 n=10+10)
      ModSqrt225_Tonelli-8                 400µs ± 0%      320µs ± 0%  -19.88%  (p=0.000 n=9+7)
      ModSqrt224_3Mod4-8                   123µs ± 1%       97µs ± 0%  -21.21%  (p=0.000 n=9+10)
      ModSqrt5430_Tonelli-8                1.87s ± 0%      1.39s ± 1%  -25.70%  (p=0.000 n=9+10)
      ModSqrt5430_3Mod4-8                  630ms ± 2%      465ms ± 1%  -26.12%  (p=0.000 n=10+10)
      Sqrt-8                              25.8µs ± 1%     25.9µs ± 0%   +0.66%  (p=0.002 n=10+8)
      IntSqr/1-8                          11.3ns ± 1%     11.3ns ± 2%     ~     (p=0.360 n=9+10)
      IntSqr/2-8                          26.6ns ± 1%     27.4ns ± 2%   +2.87%  (p=0.000 n=8+9)
      IntSqr/3-8                          36.5ns ± 6%     36.6ns ± 5%     ~     (p=0.589 n=10+10)
      IntSqr/5-8                          57.2ns ± 2%     57.8ns ± 1%   +0.92%  (p=0.045 n=10+9)
      IntSqr/8-8                           112ns ± 1%       93ns ± 1%  -16.60%  (p=0.000 n=10+10)
      IntSqr/10-8                          148ns ± 1%      129ns ± 5%  -12.85%  (p=0.000 n=10+10)
      IntSqr/20-8                          642ns ±28%      692ns ±21%     ~     (p=0.105 n=10+10)
      IntSqr/30-8                         1.03µs ±18%     1.06µs ±15%     ~     (p=0.422 n=10+8)
      IntSqr/50-8                         2.33µs ±14%     2.14µs ±20%     ~     (p=0.063 n=10+10)
      IntSqr/80-8                         4.06µs ±13%     3.72µs ±14%   -8.31%  (p=0.029 n=10+10)
      IntSqr/100-8                        5.79µs ±10%     5.20µs ±18%  -10.15%  (p=0.004 n=10+10)
      IntSqr/200-8                        17.1µs ± 1%     12.9µs ± 3%  -24.44%  (p=0.000 n=10+10)
      IntSqr/300-8                        35.9µs ± 0%     26.6µs ± 1%  -25.75%  (p=0.000 n=10+10)
      IntSqr/500-8                        84.9µs ± 0%     71.7µs ± 1%  -15.49%  (p=0.000 n=10+10)
      IntSqr/800-8                         170µs ± 1%      142µs ± 2%  -16.73%  (p=0.000 n=10+10)
      IntSqr/1000-8                        258µs ± 1%      218µs ± 1%  -15.65%  (p=0.000 n=10+10)
      Mul-8                               10.4ms ± 1%      8.3ms ± 0%  -20.05%  (p=0.000 n=10+9)
      Exp3Power/0x10-8                     311ns ±15%      321ns ±24%     ~     (p=0.447 n=10+10)
      Exp3Power/0x40-8                     358ns ±21%      346ns ±37%     ~     (p=0.591 n=10+10)
      Exp3Power/0x100-8                    611ns ±19%      570ns ±27%     ~     (p=0.393 n=10+10)
      Exp3Power/0x400-8                   1.31µs ±26%     1.34µs ±19%     ~     (p=0.853 n=10+10)
      Exp3Power/0x1000-8                  6.76µs ±23%     6.22µs ±16%     ~     (p=0.095 n=10+9)
      Exp3Power/0x4000-8                  37.6µs ±14%     36.4µs ±21%     ~     (p=0.247 n=10+10)
      Exp3Power/0x10000-8                  345µs ±14%      310µs ±11%   -9.99%  (p=0.005 n=10+10)
      Exp3Power/0x40000-8                 2.77ms ± 1%     2.34ms ± 1%  -15.47%  (p=0.000 n=10+10)
      Exp3Power/0x100000-8                25.1ms ± 1%     21.3ms ± 1%  -15.26%  (p=0.000 n=10+10)
      Exp3Power/0x400000-8                 225ms ± 1%      190ms ± 1%  -15.61%  (p=0.000 n=10+10)
      Fibo-8                              23.4ms ± 1%     23.3ms ± 0%     ~     (p=0.052 n=10+10)
      NatSqr/1-8                          58.4ns ±24%     59.8ns ±38%     ~     (p=0.739 n=10+10)
      NatSqr/2-8                           122ns ±21%      122ns ±16%     ~     (p=0.896 n=10+10)
      NatSqr/3-8                           140ns ±28%      148ns ±30%     ~     (p=0.288 n=10+10)
      NatSqr/5-8                           193ns ±29%      210ns ±34%     ~     (p=0.469 n=10+10)
      NatSqr/8-8                           317ns ±21%      296ns ±25%     ~     (p=0.393 n=10+10)
      NatSqr/10-8                          362ns ± 8%      373ns ±30%     ~     (p=0.617 n=9+10)
      NatSqr/20-8                         1.24µs ±16%     1.06µs ±29%  -14.57%  (p=0.019 n=10+10)
      NatSqr/30-8                         1.90µs ±32%     1.71µs ±10%     ~     (p=0.176 n=10+9)
      NatSqr/50-8                         4.22µs ±19%     3.67µs ± 7%  -13.03%  (p=0.017 n=10+9)
      NatSqr/80-8                         7.33µs ±20%     6.50µs ±15%  -11.26%  (p=0.009 n=10+10)
      NatSqr/100-8                        9.84µs ±18%     9.33µs ± 8%     ~     (p=0.280 n=10+10)
      NatSqr/200-8                        21.4µs ± 7%     20.0µs ±14%     ~     (p=0.075 n=10+10)
      NatSqr/300-8                        38.0µs ± 2%     31.3µs ±10%  -17.63%  (p=0.000 n=10+10)
      NatSqr/500-8                         102µs ± 5%      101µs ± 4%     ~     (p=0.780 n=9+10)
      NatSqr/800-8                         190µs ± 3%      166µs ± 6%  -12.29%  (p=0.000 n=10+10)
      NatSqr/1000-8                        277µs ± 2%      245µs ± 6%  -11.64%  (p=0.000 n=10+10)
      ScanPi-8                             144µs ±23%      149µs ±24%     ~     (p=0.579 n=10+10)
      StringPiParallel-8                  25.6µs ± 0%     25.8µs ± 0%   +0.69%  (p=0.000 n=9+10)
      Scan/10/Base2-8                      305ns ± 1%      309ns ± 1%   +1.32%  (p=0.000 n=10+9)
      Scan/100/Base2-8                    1.95µs ± 1%     1.98µs ± 1%   +1.10%  (p=0.000 n=10+10)
      Scan/1000/Base2-8                   19.5µs ± 1%     19.7µs ± 1%   +1.39%  (p=0.000 n=10+10)
      Scan/10000/Base2-8                   270µs ± 1%      272µs ± 1%   +0.58%  (p=0.024 n=9+9)
      Scan/100000/Base2-8                 10.3ms ± 0%     10.3ms ± 0%   +0.16%  (p=0.022 n=9+10)
      Scan/10/Base8-8                      146ns ± 4%      154ns ± 4%   +5.57%  (p=0.000 n=9+9)
      Scan/100/Base8-8                     748ns ± 1%      759ns ± 1%   +1.51%  (p=0.000 n=9+10)
      Scan/1000/Base8-8                   7.88µs ± 1%     8.00µs ± 1%   +1.64%  (p=0.000 n=10+10)
      Scan/10000/Base8-8                   155µs ± 1%      155µs ± 1%     ~     (p=0.968 n=10+9)
      Scan/100000/Base8-8                 9.11ms ± 0%     9.11ms ± 0%     ~     (p=0.604 n=9+10)
      Scan/10/Base10-8                     140ns ± 5%      149ns ± 5%   +6.39%  (p=0.000 n=9+10)
      Scan/100/Base10-8                    680ns ± 0%      688ns ± 1%   +1.08%  (p=0.000 n=9+10)
      Scan/1000/Base10-8                  7.09µs ± 1%     7.16µs ± 1%   +0.98%  (p=0.019 n=10+10)
      Scan/10000/Base10-8                  149µs ± 3%      150µs ± 3%     ~     (p=0.143 n=10+10)
      Scan/100000/Base10-8                9.16ms ± 0%     9.16ms ± 0%     ~     (p=0.661 n=10+9)
      Scan/10/Base16-8                     134ns ± 5%      135ns ± 3%     ~     (p=0.505 n=9+9)
      Scan/100/Base16-8                    560ns ± 1%      563ns ± 0%   +0.67%  (p=0.000 n=10+8)
      Scan/1000/Base16-8                  6.28µs ± 1%     6.26µs ± 1%     ~     (p=0.448 n=10+10)
      Scan/10000/Base16-8                  161µs ± 1%      162µs ± 1%   +0.74%  (p=0.008 n=9+9)
      Scan/100000/Base16-8                9.64ms ± 0%     9.64ms ± 0%     ~     (p=0.436 n=10+10)
      String/10/Base2-8                    116ns ±12%      118ns ±13%     ~     (p=0.645 n=10+10)
      String/100/Base2-8                   871ns ±23%      860ns ±22%     ~     (p=0.699 n=10+10)
      String/1000/Base2-8                 10.0µs ±20%     10.0µs ±23%     ~     (p=0.853 n=10+10)
      String/10000/Base2-8                 110µs ±21%      120µs ±25%     ~     (p=0.436 n=10+10)
      String/100000/Base2-8                768µs ±11%      733µs ±16%     ~     (p=0.393 n=10+10)
      String/10/Base8-8                   51.3ns ± 1%     51.0ns ± 3%     ~     (p=0.286 n=9+9)
      String/100/Base8-8                   284ns ± 9%      272ns ±12%     ~     (p=0.267 n=9+10)
      String/1000/Base8-8                 3.06µs ± 9%     3.04µs ±10%     ~     (p=0.739 n=10+10)
      String/10000/Base8-8                36.1µs ±14%     35.1µs ± 9%     ~     (p=0.447 n=10+9)
      String/100000/Base8-8                371µs ±12%      373µs ±16%     ~     (p=0.739 n=10+10)
      String/10/Base10-8                   167ns ±11%      165ns ± 9%     ~     (p=0.781 n=10+10)
      String/100/Base10-8                  727ns ± 1%      740ns ± 2%   +1.70%  (p=0.001 n=10+10)
      String/1000/Base10-8                5.30µs ±18%     5.37µs ±14%     ~     (p=0.631 n=10+10)
      String/10000/Base10-8               45.0µs ±14%     44.6µs ±10%     ~     (p=0.720 n=9+10)
      String/100000/Base10-8              5.10ms ± 1%     5.05ms ± 3%     ~     (p=0.211 n=9+10)
      String/10/Base16-8                  47.7ns ± 6%     47.7ns ± 6%     ~     (p=0.985 n=10+10)
      String/100/Base16-8                  221ns ±10%      234ns ±27%     ~     (p=0.541 n=10+10)
      String/1000/Base16-8                2.23µs ±11%     2.12µs ± 8%   -4.81%  (p=0.029 n=9+8)
      String/10000/Base16-8               28.3µs ±21%     28.5µs ±14%     ~     (p=0.796 n=10+10)
      String/100000/Base16-8               291µs ±16%      293µs ±15%     ~     (p=0.931 n=9+9)
      LeafSize/0-8                        2.43ms ± 1%     2.49ms ± 1%   +2.56%  (p=0.000 n=10+10)
      LeafSize/1-8                        49.7µs ± 9%     46.3µs ±16%   -6.78%  (p=0.017 n=10+9)
      LeafSize/2-8                        48.4µs ±18%     46.3µs ±19%     ~     (p=0.436 n=10+10)
      LeafSize/3-8                        81.7µs ± 3%     80.9µs ± 3%     ~     (p=0.278 n=10+9)
      LeafSize/4-8                        47.0µs ± 7%     47.9µs ±13%     ~     (p=0.905 n=9+10)
      LeafSize/5-8                        96.8µs ± 1%     97.3µs ± 2%     ~     (p=0.515 n=8+10)
      LeafSize/6-8                        82.5µs ± 4%     80.9µs ± 2%   -1.92%  (p=0.019 n=10+10)
      LeafSize/7-8                        67.2µs ±13%     66.6µs ± 9%     ~     (p=0.842 n=10+9)
      LeafSize/8-8                        46.0µs ±28%     45.1µs ±12%     ~     (p=0.739 n=10+10)
      LeafSize/9-8                         111µs ± 1%      111µs ± 1%     ~     (p=0.739 n=10+10)
      LeafSize/10-8                       98.8µs ± 4%     97.9µs ± 3%     ~     (p=0.278 n=10+9)
      LeafSize/11-8                       96.8µs ± 1%     96.4µs ± 1%     ~     (p=0.211 n=9+10)
      LeafSize/12-8                       81.0µs ± 4%     81.3µs ± 3%     ~     (p=0.579 n=10+10)
      LeafSize/13-8                       79.7µs ± 5%     79.2µs ± 3%     ~     (p=0.661 n=10+9)
      LeafSize/14-8                       67.6µs ±12%     65.8µs ± 7%     ~     (p=0.447 n=10+9)
      LeafSize/15-8                       63.9µs ±17%     66.3µs ±14%     ~     (p=0.481 n=10+10)
      LeafSize/16-8                       44.0µs ±28%     46.0µs ±27%     ~     (p=0.481 n=10+10)
      LeafSize/32-8                       46.2µs ±13%     43.5µs ±18%     ~     (p=0.156 n=9+10)
      LeafSize/64-8                       53.3µs ±10%     53.0µs ±19%     ~     (p=0.730 n=9+9)
      ProbablyPrime/n=0-8                 3.60ms ± 1%     3.39ms ± 1%   -5.87%  (p=0.000 n=10+9)
      ProbablyPrime/n=1-8                 4.42ms ± 1%     4.08ms ± 1%   -7.69%  (p=0.000 n=10+10)
      ProbablyPrime/n=5-8                 7.57ms ± 2%     6.79ms ± 1%  -10.24%  (p=0.000 n=10+10)
      ProbablyPrime/n=10-8                11.6ms ± 2%     10.2ms ± 1%  -11.69%  (p=0.000 n=10+10)
      ProbablyPrime/n=20-8                19.4ms ± 2%     16.9ms ± 2%  -12.89%  (p=0.000 n=10+10)
      ProbablyPrime/Lucas-8               2.81ms ± 2%     2.72ms ± 1%   -3.22%  (p=0.000 n=10+9)
      ProbablyPrime/MillerRabinBase2-8     797µs ± 1%      680µs ± 1%  -14.64%  (p=0.000 n=10+10)
      
      name                              old speed      new speed       delta
      AddVV/1-8                         17.1GB/s ± 6%   18.0GB/s ± 2%     ~     (p=0.122 n=10+8)
      AddVV/2-8                         32.4GB/s ± 2%   32.2GB/s ± 4%     ~     (p=0.661 n=10+9)
      AddVV/3-8                         38.6GB/s ± 2%   38.9GB/s ± 1%     ~     (p=0.113 n=10+9)
      AddVV/4-8                         45.8GB/s ± 2%   45.8GB/s ± 2%     ~     (p=0.796 n=10+10)
      AddVV/5-8                         48.1GB/s ± 2%   48.3GB/s ± 1%     ~     (p=0.315 n=10+10)
      AddVV/10-8                        78.9GB/s ± 1%   78.9GB/s ± 2%     ~     (p=0.353 n=10+10)
      AddVV/100-8                        136GB/s ± 2%    137GB/s ± 1%     ~     (p=0.971 n=10+10)
      AddVV/1000-8                       164GB/s ± 1%    164GB/s ± 4%     ~     (p=0.853 n=10+10)
      AddVV/10000-8                      126GB/s ± 6%    129GB/s ± 2%     ~     (p=0.063 n=10+10)
      AddVV/100000-8                     116GB/s ± 3%    116GB/s ± 3%     ~     (p=0.796 n=10+10)
      AddVW/1-8                         2.64GB/s ± 3%   2.64GB/s ± 3%     ~     (p=0.579 n=10+10)
      AddVW/2-8                         4.49GB/s ± 2%   4.44GB/s ± 2%   -1.09%  (p=0.040 n=9+9)
      AddVW/3-8                         6.36GB/s ± 1%   6.34GB/s ± 2%     ~     (p=0.684 n=10+10)
      AddVW/4-8                         6.83GB/s ± 1%   6.82GB/s ± 2%     ~     (p=0.905 n=10+9)
      AddVW/5-8                         8.75GB/s ± 1%   8.73GB/s ± 1%     ~     (p=0.796 n=10+10)
      AddVW/10-8                        10.5GB/s ± 2%   10.5GB/s ± 1%     ~     (p=0.971 n=10+10)
      AddVW/100-8                       19.5GB/s ± 2%   18.9GB/s ± 2%   -3.22%  (p=0.000 n=10+10)
      AddVW/1000-8                      20.7GB/s ± 2%   20.6GB/s ± 4%     ~     (p=0.631 n=10+10)
      AddVW/10000-8                     20.6GB/s ± 3%   20.7GB/s ± 3%     ~     (p=0.481 n=10+10)
      AddVW/100000-8                    19.4GB/s ± 2%   19.2GB/s ± 3%     ~     (p=0.165 n=10+10)
      AddMulVVW/1-8                     19.5GB/s ± 2%   19.7GB/s ± 3%     ~     (p=0.123 n=10+10)
      AddMulVVW/2-8                     30.1GB/s ± 2%   30.2GB/s ± 3%     ~     (p=0.297 n=9+9)
      AddMulVVW/3-8                     37.9GB/s ± 2%   36.5GB/s ± 2%   -3.63%  (p=0.000 n=10+10)
      AddMulVVW/4-8                     40.0GB/s ± 2%   39.4GB/s ± 2%   -1.58%  (p=0.001 n=10+10)
      AddMulVVW/5-8                     47.3GB/s ± 2%   46.6GB/s ± 1%   -1.35%  (p=0.001 n=9+9)
      AddMulVVW/10-8                    52.3GB/s ± 2%   60.6GB/s ± 3%  +15.76%  (p=0.000 n=10+10)
      AddMulVVW/100-8                   80.3GB/s ± 2%  122.1GB/s ± 1%  +51.92%  (p=0.000 n=10+10)
      AddMulVVW/1000-8                  92.0GB/s ± 1%  130.3GB/s ± 2%  +41.61%  (p=0.000 n=9+10)
      AddMulVVW/10000-8                 88.2GB/s ± 2%  108.2GB/s ± 5%  +22.66%  (p=0.000 n=10+10)
      AddMulVVW/100000-8                88.2GB/s ± 2%  102.9GB/s ± 2%  +16.69%  (p=0.000 n=10+10)
      
      Change-Id: Ic98e30c91d437d845fed03e07e976c3fdbf02b36
      Reviewed-on: https://go-review.googlesource.com/74851
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAdam Langley <agl@golang.org>
      c3935c08
  4. 23 Feb, 2018 17 commits
    • Joe Tsai's avatar
      archive/zip: fix handling of Info-ZIP Unix extended timestamps · 9697a119
      Joe Tsai authored
      The Info-ZIP Unix1 extra field is specified as such:
      >>>
      Value    Size   Description
      -----    ----   -----------
      0x5855   Short  tag for this extra block type ("UX")
      TSize    Short  total data size for this block
      AcTime   Long   time of last access (GMT/UTC)
      ModTime  Long   time of last modification (GMT/UTC)
      <<<
      
      The previous handling was incorrect in that it read the AcTime field
      instead of the ModTime field.
      
      The test-osx.zip test unfortunately locked in the wrong behavior.
      Manually parsing that ZIP file shows that the encoded MS-DOS
      date and time are 0x4b5f and 0xa97d, which corresponds with a
      date of 2017-10-31 21:11:58, which matches the correct mod time
      (off by 1 second due to MS-DOS timestamp resolution).
      
      Fixes #23901
      
      Change-Id: I567824c66e8316b9acd103dbecde366874a4b7ef
      Reviewed-on: https://go-review.googlesource.com/96895
      Run-TryBot: Joe Tsai <joetsai@google.com>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      9697a119
    • Ian Lance Taylor's avatar
      runtime: don't check for String/Error methods in printany · 804e3e56
      Ian Lance Taylor authored
      They have either already been called by preprintpanics, or they can
      not be called safely because of the various conditions checked at the
      start of gopanic.
      
      Fixes #24059
      
      Change-Id: I4a6233d12c9f7aaaee72f343257ea108bae79241
      Reviewed-on: https://go-review.googlesource.com/96755Reviewed-by: 's avatarAustin Clements <austin@google.com>
      804e3e56
    • Yuval Pavel Zholkover's avatar
      os: respect umask in Mkdir and OpenFile on BSD systems when perm has ModeSticky set · a5e8e2d9
      Yuval Pavel Zholkover authored
      Instead of calling Chmod directly on perm, stat the created file/dir to extract the
      actual permission bits which can be different from perm due to umask.
      
      Fixes #23120.
      
      Change-Id: I3e70032451fc254bf48ce9627e98988f84af8d91
      Reviewed-on: https://go-review.googlesource.com/84477
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      a5e8e2d9
    • Austin Clements's avatar
      runtime: reduce arena size to 4MB on 64-bit Windows · 78846472
      Austin Clements authored
      Currently, we use 64MB heap arenas on 64-bit platforms. This works
      well on UNIX-like OSes because they treat untouched pages as
      essentially free. However, on Windows, committed memory is charged
      against a process whether or not it has demand-faulted physical pages
      in. Hence, on Windows, even a process with a tiny heap will commit
      64MB for one heap arena, plus another 32MB for the arena map. Things
      are much worse under the race detector, which increases the heap
      commitment by a factor of 5.5X, leading to 384MB of committed memory
      at runtime init.
      
      Fix this by reducing the heap arena size to 4MB on Windows.
      
      To counterbalance the effect of increasing the arena map size by a
      factor of 16, and to further reduce the impact of the commitment for
      the arena map, we switch from a single entry L1 arena map to a 64
      entry L1 arena map.
      
      Compared to the original arena design, this slows down the
      x/benchmarks garbage benchmark by 0.49% (the slow down of this commit
      alone is 1.59%, but the previous commit bought us a 1% speed-up):
      
      name                       old time/op  new time/op  delta
      Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.29ms ± 1%  +0.49%  (p=0.000 n=17+18)
      
      (https://perf.golang.org/search?q=upload:20180223.1)
      
      (This was measured on linux/amd64 by modifying its arena configuration
      as above.)
      
      Fixes #23900.
      
      Change-Id: I6b7fa5ecebee2947bf20cfeb78c248809469c6b1
      Reviewed-on: https://go-review.googlesource.com/96780
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      78846472
    • Austin Clements's avatar
      runtime: support a two-level arena map · ec252105
      Austin Clements authored
      Currently, the heap arena map is a single, large array that covers
      every possible arena frame in the entire address space. This is
      practical up to about 48 bits of address space with 64 MB arenas.
      
      However, there are two problems with this:
      
      1. mips64, ppc64, and s390x support full 64-bit address spaces (though
         on Linux only s390x has kernel support for 64-bit address spaces).
         On these platforms, it would be good to support these larger
         address spaces.
      
      2. On Windows, processes are charged for untouched memory, so for
         processes with small heaps, the mostly-untouched 32 MB arena map
         plus a 64 MB arena are significant overhead. Hence, it would be
         good to reduce both the arena map size and the arena size, but with
         a single-level arena, these are inversely proportional.
      
      This CL adds support for a two-level arena map. Arena frame numbers
      are now divided into arenaL1Bits of L1 index and arenaL2Bits of L2
      index.
      
      At the moment, arenaL1Bits is always 0, so we effectively have a
      single level map. We do a few things so that this has no cost beyond
      the current single-level map:
      
      1. We embed the L2 array directly in mheap, so if there's a single
         entry in the L2 array, the representation is identical to the
         current representation and there's no extra level of indirection.
      
      2. Hot code that accesses the arena map is structured so that it
         optimizes to nearly the same machine code as it does currently.
      
      3. We make some small tweaks to hot code paths and to the inliner
         itself to keep some important functions inlined despite their
         now-larger ASTs. In particular, this is necessary for
         heapBitsForAddr and heapBits.next.
      
      Possibly as a result of some of the tweaks, this actually slightly
      improves the performance of the x/benchmarks garbage benchmark:
      
      name                       old time/op  new time/op  delta
      Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.26ms ± 1%  -1.07%  (p=0.000 n=17+19)
      
      (https://perf.golang.org/search?q=upload:20180223.2)
      
      For #23900.
      
      Change-Id: If5164e0961754f97eb9eca58f837f36d759505ff
      Reviewed-on: https://go-review.googlesource.com/96779
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      ec252105
    • Austin Clements's avatar
      cmd/compile: teach front-end deadcode about && and || · 2dbf15e8
      Austin Clements authored
      The front-end dead code elimination is very simple. Currently, it just
      looks for if statements with constant boolean conditions. Its main
      purpose is to reduce load on the compiler and shrink code before
      inlining computes hairiness.
      
      This CL teaches front-end dead code elimination about short-circuiting
      boolean expressions && and ||, since they're essentially the same as
      if statements.
      
      This also teaches the inliner that the constant 'if' form left behind
      by deadcode is free.
      
      These changes will help with runtime modifications in the next CL that
      would otherwise inhibit inlining in some hot code paths. Currently,
      however, they have no significant impact on benchmarks.
      
      Change-Id: I886203b3c4acdbfef08148fddd7f3a7af5afc7c1
      Reviewed-on: https://go-review.googlesource.com/96778
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      2dbf15e8
    • Austin Clements's avatar
      runtime: rename "arena index" to "arena map" · 33b76920
      Austin Clements authored
      There are too many places where I want to talk about "indexing into
      the arena index". Make this less awkward and ambiguous by calling it
      the "arena map" instead.
      
      Change-Id: I726b0667bb2139dbc006175a0ec09a871cdf73f9
      Reviewed-on: https://go-review.googlesource.com/96777
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      33b76920
    • Austin Clements's avatar
      runtime: don't assume arena is in address order · 9680980e
      Austin Clements authored
      On amd64, the arena is no longer in address space order, but currently
      the heap dumper assumes that it is. Fix this assumption.
      
      Change-Id: Iab1953cd36b359d0fb78ed49e5eb813116a18855
      Reviewed-on: https://go-review.googlesource.com/96776
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      9680980e
    • Ian Lance Taylor's avatar
      path: use OS-specific function in MkdirAll, don't always keep trailing slash · b86e7668
      Ian Lance Taylor authored
      CL 86295 changed MkdirAll to always pass a trailing path separator to
      support extended-length paths on Windows.
      
      However, when Stat is called on an existing file followed by trailing
      slash, it will return a "not a directory" error, skipping the fast
      path at the beginning of MkdirAll.
      
      This change fixes MkdirAll to only pass the trailing path separator
      where required on Windows, by using an OS-specific function fixRootDirectory.
      
      Updates #23918
      
      Change-Id: I23f84a20e65ccce556efa743d026d352b4812c34
      Reviewed-on: https://go-review.googlesource.com/95255
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid du Colombier <0intro@gmail.com>
      Reviewed-by: 's avatarAlex Brainman <alex.brainman@gmail.com>
      b86e7668
    • Daniel Martí's avatar
      cmd/vet: use type info to detect the atomic funcs · bae3fd66
      Daniel Martí authored
      Simply checking if a name is "atomic" isn't enough, as that might be a
      var or another imported package. Now that vet requires type information,
      we can do better. And add a simple regression test.
      
      Change-Id: Ibd2004428374e3628cd3cd0ffb5f37cedaf448ea
      Reviewed-on: https://go-review.googlesource.com/91795
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      bae3fd66
    • Adam Langley's avatar
      crypto/x509: tighten EKU checking for requested EKUs. · 0681c7c3
      Adam Langley authored
      There are, sadly, many exceptions to EKU checking to reflect mistakes
      that CAs have made in practice. However, the requirements for checking
      requested EKUs against the leaf should be tighter than for checking leaf
      EKUs against a CA.
      
      Fixes #23884
      
      Change-Id: I05ea874c4ada0696d8bb18cac4377c0b398fcb5e
      Reviewed-on: https://go-review.googlesource.com/96379Reviewed-by: 's avatarJonathan Rudenberg <jonathan@titanous.com>
      Reviewed-by: 's avatarFilippo Valsorda <hi@filippo.io>
      Run-TryBot: Filippo Valsorda <hi@filippo.io>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      0681c7c3
    • Oleg Bulatov's avatar
      regexp: Regexp shouldn't keep references to inputs · 72635401
      Oleg Bulatov authored
      If you try to find something in a slice of bytes using a Regexp object,
      the byte array will not be released by GC until you use the Regexp object
      on another slice of bytes. It happens because the Regexp object keep
      references to the input data in its cache.
      
      Change-Id: I873107f15c1900aa53ccae5d29dbc885b9562808
      Reviewed-on: https://go-review.googlesource.com/96715Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      72635401
    • Alberto Donizetti's avatar
      cmd/compile: add code generation tests for sqrt intrinsics · 37a038a3
      Alberto Donizetti authored
      Add "sqrt-intrisified" code generation tests for mips64 and 386, where
      we weren't intrisifying math.Sqrt (see CL 96615 and CL 95916), and for
      mips and amd64, which lacked sqrt intrinsics tests.
      
      Change-Id: I0cfc08aec6eefd47f3cd7a5995a89393e8b7ed9e
      Reviewed-on: https://go-review.googlesource.com/96716
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      37a038a3
    • mingrammer's avatar
      runtime: rename the TestGcHashmapIndirection to TestGcMapIndirection · fceaa2e2
      mingrammer authored
      There was still the word 'Hashmap' in gc_test.go, so I renamed it to just 'Map'
      
      Previous renaming commit: https://golang.org/cl/90336
      
      Change-Id: I5b0e5c2229d1c30937c7216247f4533effb81ce7
      Reviewed-on: https://go-review.googlesource.com/96675Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      fceaa2e2
    • Alberto Donizetti's avatar
      cmd/compile: intrinsify math.Sqrt on 386 · 9ee78af8
      Alberto Donizetti authored
      It seems like all the pieces were already there, it only needed the
      final plumbing.
      
      Before:
      
      	0x001b 00027 (test.go:9)	MOVSD	X0, (SP)
      	0x0020 00032 (test.go:9)	CALL	math.Sqrt(SB)
      	0x0025 00037 (test.go:9)	MOVSD	8(SP), X0
      
      After:
      
      	0x0018 00024 (test.go:9)	SQRTSD	X0, X0
      
      name    old time/op  new time/op  delta
      Sqrt-4  4.60ns ± 2%  0.45ns ± 1%  -90.33%  (p=0.000 n=10+10)
      
      Change-Id: I0f623958e19e726840140bf9b495d3f3a9184b9d
      Reviewed-on: https://go-review.googlesource.com/96615
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      9ee78af8
    • Alberto Donizetti's avatar
      cmd/compile: use | in the last repetitive generic rules · f6c67813
      Alberto Donizetti authored
      This change or-ifies the last low-hanging rules in generic. Again,
      this is limited at short and repetitive rules, where the use or ors
      does not impact readability.
      
      Ran rulegen, no change in the actual compiler code.
      
      Change-Id: I972b523bc08532f173a3645b47d6936b6e1218c8
      Reviewed-on: https://go-review.googlesource.com/96335Reviewed-by: 's avatarGiovanni Bajo <rasky@develer.com>
      f6c67813
    • Jerrin Shaji George's avatar
      runtime: fix a few typos in comments · 5b3cd560
      Jerrin Shaji George authored
      Change-Id: I07a1eb02ffc621c5696b49491181300bf411f822
      Reviewed-on: https://go-review.googlesource.com/96475Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      5b3cd560
  5. 22 Feb, 2018 7 commits