1. 26 Feb, 2018 8 commits
  2. 25 Feb, 2018 1 commit
  3. 24 Feb, 2018 3 commits
    • Lubomir I. Ivanov (VMware)'s avatar
      os/user: obtain a user home path on Windows · 7a218942
      Lubomir I. Ivanov (VMware) authored
      newUserFromSid() is extended so that the retriaval of the user home
      path based on a user SID becomes possible.
      
      (1) The primary method it uses is to lookup the Windows registry for
      the following key:
        HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList\[SID]
      
      If the key does not exist the user might not have logged in yet.
      If (1) fails it falls back to (2)
      
      (2) The second method the function uses is to look at the default home
      path for users (e.g. WINAPI's GetProfilesDirectory()) and append
      the username to that. The procedure is in the lines of:
        c:\Users + \ + <username>
      
      The function newUser() now requires the following arguments:
        uid, gid, dir, username, domain
      This is done to avoid multiple calls to usid.String() and
      usid.LookupAccount("") in the case of a newUserFromSid()
      call stack.
      
      The functions current() and newUserFromSid() both call newUser()
      supplying the arguments in question. The helpers
      lookupUsernameAndDomain() and findHomeDirInRegistry() are
      added.
      
      This commit also updates:
      - go/build/deps_test.go, so that the test now includes the
      "internal/syscall/windows/registry" import.
      - os/user/user_test.go, so that User.HomeDir is tested on Windows.
      
      GitHub-Last-Rev: 25423e2a3820121f4c42321e7a77a3977f409724
      GitHub-Pull-Request: golang/go#23822
      Change-Id: I6c3ad1c4ce3e7bc0d1add024951711f615b84ee5
      Reviewed-on: https://go-review.googlesource.com/93935Reviewed-by: 's avatarAlex Brainman <alex.brainman@gmail.com>
      Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      7a218942
    • Daniel Martí's avatar
      cmd/compile/internal/syntax: use stringer for operators and tokens · c8791538
      Daniel Martí authored
      With its new -linecomment flag, it is now possible to use stringer on
      values whose strings aren't valid identifiers. This is the case with
      tokens and operators in Go.
      
      Operator alredy had inline comments with each operator's string
      representation; only minor modifications were needed. The inline
      comments were added to each of the token names, using the same strategy.
      
      Comments that were previously inline or part of the string arrays were
      moved to the line immediately before the name they correspond to.
      
      Finally, declare tokStrFast as a function that uses the generated arrays
      directly. Avoiding the branch and strconv call means that we avoid a
      performance regression in the scanner, perhaps due to the lack of
      mid-stack inlining.
      
      Performance is not affected. Measured with 'go test -run StdLib -fast'
      on an X1 Carbon Gen2 (i5-4300U @ 1.90GHz, 8GB RAM, SSD), the best of 5
      runs before and after the changes are:
      
      	parsed 1709399 lines (3763 files) in 1.707402159s (1001169 lines/s)
      	allocated 449.282Mb (263.137Mb/s)
      
      	parsed 1709329 lines (3765 files) in 1.706663154s (1001562 lines/s)
      	allocated 449.290Mb (263.256Mb/s)
      
      Change-Id: Idcc4f83393fcadd6579700e3602c09496ea2625b
      Reviewed-on: https://go-review.googlesource.com/95357Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      c8791538
    • Ilya Tocar's avatar
      math/big: speed-up addMulVVW on amd64 · c3935c08
      Ilya Tocar authored
      Use MULX/ADOX/ADCX instructions to speed-up addMulVVW,
      when they are available. addMulVVW is a hotspot in rsa.
      This is faster than ADD/ADC/IMUL version, because ADOX/ADCX only
      modify carry/overflow flag, so they can be interleaved with each other
      and with MULX, which doesn't modify flags at all.
      Increasing unroll factor to e. g. 16 makes rsa 1% faster, but 3PrimeRSA2048Decrypt
      performance falls back to baseline.
      
      Updates #20058
      
      AddMulVVW/1-8                       3.28ns ± 2%     3.26ns ± 3%     ~     (p=0.107 n=10+10)
      AddMulVVW/2-8                       4.26ns ± 2%     4.24ns ± 3%     ~     (p=0.327 n=9+9)
      AddMulVVW/3-8                       5.07ns ± 2%     5.26ns ± 2%   +3.73%  (p=0.000 n=10+10)
      AddMulVVW/4-8                       6.40ns ± 2%     6.50ns ± 2%   +1.61%  (p=0.000 n=10+10)
      AddMulVVW/5-8                       6.77ns ± 2%     6.86ns ± 1%   +1.38%  (p=0.001 n=9+9)
      AddMulVVW/10-8                      12.2ns ± 2%     10.6ns ± 3%  -13.65%  (p=0.000 n=10+10)
      AddMulVVW/100-8                     79.7ns ± 2%     52.4ns ± 1%  -34.17%  (p=0.000 n=10+10)
      AddMulVVW/1000-8                     695ns ± 1%      491ns ± 2%  -29.39%  (p=0.000 n=9+10)
      AddMulVVW/10000-8                   7.26µs ± 2%     5.92µs ± 6%  -18.42%  (p=0.000 n=10+10)
      AddMulVVW/100000-8                  72.6µs ± 2%     62.2µs ± 2%  -14.31%  (p=0.000 n=10+10)
      
      crypto/rsa speed-up is smaller, but stil noticeable:
      
      RSA2048Decrypt-8        1.61ms ± 1%  1.38ms ± 1%  -14.13%  (p=0.000 n=10+10)
      RSA2048Sign-8           1.93ms ± 1%  1.70ms ± 1%  -11.86%  (p=0.000 n=10+10)
      3PrimeRSA2048Decrypt-8   932µs ± 0%   828µs ± 0%  -11.15%  (p=0.000 n=10+10)
      
      Results on crypto/tls:
      
      HandshakeServer/RSA-8                        901µs ± 1%    777µs ± 0%  -13.70%  (p=0.000 n=10+8)
      HandshakeServer/ECDHE-P256-RSA-8            1.01ms ± 1%   0.90ms ± 0%  -11.53%  (p=0.000 n=10+9)
      
      Full math/big benchmarks:
      
      name                              old time/op    new time/op     delta
      AddVV/1-8                           3.74ns ± 6%     3.55ns ± 2%     ~     (p=0.082 n=10+8)
      AddVV/2-8                           3.96ns ± 2%     3.98ns ± 5%     ~     (p=0.794 n=10+9)
      AddVV/3-8                           4.97ns ± 2%     4.94ns ± 1%     ~     (p=0.081 n=10+9)
      AddVV/4-8                           5.59ns ± 2%     5.59ns ± 2%     ~     (p=0.809 n=10+10)
      AddVV/5-8                           6.63ns ± 1%     6.62ns ± 1%     ~     (p=0.560 n=9+10)
      AddVV/10-8                          8.11ns ± 1%     8.11ns ± 2%     ~     (p=0.402 n=10+10)
      AddVV/100-8                         46.9ns ± 2%     46.8ns ± 1%     ~     (p=0.809 n=10+10)
      AddVV/1000-8                         389ns ± 1%      391ns ± 4%     ~     (p=0.809 n=10+10)
      AddVV/10000-8                       5.05µs ± 5%     4.98µs ± 2%     ~     (p=0.113 n=9+10)
      AddVV/100000-8                      55.3µs ± 3%     55.2µs ± 3%     ~     (p=0.796 n=10+10)
      AddVW/1-8                           3.04ns ± 3%     3.02ns ± 3%     ~     (p=0.538 n=10+10)
      AddVW/2-8                           3.57ns ± 2%     3.61ns ± 2%   +1.12%  (p=0.032 n=9+9)
      AddVW/3-8                           3.77ns ± 1%     3.79ns ± 2%     ~     (p=0.719 n=10+10)
      AddVW/4-8                           4.69ns ± 1%     4.69ns ± 2%     ~     (p=0.920 n=10+9)
      AddVW/5-8                           4.58ns ± 1%     4.58ns ± 1%     ~     (p=0.812 n=10+10)
      AddVW/10-8                          7.62ns ± 2%     7.63ns ± 1%     ~     (p=0.926 n=10+10)
      AddVW/100-8                         41.1ns ± 2%     42.4ns ± 3%   +3.34%  (p=0.000 n=10+10)
      AddVW/1000-8                         386ns ± 2%      389ns ± 4%     ~     (p=0.514 n=10+10)
      AddVW/10000-8                       3.88µs ± 3%     3.87µs ± 3%     ~     (p=0.448 n=10+10)
      AddVW/100000-8                      41.2µs ± 3%     41.7µs ± 3%     ~     (p=0.148 n=10+10)
      AddMulVVW/1-8                       3.28ns ± 2%     3.26ns ± 3%     ~     (p=0.107 n=10+10)
      AddMulVVW/2-8                       4.26ns ± 2%     4.24ns ± 3%     ~     (p=0.327 n=9+9)
      AddMulVVW/3-8                       5.07ns ± 2%     5.26ns ± 2%   +3.73%  (p=0.000 n=10+10)
      AddMulVVW/4-8                       6.40ns ± 2%     6.50ns ± 2%   +1.61%  (p=0.000 n=10+10)
      AddMulVVW/5-8                       6.77ns ± 2%     6.86ns ± 1%   +1.38%  (p=0.001 n=9+9)
      AddMulVVW/10-8                      12.2ns ± 2%     10.6ns ± 3%  -13.65%  (p=0.000 n=10+10)
      AddMulVVW/100-8                     79.7ns ± 2%     52.4ns ± 1%  -34.17%  (p=0.000 n=10+10)
      AddMulVVW/1000-8                     695ns ± 1%      491ns ± 2%  -29.39%  (p=0.000 n=9+10)
      AddMulVVW/10000-8                   7.26µs ± 2%     5.92µs ± 6%  -18.42%  (p=0.000 n=10+10)
      AddMulVVW/100000-8                  72.6µs ± 2%     62.2µs ± 2%  -14.31%  (p=0.000 n=10+10)
      DecimalConversion-8                  108µs ±19%      104µs ± 4%     ~     (p=0.460 n=10+8)
      FloatString/100-8                    926ns ±14%      908ns ± 5%     ~     (p=0.398 n=9+9)
      FloatString/1000-8                  25.7µs ± 1%     25.7µs ± 1%     ~     (p=0.739 n=10+10)
      FloatString/10000-8                 2.13ms ± 1%     2.12ms ± 1%     ~     (p=0.353 n=10+10)
      FloatString/100000-8                 207ms ± 1%      206ms ± 2%     ~     (p=0.912 n=10+10)
      FloatAdd/10-8                       61.3ns ± 3%     61.9ns ± 3%     ~     (p=0.183 n=10+10)
      FloatAdd/100-8                      62.0ns ± 2%     62.9ns ± 4%     ~     (p=0.118 n=10+10)
      FloatAdd/1000-8                     84.7ns ± 2%     84.4ns ± 1%     ~     (p=0.591 n=10+10)
      FloatAdd/10000-8                     305ns ± 2%      306ns ± 1%     ~     (p=0.443 n=10+10)
      FloatAdd/100000-8                   2.45µs ± 1%     2.46µs ± 1%     ~     (p=0.782 n=10+10)
      FloatSub/10-8                       56.8ns ± 4%     56.5ns ± 5%     ~     (p=0.423 n=10+10)
      FloatSub/100-8                      57.3ns ± 4%     57.1ns ± 5%     ~     (p=0.540 n=10+10)
      FloatSub/1000-8                     66.8ns ± 4%     66.6ns ± 1%     ~     (p=0.868 n=10+10)
      FloatSub/10000-8                     199ns ± 1%      198ns ± 1%     ~     (p=0.287 n=10+9)
      FloatSub/100000-8                   1.47µs ± 2%     1.47µs ± 2%     ~     (p=0.920 n=10+9)
      ParseFloatSmallExp-8                8.74µs ±10%     9.48µs ±10%   +8.51%  (p=0.010 n=9+10)
      ParseFloatLargeExp-8                39.2µs ±25%     39.6µs ±12%     ~     (p=0.529 n=10+10)
      GCD10x10/WithoutXY-8                 173ns ±23%      177ns ±20%     ~     (p=0.698 n=10+10)
      GCD10x10/WithXY-8                    736ns ±12%      728ns ±16%     ~     (p=0.838 n=10+10)
      GCD10x100/WithoutXY-8                325ns ±16%      326ns ±14%     ~     (p=0.912 n=10+10)
      GCD10x100/WithXY-8                  1.14µs ±13%     1.16µs ± 6%     ~     (p=0.287 n=10+9)
      GCD10x1000/WithoutXY-8               851ns ±25%      820ns ±12%     ~     (p=0.592 n=10+10)
      GCD10x1000/WithXY-8                 2.89µs ±17%     2.85µs ± 5%     ~     (p=1.000 n=10+9)
      GCD10x10000/WithoutXY-8             6.66µs ±12%     6.82µs ±19%     ~     (p=0.529 n=10+10)
      GCD10x10000/WithXY-8                18.0µs ± 5%     17.2µs ±19%     ~     (p=0.315 n=7+10)
      GCD10x100000/WithoutXY-8            77.8µs ±18%     73.3µs ±11%     ~     (p=0.315 n=10+9)
      GCD10x100000/WithXY-8                186µs ±14%      204µs ±29%     ~     (p=0.218 n=10+10)
      GCD100x100/WithoutXY-8              1.09µs ± 1%     1.09µs ± 2%     ~     (p=0.117 n=9+10)
      GCD100x100/WithXY-8                 7.93µs ± 1%     7.97µs ± 1%   +0.52%  (p=0.006 n=10+10)
      GCD100x1000/WithoutXY-8             2.00µs ± 3%     2.04µs ± 6%     ~     (p=0.053 n=9+10)
      GCD100x1000/WithXY-8                9.23µs ± 1%     9.29µs ± 1%   +0.63%  (p=0.009 n=10+10)
      GCD100x10000/WithoutXY-8            10.2µs ±11%      9.7µs ± 6%     ~     (p=0.278 n=10+9)
      GCD100x10000/WithXY-8               33.3µs ± 4%     33.6µs ± 4%     ~     (p=0.481 n=10+10)
      GCD100x100000/WithoutXY-8            106µs ±17%      105µs ±13%     ~     (p=0.853 n=10+10)
      GCD100x100000/WithXY-8               289µs ±17%      276µs ± 8%     ~     (p=0.353 n=10+10)
      GCD1000x1000/WithoutXY-8            12.2µs ± 1%     12.1µs ± 1%   -0.45%  (p=0.007 n=10+10)
      GCD1000x1000/WithXY-8                131µs ± 1%      132µs ± 0%   +0.93%  (p=0.000 n=9+7)
      GCD1000x10000/WithoutXY-8           20.6µs ± 2%     20.6µs ± 1%     ~     (p=0.326 n=10+9)
      GCD1000x10000/WithXY-8               238µs ± 1%      237µs ± 1%     ~     (p=0.356 n=9+10)
      GCD1000x100000/WithoutXY-8           117µs ± 8%      114µs ±11%     ~     (p=0.190 n=10+10)
      GCD1000x100000/WithXY-8             1.51ms ± 1%     1.50ms ± 1%     ~     (p=0.053 n=9+10)
      GCD10000x10000/WithoutXY-8           220µs ± 1%      218µs ± 1%   -0.86%  (p=0.000 n=10+10)
      GCD10000x10000/WithXY-8             3.04ms ± 0%     3.05ms ± 0%   +0.33%  (p=0.001 n=9+10)
      GCD10000x100000/WithoutXY-8          513µs ± 0%      511µs ± 0%   -0.38%  (p=0.000 n=10+10)
      GCD10000x100000/WithXY-8            15.1ms ± 0%     15.0ms ± 0%     ~     (p=0.053 n=10+9)
      GCD100000x100000/WithoutXY-8        10.4ms ± 1%     10.4ms ± 2%     ~     (p=0.258 n=9+9)
      GCD100000x100000/WithXY-8            205ms ± 1%      205ms ± 1%     ~     (p=0.481 n=10+10)
      Hilbert-8                           1.25ms ±15%     1.24ms ±17%     ~     (p=0.853 n=10+10)
      Binomial-8                          3.03µs ±24%     2.90µs ±16%     ~     (p=0.481 n=10+10)
      QuoRem-8                            1.95µs ± 1%     1.95µs ± 2%     ~     (p=0.117 n=9+10)
      Exp-8                               5.12ms ± 2%     3.99ms ± 1%  -22.02%  (p=0.000 n=10+9)
      Exp2-8                              5.14ms ± 2%     3.98ms ± 0%  -22.55%  (p=0.000 n=10+9)
      Bitset-8                            16.4ns ± 2%     16.5ns ± 2%     ~     (p=0.311 n=9+10)
      BitsetNeg-8                         46.3ns ± 4%     45.8ns ± 4%     ~     (p=0.272 n=10+10)
      BitsetOrig-8                         250ns ±19%      247ns ±14%     ~     (p=0.671 n=10+10)
      BitsetNegOrig-8                      416ns ±14%      429ns ±14%     ~     (p=0.353 n=10+10)
      ModSqrt225_Tonelli-8                 400µs ± 0%      320µs ± 0%  -19.88%  (p=0.000 n=9+7)
      ModSqrt224_3Mod4-8                   123µs ± 1%       97µs ± 0%  -21.21%  (p=0.000 n=9+10)
      ModSqrt5430_Tonelli-8                1.87s ± 0%      1.39s ± 1%  -25.70%  (p=0.000 n=9+10)
      ModSqrt5430_3Mod4-8                  630ms ± 2%      465ms ± 1%  -26.12%  (p=0.000 n=10+10)
      Sqrt-8                              25.8µs ± 1%     25.9µs ± 0%   +0.66%  (p=0.002 n=10+8)
      IntSqr/1-8                          11.3ns ± 1%     11.3ns ± 2%     ~     (p=0.360 n=9+10)
      IntSqr/2-8                          26.6ns ± 1%     27.4ns ± 2%   +2.87%  (p=0.000 n=8+9)
      IntSqr/3-8                          36.5ns ± 6%     36.6ns ± 5%     ~     (p=0.589 n=10+10)
      IntSqr/5-8                          57.2ns ± 2%     57.8ns ± 1%   +0.92%  (p=0.045 n=10+9)
      IntSqr/8-8                           112ns ± 1%       93ns ± 1%  -16.60%  (p=0.000 n=10+10)
      IntSqr/10-8                          148ns ± 1%      129ns ± 5%  -12.85%  (p=0.000 n=10+10)
      IntSqr/20-8                          642ns ±28%      692ns ±21%     ~     (p=0.105 n=10+10)
      IntSqr/30-8                         1.03µs ±18%     1.06µs ±15%     ~     (p=0.422 n=10+8)
      IntSqr/50-8                         2.33µs ±14%     2.14µs ±20%     ~     (p=0.063 n=10+10)
      IntSqr/80-8                         4.06µs ±13%     3.72µs ±14%   -8.31%  (p=0.029 n=10+10)
      IntSqr/100-8                        5.79µs ±10%     5.20µs ±18%  -10.15%  (p=0.004 n=10+10)
      IntSqr/200-8                        17.1µs ± 1%     12.9µs ± 3%  -24.44%  (p=0.000 n=10+10)
      IntSqr/300-8                        35.9µs ± 0%     26.6µs ± 1%  -25.75%  (p=0.000 n=10+10)
      IntSqr/500-8                        84.9µs ± 0%     71.7µs ± 1%  -15.49%  (p=0.000 n=10+10)
      IntSqr/800-8                         170µs ± 1%      142µs ± 2%  -16.73%  (p=0.000 n=10+10)
      IntSqr/1000-8                        258µs ± 1%      218µs ± 1%  -15.65%  (p=0.000 n=10+10)
      Mul-8                               10.4ms ± 1%      8.3ms ± 0%  -20.05%  (p=0.000 n=10+9)
      Exp3Power/0x10-8                     311ns ±15%      321ns ±24%     ~     (p=0.447 n=10+10)
      Exp3Power/0x40-8                     358ns ±21%      346ns ±37%     ~     (p=0.591 n=10+10)
      Exp3Power/0x100-8                    611ns ±19%      570ns ±27%     ~     (p=0.393 n=10+10)
      Exp3Power/0x400-8                   1.31µs ±26%     1.34µs ±19%     ~     (p=0.853 n=10+10)
      Exp3Power/0x1000-8                  6.76µs ±23%     6.22µs ±16%     ~     (p=0.095 n=10+9)
      Exp3Power/0x4000-8                  37.6µs ±14%     36.4µs ±21%     ~     (p=0.247 n=10+10)
      Exp3Power/0x10000-8                  345µs ±14%      310µs ±11%   -9.99%  (p=0.005 n=10+10)
      Exp3Power/0x40000-8                 2.77ms ± 1%     2.34ms ± 1%  -15.47%  (p=0.000 n=10+10)
      Exp3Power/0x100000-8                25.1ms ± 1%     21.3ms ± 1%  -15.26%  (p=0.000 n=10+10)
      Exp3Power/0x400000-8                 225ms ± 1%      190ms ± 1%  -15.61%  (p=0.000 n=10+10)
      Fibo-8                              23.4ms ± 1%     23.3ms ± 0%     ~     (p=0.052 n=10+10)
      NatSqr/1-8                          58.4ns ±24%     59.8ns ±38%     ~     (p=0.739 n=10+10)
      NatSqr/2-8                           122ns ±21%      122ns ±16%     ~     (p=0.896 n=10+10)
      NatSqr/3-8                           140ns ±28%      148ns ±30%     ~     (p=0.288 n=10+10)
      NatSqr/5-8                           193ns ±29%      210ns ±34%     ~     (p=0.469 n=10+10)
      NatSqr/8-8                           317ns ±21%      296ns ±25%     ~     (p=0.393 n=10+10)
      NatSqr/10-8                          362ns ± 8%      373ns ±30%     ~     (p=0.617 n=9+10)
      NatSqr/20-8                         1.24µs ±16%     1.06µs ±29%  -14.57%  (p=0.019 n=10+10)
      NatSqr/30-8                         1.90µs ±32%     1.71µs ±10%     ~     (p=0.176 n=10+9)
      NatSqr/50-8                         4.22µs ±19%     3.67µs ± 7%  -13.03%  (p=0.017 n=10+9)
      NatSqr/80-8                         7.33µs ±20%     6.50µs ±15%  -11.26%  (p=0.009 n=10+10)
      NatSqr/100-8                        9.84µs ±18%     9.33µs ± 8%     ~     (p=0.280 n=10+10)
      NatSqr/200-8                        21.4µs ± 7%     20.0µs ±14%     ~     (p=0.075 n=10+10)
      NatSqr/300-8                        38.0µs ± 2%     31.3µs ±10%  -17.63%  (p=0.000 n=10+10)
      NatSqr/500-8                         102µs ± 5%      101µs ± 4%     ~     (p=0.780 n=9+10)
      NatSqr/800-8                         190µs ± 3%      166µs ± 6%  -12.29%  (p=0.000 n=10+10)
      NatSqr/1000-8                        277µs ± 2%      245µs ± 6%  -11.64%  (p=0.000 n=10+10)
      ScanPi-8                             144µs ±23%      149µs ±24%     ~     (p=0.579 n=10+10)
      StringPiParallel-8                  25.6µs ± 0%     25.8µs ± 0%   +0.69%  (p=0.000 n=9+10)
      Scan/10/Base2-8                      305ns ± 1%      309ns ± 1%   +1.32%  (p=0.000 n=10+9)
      Scan/100/Base2-8                    1.95µs ± 1%     1.98µs ± 1%   +1.10%  (p=0.000 n=10+10)
      Scan/1000/Base2-8                   19.5µs ± 1%     19.7µs ± 1%   +1.39%  (p=0.000 n=10+10)
      Scan/10000/Base2-8                   270µs ± 1%      272µs ± 1%   +0.58%  (p=0.024 n=9+9)
      Scan/100000/Base2-8                 10.3ms ± 0%     10.3ms ± 0%   +0.16%  (p=0.022 n=9+10)
      Scan/10/Base8-8                      146ns ± 4%      154ns ± 4%   +5.57%  (p=0.000 n=9+9)
      Scan/100/Base8-8                     748ns ± 1%      759ns ± 1%   +1.51%  (p=0.000 n=9+10)
      Scan/1000/Base8-8                   7.88µs ± 1%     8.00µs ± 1%   +1.64%  (p=0.000 n=10+10)
      Scan/10000/Base8-8                   155µs ± 1%      155µs ± 1%     ~     (p=0.968 n=10+9)
      Scan/100000/Base8-8                 9.11ms ± 0%     9.11ms ± 0%     ~     (p=0.604 n=9+10)
      Scan/10/Base10-8                     140ns ± 5%      149ns ± 5%   +6.39%  (p=0.000 n=9+10)
      Scan/100/Base10-8                    680ns ± 0%      688ns ± 1%   +1.08%  (p=0.000 n=9+10)
      Scan/1000/Base10-8                  7.09µs ± 1%     7.16µs ± 1%   +0.98%  (p=0.019 n=10+10)
      Scan/10000/Base10-8                  149µs ± 3%      150µs ± 3%     ~     (p=0.143 n=10+10)
      Scan/100000/Base10-8                9.16ms ± 0%     9.16ms ± 0%     ~     (p=0.661 n=10+9)
      Scan/10/Base16-8                     134ns ± 5%      135ns ± 3%     ~     (p=0.505 n=9+9)
      Scan/100/Base16-8                    560ns ± 1%      563ns ± 0%   +0.67%  (p=0.000 n=10+8)
      Scan/1000/Base16-8                  6.28µs ± 1%     6.26µs ± 1%     ~     (p=0.448 n=10+10)
      Scan/10000/Base16-8                  161µs ± 1%      162µs ± 1%   +0.74%  (p=0.008 n=9+9)
      Scan/100000/Base16-8                9.64ms ± 0%     9.64ms ± 0%     ~     (p=0.436 n=10+10)
      String/10/Base2-8                    116ns ±12%      118ns ±13%     ~     (p=0.645 n=10+10)
      String/100/Base2-8                   871ns ±23%      860ns ±22%     ~     (p=0.699 n=10+10)
      String/1000/Base2-8                 10.0µs ±20%     10.0µs ±23%     ~     (p=0.853 n=10+10)
      String/10000/Base2-8                 110µs ±21%      120µs ±25%     ~     (p=0.436 n=10+10)
      String/100000/Base2-8                768µs ±11%      733µs ±16%     ~     (p=0.393 n=10+10)
      String/10/Base8-8                   51.3ns ± 1%     51.0ns ± 3%     ~     (p=0.286 n=9+9)
      String/100/Base8-8                   284ns ± 9%      272ns ±12%     ~     (p=0.267 n=9+10)
      String/1000/Base8-8                 3.06µs ± 9%     3.04µs ±10%     ~     (p=0.739 n=10+10)
      String/10000/Base8-8                36.1µs ±14%     35.1µs ± 9%     ~     (p=0.447 n=10+9)
      String/100000/Base8-8                371µs ±12%      373µs ±16%     ~     (p=0.739 n=10+10)
      String/10/Base10-8                   167ns ±11%      165ns ± 9%     ~     (p=0.781 n=10+10)
      String/100/Base10-8                  727ns ± 1%      740ns ± 2%   +1.70%  (p=0.001 n=10+10)
      String/1000/Base10-8                5.30µs ±18%     5.37µs ±14%     ~     (p=0.631 n=10+10)
      String/10000/Base10-8               45.0µs ±14%     44.6µs ±10%     ~     (p=0.720 n=9+10)
      String/100000/Base10-8              5.10ms ± 1%     5.05ms ± 3%     ~     (p=0.211 n=9+10)
      String/10/Base16-8                  47.7ns ± 6%     47.7ns ± 6%     ~     (p=0.985 n=10+10)
      String/100/Base16-8                  221ns ±10%      234ns ±27%     ~     (p=0.541 n=10+10)
      String/1000/Base16-8                2.23µs ±11%     2.12µs ± 8%   -4.81%  (p=0.029 n=9+8)
      String/10000/Base16-8               28.3µs ±21%     28.5µs ±14%     ~     (p=0.796 n=10+10)
      String/100000/Base16-8               291µs ±16%      293µs ±15%     ~     (p=0.931 n=9+9)
      LeafSize/0-8                        2.43ms ± 1%     2.49ms ± 1%   +2.56%  (p=0.000 n=10+10)
      LeafSize/1-8                        49.7µs ± 9%     46.3µs ±16%   -6.78%  (p=0.017 n=10+9)
      LeafSize/2-8                        48.4µs ±18%     46.3µs ±19%     ~     (p=0.436 n=10+10)
      LeafSize/3-8                        81.7µs ± 3%     80.9µs ± 3%     ~     (p=0.278 n=10+9)
      LeafSize/4-8                        47.0µs ± 7%     47.9µs ±13%     ~     (p=0.905 n=9+10)
      LeafSize/5-8                        96.8µs ± 1%     97.3µs ± 2%     ~     (p=0.515 n=8+10)
      LeafSize/6-8                        82.5µs ± 4%     80.9µs ± 2%   -1.92%  (p=0.019 n=10+10)
      LeafSize/7-8                        67.2µs ±13%     66.6µs ± 9%     ~     (p=0.842 n=10+9)
      LeafSize/8-8                        46.0µs ±28%     45.1µs ±12%     ~     (p=0.739 n=10+10)
      LeafSize/9-8                         111µs ± 1%      111µs ± 1%     ~     (p=0.739 n=10+10)
      LeafSize/10-8                       98.8µs ± 4%     97.9µs ± 3%     ~     (p=0.278 n=10+9)
      LeafSize/11-8                       96.8µs ± 1%     96.4µs ± 1%     ~     (p=0.211 n=9+10)
      LeafSize/12-8                       81.0µs ± 4%     81.3µs ± 3%     ~     (p=0.579 n=10+10)
      LeafSize/13-8                       79.7µs ± 5%     79.2µs ± 3%     ~     (p=0.661 n=10+9)
      LeafSize/14-8                       67.6µs ±12%     65.8µs ± 7%     ~     (p=0.447 n=10+9)
      LeafSize/15-8                       63.9µs ±17%     66.3µs ±14%     ~     (p=0.481 n=10+10)
      LeafSize/16-8                       44.0µs ±28%     46.0µs ±27%     ~     (p=0.481 n=10+10)
      LeafSize/32-8                       46.2µs ±13%     43.5µs ±18%     ~     (p=0.156 n=9+10)
      LeafSize/64-8                       53.3µs ±10%     53.0µs ±19%     ~     (p=0.730 n=9+9)
      ProbablyPrime/n=0-8                 3.60ms ± 1%     3.39ms ± 1%   -5.87%  (p=0.000 n=10+9)
      ProbablyPrime/n=1-8                 4.42ms ± 1%     4.08ms ± 1%   -7.69%  (p=0.000 n=10+10)
      ProbablyPrime/n=5-8                 7.57ms ± 2%     6.79ms ± 1%  -10.24%  (p=0.000 n=10+10)
      ProbablyPrime/n=10-8                11.6ms ± 2%     10.2ms ± 1%  -11.69%  (p=0.000 n=10+10)
      ProbablyPrime/n=20-8                19.4ms ± 2%     16.9ms ± 2%  -12.89%  (p=0.000 n=10+10)
      ProbablyPrime/Lucas-8               2.81ms ± 2%     2.72ms ± 1%   -3.22%  (p=0.000 n=10+9)
      ProbablyPrime/MillerRabinBase2-8     797µs ± 1%      680µs ± 1%  -14.64%  (p=0.000 n=10+10)
      
      name                              old speed      new speed       delta
      AddVV/1-8                         17.1GB/s ± 6%   18.0GB/s ± 2%     ~     (p=0.122 n=10+8)
      AddVV/2-8                         32.4GB/s ± 2%   32.2GB/s ± 4%     ~     (p=0.661 n=10+9)
      AddVV/3-8                         38.6GB/s ± 2%   38.9GB/s ± 1%     ~     (p=0.113 n=10+9)
      AddVV/4-8                         45.8GB/s ± 2%   45.8GB/s ± 2%     ~     (p=0.796 n=10+10)
      AddVV/5-8                         48.1GB/s ± 2%   48.3GB/s ± 1%     ~     (p=0.315 n=10+10)
      AddVV/10-8                        78.9GB/s ± 1%   78.9GB/s ± 2%     ~     (p=0.353 n=10+10)
      AddVV/100-8                        136GB/s ± 2%    137GB/s ± 1%     ~     (p=0.971 n=10+10)
      AddVV/1000-8                       164GB/s ± 1%    164GB/s ± 4%     ~     (p=0.853 n=10+10)
      AddVV/10000-8                      126GB/s ± 6%    129GB/s ± 2%     ~     (p=0.063 n=10+10)
      AddVV/100000-8                     116GB/s ± 3%    116GB/s ± 3%     ~     (p=0.796 n=10+10)
      AddVW/1-8                         2.64GB/s ± 3%   2.64GB/s ± 3%     ~     (p=0.579 n=10+10)
      AddVW/2-8                         4.49GB/s ± 2%   4.44GB/s ± 2%   -1.09%  (p=0.040 n=9+9)
      AddVW/3-8                         6.36GB/s ± 1%   6.34GB/s ± 2%     ~     (p=0.684 n=10+10)
      AddVW/4-8                         6.83GB/s ± 1%   6.82GB/s ± 2%     ~     (p=0.905 n=10+9)
      AddVW/5-8                         8.75GB/s ± 1%   8.73GB/s ± 1%     ~     (p=0.796 n=10+10)
      AddVW/10-8                        10.5GB/s ± 2%   10.5GB/s ± 1%     ~     (p=0.971 n=10+10)
      AddVW/100-8                       19.5GB/s ± 2%   18.9GB/s ± 2%   -3.22%  (p=0.000 n=10+10)
      AddVW/1000-8                      20.7GB/s ± 2%   20.6GB/s ± 4%     ~     (p=0.631 n=10+10)
      AddVW/10000-8                     20.6GB/s ± 3%   20.7GB/s ± 3%     ~     (p=0.481 n=10+10)
      AddVW/100000-8                    19.4GB/s ± 2%   19.2GB/s ± 3%     ~     (p=0.165 n=10+10)
      AddMulVVW/1-8                     19.5GB/s ± 2%   19.7GB/s ± 3%     ~     (p=0.123 n=10+10)
      AddMulVVW/2-8                     30.1GB/s ± 2%   30.2GB/s ± 3%     ~     (p=0.297 n=9+9)
      AddMulVVW/3-8                     37.9GB/s ± 2%   36.5GB/s ± 2%   -3.63%  (p=0.000 n=10+10)
      AddMulVVW/4-8                     40.0GB/s ± 2%   39.4GB/s ± 2%   -1.58%  (p=0.001 n=10+10)
      AddMulVVW/5-8                     47.3GB/s ± 2%   46.6GB/s ± 1%   -1.35%  (p=0.001 n=9+9)
      AddMulVVW/10-8                    52.3GB/s ± 2%   60.6GB/s ± 3%  +15.76%  (p=0.000 n=10+10)
      AddMulVVW/100-8                   80.3GB/s ± 2%  122.1GB/s ± 1%  +51.92%  (p=0.000 n=10+10)
      AddMulVVW/1000-8                  92.0GB/s ± 1%  130.3GB/s ± 2%  +41.61%  (p=0.000 n=9+10)
      AddMulVVW/10000-8                 88.2GB/s ± 2%  108.2GB/s ± 5%  +22.66%  (p=0.000 n=10+10)
      AddMulVVW/100000-8                88.2GB/s ± 2%  102.9GB/s ± 2%  +16.69%  (p=0.000 n=10+10)
      
      Change-Id: Ic98e30c91d437d845fed03e07e976c3fdbf02b36
      Reviewed-on: https://go-review.googlesource.com/74851
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarAdam Langley <agl@golang.org>
      c3935c08
  4. 23 Feb, 2018 17 commits
    • Joe Tsai's avatar
      archive/zip: fix handling of Info-ZIP Unix extended timestamps · 9697a119
      Joe Tsai authored
      The Info-ZIP Unix1 extra field is specified as such:
      >>>
      Value    Size   Description
      -----    ----   -----------
      0x5855   Short  tag for this extra block type ("UX")
      TSize    Short  total data size for this block
      AcTime   Long   time of last access (GMT/UTC)
      ModTime  Long   time of last modification (GMT/UTC)
      <<<
      
      The previous handling was incorrect in that it read the AcTime field
      instead of the ModTime field.
      
      The test-osx.zip test unfortunately locked in the wrong behavior.
      Manually parsing that ZIP file shows that the encoded MS-DOS
      date and time are 0x4b5f and 0xa97d, which corresponds with a
      date of 2017-10-31 21:11:58, which matches the correct mod time
      (off by 1 second due to MS-DOS timestamp resolution).
      
      Fixes #23901
      
      Change-Id: I567824c66e8316b9acd103dbecde366874a4b7ef
      Reviewed-on: https://go-review.googlesource.com/96895
      Run-TryBot: Joe Tsai <joetsai@google.com>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      9697a119
    • Ian Lance Taylor's avatar
      runtime: don't check for String/Error methods in printany · 804e3e56
      Ian Lance Taylor authored
      They have either already been called by preprintpanics, or they can
      not be called safely because of the various conditions checked at the
      start of gopanic.
      
      Fixes #24059
      
      Change-Id: I4a6233d12c9f7aaaee72f343257ea108bae79241
      Reviewed-on: https://go-review.googlesource.com/96755Reviewed-by: 's avatarAustin Clements <austin@google.com>
      804e3e56
    • Yuval Pavel Zholkover's avatar
      os: respect umask in Mkdir and OpenFile on BSD systems when perm has ModeSticky set · a5e8e2d9
      Yuval Pavel Zholkover authored
      Instead of calling Chmod directly on perm, stat the created file/dir to extract the
      actual permission bits which can be different from perm due to umask.
      
      Fixes #23120.
      
      Change-Id: I3e70032451fc254bf48ce9627e98988f84af8d91
      Reviewed-on: https://go-review.googlesource.com/84477
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      a5e8e2d9
    • Austin Clements's avatar
      runtime: reduce arena size to 4MB on 64-bit Windows · 78846472
      Austin Clements authored
      Currently, we use 64MB heap arenas on 64-bit platforms. This works
      well on UNIX-like OSes because they treat untouched pages as
      essentially free. However, on Windows, committed memory is charged
      against a process whether or not it has demand-faulted physical pages
      in. Hence, on Windows, even a process with a tiny heap will commit
      64MB for one heap arena, plus another 32MB for the arena map. Things
      are much worse under the race detector, which increases the heap
      commitment by a factor of 5.5X, leading to 384MB of committed memory
      at runtime init.
      
      Fix this by reducing the heap arena size to 4MB on Windows.
      
      To counterbalance the effect of increasing the arena map size by a
      factor of 16, and to further reduce the impact of the commitment for
      the arena map, we switch from a single entry L1 arena map to a 64
      entry L1 arena map.
      
      Compared to the original arena design, this slows down the
      x/benchmarks garbage benchmark by 0.49% (the slow down of this commit
      alone is 1.59%, but the previous commit bought us a 1% speed-up):
      
      name                       old time/op  new time/op  delta
      Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.29ms ± 1%  +0.49%  (p=0.000 n=17+18)
      
      (https://perf.golang.org/search?q=upload:20180223.1)
      
      (This was measured on linux/amd64 by modifying its arena configuration
      as above.)
      
      Fixes #23900.
      
      Change-Id: I6b7fa5ecebee2947bf20cfeb78c248809469c6b1
      Reviewed-on: https://go-review.googlesource.com/96780
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      78846472
    • Austin Clements's avatar
      runtime: support a two-level arena map · ec252105
      Austin Clements authored
      Currently, the heap arena map is a single, large array that covers
      every possible arena frame in the entire address space. This is
      practical up to about 48 bits of address space with 64 MB arenas.
      
      However, there are two problems with this:
      
      1. mips64, ppc64, and s390x support full 64-bit address spaces (though
         on Linux only s390x has kernel support for 64-bit address spaces).
         On these platforms, it would be good to support these larger
         address spaces.
      
      2. On Windows, processes are charged for untouched memory, so for
         processes with small heaps, the mostly-untouched 32 MB arena map
         plus a 64 MB arena are significant overhead. Hence, it would be
         good to reduce both the arena map size and the arena size, but with
         a single-level arena, these are inversely proportional.
      
      This CL adds support for a two-level arena map. Arena frame numbers
      are now divided into arenaL1Bits of L1 index and arenaL2Bits of L2
      index.
      
      At the moment, arenaL1Bits is always 0, so we effectively have a
      single level map. We do a few things so that this has no cost beyond
      the current single-level map:
      
      1. We embed the L2 array directly in mheap, so if there's a single
         entry in the L2 array, the representation is identical to the
         current representation and there's no extra level of indirection.
      
      2. Hot code that accesses the arena map is structured so that it
         optimizes to nearly the same machine code as it does currently.
      
      3. We make some small tweaks to hot code paths and to the inliner
         itself to keep some important functions inlined despite their
         now-larger ASTs. In particular, this is necessary for
         heapBitsForAddr and heapBits.next.
      
      Possibly as a result of some of the tweaks, this actually slightly
      improves the performance of the x/benchmarks garbage benchmark:
      
      name                       old time/op  new time/op  delta
      Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.26ms ± 1%  -1.07%  (p=0.000 n=17+19)
      
      (https://perf.golang.org/search?q=upload:20180223.2)
      
      For #23900.
      
      Change-Id: If5164e0961754f97eb9eca58f837f36d759505ff
      Reviewed-on: https://go-review.googlesource.com/96779
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      ec252105
    • Austin Clements's avatar
      cmd/compile: teach front-end deadcode about && and || · 2dbf15e8
      Austin Clements authored
      The front-end dead code elimination is very simple. Currently, it just
      looks for if statements with constant boolean conditions. Its main
      purpose is to reduce load on the compiler and shrink code before
      inlining computes hairiness.
      
      This CL teaches front-end dead code elimination about short-circuiting
      boolean expressions && and ||, since they're essentially the same as
      if statements.
      
      This also teaches the inliner that the constant 'if' form left behind
      by deadcode is free.
      
      These changes will help with runtime modifications in the next CL that
      would otherwise inhibit inlining in some hot code paths. Currently,
      however, they have no significant impact on benchmarks.
      
      Change-Id: I886203b3c4acdbfef08148fddd7f3a7af5afc7c1
      Reviewed-on: https://go-review.googlesource.com/96778
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      2dbf15e8
    • Austin Clements's avatar
      runtime: rename "arena index" to "arena map" · 33b76920
      Austin Clements authored
      There are too many places where I want to talk about "indexing into
      the arena index". Make this less awkward and ambiguous by calling it
      the "arena map" instead.
      
      Change-Id: I726b0667bb2139dbc006175a0ec09a871cdf73f9
      Reviewed-on: https://go-review.googlesource.com/96777
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      33b76920
    • Austin Clements's avatar
      runtime: don't assume arena is in address order · 9680980e
      Austin Clements authored
      On amd64, the arena is no longer in address space order, but currently
      the heap dumper assumes that it is. Fix this assumption.
      
      Change-Id: Iab1953cd36b359d0fb78ed49e5eb813116a18855
      Reviewed-on: https://go-review.googlesource.com/96776
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      9680980e
    • Ian Lance Taylor's avatar
      path: use OS-specific function in MkdirAll, don't always keep trailing slash · b86e7668
      Ian Lance Taylor authored
      CL 86295 changed MkdirAll to always pass a trailing path separator to
      support extended-length paths on Windows.
      
      However, when Stat is called on an existing file followed by trailing
      slash, it will return a "not a directory" error, skipping the fast
      path at the beginning of MkdirAll.
      
      This change fixes MkdirAll to only pass the trailing path separator
      where required on Windows, by using an OS-specific function fixRootDirectory.
      
      Updates #23918
      
      Change-Id: I23f84a20e65ccce556efa743d026d352b4812c34
      Reviewed-on: https://go-review.googlesource.com/95255
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid du Colombier <0intro@gmail.com>
      Reviewed-by: 's avatarAlex Brainman <alex.brainman@gmail.com>
      b86e7668
    • Daniel Martí's avatar
      cmd/vet: use type info to detect the atomic funcs · bae3fd66
      Daniel Martí authored
      Simply checking if a name is "atomic" isn't enough, as that might be a
      var or another imported package. Now that vet requires type information,
      we can do better. And add a simple regression test.
      
      Change-Id: Ibd2004428374e3628cd3cd0ffb5f37cedaf448ea
      Reviewed-on: https://go-review.googlesource.com/91795
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      Reviewed-by: 's avatarRobert Griesemer <gri@golang.org>
      bae3fd66
    • Adam Langley's avatar
      crypto/x509: tighten EKU checking for requested EKUs. · 0681c7c3
      Adam Langley authored
      There are, sadly, many exceptions to EKU checking to reflect mistakes
      that CAs have made in practice. However, the requirements for checking
      requested EKUs against the leaf should be tighter than for checking leaf
      EKUs against a CA.
      
      Fixes #23884
      
      Change-Id: I05ea874c4ada0696d8bb18cac4377c0b398fcb5e
      Reviewed-on: https://go-review.googlesource.com/96379Reviewed-by: 's avatarJonathan Rudenberg <jonathan@titanous.com>
      Reviewed-by: 's avatarFilippo Valsorda <hi@filippo.io>
      Run-TryBot: Filippo Valsorda <hi@filippo.io>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      0681c7c3
    • Oleg Bulatov's avatar
      regexp: Regexp shouldn't keep references to inputs · 72635401
      Oleg Bulatov authored
      If you try to find something in a slice of bytes using a Regexp object,
      the byte array will not be released by GC until you use the Regexp object
      on another slice of bytes. It happens because the Regexp object keep
      references to the input data in its cache.
      
      Change-Id: I873107f15c1900aa53ccae5d29dbc885b9562808
      Reviewed-on: https://go-review.googlesource.com/96715Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      72635401
    • Alberto Donizetti's avatar
      cmd/compile: add code generation tests for sqrt intrinsics · 37a038a3
      Alberto Donizetti authored
      Add "sqrt-intrisified" code generation tests for mips64 and 386, where
      we weren't intrisifying math.Sqrt (see CL 96615 and CL 95916), and for
      mips and amd64, which lacked sqrt intrinsics tests.
      
      Change-Id: I0cfc08aec6eefd47f3cd7a5995a89393e8b7ed9e
      Reviewed-on: https://go-review.googlesource.com/96716
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      37a038a3
    • mingrammer's avatar
      runtime: rename the TestGcHashmapIndirection to TestGcMapIndirection · fceaa2e2
      mingrammer authored
      There was still the word 'Hashmap' in gc_test.go, so I renamed it to just 'Map'
      
      Previous renaming commit: https://golang.org/cl/90336
      
      Change-Id: I5b0e5c2229d1c30937c7216247f4533effb81ce7
      Reviewed-on: https://go-review.googlesource.com/96675Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      fceaa2e2
    • Alberto Donizetti's avatar
      cmd/compile: intrinsify math.Sqrt on 386 · 9ee78af8
      Alberto Donizetti authored
      It seems like all the pieces were already there, it only needed the
      final plumbing.
      
      Before:
      
      	0x001b 00027 (test.go:9)	MOVSD	X0, (SP)
      	0x0020 00032 (test.go:9)	CALL	math.Sqrt(SB)
      	0x0025 00037 (test.go:9)	MOVSD	8(SP), X0
      
      After:
      
      	0x0018 00024 (test.go:9)	SQRTSD	X0, X0
      
      name    old time/op  new time/op  delta
      Sqrt-4  4.60ns ± 2%  0.45ns ± 1%  -90.33%  (p=0.000 n=10+10)
      
      Change-Id: I0f623958e19e726840140bf9b495d3f3a9184b9d
      Reviewed-on: https://go-review.googlesource.com/96615
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      9ee78af8
    • Alberto Donizetti's avatar
      cmd/compile: use | in the last repetitive generic rules · f6c67813
      Alberto Donizetti authored
      This change or-ifies the last low-hanging rules in generic. Again,
      this is limited at short and repetitive rules, where the use or ors
      does not impact readability.
      
      Ran rulegen, no change in the actual compiler code.
      
      Change-Id: I972b523bc08532f173a3645b47d6936b6e1218c8
      Reviewed-on: https://go-review.googlesource.com/96335Reviewed-by: 's avatarGiovanni Bajo <rasky@develer.com>
      f6c67813
    • Jerrin Shaji George's avatar
      runtime: fix a few typos in comments · 5b3cd560
      Jerrin Shaji George authored
      Change-Id: I07a1eb02ffc621c5696b49491181300bf411f822
      Reviewed-on: https://go-review.googlesource.com/96475Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      5b3cd560
  5. 22 Feb, 2018 10 commits
    • Robert Griesemer's avatar
      go/types: add -panic flag to gotype command for debugging · 70b09c72
      Robert Griesemer authored
      Setting -panic will cause gotype to panic with the first reported
      error, producing a stack trace for debugging.
      
      For #23914.
      
      Change-Id: I40c41cf10aa13d1dd9a099f727ef4201802de13a
      Reviewed-on: https://go-review.googlesource.com/96375Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      70b09c72
    • Tobias Klauser's avatar
      syscall: remove list of unimplemented syscalls · 6450c591
      Tobias Klauser authored
      The syscall package is frozen and we don't want to encourage anyone to
      implement these syscalls.
      
      Change-Id: I6b6e33e32a4b097da6012226aa15300735e50e9f
      Reviewed-on: https://go-review.googlesource.com/96315Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      6450c591
    • Robert Griesemer's avatar
      go/types: fix regression with short variable declarations · 2465ae64
      Robert Griesemer authored
      The variables on the lhs of a short variable declaration are
      only in scope after the variable declaration. Specifically,
      function literals on the rhs of a short variable declaration
      must not see newly declared variables on the lhs.
      
      This used to work and this bug was likely introduced with
      https://go-review.googlesource.com/c/go/+/83397 for go1.11.
      Luckily this is just an oversight and the fix is trivial:
      Simply use the mechanism for delayed type-checkin of function
      literals introduced in the before-mentioned change here as well.
      
      Fixes #24026.
      
      Change-Id: I74ce3a0d05c5a2a42ce4b27601645964f906e82d
      Reviewed-on: https://go-review.googlesource.com/96177Reviewed-by: 's avatarAlan Donovan <adonovan@google.com>
      2465ae64
    • Ben Shi's avatar
      cmd/compile: fix FP accuracy issue introduced by FMA optimization on ARM64 · 7113d3a5
      Ben Shi authored
      Two ARM64 rules are added to avoid FP accuracy issue, which causes
      build failure.
      https://build.golang.org/log/1360f5c9ef3f37968216350283c1013e9681725d
      
      fixes #24033
      
      Change-Id: I9b74b584ab5cc53fa49476de275dc549adf97610
      Reviewed-on: https://go-review.googlesource.com/96355Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      7113d3a5
    • Alexey Palazhchenko's avatar
      database/sql: add String method to IsolationLevel · ef3ab3f5
      Alexey Palazhchenko authored
      Fixes #23632
      
      Change-Id: I7197e13df6cf28400a6dd86c110f41129550abb6
      Reviewed-on: https://go-review.googlesource.com/92235Reviewed-by: 's avatarDaniel Theophanes <kardianos@gmail.com>
      ef3ab3f5
    • Alberto Donizetti's avatar
      cmd/compile: use | in the most repetitive s390x rules · 1e05924c
      Alberto Donizetti authored
      For now, limited to the most repetitive rules that are also short and
      simple, so that we can have a substantial conciseness win without
      compromising rules readability.
      
      Ran rulegen, no changes in the rewrite files.
      
      Change-Id: I8447784895a218c5c1b4dfa1cdb355bd73dabfd1
      Reviewed-on: https://go-review.googlesource.com/95955Reviewed-by: 's avatarGiovanni Bajo <rasky@develer.com>
      1e05924c
    • Martin Möhrmann's avatar
      reflect: avoid calling common if type is known to be *rtype · 1dbe4c50
      Martin Möhrmann authored
      If the type of Type is known to be *rtype than the common
      function is a no-op and does not need to be called.
      
      name  old time/op  new time/op  delta
      New   31.0ns ± 5%  30.2ns ± 4%  -2.74%  (p=0.008 n=20+20)
      
      Change-Id: I5d00346dbc782e34c530166d1ee0499b24068b51
      Reviewed-on: https://go-review.googlesource.com/96115Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      1dbe4c50
    • Ben Shi's avatar
      cmd/compile: improve FP performance on ARM64 · f4c3072c
      Ben Shi authored
      FMADD/FMSUB/FNMADD/FNMSUB are efficient FP instructions, which can
      be used by the comiler to improve FP performance. This CL implements
      this optimization.
      
      1. The compilecmp benchmark shows little change.
      name        old time/op       new time/op       delta
      Template          2.35s ± 4%        2.38s ± 4%    ~     (p=0.161 n=15+15)
      Unicode           1.36s ± 5%        1.36s ± 4%    ~     (p=0.685 n=14+13)
      GoTypes           8.11s ± 3%        8.13s ± 2%    ~     (p=0.624 n=15+15)
      Compiler          40.5s ± 2%        40.7s ± 2%    ~     (p=0.137 n=15+15)
      SSA                115s ± 3%         116s ± 1%    ~     (p=0.270 n=15+14)
      Flate             1.46s ± 4%        1.45s ± 5%    ~     (p=0.870 n=15+15)
      GoParser          1.85s ± 2%        1.87s ± 3%    ~     (p=0.477 n=14+15)
      Reflect           5.11s ± 4%        5.10s ± 2%    ~     (p=0.624 n=15+15)
      Tar               2.23s ± 3%        2.23s ± 5%    ~     (p=0.624 n=15+15)
      XML               2.72s ± 5%        2.74s ± 3%    ~     (p=0.290 n=15+14)
      [Geo mean]        5.02s             5.03s       +0.29%
      
      name        old user-time/op  new user-time/op  delta
      Template          2.90s ± 2%        2.90s ± 3%    ~     (p=0.780 n=14+15)
      Unicode           1.71s ± 5%        1.70s ± 3%    ~     (p=0.458 n=14+13)
      GoTypes           9.77s ± 2%        9.76s ± 2%    ~     (p=0.838 n=15+15)
      Compiler          49.1s ± 2%        49.1s ± 2%    ~     (p=0.902 n=15+15)
      SSA                144s ± 1%         144s ± 2%    ~     (p=0.567 n=15+15)
      Flate             1.75s ± 5%        1.74s ± 3%    ~     (p=0.461 n=15+15)
      GoParser          2.22s ± 2%        2.21s ± 3%    ~     (p=0.233 n=15+15)
      Reflect           5.99s ± 2%        5.95s ± 1%    ~     (p=0.093 n=14+15)
      Tar               2.68s ± 2%        2.67s ± 3%    ~     (p=0.310 n=14+15)
      XML               3.22s ± 2%        3.24s ± 3%    ~     (p=0.512 n=15+15)
      [Geo mean]        6.08s             6.07s       -0.19%
      
      name        old text-bytes    new text-bytes    delta
      HelloSize         641kB ± 0%        641kB ± 0%    ~     (all equal)
      
      name        old data-bytes    new data-bytes    delta
      HelloSize        9.46kB ± 0%       9.46kB ± 0%    ~     (all equal)
      
      name        old bss-bytes     new bss-bytes     delta
      HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize        1.24MB ± 0%       1.24MB ± 0%    ~     (all equal)
      
      2. The go1 benchmark shows little improvement in total (excluding noise),
      but some improvement in test case Mandelbrot200 and FmtFprintfFloat.
      name                     old time/op    new time/op    delta
      BinaryTree17-4              42.1s ± 2%     42.0s ± 2%    ~     (p=0.453 n=30+28)
      Fannkuch11-4                33.5s ± 3%     33.3s ± 3%  -0.38%  (p=0.045 n=30+30)
      FmtFprintfEmpty-4           534ns ± 0%     534ns ± 0%    ~     (all equal)
      FmtFprintfString-4         1.09µs ± 0%    1.09µs ± 0%  -0.27%  (p=0.000 n=23+17)
      FmtFprintfInt-4            1.16µs ± 3%    1.16µs ± 3%    ~     (p=0.714 n=30+30)
      FmtFprintfIntInt-4         1.76µs ± 1%    1.77µs ± 0%  +0.15%  (p=0.002 n=23+23)
      FmtFprintfPrefixedInt-4    2.21µs ± 3%    2.20µs ± 3%    ~     (p=0.390 n=30+30)
      FmtFprintfFloat-4          3.28µs ± 0%    3.11µs ± 0%  -5.01%  (p=0.000 n=25+26)
      FmtManyArgs-4              7.18µs ± 0%    7.19µs ± 0%  +0.13%  (p=0.000 n=24+25)
      GobDecode-4                94.9ms ± 0%    95.6ms ± 5%  +0.83%  (p=0.002 n=23+29)
      GobEncode-4                80.7ms ± 4%    79.8ms ± 0%  -1.11%  (p=0.003 n=30+24)
      Gzip-4                      4.58s ± 4%     4.59s ± 3%  +0.26%  (p=0.002 n=30+26)
      Gunzip-4                    449ms ± 4%     443ms ± 0%    ~     (p=0.096 n=30+26)
      HTTPClientServer-4          553µs ± 1%     548µs ± 1%  -0.96%  (p=0.000 n=30+30)
      JSONEncode-4                215ms ± 4%     214ms ± 4%  -0.29%  (p=0.000 n=30+30)
      JSONDecode-4                868ms ± 4%     875ms ± 5%  +0.79%  (p=0.008 n=30+30)
      Mandelbrot200-4            51.4ms ± 0%    46.7ms ± 3%  -9.09%  (p=0.000 n=25+26)
      GoParse-4                  42.1ms ± 0%    41.8ms ± 0%  -0.61%  (p=0.000 n=25+24)
      RegexpMatchEasy0_32-4      1.02µs ± 4%    1.02µs ± 4%  -0.17%  (p=0.000 n=30+30)
      RegexpMatchEasy0_1K-4      3.90µs ± 0%    3.95µs ± 4%    ~     (p=0.516 n=23+30)
      RegexpMatchEasy1_32-4       970ns ± 3%     973ns ± 3%    ~     (p=0.951 n=30+30)
      RegexpMatchEasy1_1K-4      6.43µs ± 3%    6.33µs ± 0%  -1.62%  (p=0.000 n=30+25)
      RegexpMatchMedium_32-4     1.75µs ± 0%    1.75µs ± 0%    ~     (p=0.422 n=25+24)
      RegexpMatchMedium_1K-4      568µs ± 3%     562µs ± 0%    ~     (p=0.079 n=30+24)
      RegexpMatchHard_32-4       30.8µs ± 0%    31.2µs ± 4%  +1.46%  (p=0.018 n=23+30)
      RegexpMatchHard_1K-4        932µs ± 0%     946µs ± 3%  +1.49%  (p=0.000 n=24+30)
      Revcomp-4                   7.69s ± 3%     7.69s ± 2%  +0.04%  (p=0.032 n=24+25)
      Template-4                  893ms ± 5%     880ms ± 6%  -1.53%  (p=0.000 n=30+30)
      TimeParse-4                4.90µs ± 3%    4.84µs ± 0%    ~     (p=0.080 n=30+25)
      TimeFormat-4               4.70µs ± 1%    4.76µs ± 0%  +1.21%  (p=0.000 n=23+26)
      [Geo mean]                  710µs          706µs       -0.63%
      
      name                     old speed      new speed      delta
      GobDecode-4              8.09MB/s ± 0%  8.03MB/s ± 5%  -0.77%  (p=0.002 n=23+29)
      GobEncode-4              9.52MB/s ± 4%  9.62MB/s ± 0%  +1.07%  (p=0.003 n=30+24)
      Gzip-4                   4.24MB/s ± 4%  4.23MB/s ± 3%  -0.35%  (p=0.002 n=30+26)
      Gunzip-4                 43.2MB/s ± 4%  43.8MB/s ± 0%    ~     (p=0.123 n=30+26)
      JSONEncode-4             9.03MB/s ± 4%  9.06MB/s ± 4%  +0.28%  (p=0.000 n=30+30)
      JSONDecode-4             2.24MB/s ± 4%  2.22MB/s ± 5%  -0.79%  (p=0.008 n=30+30)
      GoParse-4                1.38MB/s ± 1%  1.38MB/s ± 0%    ~     (p=0.401 n=25+17)
      RegexpMatchEasy0_32-4    31.4MB/s ± 4%  31.5MB/s ± 3%  +0.16%  (p=0.000 n=30+30)
      RegexpMatchEasy0_1K-4     262MB/s ± 0%   259MB/s ± 4%    ~     (p=0.693 n=23+30)
      RegexpMatchEasy1_32-4    33.0MB/s ± 3%  32.9MB/s ± 3%    ~     (p=0.139 n=30+30)
      RegexpMatchEasy1_1K-4     159MB/s ± 3%   162MB/s ± 0%  +1.60%  (p=0.000 n=30+25)
      RegexpMatchMedium_32-4    570kB/s ± 0%   570kB/s ± 0%    ~     (all equal)
      RegexpMatchMedium_1K-4   1.80MB/s ± 3%  1.82MB/s ± 0%  +1.09%  (p=0.007 n=30+24)
      RegexpMatchHard_32-4     1.04MB/s ± 0%  1.03MB/s ± 3%  -1.38%  (p=0.003 n=23+30)
      RegexpMatchHard_1K-4     1.10MB/s ± 0%  1.08MB/s ± 3%  -1.52%  (p=0.000 n=24+30)
      Revcomp-4                33.0MB/s ± 3%  33.0MB/s ± 2%    ~     (p=0.128 n=24+25)
      Template-4               2.17MB/s ± 5%  2.21MB/s ± 6%  +1.61%  (p=0.000 n=30+30)
      [Geo mean]               7.79MB/s       7.79MB/s       +0.05%
      
      Change-Id: Ied3dbdb5ba8e386168629cba06fcd4263bbb83e1
      Reviewed-on: https://go-review.googlesource.com/94901
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      f4c3072c
    • erifan01's avatar
      cmd/asm: add arm64 instructions for math optimization · f5de4200
      erifan01 authored
      Add arm64 HW instructions FMADDD, FMADDS, FMSUBD, FMSUBS, FNMADDD, FNMADDS,
      FNMSUBD, FNMSUBS, VFMLA, VFMLS, VMOV (element) for math optimization.
      
      Add check on register element index and test cases.
      
      Change-Id: Ice07c50b1a02d488ad2cde2a4e8aea93f3e3afff
      Reviewed-on: https://go-review.googlesource.com/90876Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      f5de4200
    • David Chase's avatar
      cmd/compile: decouple emitted block order from regalloc block order · c18ff184
      David Chase authored
      While tinkering with different block orders for the preemptible
      loop experiment, crashed the register allocator with a "bad"
      one (these exist).  Realized that one knob was controlling
      two things (register allocation and branch patterns) and
      decided that life would be simpler if the two orders were
      independent.
      
      Ran some experiments and determined that we have probably,
      mostly, been optimizing for register allocation effects, not
      branch effects.  Bad block orders for register allocation are
      somewhat costly.
      
      This will also allow separate experimentation with perhaps-
      better block orders for register allocation.
      
      Change-Id: I6ecf2f24cca178b6f8acc0d3c4caaef043c11ed9
      Reviewed-on: https://go-review.googlesource.com/47314
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      c18ff184
  6. 21 Feb, 2018 1 commit
    • Hana Kim's avatar
      cmd/trace: add memory usage reporting · a66af728
      Hana Kim authored
      Enabled when the tool runs with DEBUG_MEMORY_USAGE=1 env var.
      After reporting the usage, it waits until user enters input
      (helpful when checking top or other memory monitor)
      
      Also adds net/http/pprof to export debug endpoints.
      
      From the trace included in #21870
      
      $ DEBUG_MEMORY_USAGE=1 go tool trace trace.out
      2018/02/21 16:04:49 Parsing trace...
      after parsing trace
       Alloc:	3385747848 Bytes
       Sys:	3661654648 Bytes
       HeapReleased:	0 Bytes
       HeapSys:	3488907264 Bytes
       HeapInUse:	3426377728 Bytes
       HeapAlloc:	3385747848 Bytes
      Enter to continue...
      2018/02/21 16:05:09 Serializing trace...
      after generating trace
       Alloc:	4908929616 Bytes
       Sys:	5319063640 Bytes
       HeapReleased:	0 Bytes
       HeapSys:	5032411136 Bytes
       HeapInUse:	4982865920 Bytes
       HeapAlloc:	4908929616 Bytes
      Enter to continue...
      2018/02/21 16:05:18 Splitting trace...
      after spliting trace
       Alloc:	4909026200 Bytes
       Sys:	5319063640 Bytes
       HeapReleased:	0 Bytes
       HeapSys:	5032411136 Bytes
       HeapInUse:	4983046144 Bytes
       HeapAlloc:	4909026200 Bytes
      Enter to continue...
      2018/02/21 16:05:39 Opening browser. Trace viewer is listening on http://127.0.0.1:33661
      after httpJsonTrace
       Alloc:	5288336048 Bytes
       Sys:	7790245896 Bytes
       HeapReleased:	0 Bytes
       HeapSys:	7381123072 Bytes
       HeapInUse:	5324120064 Bytes
       HeapAlloc:	5288336048 Bytes
      Enter to continue...
      
      Change-Id: I88bb3cb1af3cb62e4643a8cbafd5823672b2e464
      Reviewed-on: https://go-review.googlesource.com/92355Reviewed-by: 's avatarPeter Weinberger <pjw@google.com>
      a66af728