1. 06 Mar, 2018 1 commit
    • erifan01's avatar
      math/big: optimize addVV and subVV on arm64 · c4f3fe95
      erifan01 authored
      The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance.
      By unrolling the cycle 4x and 2x, and using "LDP", "STP" instructions, this CL can reduce the "load" cost and improve performance.
      
      Benchmarks:
      
      name                              old time/op    new time/op     delta
      AddVV/1-8                           21.5ns ± 0%     11.5ns ± 0%   -46.51%  (p=0.008 n=5+5)
      AddVV/2-8                           13.5ns ± 0%     12.0ns ± 0%   -11.11%  (p=0.008 n=5+5)
      AddVV/3-8                           15.5ns ± 0%     13.0ns ± 0%   -16.13%  (p=0.008 n=5+5)
      AddVV/4-8                           17.5ns ± 0%     13.5ns ± 0%   -22.86%  (p=0.008 n=5+5)
      AddVV/5-8                           19.5ns ± 0%     14.5ns ± 0%   -25.64%  (p=0.008 n=5+5)
      AddVV/10-8                          29.5ns ± 0%     18.0ns ± 0%   -38.98%  (p=0.008 n=5+5)
      AddVV/100-8                          217ns ± 0%       94ns ± 0%   -56.64%  (p=0.008 n=5+5)
      AddVV/1000-8                        2.02µs ± 0%     1.03µs ± 0%   -48.85%  (p=0.008 n=5+5)
      AddVV/10000-8                       20.5µs ± 0%     11.3µs ± 0%   -44.70%  (p=0.008 n=5+5)
      AddVV/100000-8                       247µs ± 3%      154µs ± 0%   -37.52%  (p=0.008 n=5+5)
      SubVV/1-8                           21.5ns ± 0%     11.5ns ± 0%      ~     (p=0.079 n=4+5)
      SubVV/2-8                           13.5ns ± 0%     12.0ns ± 0%   -11.11%  (p=0.008 n=5+5)
      SubVV/3-8                           15.5ns ± 0%     13.0ns ± 0%   -16.13%  (p=0.008 n=5+5)
      SubVV/4-8                           17.5ns ± 0%     13.5ns ± 0%   -22.86%  (p=0.008 n=5+5)
      SubVV/5-8                           19.5ns ± 0%     14.5ns ± 0%   -25.64%  (p=0.008 n=5+5)
      SubVV/10-8                          29.5ns ± 0%     18.0ns ± 0%   -38.98%  (p=0.008 n=5+5)
      SubVV/100-8                          217ns ± 0%       94ns ± 0%   -56.64%  (p=0.008 n=5+5)
      SubVV/1000-8                        2.02µs ± 0%     0.80µs ± 0%   -60.50%  (p=0.008 n=5+5)
      SubVV/10000-8                       20.5µs ± 0%     11.3µs ± 0%   -44.99%  (p=0.008 n=5+5)
      SubVV/100000-8                       221µs ±11%      223µs ±16%      ~     (p=0.690 n=5+5)
      AddVW/1-8                           9.32ns ± 0%     9.32ns ± 0%      ~     (all equal)
      AddVW/2-8                           19.7ns ± 1%     19.7ns ± 0%      ~     (p=0.381 n=5+4)
      AddVW/3-8                           11.5ns ± 0%     11.5ns ± 0%      ~     (all equal)
      AddVW/4-8                           13.0ns ± 0%     13.0ns ± 0%      ~     (all equal)
      AddVW/5-8                           14.5ns ± 0%     14.5ns ± 0%      ~     (all equal)
      AddVW/10-8                          22.0ns ± 0%     22.0ns ± 0%      ~     (all equal)
      AddVW/100-8                          167ns ± 0%      167ns ± 0%      ~     (all equal)
      AddVW/1000-8                        1.52µs ± 0%     1.52µs ± 0%    +0.40%  (p=0.008 n=5+5)
      AddVW/10000-8                       15.1µs ± 0%     15.1µs ± 0%      ~     (p=0.556 n=5+4)
      AddVW/100000-8                       152µs ± 1%      152µs ± 1%      ~     (p=0.690 n=5+5)
      AddMulVVW/1-8                       33.3ns ± 0%     32.7ns ± 1%    -1.86%  (p=0.008 n=5+5)
      AddMulVVW/2-8                       59.3ns ± 1%     56.9ns ± 1%    -4.15%  (p=0.008 n=5+5)
      AddMulVVW/3-8                       80.5ns ± 1%     85.4ns ± 3%    +6.19%  (p=0.008 n=5+5)
      AddMulVVW/4-8                        127ns ± 0%      111ns ± 1%   -13.19%  (p=0.008 n=5+5)
      AddMulVVW/5-8                        144ns ± 0%      149ns ± 0%    +3.47%  (p=0.016 n=4+5)
      AddMulVVW/10-8                       298ns ± 1%      283ns ± 0%    -4.77%  (p=0.008 n=5+5)
      AddMulVVW/100-8                     3.06µs ± 0%     2.99µs ± 0%    -2.21%  (p=0.008 n=5+5)
      AddMulVVW/1000-8                    31.3µs ± 0%     26.9µs ± 0%   -14.17%  (p=0.008 n=5+5)
      AddMulVVW/10000-8                    316µs ± 0%      305µs ± 0%    -3.51%  (p=0.008 n=5+5)
      AddMulVVW/100000-8                  3.17ms ± 0%     3.17ms ± 1%      ~     (p=0.690 n=5+5)
      DecimalConversion-8                  316µs ± 1%      313µs ± 2%      ~     (p=0.095 n=5+5)
      FloatString/100-8                   2.53µs ± 1%     2.56µs ± 2%      ~     (p=0.222 n=5+5)
      FloatString/1000-8                  58.4µs ± 0%     58.5µs ± 0%      ~     (p=0.206 n=5+5)
      FloatString/10000-8                 4.59ms ± 0%     4.58ms ± 0%    -0.31%  (p=0.008 n=5+5)
      FloatString/100000-8                 446ms ± 0%      444ms ± 0%    -0.31%  (p=0.008 n=5+5)
      FloatAdd/10-8                        184ns ± 0%      172ns ± 0%    -6.30%  (p=0.008 n=5+5)
      FloatAdd/100-8                       189ns ± 2%      191ns ± 4%      ~     (p=0.381 n=5+5)
      FloatAdd/1000-8                      371ns ± 0%      347ns ± 1%    -6.42%  (p=0.008 n=5+5)
      FloatAdd/10000-8                    1.87µs ± 0%     1.68µs ± 0%   -10.16%  (p=0.008 n=5+5)
      FloatAdd/100000-8                   17.1µs ± 0%     15.6µs ± 0%    -8.74%  (p=0.016 n=5+4)
      FloatSub/10-8                        152ns ± 0%      138ns ± 0%    -9.47%  (p=0.000 n=4+5)
      FloatSub/100-8                       148ns ± 0%      142ns ± 0%    -4.05%  (p=0.000 n=5+4)
      FloatSub/1000-8                      245ns ± 1%      217ns ± 0%   -11.28%  (p=0.000 n=5+4)
      FloatSub/10000-8                    1.07µs ± 0%     0.88µs ± 1%   -18.14%  (p=0.008 n=5+5)
      FloatSub/100000-8                   9.58µs ± 0%     7.96µs ± 0%   -16.84%  (p=0.008 n=5+5)
      ParseFloatSmallExp-8                28.8µs ± 1%     29.0µs ± 1%      ~     (p=0.095 n=5+5)
      ParseFloatLargeExp-8                 126µs ± 1%      126µs ± 1%      ~     (p=0.841 n=5+5)
      GCD10x10/WithoutXY-8                 277ns ± 2%      281ns ± 4%      ~     (p=0.746 n=5+5)
      GCD10x10/WithXY-8                   2.10µs ± 1%     2.12µs ± 3%      ~     (p=0.548 n=5+5)
      GCD10x100/WithoutXY-8                615ns ± 3%      607ns ± 2%      ~     (p=0.135 n=5+5)
      GCD10x100/WithXY-8                  3.50µs ± 2%     3.62µs ± 5%      ~     (p=0.151 n=5+5)
      GCD10x1000/WithoutXY-8              1.39µs ± 2%     1.39µs ± 3%      ~     (p=0.690 n=5+5)
      GCD10x1000/WithXY-8                 7.39µs ± 1%     7.34µs ± 2%      ~     (p=0.135 n=5+5)
      GCD10x10000/WithoutXY-8             8.66µs ± 1%     8.68µs ± 1%      ~     (p=0.421 n=5+5)
      GCD10x10000/WithXY-8                28.1µs ± 2%     27.0µs ± 2%    -3.81%  (p=0.008 n=5+5)
      GCD10x100000/WithoutXY-8            79.3µs ± 1%     79.3µs ± 1%      ~     (p=0.841 n=5+5)
      GCD10x100000/WithXY-8                238µs ± 0%      227µs ± 1%    -4.74%  (p=0.008 n=5+5)
      GCD100x100/WithoutXY-8              1.89µs ± 1%     1.88µs ± 2%      ~     (p=0.968 n=5+5)
      GCD100x100/WithXY-8                 26.7µs ± 1%     27.0µs ± 1%    +1.44%  (p=0.032 n=5+5)
      GCD100x1000/WithoutXY-8             4.48µs ± 1%     4.45µs ± 2%      ~     (p=0.341 n=5+5)
      GCD100x1000/WithXY-8                36.3µs ± 1%     35.1µs ± 1%    -3.27%  (p=0.008 n=5+5)
      GCD100x10000/WithoutXY-8            22.8µs ± 0%     22.7µs ± 1%      ~     (p=0.056 n=5+5)
      GCD100x10000/WithXY-8                145µs ± 1%      133µs ± 1%    -8.33%  (p=0.008 n=5+5)
      GCD100x100000/WithoutXY-8            198µs ± 0%      195µs ± 0%    -1.56%  (p=0.008 n=5+5)
      GCD100x100000/WithXY-8              1.11ms ± 0%     1.00ms ± 0%   -10.04%  (p=0.008 n=5+5)
      GCD1000x1000/WithoutXY-8            25.2µs ± 1%     24.8µs ± 1%    -1.63%  (p=0.016 n=5+5)
      GCD1000x1000/WithXY-8                513µs ± 0%      517µs ± 2%      ~     (p=0.421 n=5+5)
      GCD1000x10000/WithoutXY-8           57.0µs ± 0%     52.7µs ± 1%    -7.56%  (p=0.008 n=5+5)
      GCD1000x10000/WithXY-8              1.20ms ± 0%     1.10ms ± 0%    -8.70%  (p=0.008 n=5+5)
      GCD1000x100000/WithoutXY-8           358µs ± 0%      318µs ± 1%   -11.03%  (p=0.008 n=5+5)
      GCD1000x100000/WithXY-8             8.71ms ± 0%     7.65ms ± 0%   -12.19%  (p=0.008 n=5+5)
      GCD10000x10000/WithoutXY-8           690µs ± 0%      630µs ± 0%    -8.71%  (p=0.008 n=5+5)
      GCD10000x10000/WithXY-8             16.0ms ± 1%     14.9ms ± 0%    -6.85%  (p=0.008 n=5+5)
      GCD10000x100000/WithoutXY-8         2.09ms ± 0%     1.75ms ± 0%   -16.09%  (p=0.016 n=5+4)
      GCD10000x100000/WithXY-8            86.8ms ± 0%     76.3ms ± 0%   -12.09%  (p=0.008 n=5+5)
      GCD100000x100000/WithoutXY-8        51.1ms ± 0%     46.0ms ± 0%    -9.97%  (p=0.008 n=5+5)
      GCD100000x100000/WithXY-8            1.25s ± 0%      1.15s ± 0%    -7.92%  (p=0.008 n=5+5)
      Hilbert-8                           2.45ms ± 1%     2.49ms ± 1%    +1.99%  (p=0.008 n=5+5)
      Binomial-8                          4.98µs ± 3%     4.90µs ± 2%      ~     (p=0.421 n=5+5)
      QuoRem-8                            7.10µs ± 0%     6.21µs ± 0%   -12.55%  (p=0.016 n=5+4)
      Exp-8                                161ms ± 0%      161ms ± 0%      ~     (p=0.421 n=5+5)
      Exp2-8                               161ms ± 0%      161ms ± 0%      ~     (p=0.151 n=5+5)
      Bitset-8                            40.4ns ± 0%     40.3ns ± 0%      ~     (p=0.190 n=5+5)
      BitsetNeg-8                          163ns ± 3%      137ns ± 2%   -15.91%  (p=0.008 n=5+5)
      BitsetOrig-8                         377ns ± 1%      372ns ± 1%    -1.22%  (p=0.024 n=5+5)
      BitsetNegOrig-8                      631ns ± 1%      605ns ± 1%    -4.09%  (p=0.008 n=5+5)
      ModSqrt225_Tonelli-8                7.26ms ± 0%     7.26ms ± 0%      ~     (p=0.548 n=5+5)
      ModSqrt224_3Mod4-8                  2.24ms ± 0%     2.24ms ± 0%      ~     (p=1.000 n=5+5)
      ModSqrt5430_Tonelli-8                62.4s ± 0%      62.4s ± 0%      ~     (p=0.841 n=5+5)
      ModSqrt5430_3Mod4-8                  20.8s ± 0%      20.7s ± 0%      ~     (p=0.056 n=5+5)
      Sqrt-8                               101µs ± 0%       89µs ± 0%   -12.17%  (p=0.008 n=5+5)
      IntSqr/1-8                          32.5ns ± 1%     32.7ns ± 1%      ~     (p=0.056 n=5+5)
      IntSqr/2-8                           160ns ± 5%      158ns ± 0%      ~     (p=0.397 n=5+4)
      IntSqr/3-8                           298ns ± 4%      296ns ± 4%      ~     (p=0.667 n=5+5)
      IntSqr/5-8                           737ns ± 5%      761ns ± 3%    +3.34%  (p=0.016 n=5+5)
      IntSqr/8-8                          1.87µs ± 4%     1.90µs ± 3%      ~     (p=0.222 n=5+5)
      IntSqr/10-8                         2.96µs ± 4%     2.92µs ± 6%      ~     (p=0.310 n=5+5)
      IntSqr/20-8                         6.28µs ± 3%     6.21µs ± 2%      ~     (p=0.310 n=5+5)
      IntSqr/30-8                         14.0µs ± 2%     13.9µs ± 2%      ~     (p=0.548 n=5+5)
      IntSqr/50-8                         37.7µs ± 3%     38.3µs ± 2%      ~     (p=0.095 n=5+5)
      IntSqr/80-8                         95.9µs ± 2%     95.1µs ± 1%      ~     (p=0.310 n=5+5)
      IntSqr/100-8                         148µs ± 1%      148µs ± 1%      ~     (p=0.841 n=5+5)
      IntSqr/200-8                         586µs ± 1%      587µs ± 1%      ~     (p=1.000 n=5+5)
      IntSqr/300-8                        1.32ms ± 0%     1.31ms ± 1%    -0.73%  (p=0.032 n=5+5)
      IntSqr/500-8                        2.48ms ± 0%     2.45ms ± 0%    -1.15%  (p=0.008 n=5+5)
      IntSqr/800-8                        4.68ms ± 0%     4.62ms ± 0%    -1.23%  (p=0.008 n=5+5)
      IntSqr/1000-8                       7.57ms ± 0%     7.50ms ± 0%    -0.84%  (p=0.008 n=5+5)
      Mul-8                                311ms ± 0%      308ms ± 0%    -0.81%  (p=0.008 n=5+5)
      Exp3Power/0x10-8                     574ns ± 1%      578ns ± 2%      ~     (p=0.500 n=5+5)
      Exp3Power/0x40-8                     640ns ± 1%      646ns ± 0%      ~     (p=0.056 n=5+5)
      Exp3Power/0x100-8                   1.42µs ± 1%     1.42µs ± 1%      ~     (p=0.246 n=5+5)
      Exp3Power/0x400-8                   8.30µs ± 1%     8.29µs ± 1%      ~     (p=0.802 n=5+5)
      Exp3Power/0x1000-8                  60.0µs ± 0%     59.9µs ± 0%    -0.24%  (p=0.016 n=5+5)
      Exp3Power/0x4000-8                   817µs ± 0%      816µs ± 0%    -0.17%  (p=0.008 n=5+5)
      Exp3Power/0x10000-8                 7.80ms ± 1%     7.70ms ± 0%    -1.23%  (p=0.008 n=5+5)
      Exp3Power/0x40000-8                 73.4ms ± 0%     72.5ms ± 0%    -1.28%  (p=0.008 n=5+5)
      Exp3Power/0x100000-8                 665ms ± 0%      656ms ± 0%    -1.34%  (p=0.008 n=5+5)
      Exp3Power/0x400000-8                 5.99s ± 0%      5.90s ± 0%    -1.40%  (p=0.008 n=5+5)
      Fibo-8                               116ms ± 0%       50ms ± 0%   -57.09%  (p=0.008 n=5+5)
      NatSqr/1-8                           112ns ± 4%      112ns ± 2%      ~     (p=0.968 n=5+5)
      NatSqr/2-8                           251ns ± 2%      250ns ± 1%      ~     (p=0.571 n=5+5)
      NatSqr/3-8                           378ns ± 2%      379ns ± 2%      ~     (p=0.794 n=5+5)
      NatSqr/5-8                           829ns ± 3%      827ns ± 2%      ~     (p=1.000 n=5+5)
      NatSqr/8-8                          1.97µs ± 2%     1.95µs ± 2%      ~     (p=0.310 n=5+5)
      NatSqr/10-8                         3.02µs ± 2%     2.99µs ± 2%      ~     (p=0.421 n=5+5)
      NatSqr/20-8                         6.51µs ± 2%     6.49µs ± 1%      ~     (p=0.841 n=5+5)
      NatSqr/30-8                         14.1µs ± 2%     14.0µs ± 2%      ~     (p=0.841 n=5+5)
      NatSqr/50-8                         38.1µs ± 2%     38.3µs ± 3%      ~     (p=0.690 n=5+5)
      NatSqr/80-8                         95.5µs ± 2%     96.0µs ± 1%      ~     (p=0.421 n=5+5)
      NatSqr/100-8                         150µs ± 1%      148µs ± 2%      ~     (p=0.095 n=5+5)
      NatSqr/200-8                         588µs ± 1%      590µs ± 1%      ~     (p=0.421 n=5+5)
      NatSqr/300-8                        1.32ms ± 1%     1.31ms ± 1%      ~     (p=0.841 n=5+5)
      NatSqr/500-8                        2.50ms ± 0%     2.47ms ± 0%    -1.03%  (p=0.008 n=5+5)
      NatSqr/800-8                        4.70ms ± 0%     4.64ms ± 0%    -1.31%  (p=0.008 n=5+5)
      NatSqr/1000-8                       7.60ms ± 0%     7.52ms ± 0%    -1.01%  (p=0.008 n=5+5)
      ScanPi-8                             326µs ± 0%      326µs ± 0%      ~     (p=0.841 n=5+5)
      StringPiParallel-8                  70.3µs ± 5%     63.8µs ±10%      ~     (p=0.056 n=5+5)
      Scan/10/Base2-8                     1.09µs ± 0%     1.09µs ± 0%      ~     (p=0.317 n=5+5)
      Scan/100/Base2-8                    7.79µs ± 0%     7.78µs ± 0%      ~     (p=0.063 n=5+5)
      Scan/1000/Base2-8                   79.0µs ± 0%     78.9µs ± 0%    -0.18%  (p=0.008 n=5+5)
      Scan/10000/Base2-8                  1.22ms ± 0%     1.22ms ± 0%    -0.15%  (p=0.008 n=5+5)
      Scan/100000/Base2-8                 55.1ms ± 0%     55.2ms ± 0%    +0.20%  (p=0.008 n=5+5)
      Scan/10/Base8-8                      512ns ± 0%      512ns ± 1%      ~     (p=0.810 n=5+5)
      Scan/100/Base8-8                    2.89µs ± 0%     2.89µs ± 0%      ~     (p=0.810 n=5+5)
      Scan/1000/Base8-8                   31.0µs ± 0%     31.0µs ± 0%      ~     (p=0.151 n=5+5)
      Scan/10000/Base8-8                   740µs ± 0%      741µs ± 0%    +0.10%  (p=0.008 n=5+5)
      Scan/100000/Base8-8                 50.6ms ± 0%     50.6ms ± 0%    +0.08%  (p=0.008 n=5+5)
      Scan/10/Base10-8                     487ns ± 0%      487ns ± 0%      ~     (p=0.571 n=5+5)
      Scan/100/Base10-8                   2.67µs ± 0%     2.67µs ± 0%      ~     (p=0.810 n=5+5)
      Scan/1000/Base10-8                  28.7µs ± 0%     28.7µs ± 0%    +0.06%  (p=0.008 n=5+5)
      Scan/10000/Base10-8                  716µs ± 0%      717µs ± 0%      ~     (p=0.222 n=5+5)
      Scan/100000/Base10-8                50.3ms ± 0%     50.3ms ± 0%    +0.10%  (p=0.008 n=5+5)
      Scan/10/Base16-8                     438ns ± 0%      437ns ± 1%      ~     (p=0.786 n=5+5)
      Scan/100/Base16-8                   2.47µs ± 0%     2.47µs ± 0%    -0.19%  (p=0.048 n=5+5)
      Scan/1000/Base16-8                  27.2µs ± 0%     27.3µs ± 0%      ~     (p=0.087 n=5+5)
      Scan/10000/Base16-8                  722µs ± 0%      722µs ± 0%    +0.11%  (p=0.008 n=5+5)
      Scan/100000/Base16-8                52.6ms ± 0%     52.7ms ± 0%    +0.15%  (p=0.008 n=5+5)
      String/10/Base2-8                    247ns ± 2%      248ns ± 1%      ~     (p=0.437 n=5+5)
      String/100/Base2-8                  1.51µs ± 0%     1.51µs ± 0%    -0.37%  (p=0.024 n=5+5)
      String/1000/Base2-8                 13.6µs ± 1%     13.5µs ± 0%      ~     (p=0.095 n=5+5)
      String/10000/Base2-8                 135µs ± 0%      135µs ± 1%      ~     (p=0.841 n=5+5)
      String/100000/Base2-8               1.32ms ± 1%     1.32ms ± 1%      ~     (p=0.690 n=5+5)
      String/10/Base8-8                    169ns ± 1%      169ns ± 1%      ~     (p=1.000 n=5+5)
      String/100/Base8-8                   636ns ± 0%      634ns ± 1%      ~     (p=0.413 n=5+5)
      String/1000/Base8-8                 5.33µs ± 1%     5.32µs ± 0%      ~     (p=0.222 n=5+5)
      String/10000/Base8-8                50.9µs ± 1%     50.7µs ± 0%      ~     (p=0.151 n=5+5)
      String/100000/Base8-8                500µs ± 1%      497µs ± 0%      ~     (p=0.421 n=5+5)
      String/10/Base10-8                   516ns ± 1%      513ns ± 0%    -0.62%  (p=0.016 n=5+4)
      String/100/Base10-8                 1.97µs ± 0%     1.96µs ± 0%      ~     (p=0.667 n=4+5)
      String/1000/Base10-8                12.5µs ± 0%     11.5µs ± 0%    -7.92%  (p=0.008 n=5+5)
      String/10000/Base10-8               57.7µs ± 0%     52.5µs ± 0%    -8.93%  (p=0.008 n=5+5)
      String/100000/Base10-8              25.6ms ± 0%     21.6ms ± 0%   -15.94%  (p=0.008 n=5+5)
      String/10/Base16-8                   150ns ± 1%      149ns ± 0%      ~     (p=0.413 n=5+4)
      String/100/Base16-8                  514ns ± 1%      514ns ± 1%      ~     (p=0.849 n=5+5)
      String/1000/Base16-8                4.01µs ± 0%     4.01µs ± 0%      ~     (p=0.421 n=5+5)
      String/10000/Base16-8               37.8µs ± 1%     37.8µs ± 1%      ~     (p=0.841 n=5+5)
      String/100000/Base16-8               373µs ± 2%      373µs ± 0%      ~     (p=0.421 n=5+5)
      LeafSize/0-8                        6.63ms ± 0%     6.63ms ± 0%      ~     (p=0.730 n=4+5)
      LeafSize/1-8                        74.0µs ± 0%     67.7µs ± 1%    -8.53%  (p=0.008 n=5+5)
      LeafSize/2-8                        74.2µs ± 0%     68.3µs ± 1%    -7.99%  (p=0.008 n=5+5)
      LeafSize/3-8                         379µs ± 0%      309µs ± 0%   -18.52%  (p=0.008 n=5+5)
      LeafSize/4-8                        72.7µs ± 1%     66.7µs ± 0%    -8.37%  (p=0.008 n=5+5)
      LeafSize/5-8                         471µs ± 0%      384µs ± 0%   -18.55%  (p=0.008 n=5+5)
      LeafSize/6-8                         378µs ± 0%      308µs ± 0%   -18.59%  (p=0.008 n=5+5)
      LeafSize/7-8                         245µs ± 0%      204µs ± 1%   -16.75%  (p=0.008 n=5+5)
      LeafSize/8-8                        73.4µs ± 0%     66.9µs ± 1%    -8.79%  (p=0.008 n=5+5)
      LeafSize/9-8                         538µs ± 0%      437µs ± 0%   -18.75%  (p=0.008 n=5+5)
      LeafSize/10-8                        472µs ± 0%      396µs ± 1%   -16.01%  (p=0.008 n=5+5)
      LeafSize/11-8                        460µs ± 0%      374µs ± 0%   -18.58%  (p=0.008 n=5+5)
      LeafSize/12-8                        378µs ± 0%      308µs ± 0%   -18.38%  (p=0.008 n=5+5)
      LeafSize/13-8                        343µs ± 0%      284µs ± 0%   -17.30%  (p=0.008 n=5+5)
      LeafSize/14-8                        248µs ± 0%      206µs ± 0%   -16.94%  (p=0.008 n=5+5)
      LeafSize/15-8                        169µs ± 0%      144µs ± 0%   -14.69%  (p=0.008 n=5+5)
      LeafSize/16-8                       72.9µs ± 0%     66.8µs ± 1%    -8.27%  (p=0.008 n=5+5)
      LeafSize/32-8                       82.5µs ± 0%     76.7µs ± 0%    -7.04%  (p=0.008 n=5+5)
      LeafSize/64-8                        134µs ± 0%      129µs ± 0%    -3.80%  (p=0.008 n=5+5)
      ProbablyPrime/n=0-8                 44.2ms ± 0%     43.4ms ± 0%    -1.95%  (p=0.008 n=5+5)
      ProbablyPrime/n=1-8                 64.9ms ± 0%     64.0ms ± 0%    -1.27%  (p=0.008 n=5+5)
      ProbablyPrime/n=5-8                  147ms ± 0%      146ms ± 0%    -0.58%  (p=0.008 n=5+5)
      ProbablyPrime/n=10-8                 250ms ± 0%      249ms ± 0%    -0.35%  (p=0.008 n=5+5)
      ProbablyPrime/n=20-8                 456ms ± 0%      455ms ± 0%    -0.18%  (p=0.008 n=5+5)
      ProbablyPrime/Lucas-8               23.6ms ± 0%     22.7ms ± 0%    -3.74%  (p=0.008 n=5+5)
      ProbablyPrime/MillerRabinBase2-8    20.7ms ± 0%     20.6ms ± 0%      ~     (p=0.421 n=5+5)
      FloatSqrt/64-8                      2.25µs ± 1%     2.29µs ± 0%    +1.48%  (p=0.008 n=5+5)
      FloatSqrt/128-8                     4.86µs ± 1%     4.92µs ± 1%    +1.21%  (p=0.032 n=5+5)
      FloatSqrt/256-8                     13.6µs ± 0%     13.7µs ± 1%    +1.31%  (p=0.032 n=5+5)
      FloatSqrt/1000-8                    70.0µs ± 1%     70.1µs ± 0%      ~     (p=0.690 n=5+5)
      FloatSqrt/10000-8                   1.92ms ± 0%     1.90ms ± 0%    -0.59%  (p=0.008 n=5+5)
      FloatSqrt/100000-8                  55.3ms ± 0%     54.8ms ± 0%    -1.01%  (p=0.008 n=5+5)
      FloatSqrt/1000000-8                  4.56s ± 0%      4.50s ± 0%    -1.28%  (p=0.008 n=5+5)
      
      name                              old speed      new speed       delta
      AddVV/1-8                         2.97GB/s ± 0%   5.56GB/s ± 0%   +86.85%  (p=0.008 n=5+5)
      AddVV/2-8                         9.47GB/s ± 0%  10.66GB/s ± 0%   +12.50%  (p=0.008 n=5+5)
      AddVV/3-8                         12.4GB/s ± 0%   14.7GB/s ± 0%   +19.10%  (p=0.008 n=5+5)
      AddVV/4-8                         14.6GB/s ± 0%   18.9GB/s ± 0%   +29.63%  (p=0.016 n=4+5)
      AddVV/5-8                         16.4GB/s ± 0%   22.0GB/s ± 0%   +34.47%  (p=0.016 n=5+4)
      AddVV/10-8                        21.7GB/s ± 0%   35.5GB/s ± 0%   +63.89%  (p=0.008 n=5+5)
      AddVV/100-8                       29.4GB/s ± 0%   68.0GB/s ± 0%  +131.38%  (p=0.008 n=5+5)
      AddVV/1000-8                      31.7GB/s ± 0%   61.9GB/s ± 0%   +95.43%  (p=0.008 n=5+5)
      AddVV/10000-8                     31.2GB/s ± 0%   56.4GB/s ± 0%   +80.83%  (p=0.008 n=5+5)
      AddVV/100000-8                    25.9GB/s ± 3%   41.4GB/s ± 0%   +59.98%  (p=0.008 n=5+5)
      SubVV/1-8                         2.97GB/s ± 0%   5.56GB/s ± 0%   +86.97%  (p=0.016 n=4+5)
      SubVV/2-8                         9.47GB/s ± 0%  10.66GB/s ± 0%   +12.51%  (p=0.008 n=5+5)
      SubVV/3-8                         12.4GB/s ± 0%   14.8GB/s ± 0%   +19.23%  (p=0.016 n=4+5)
      SubVV/4-8                         14.6GB/s ± 0%   18.9GB/s ± 0%   +29.56%  (p=0.008 n=5+5)
      SubVV/5-8                         16.4GB/s ± 0%   22.0GB/s ± 0%   +34.47%  (p=0.016 n=4+5)
      SubVV/10-8                        21.7GB/s ± 0%   35.5GB/s ± 0%   +63.89%  (p=0.008 n=5+5)
      SubVV/100-8                       29.4GB/s ± 0%   68.0GB/s ± 0%  +131.38%  (p=0.008 n=5+5)
      SubVV/1000-8                      31.6GB/s ± 0%   80.1GB/s ± 0%  +153.08%  (p=0.008 n=5+5)
      SubVV/10000-8                     31.2GB/s ± 0%   56.7GB/s ± 0%   +81.79%  (p=0.008 n=5+5)
      SubVV/100000-8                    29.1GB/s ±10%   29.0GB/s ±18%      ~     (p=0.690 n=5+5)
      AddVW/1-8                          859MB/s ± 0%    859MB/s ± 0%    -0.01%  (p=0.008 n=5+5)
      AddVW/2-8                          811MB/s ± 1%    814MB/s ± 0%      ~     (p=0.413 n=5+4)
      AddVW/3-8                         2.08GB/s ± 0%   2.08GB/s ± 0%      ~     (p=0.206 n=5+5)
      AddVW/4-8                         2.46GB/s ± 0%   2.46GB/s ± 0%      ~     (p=0.056 n=5+5)
      AddVW/5-8                         2.75GB/s ± 0%   2.75GB/s ± 0%      ~     (p=0.508 n=5+5)
      AddVW/10-8                        3.63GB/s ± 0%   3.63GB/s ± 0%      ~     (p=0.214 n=5+5)
      AddVW/100-8                       4.79GB/s ± 0%   4.79GB/s ± 0%      ~     (p=0.500 n=5+5)
      AddVW/1000-8                      5.27GB/s ± 0%   5.25GB/s ± 0%    -0.43%  (p=0.008 n=5+5)
      AddVW/10000-8                     5.30GB/s ± 0%   5.30GB/s ± 0%      ~     (p=0.397 n=5+5)
      AddVW/100000-8                    5.27GB/s ± 1%   5.25GB/s ± 1%      ~     (p=0.690 n=5+5)
      AddMulVVW/1-8                     1.92GB/s ± 0%   1.96GB/s ± 1%    +1.95%  (p=0.008 n=5+5)
      AddMulVVW/2-8                     2.16GB/s ± 1%   2.25GB/s ± 1%    +4.32%  (p=0.008 n=5+5)
      AddMulVVW/3-8                     2.39GB/s ± 1%   2.25GB/s ± 3%    -5.79%  (p=0.008 n=5+5)
      AddMulVVW/4-8                     2.00GB/s ± 0%   2.31GB/s ± 1%   +15.31%  (p=0.008 n=5+5)
      AddMulVVW/5-8                     2.22GB/s ± 0%   2.14GB/s ± 0%    -3.86%  (p=0.008 n=5+5)
      AddMulVVW/10-8                    2.15GB/s ± 1%   2.25GB/s ± 0%    +5.03%  (p=0.008 n=5+5)
      AddMulVVW/100-8                   2.09GB/s ± 0%   2.14GB/s ± 0%    +2.25%  (p=0.008 n=5+5)
      AddMulVVW/1000-8                  2.04GB/s ± 0%   2.38GB/s ± 0%   +16.52%  (p=0.008 n=5+5)
      AddMulVVW/10000-8                 2.03GB/s ± 0%   2.10GB/s ± 0%    +3.64%  (p=0.008 n=5+5)
      AddMulVVW/100000-8                2.02GB/s ± 0%   2.02GB/s ± 1%      ~     (p=0.690 n=5+5)
      
      Change-Id: Ie482d67a7dbb5af6f5d81af2b3d9d14bd66336db
      Reviewed-on: https://go-review.googlesource.com/77831Reviewed-by: 's avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c4f3fe95
  2. 05 Mar, 2018 10 commits
  3. 04 Mar, 2018 7 commits
  4. 03 Mar, 2018 7 commits
    • Giovanni Bajo's avatar
      test: port a nil-check interface test from asm_test · 8ce74b7d
      Giovanni Bajo authored
      Change-Id: I69c1688506d1aeca655047acf35d1bff966fc01e
      Reviewed-on: https://go-review.googlesource.com/98442
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      8ce74b7d
    • Giovanni Bajo's avatar
      test: use the version of Go used to run run.go · ec0b8c05
      Giovanni Bajo authored
      Currently, the top-level testsuite always uses whatever version
      of Go is found in the PATH to execute all the tests. This
      forces the developers to tweak the PATH to run the testsuite.
      
      Change it to use the same version of Go used to run run.go.
      This allows developers to run the testsuite using the tip
      compiler by simply saying "../bin/go run run.go".
      
      I think this is a better solution compared to always forcing
      "../bin/go", because it allows developers to run the testsuite
      using different Go versions, for instance to check if a new
      test is fixed in tip compared to the installed compiler.
      
      Fixes #24217
      
      Change-Id: I41b299c753b6e77c41e28be9091b2b630efea9d2
      Reviewed-on: https://go-review.googlesource.com/98439
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      ec0b8c05
    • Pascal S. de Kloe's avatar
      encoding/json: apply conventional error handling in decoder · 74a92b8e
      Pascal S. de Kloe authored
      name                            old time/op    new time/op    delta
      CodeEncoder-12                    1.89ms ± 1%    1.91ms ± 0%   +1.16%  (p=0.000 n=20+19)
      CodeMarshal-12                    2.09ms ± 1%    2.12ms ± 0%   +1.63%  (p=0.000 n=17+18)
      CodeDecoder-12                    8.43ms ± 1%    8.32ms ± 1%   -1.35%  (p=0.000 n=18+20)
      UnicodeDecoder-12                  399ns ± 0%     339ns ± 0%  -15.00%  (p=0.000 n=20+19)
      DecoderStream-12                   281ns ± 1%     231ns ± 0%  -17.91%  (p=0.000 n=20+16)
      CodeUnmarshal-12                  9.35ms ± 2%    9.15ms ± 2%   -2.11%  (p=0.000 n=20+20)
      CodeUnmarshalReuse-12             8.41ms ± 2%    8.29ms ± 2%   -1.34%  (p=0.000 n=20+20)
      UnmarshalString-12                81.2ns ± 2%    74.0ns ± 4%   -8.89%  (p=0.000 n=20+20)
      UnmarshalFloat64-12               71.1ns ± 2%    64.3ns ± 1%   -9.60%  (p=0.000 n=20+19)
      UnmarshalInt64-12                 60.6ns ± 2%    53.2ns ± 0%  -12.28%  (p=0.000 n=18+18)
      Issue10335-12                     96.9ns ± 0%    87.7ns ± 1%   -9.52%  (p=0.000 n=17+20)
      Unmapped-12                        247ns ± 4%     231ns ± 3%   -6.34%  (p=0.000 n=20+20)
      TypeFieldsCache/MissTypes1-12     11.1µs ± 0%    11.1µs ± 0%     ~     (p=0.376 n=19+20)
      TypeFieldsCache/MissTypes10-12    33.9µs ± 0%    33.8µs ± 0%   -0.32%  (p=0.000 n=18+9)
      
      name                            old speed      new speed      delta
      CodeEncoder-12                  1.03GB/s ± 1%  1.01GB/s ± 0%   -1.15%  (p=0.000 n=20+19)
      CodeMarshal-12                   930MB/s ± 1%   915MB/s ± 0%   -1.60%  (p=0.000 n=17+18)
      CodeDecoder-12                   230MB/s ± 1%   233MB/s ± 1%   +1.37%  (p=0.000 n=18+20)
      UnicodeDecoder-12               35.0MB/s ± 0%  41.2MB/s ± 0%  +17.60%  (p=0.000 n=20+19)
      CodeUnmarshal-12                 208MB/s ± 2%   212MB/s ± 2%   +2.16%  (p=0.000 n=20+20)
      
      name                            old alloc/op   new alloc/op   delta
      Issue10335-12                       184B ± 0%      184B ± 0%     ~     (all equal)
      Unmapped-12                         216B ± 0%      216B ± 0%     ~     (all equal)
      
      name                            old allocs/op  new allocs/op  delta
      Issue10335-12                       3.00 ± 0%      3.00 ± 0%     ~     (all equal)
      Unmapped-12                         4.00 ± 0%      4.00 ± 0%     ~     (all equal)
      
      Change-Id: I4b1a87a205da2ef9a572f86f85bc833653c61570
      Reviewed-on: https://go-review.googlesource.com/98440Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      74a92b8e
    • Tobias Klauser's avatar
      runtime: use vDSO for clock_gettime on linux/arm · 51b02711
      Tobias Klauser authored
      Use the __vdso_clock_gettime fast path via the vDSO on linux/arm to
      speed up nanotime and walltime. This results in the following
      performance improvement for time.Now on a RaspberryPi 3 (running
      32bit Raspbian, i.e. GOOS=linux/GOARCH=arm):
      
      name     old time/op  new time/op  delta
      TimeNow  0.99µs ± 0%  0.39µs ± 1%  -60.74%  (p=0.000 n=12+20)
      
      Change-Id: I3598278a6c88d7f6a6ce66c56b9d25f9dd2f4c9a
      Reviewed-on: https://go-review.googlesource.com/98095Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      51b02711
    • Tobias Klauser's avatar
      runtime: remove unused __vdso_time_sym · c69f60d0
      Tobias Klauser authored
      It's unused since https://golang.org/cl/99320043
      
      Change-Id: I74d69ff894aa2fb556f1c2083406c118c559d91b
      Reviewed-on: https://go-review.googlesource.com/98195
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      c69f60d0
    • Keith Randall's avatar
      internal/bytealg: move equal functions to bytealg · 1dfa380e
      Keith Randall authored
      Move bytes.Equal, runtime.memequal, and runtime.memequal_varlen
      to the bytealg package.
      
      Update #19792
      
      Change-Id: Ic4175e952936016ea0bda6c7c3dbb33afdc8e4ac
      Reviewed-on: https://go-review.googlesource.com/98355
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      1dfa380e
    • Joe Tsai's avatar
      encoding/json: use sync.Map for field cache · f0756ca2
      Joe Tsai authored
      The previous type cache is quadratic in time in the situation where
      new types are continually encountered. Now that it is possible to dynamically
      create new types with the reflect package, this can cause json to
      perform very poorly.
      
      Switch to sync.Map which does well when the cache has hit steady state,
      but also handles occasional updates in better than quadratic time.
      
      benchmark                                     old ns/op      new ns/op     delta
      BenchmarkTypeFieldsCache/MissTypes1-8         14817          16202         +9.35%
      BenchmarkTypeFieldsCache/MissTypes10-8        70926          69144         -2.51%
      BenchmarkTypeFieldsCache/MissTypes100-8       976467         208973        -78.60%
      BenchmarkTypeFieldsCache/MissTypes1000-8      79520162       1750371       -97.80%
      BenchmarkTypeFieldsCache/MissTypes10000-8     6873625837     16847806      -99.75%
      BenchmarkTypeFieldsCache/HitTypes1000-8       7.51           8.80          +17.18%
      BenchmarkTypeFieldsCache/HitTypes10000-8      7.58           8.68          +14.51%
      
      The old implementation takes 12 minutes just to build a cache of size 1e5
      due to the quadratic behavior. I did not bother benchmark sizes above that.
      
      Change-Id: I5e6facc1eb8e1b80e5ca285e4dd2cc8815618dad
      Reviewed-on: https://go-review.googlesource.com/76850
      Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
      Reviewed-by: 's avatarBryan Mills <bcmills@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      f0756ca2
  5. 02 Mar, 2018 15 commits
    • Shamil Garatuev's avatar
      internal/syscall/windows/registry: improve ReadSubKeyNames permissions · e658b85f
      Shamil Garatuev authored
      Make ReadSubKeyNames work even if key is opened with only
      ENUMERATE_SUB_KEYs access rights mask.
      
      Fixes #23869
      
      Change-Id: I138bd51715fdbc3bda05607c64bde1150f4fe6b2
      Reviewed-on: https://go-review.googlesource.com/97435Reviewed-by: 's avatarAlex Brainman <alex.brainman@gmail.com>
      e658b85f
    • Keith Randall's avatar
      internal/bytealg: move IndexByte asssembly to the new bytealg package · 403ab0f2
      Keith Randall authored
      Move the IndexByte function from the runtime to a new bytealg package.
      The new package will eventually hold all the optimized assembly for
      groveling through byte slices and strings. It seems a better home for
      this code than randomly keeping it in runtime.
      
      Once this is in, the next step is to move the other functions
      (Compare, Equal, ...).
      
      Update #19792
      
      This change seems complicated enough that we might just declare
      "not worth it" and abandon.  Opinions welcome.
      
      The core assembly is all unchanged, except minor modifications where
      the code reads cpu feature bits.
      
      The wrapper functions have been cleaned up as they are now actually
      checked by vet.
      
      Change-Id: I9fa75bee5d85db3a65b3fd3b7997e60367523796
      Reviewed-on: https://go-review.googlesource.com/98016
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      403ab0f2
    • Brad Fitzpatrick's avatar
      net: skip flaky TestLookupLongTXT for now · dcedcaa5
      Brad Fitzpatrick authored
      Flaky tests failing trybots help nobody.
      
      Updates #22857
      
      Change-Id: I87bc018651ab4fe02560a6d24c08a1d7ccd8ba37
      Reviewed-on: https://go-review.googlesource.com/97416Reviewed-by: 's avatarIan Lance Taylor <iant@golang.org>
      dcedcaa5
    • Damien Mathieu's avatar
      net/http: lock the read-only mutex in shouldRedirect · 2fd1b523
      Damien Mathieu authored
      Since that method uses 'mux.m', we need to lock the mutex to avoid data races.
      
      Change-Id: I998448a6e482b5d6a1b24f3354bb824906e23172
      GitHub-Last-Rev: 163a7d4942e793b328e05a7eb91f6d3fdc4ba12b
      GitHub-Pull-Request: golang/go#23994
      Reviewed-on: https://go-review.googlesource.com/96575Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      2fd1b523
    • David du Colombier's avatar
      cmd/compile: skip TestEmptyDwarfRanges on Plan 9 · 1c9297c3
      David du Colombier authored
      TestEmptyDwarfRanges has been added in CL 94816.
      This test is failing on Plan 9 because executables
      don't have a DWARF symbol table.
      
      Fixes #24226.
      
      Change-Id: Iff7e34b8c2703a2f19ee8087a4d64d0bb98496cd
      Reviewed-on: https://go-review.googlesource.com/98275Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      1c9297c3
    • Hana Kim's avatar
      internal/trace: Revert "remove backlinks from span/task end to start" · d3562c9d
      Hana Kim authored
      This reverts commit 16398894.
      This broke TestUserTaskSpan test.
      
      Change-Id: If5ff8bdfe84e8cb30787b03ead87205ece3d5601
      Reviewed-on: https://go-review.googlesource.com/98235Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      d3562c9d
    • Hana Kim's avatar
      internal/trace: remove backlinks from span/task end to start · 16398894
      Hana Kim authored
      Even though undocumented, the assumption is the Event's link field
      points to the following event in the future. The new span/task event
      processing breaks the assumption.
      
      Change-Id: I4ce2f30c67c4f525ec0a121a7e43d8bdd2ec3f77
      Reviewed-on: https://go-review.googlesource.com/96395Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      16398894
    • Alberto Donizetti's avatar
      test/codegen: add copyright headers to new codegen files · 644b2daf
      Alberto Donizetti authored
      Change-Id: I9fe6572d1043ef9ee09c0925059ded554ad24c6b
      Reviewed-on: https://go-review.googlesource.com/98215Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      644b2daf
    • Michael Fraenkel's avatar
      cmd/compile: convert type during finishcompare · 5b071bfa
      Michael Fraenkel authored
      When recursively calling walkexpr, r.Type is still the untyped value.
      It then sometimes recursively calls finishcompare, which complains that
      you can't compare the resulting expression to that untyped value.
      
      Updates #23834.
      
      Change-Id: I6b7acd3970ceaff8da9216bfa0ae24aca5dee828
      Reviewed-on: https://go-review.googlesource.com/97856Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      5b071bfa
    • Than McIntosh's avatar
      cmd/compile: add DWARF register mappings for ARM64. · 9b95611e
      Than McIntosh authored
      Add DWARF register mappings for ARM64, so that that arch will become
      usable with "-dwarflocationlists". [NB: I've plugged in a set of
      numbers from the doc, but this will require additional manual testing.]
      
      Change-Id: Id9aa63857bc8b4f5c825f49274101cf372e9e856
      Reviewed-on: https://go-review.googlesource.com/82515Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      9b95611e
    • Alessandro Arzilli's avatar
      cmd/link: fix up debug_range for dsymutil (revert CL 72371) · eca41af0
      Alessandro Arzilli authored
      Dsymutil, an utility used on macOS when externally linking executables,
      does not support base address selector entries in debug_ranges.
      
      CL 73271 worked around this problem by removing base address selectors
      and emitting CU-relative relocations for each list entry.
      
      This commit, as an optimization, reintroduces the base address
      selectors and changes the linker to remove them again, but only when it
      knows that it will have to invoke the external linker on macOS.
      
      Compilecmp comparing master with a branch that has scope tracking
      always enabled:
      
      completed   15 of   15, estimated time remaining 0s (eta 2:43PM)
      name        old time/op       new time/op       delta
      Template          272ms ± 8%        257ms ± 5%  -5.33%  (p=0.000 n=15+14)
      Unicode           124ms ± 7%        122ms ± 5%    ~     (p=0.210 n=14+14)
      GoTypes           873ms ± 3%        870ms ± 5%    ~     (p=0.856 n=15+13)
      Compiler          4.49s ± 2%        4.49s ± 5%    ~     (p=0.982 n=14+14)
      SSA               11.8s ± 4%        11.8s ± 3%    ~     (p=0.653 n=15+15)
      Flate             163ms ± 6%        164ms ± 9%    ~     (p=0.914 n=14+15)
      GoParser          203ms ± 6%        202ms ±10%    ~     (p=0.571 n=14+14)
      Reflect           547ms ± 7%        542ms ± 4%    ~     (p=0.914 n=15+14)
      Tar               244ms ± 7%        237ms ± 3%  -2.80%  (p=0.002 n=14+13)
      XML               289ms ± 6%        289ms ± 5%    ~     (p=0.839 n=14+14)
      [Geo mean]        537ms             531ms       -1.10%
      
      name        old user-time/op  new user-time/op  delta
      Template          360ms ± 4%        341ms ± 7%  -5.16%  (p=0.000 n=14+14)
      Unicode           189ms ±11%        190ms ± 8%    ~     (p=0.844 n=15+15)
      GoTypes           1.13s ± 4%        1.14s ± 7%    ~     (p=0.582 n=15+14)
      Compiler          5.34s ± 2%        5.40s ± 4%  +1.19%  (p=0.036 n=11+13)
      SSA               14.7s ± 2%        14.7s ± 3%    ~     (p=0.602 n=15+15)
      Flate             211ms ± 7%        214ms ± 8%    ~     (p=0.252 n=14+14)
      GoParser          267ms ±12%        266ms ± 2%    ~     (p=0.837 n=15+11)
      Reflect           706ms ± 4%        701ms ± 3%    ~     (p=0.213 n=14+12)
      Tar               331ms ± 9%        320ms ± 5%  -3.30%  (p=0.025 n=15+14)
      XML               378ms ± 4%        373ms ± 6%    ~     (p=0.253 n=14+15)
      [Geo mean]        704ms             700ms       -0.58%
      
      name        old alloc/op      new alloc/op      delta
      Template         38.0MB ± 0%       38.4MB ± 0%  +1.12%  (p=0.000 n=15+15)
      Unicode          28.8MB ± 0%       28.8MB ± 0%  +0.17%  (p=0.000 n=15+15)
      GoTypes           112MB ± 0%        114MB ± 0%  +1.47%  (p=0.000 n=15+15)
      Compiler          465MB ± 0%        473MB ± 0%  +1.71%  (p=0.000 n=15+15)
      SSA              1.48GB ± 0%       1.53GB ± 0%  +3.07%  (p=0.000 n=15+15)
      Flate            24.3MB ± 0%       24.7MB ± 0%  +1.67%  (p=0.000 n=15+15)
      GoParser         30.7MB ± 0%       31.0MB ± 0%  +1.15%  (p=0.000 n=12+15)
      Reflect          76.3MB ± 0%       77.1MB ± 0%  +0.97%  (p=0.000 n=15+15)
      Tar              39.2MB ± 0%       39.6MB ± 0%  +0.91%  (p=0.000 n=15+15)
      XML              41.5MB ± 0%       42.0MB ± 0%  +1.29%  (p=0.000 n=15+15)
      [Geo mean]       77.5MB            78.6MB       +1.35%
      
      name        old allocs/op     new allocs/op     delta
      Template           385k ± 0%         387k ± 0%  +0.51%  (p=0.000 n=15+15)
      Unicode            342k ± 0%         343k ± 0%  +0.10%  (p=0.000 n=14+15)
      GoTypes           1.19M ± 0%        1.19M ± 0%  +0.62%  (p=0.000 n=15+15)
      Compiler          4.51M ± 0%        4.54M ± 0%  +0.50%  (p=0.000 n=14+15)
      SSA               12.2M ± 0%        12.4M ± 0%  +1.12%  (p=0.000 n=14+15)
      Flate              234k ± 0%         236k ± 0%  +0.60%  (p=0.000 n=15+15)
      GoParser           318k ± 0%         320k ± 0%  +0.60%  (p=0.000 n=15+15)
      Reflect            974k ± 0%         977k ± 0%  +0.27%  (p=0.000 n=15+15)
      Tar                395k ± 0%         397k ± 0%  +0.37%  (p=0.000 n=14+15)
      XML                404k ± 0%         407k ± 0%  +0.53%  (p=0.000 n=15+15)
      [Geo mean]         794k              798k       +0.52%
      
      name        old text-bytes    new text-bytes    delta
      HelloSize         680kB ± 0%        680kB ± 0%    ~     (all equal)
      
      name        old data-bytes    new data-bytes    delta
      HelloSize        9.62kB ± 0%       9.62kB ± 0%    ~     (all equal)
      
      name        old bss-bytes     new bss-bytes     delta
      HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize        1.11MB ± 0%       1.13MB ± 0%  +1.85%  (p=0.000 n=15+15)
      
      Change-Id: I61c98ba0340cb798034b2bb55e3ab3a58ac1cf23
      Reviewed-on: https://go-review.googlesource.com/98075Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      eca41af0
    • Heschi Kreinick's avatar
      cmd/compile/internal/ssa: batch up all zero-width instructions · 9dc351be
      Heschi Kreinick authored
      When generating location lists, batch up changes for all zero-width
      instructions, not just phis. This prevents the creation of location list
      entries that don't actually cover any instructions.
      
      This isn't perfect because of the caveats in the prior CL (Copy is
      zero-width sometimes) but in practice this seems to fix all of the empty
      lists in std.
      
      Change-Id: Ice4a9ade36b6b24ca111d1494c414eec96e5af25
      Reviewed-on: https://go-review.googlesource.com/97958
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarDavid Chase <drchase@google.com>
      9dc351be
    • Heschi Kreinick's avatar
      cmd/compile/internal/ssa: note zero-width Ops · caa1b4af
      Heschi Kreinick authored
      Add a bool to opInfo to indicate if an Op never results in any
      instructions. This is a conservative approximation: some operations,
      like Copy, may or may not generate code depending on their arguments.
      
      I built the list by reading each arch's ssaGenValue function. Hopefully
      I got them all.
      
      Change-Id: I130b251b65f18208294e129bb7ddc3f91d57d31d
      Reviewed-on: https://go-review.googlesource.com/97957Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      caa1b4af
    • Zhou Peng's avatar
      runtime: fix typo, func comments should start with function name · b77aad08
      Zhou Peng authored
      Change-Id: I289af4884583537639800e37928c22814d38cba9
      Reviewed-on: https://go-review.googlesource.com/98115Reviewed-by: 's avatarAlberto Donizetti <alb.donizetti@gmail.com>
      b77aad08
    • Alessandro Arzilli's avatar
      cmd/compile: optimize scope tracking · 3fca7306
      Alessandro Arzilli authored
      1. Detect and remove the markers of lexical scopes that don't contain
      any variables early in noder, instead of waiting until the end of DWARF
      generation.
      This saves memory by never allocating some of the markers and optimizes
      some of the algorithms that depend on the number of scopes.
      
      2. Assign scopes to Progs by doing, for each Prog, a binary search over
      the markers array. This is faster, compared to sorting the Prog list
      because there are fewer markers than there are Progs.
      
      completed   15 of   15, estimated time remaining 0s (eta 2:30PM)
      name        old time/op       new time/op       delta
      Template          274ms ± 5%        260ms ± 6%  -4.91%  (p=0.000 n=15+15)
      Unicode           126ms ± 5%        127ms ± 9%    ~     (p=0.856 n=13+15)
      GoTypes           861ms ± 5%        857ms ± 4%    ~     (p=0.595 n=15+15)
      Compiler          4.11s ± 4%        4.12s ± 5%    ~     (p=1.000 n=15+15)
      SSA               10.7s ± 2%        10.9s ± 4%  +2.01%  (p=0.002 n=14+14)
      Flate             163ms ± 4%        166ms ± 9%    ~     (p=0.134 n=14+15)
      GoParser          203ms ± 4%        205ms ± 6%    ~     (p=0.461 n=15+15)
      Reflect           544ms ± 5%        549ms ± 4%    ~     (p=0.174 n=15+15)
      Tar               249ms ± 9%        245ms ± 6%    ~     (p=0.285 n=15+15)
      XML               286ms ± 4%        291ms ± 5%    ~     (p=0.081 n=15+15)
      [Geo mean]        528ms             529ms       +0.14%
      
      name        old user-time/op  new user-time/op  delta
      Template          358ms ± 7%        354ms ± 5%    ~     (p=0.242 n=14+15)
      Unicode           189ms ±11%        191ms ±10%    ~     (p=0.438 n=15+15)
      GoTypes           1.15s ± 4%        1.14s ± 3%    ~     (p=0.405 n=15+15)
      Compiler          5.36s ± 6%        5.35s ± 5%    ~     (p=0.588 n=15+15)
      SSA               14.6s ± 3%        15.0s ± 4%  +2.58%  (p=0.000 n=15+15)
      Flate             214ms ±12%        216ms ± 8%    ~     (p=0.539 n=15+15)
      GoParser          267ms ± 6%        270ms ± 5%    ~     (p=0.569 n=15+15)
      Reflect           712ms ± 5%        709ms ± 4%    ~     (p=0.894 n=15+15)
      Tar               329ms ± 8%        330ms ± 5%    ~     (p=0.974 n=14+15)
      XML               371ms ± 3%        381ms ± 5%  +2.85%  (p=0.002 n=13+15)
      [Geo mean]        705ms             709ms       +0.62%
      
      name        old alloc/op      new alloc/op      delta
      Template         38.0MB ± 0%       38.4MB ± 0%  +1.27%  (p=0.000 n=15+14)
      Unicode          28.8MB ± 0%       28.8MB ± 0%  +0.16%  (p=0.000 n=15+14)
      GoTypes           112MB ± 0%        114MB ± 0%  +1.64%  (p=0.000 n=15+15)
      Compiler          465MB ± 0%        474MB ± 0%  +1.91%  (p=0.000 n=15+15)
      SSA              1.48GB ± 0%       1.53GB ± 0%  +3.32%  (p=0.000 n=15+15)
      Flate            24.3MB ± 0%       24.8MB ± 0%  +1.77%  (p=0.000 n=14+15)
      GoParser         30.7MB ± 0%       31.1MB ± 0%  +1.27%  (p=0.000 n=15+15)
      Reflect          76.3MB ± 0%       77.1MB ± 0%  +1.03%  (p=0.000 n=15+15)
      Tar              39.2MB ± 0%       39.6MB ± 0%  +1.02%  (p=0.000 n=13+15)
      XML              41.5MB ± 0%       42.1MB ± 0%  +1.45%  (p=0.000 n=15+15)
      [Geo mean]       77.5MB            78.7MB       +1.48%
      
      name        old allocs/op     new allocs/op     delta
      Template           385k ± 0%         387k ± 0%  +0.54%  (p=0.000 n=15+15)
      Unicode            342k ± 0%         343k ± 0%  +0.10%  (p=0.000 n=15+15)
      GoTypes           1.19M ± 0%        1.19M ± 0%  +0.64%  (p=0.000 n=14+15)
      Compiler          4.51M ± 0%        4.54M ± 0%  +0.53%  (p=0.000 n=15+15)
      SSA               12.2M ± 0%        12.4M ± 0%  +1.16%  (p=0.000 n=15+15)
      Flate              234k ± 0%         236k ± 0%  +0.63%  (p=0.000 n=14+15)
      GoParser           318k ± 0%         320k ± 0%  +0.63%  (p=0.000 n=15+15)
      Reflect            974k ± 0%         977k ± 0%  +0.28%  (p=0.000 n=15+15)
      Tar                395k ± 0%         397k ± 0%  +0.38%  (p=0.000 n=15+13)
      XML                404k ± 0%         407k ± 0%  +0.55%  (p=0.000 n=15+15)
      [Geo mean]         794k              799k       +0.55%
      
      name        old text-bytes    new text-bytes    delta
      HelloSize         680kB ± 0%        680kB ± 0%    ~     (all equal)
      
      name        old data-bytes    new data-bytes    delta
      HelloSize        9.62kB ± 0%       9.62kB ± 0%    ~     (all equal)
      
      name        old bss-bytes     new bss-bytes     delta
      HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize        1.11MB ± 0%       1.12MB ± 0%  +1.11%  (p=0.000 n=15+15)
      
      Change-Id: I95a0173ee28c52be1a4851d2a6e389529e74bf28
      Reviewed-on: https://go-review.googlesource.com/95396
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarMatthew Dempsky <mdempsky@google.com>
      Reviewed-by: 's avatarHeschi Kreinick <heschi@google.com>
      3fca7306