- 06 Mar, 2018 2 commits
-
-
Fangming.Fang authored
This patch makes use of arm64 AES instructions to accelerate AES computation and only supports optimization on Linux for arm64 name old time/op new time/op delta Encrypt-32 255ns ± 0% 26ns ± 0% -89.73% Decrypt-32 256ns ± 0% 26ns ± 0% -89.77% Expand-32 990ns ± 5% 901ns ± 0% -9.05% name old speed new speed delta Encrypt-32 62.5MB/s ± 0% 610.4MB/s ± 0% +876.39% Decrypt-32 62.3MB/s ± 0% 610.2MB/s ± 0% +879.6% Fixes #18498 Change-Id: If416e5a151785325527b32ff72f6da3812493ed0 Reviewed-on: https://go-review.googlesource.com/64490 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
-
erifan01 authored
The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance. By unrolling the cycle 4x and 2x, and using "LDP", "STP" instructions, this CL can reduce the "load" cost and improve performance. Benchmarks: name old time/op new time/op delta AddVV/1-8 21.5ns ± 0% 11.5ns ± 0% -46.51% (p=0.008 n=5+5) AddVV/2-8 13.5ns ± 0% 12.0ns ± 0% -11.11% (p=0.008 n=5+5) AddVV/3-8 15.5ns ± 0% 13.0ns ± 0% -16.13% (p=0.008 n=5+5) AddVV/4-8 17.5ns ± 0% 13.5ns ± 0% -22.86% (p=0.008 n=5+5) AddVV/5-8 19.5ns ± 0% 14.5ns ± 0% -25.64% (p=0.008 n=5+5) AddVV/10-8 29.5ns ± 0% 18.0ns ± 0% -38.98% (p=0.008 n=5+5) AddVV/100-8 217ns ± 0% 94ns ± 0% -56.64% (p=0.008 n=5+5) AddVV/1000-8 2.02µs ± 0% 1.03µs ± 0% -48.85% (p=0.008 n=5+5) AddVV/10000-8 20.5µs ± 0% 11.3µs ± 0% -44.70% (p=0.008 n=5+5) AddVV/100000-8 247µs ± 3% 154µs ± 0% -37.52% (p=0.008 n=5+5) SubVV/1-8 21.5ns ± 0% 11.5ns ± 0% ~ (p=0.079 n=4+5) SubVV/2-8 13.5ns ± 0% 12.0ns ± 0% -11.11% (p=0.008 n=5+5) SubVV/3-8 15.5ns ± 0% 13.0ns ± 0% -16.13% (p=0.008 n=5+5) SubVV/4-8 17.5ns ± 0% 13.5ns ± 0% -22.86% (p=0.008 n=5+5) SubVV/5-8 19.5ns ± 0% 14.5ns ± 0% -25.64% (p=0.008 n=5+5) SubVV/10-8 29.5ns ± 0% 18.0ns ± 0% -38.98% (p=0.008 n=5+5) SubVV/100-8 217ns ± 0% 94ns ± 0% -56.64% (p=0.008 n=5+5) SubVV/1000-8 2.02µs ± 0% 0.80µs ± 0% -60.50% (p=0.008 n=5+5) SubVV/10000-8 20.5µs ± 0% 11.3µs ± 0% -44.99% (p=0.008 n=5+5) SubVV/100000-8 221µs ±11% 223µs ±16% ~ (p=0.690 n=5+5) AddVW/1-8 9.32ns ± 0% 9.32ns ± 0% ~ (all equal) AddVW/2-8 19.7ns ± 1% 19.7ns ± 0% ~ (p=0.381 n=5+4) AddVW/3-8 11.5ns ± 0% 11.5ns ± 0% ~ (all equal) AddVW/4-8 13.0ns ± 0% 13.0ns ± 0% ~ (all equal) AddVW/5-8 14.5ns ± 0% 14.5ns ± 0% ~ (all equal) AddVW/10-8 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) AddVW/100-8 167ns ± 0% 167ns ± 0% ~ (all equal) AddVW/1000-8 1.52µs ± 0% 1.52µs ± 0% +0.40% (p=0.008 n=5+5) AddVW/10000-8 15.1µs ± 0% 15.1µs ± 0% ~ (p=0.556 n=5+4) AddVW/100000-8 152µs ± 1% 152µs ± 1% ~ (p=0.690 n=5+5) AddMulVVW/1-8 33.3ns ± 0% 32.7ns ± 1% -1.86% (p=0.008 n=5+5) AddMulVVW/2-8 59.3ns ± 1% 56.9ns ± 1% -4.15% (p=0.008 n=5+5) AddMulVVW/3-8 80.5ns ± 1% 85.4ns ± 3% +6.19% (p=0.008 n=5+5) AddMulVVW/4-8 127ns ± 0% 111ns ± 1% -13.19% (p=0.008 n=5+5) AddMulVVW/5-8 144ns ± 0% 149ns ± 0% +3.47% (p=0.016 n=4+5) AddMulVVW/10-8 298ns ± 1% 283ns ± 0% -4.77% (p=0.008 n=5+5) AddMulVVW/100-8 3.06µs ± 0% 2.99µs ± 0% -2.21% (p=0.008 n=5+5) AddMulVVW/1000-8 31.3µs ± 0% 26.9µs ± 0% -14.17% (p=0.008 n=5+5) AddMulVVW/10000-8 316µs ± 0% 305µs ± 0% -3.51% (p=0.008 n=5+5) AddMulVVW/100000-8 3.17ms ± 0% 3.17ms ± 1% ~ (p=0.690 n=5+5) DecimalConversion-8 316µs ± 1% 313µs ± 2% ~ (p=0.095 n=5+5) FloatString/100-8 2.53µs ± 1% 2.56µs ± 2% ~ (p=0.222 n=5+5) FloatString/1000-8 58.4µs ± 0% 58.5µs ± 0% ~ (p=0.206 n=5+5) FloatString/10000-8 4.59ms ± 0% 4.58ms ± 0% -0.31% (p=0.008 n=5+5) FloatString/100000-8 446ms ± 0% 444ms ± 0% -0.31% (p=0.008 n=5+5) FloatAdd/10-8 184ns ± 0% 172ns ± 0% -6.30% (p=0.008 n=5+5) FloatAdd/100-8 189ns ± 2% 191ns ± 4% ~ (p=0.381 n=5+5) FloatAdd/1000-8 371ns ± 0% 347ns ± 1% -6.42% (p=0.008 n=5+5) FloatAdd/10000-8 1.87µs ± 0% 1.68µs ± 0% -10.16% (p=0.008 n=5+5) FloatAdd/100000-8 17.1µs ± 0% 15.6µs ± 0% -8.74% (p=0.016 n=5+4) FloatSub/10-8 152ns ± 0% 138ns ± 0% -9.47% (p=0.000 n=4+5) FloatSub/100-8 148ns ± 0% 142ns ± 0% -4.05% (p=0.000 n=5+4) FloatSub/1000-8 245ns ± 1% 217ns ± 0% -11.28% (p=0.000 n=5+4) FloatSub/10000-8 1.07µs ± 0% 0.88µs ± 1% -18.14% (p=0.008 n=5+5) FloatSub/100000-8 9.58µs ± 0% 7.96µs ± 0% -16.84% (p=0.008 n=5+5) ParseFloatSmallExp-8 28.8µs ± 1% 29.0µs ± 1% ~ (p=0.095 n=5+5) ParseFloatLargeExp-8 126µs ± 1% 126µs ± 1% ~ (p=0.841 n=5+5) GCD10x10/WithoutXY-8 277ns ± 2% 281ns ± 4% ~ (p=0.746 n=5+5) GCD10x10/WithXY-8 2.10µs ± 1% 2.12µs ± 3% ~ (p=0.548 n=5+5) GCD10x100/WithoutXY-8 615ns ± 3% 607ns ± 2% ~ (p=0.135 n=5+5) GCD10x100/WithXY-8 3.50µs ± 2% 3.62µs ± 5% ~ (p=0.151 n=5+5) GCD10x1000/WithoutXY-8 1.39µs ± 2% 1.39µs ± 3% ~ (p=0.690 n=5+5) GCD10x1000/WithXY-8 7.39µs ± 1% 7.34µs ± 2% ~ (p=0.135 n=5+5) GCD10x10000/WithoutXY-8 8.66µs ± 1% 8.68µs ± 1% ~ (p=0.421 n=5+5) GCD10x10000/WithXY-8 28.1µs ± 2% 27.0µs ± 2% -3.81% (p=0.008 n=5+5) GCD10x100000/WithoutXY-8 79.3µs ± 1% 79.3µs ± 1% ~ (p=0.841 n=5+5) GCD10x100000/WithXY-8 238µs ± 0% 227µs ± 1% -4.74% (p=0.008 n=5+5) GCD100x100/WithoutXY-8 1.89µs ± 1% 1.88µs ± 2% ~ (p=0.968 n=5+5) GCD100x100/WithXY-8 26.7µs ± 1% 27.0µs ± 1% +1.44% (p=0.032 n=5+5) GCD100x1000/WithoutXY-8 4.48µs ± 1% 4.45µs ± 2% ~ (p=0.341 n=5+5) GCD100x1000/WithXY-8 36.3µs ± 1% 35.1µs ± 1% -3.27% (p=0.008 n=5+5) GCD100x10000/WithoutXY-8 22.8µs ± 0% 22.7µs ± 1% ~ (p=0.056 n=5+5) GCD100x10000/WithXY-8 145µs ± 1% 133µs ± 1% -8.33% (p=0.008 n=5+5) GCD100x100000/WithoutXY-8 198µs ± 0% 195µs ± 0% -1.56% (p=0.008 n=5+5) GCD100x100000/WithXY-8 1.11ms ± 0% 1.00ms ± 0% -10.04% (p=0.008 n=5+5) GCD1000x1000/WithoutXY-8 25.2µs ± 1% 24.8µs ± 1% -1.63% (p=0.016 n=5+5) GCD1000x1000/WithXY-8 513µs ± 0% 517µs ± 2% ~ (p=0.421 n=5+5) GCD1000x10000/WithoutXY-8 57.0µs ± 0% 52.7µs ± 1% -7.56% (p=0.008 n=5+5) GCD1000x10000/WithXY-8 1.20ms ± 0% 1.10ms ± 0% -8.70% (p=0.008 n=5+5) GCD1000x100000/WithoutXY-8 358µs ± 0% 318µs ± 1% -11.03% (p=0.008 n=5+5) GCD1000x100000/WithXY-8 8.71ms ± 0% 7.65ms ± 0% -12.19% (p=0.008 n=5+5) GCD10000x10000/WithoutXY-8 690µs ± 0% 630µs ± 0% -8.71% (p=0.008 n=5+5) GCD10000x10000/WithXY-8 16.0ms ± 1% 14.9ms ± 0% -6.85% (p=0.008 n=5+5) GCD10000x100000/WithoutXY-8 2.09ms ± 0% 1.75ms ± 0% -16.09% (p=0.016 n=5+4) GCD10000x100000/WithXY-8 86.8ms ± 0% 76.3ms ± 0% -12.09% (p=0.008 n=5+5) GCD100000x100000/WithoutXY-8 51.1ms ± 0% 46.0ms ± 0% -9.97% (p=0.008 n=5+5) GCD100000x100000/WithXY-8 1.25s ± 0% 1.15s ± 0% -7.92% (p=0.008 n=5+5) Hilbert-8 2.45ms ± 1% 2.49ms ± 1% +1.99% (p=0.008 n=5+5) Binomial-8 4.98µs ± 3% 4.90µs ± 2% ~ (p=0.421 n=5+5) QuoRem-8 7.10µs ± 0% 6.21µs ± 0% -12.55% (p=0.016 n=5+4) Exp-8 161ms ± 0% 161ms ± 0% ~ (p=0.421 n=5+5) Exp2-8 161ms ± 0% 161ms ± 0% ~ (p=0.151 n=5+5) Bitset-8 40.4ns ± 0% 40.3ns ± 0% ~ (p=0.190 n=5+5) BitsetNeg-8 163ns ± 3% 137ns ± 2% -15.91% (p=0.008 n=5+5) BitsetOrig-8 377ns ± 1% 372ns ± 1% -1.22% (p=0.024 n=5+5) BitsetNegOrig-8 631ns ± 1% 605ns ± 1% -4.09% (p=0.008 n=5+5) ModSqrt225_Tonelli-8 7.26ms ± 0% 7.26ms ± 0% ~ (p=0.548 n=5+5) ModSqrt224_3Mod4-8 2.24ms ± 0% 2.24ms ± 0% ~ (p=1.000 n=5+5) ModSqrt5430_Tonelli-8 62.4s ± 0% 62.4s ± 0% ~ (p=0.841 n=5+5) ModSqrt5430_3Mod4-8 20.8s ± 0% 20.7s ± 0% ~ (p=0.056 n=5+5) Sqrt-8 101µs ± 0% 89µs ± 0% -12.17% (p=0.008 n=5+5) IntSqr/1-8 32.5ns ± 1% 32.7ns ± 1% ~ (p=0.056 n=5+5) IntSqr/2-8 160ns ± 5% 158ns ± 0% ~ (p=0.397 n=5+4) IntSqr/3-8 298ns ± 4% 296ns ± 4% ~ (p=0.667 n=5+5) IntSqr/5-8 737ns ± 5% 761ns ± 3% +3.34% (p=0.016 n=5+5) IntSqr/8-8 1.87µs ± 4% 1.90µs ± 3% ~ (p=0.222 n=5+5) IntSqr/10-8 2.96µs ± 4% 2.92µs ± 6% ~ (p=0.310 n=5+5) IntSqr/20-8 6.28µs ± 3% 6.21µs ± 2% ~ (p=0.310 n=5+5) IntSqr/30-8 14.0µs ± 2% 13.9µs ± 2% ~ (p=0.548 n=5+5) IntSqr/50-8 37.7µs ± 3% 38.3µs ± 2% ~ (p=0.095 n=5+5) IntSqr/80-8 95.9µs ± 2% 95.1µs ± 1% ~ (p=0.310 n=5+5) IntSqr/100-8 148µs ± 1% 148µs ± 1% ~ (p=0.841 n=5+5) IntSqr/200-8 586µs ± 1% 587µs ± 1% ~ (p=1.000 n=5+5) IntSqr/300-8 1.32ms ± 0% 1.31ms ± 1% -0.73% (p=0.032 n=5+5) IntSqr/500-8 2.48ms ± 0% 2.45ms ± 0% -1.15% (p=0.008 n=5+5) IntSqr/800-8 4.68ms ± 0% 4.62ms ± 0% -1.23% (p=0.008 n=5+5) IntSqr/1000-8 7.57ms ± 0% 7.50ms ± 0% -0.84% (p=0.008 n=5+5) Mul-8 311ms ± 0% 308ms ± 0% -0.81% (p=0.008 n=5+5) Exp3Power/0x10-8 574ns ± 1% 578ns ± 2% ~ (p=0.500 n=5+5) Exp3Power/0x40-8 640ns ± 1% 646ns ± 0% ~ (p=0.056 n=5+5) Exp3Power/0x100-8 1.42µs ± 1% 1.42µs ± 1% ~ (p=0.246 n=5+5) Exp3Power/0x400-8 8.30µs ± 1% 8.29µs ± 1% ~ (p=0.802 n=5+5) Exp3Power/0x1000-8 60.0µs ± 0% 59.9µs ± 0% -0.24% (p=0.016 n=5+5) Exp3Power/0x4000-8 817µs ± 0% 816µs ± 0% -0.17% (p=0.008 n=5+5) Exp3Power/0x10000-8 7.80ms ± 1% 7.70ms ± 0% -1.23% (p=0.008 n=5+5) Exp3Power/0x40000-8 73.4ms ± 0% 72.5ms ± 0% -1.28% (p=0.008 n=5+5) Exp3Power/0x100000-8 665ms ± 0% 656ms ± 0% -1.34% (p=0.008 n=5+5) Exp3Power/0x400000-8 5.99s ± 0% 5.90s ± 0% -1.40% (p=0.008 n=5+5) Fibo-8 116ms ± 0% 50ms ± 0% -57.09% (p=0.008 n=5+5) NatSqr/1-8 112ns ± 4% 112ns ± 2% ~ (p=0.968 n=5+5) NatSqr/2-8 251ns ± 2% 250ns ± 1% ~ (p=0.571 n=5+5) NatSqr/3-8 378ns ± 2% 379ns ± 2% ~ (p=0.794 n=5+5) NatSqr/5-8 829ns ± 3% 827ns ± 2% ~ (p=1.000 n=5+5) NatSqr/8-8 1.97µs ± 2% 1.95µs ± 2% ~ (p=0.310 n=5+5) NatSqr/10-8 3.02µs ± 2% 2.99µs ± 2% ~ (p=0.421 n=5+5) NatSqr/20-8 6.51µs ± 2% 6.49µs ± 1% ~ (p=0.841 n=5+5) NatSqr/30-8 14.1µs ± 2% 14.0µs ± 2% ~ (p=0.841 n=5+5) NatSqr/50-8 38.1µs ± 2% 38.3µs ± 3% ~ (p=0.690 n=5+5) NatSqr/80-8 95.5µs ± 2% 96.0µs ± 1% ~ (p=0.421 n=5+5) NatSqr/100-8 150µs ± 1% 148µs ± 2% ~ (p=0.095 n=5+5) NatSqr/200-8 588µs ± 1% 590µs ± 1% ~ (p=0.421 n=5+5) NatSqr/300-8 1.32ms ± 1% 1.31ms ± 1% ~ (p=0.841 n=5+5) NatSqr/500-8 2.50ms ± 0% 2.47ms ± 0% -1.03% (p=0.008 n=5+5) NatSqr/800-8 4.70ms ± 0% 4.64ms ± 0% -1.31% (p=0.008 n=5+5) NatSqr/1000-8 7.60ms ± 0% 7.52ms ± 0% -1.01% (p=0.008 n=5+5) ScanPi-8 326µs ± 0% 326µs ± 0% ~ (p=0.841 n=5+5) StringPiParallel-8 70.3µs ± 5% 63.8µs ±10% ~ (p=0.056 n=5+5) Scan/10/Base2-8 1.09µs ± 0% 1.09µs ± 0% ~ (p=0.317 n=5+5) Scan/100/Base2-8 7.79µs ± 0% 7.78µs ± 0% ~ (p=0.063 n=5+5) Scan/1000/Base2-8 79.0µs ± 0% 78.9µs ± 0% -0.18% (p=0.008 n=5+5) Scan/10000/Base2-8 1.22ms ± 0% 1.22ms ± 0% -0.15% (p=0.008 n=5+5) Scan/100000/Base2-8 55.1ms ± 0% 55.2ms ± 0% +0.20% (p=0.008 n=5+5) Scan/10/Base8-8 512ns ± 0% 512ns ± 1% ~ (p=0.810 n=5+5) Scan/100/Base8-8 2.89µs ± 0% 2.89µs ± 0% ~ (p=0.810 n=5+5) Scan/1000/Base8-8 31.0µs ± 0% 31.0µs ± 0% ~ (p=0.151 n=5+5) Scan/10000/Base8-8 740µs ± 0% 741µs ± 0% +0.10% (p=0.008 n=5+5) Scan/100000/Base8-8 50.6ms ± 0% 50.6ms ± 0% +0.08% (p=0.008 n=5+5) Scan/10/Base10-8 487ns ± 0% 487ns ± 0% ~ (p=0.571 n=5+5) Scan/100/Base10-8 2.67µs ± 0% 2.67µs ± 0% ~ (p=0.810 n=5+5) Scan/1000/Base10-8 28.7µs ± 0% 28.7µs ± 0% +0.06% (p=0.008 n=5+5) Scan/10000/Base10-8 716µs ± 0% 717µs ± 0% ~ (p=0.222 n=5+5) Scan/100000/Base10-8 50.3ms ± 0% 50.3ms ± 0% +0.10% (p=0.008 n=5+5) Scan/10/Base16-8 438ns ± 0% 437ns ± 1% ~ (p=0.786 n=5+5) Scan/100/Base16-8 2.47µs ± 0% 2.47µs ± 0% -0.19% (p=0.048 n=5+5) Scan/1000/Base16-8 27.2µs ± 0% 27.3µs ± 0% ~ (p=0.087 n=5+5) Scan/10000/Base16-8 722µs ± 0% 722µs ± 0% +0.11% (p=0.008 n=5+5) Scan/100000/Base16-8 52.6ms ± 0% 52.7ms ± 0% +0.15% (p=0.008 n=5+5) String/10/Base2-8 247ns ± 2% 248ns ± 1% ~ (p=0.437 n=5+5) String/100/Base2-8 1.51µs ± 0% 1.51µs ± 0% -0.37% (p=0.024 n=5+5) String/1000/Base2-8 13.6µs ± 1% 13.5µs ± 0% ~ (p=0.095 n=5+5) String/10000/Base2-8 135µs ± 0% 135µs ± 1% ~ (p=0.841 n=5+5) String/100000/Base2-8 1.32ms ± 1% 1.32ms ± 1% ~ (p=0.690 n=5+5) String/10/Base8-8 169ns ± 1% 169ns ± 1% ~ (p=1.000 n=5+5) String/100/Base8-8 636ns ± 0% 634ns ± 1% ~ (p=0.413 n=5+5) String/1000/Base8-8 5.33µs ± 1% 5.32µs ± 0% ~ (p=0.222 n=5+5) String/10000/Base8-8 50.9µs ± 1% 50.7µs ± 0% ~ (p=0.151 n=5+5) String/100000/Base8-8 500µs ± 1% 497µs ± 0% ~ (p=0.421 n=5+5) String/10/Base10-8 516ns ± 1% 513ns ± 0% -0.62% (p=0.016 n=5+4) String/100/Base10-8 1.97µs ± 0% 1.96µs ± 0% ~ (p=0.667 n=4+5) String/1000/Base10-8 12.5µs ± 0% 11.5µs ± 0% -7.92% (p=0.008 n=5+5) String/10000/Base10-8 57.7µs ± 0% 52.5µs ± 0% -8.93% (p=0.008 n=5+5) String/100000/Base10-8 25.6ms ± 0% 21.6ms ± 0% -15.94% (p=0.008 n=5+5) String/10/Base16-8 150ns ± 1% 149ns ± 0% ~ (p=0.413 n=5+4) String/100/Base16-8 514ns ± 1% 514ns ± 1% ~ (p=0.849 n=5+5) String/1000/Base16-8 4.01µs ± 0% 4.01µs ± 0% ~ (p=0.421 n=5+5) String/10000/Base16-8 37.8µs ± 1% 37.8µs ± 1% ~ (p=0.841 n=5+5) String/100000/Base16-8 373µs ± 2% 373µs ± 0% ~ (p=0.421 n=5+5) LeafSize/0-8 6.63ms ± 0% 6.63ms ± 0% ~ (p=0.730 n=4+5) LeafSize/1-8 74.0µs ± 0% 67.7µs ± 1% -8.53% (p=0.008 n=5+5) LeafSize/2-8 74.2µs ± 0% 68.3µs ± 1% -7.99% (p=0.008 n=5+5) LeafSize/3-8 379µs ± 0% 309µs ± 0% -18.52% (p=0.008 n=5+5) LeafSize/4-8 72.7µs ± 1% 66.7µs ± 0% -8.37% (p=0.008 n=5+5) LeafSize/5-8 471µs ± 0% 384µs ± 0% -18.55% (p=0.008 n=5+5) LeafSize/6-8 378µs ± 0% 308µs ± 0% -18.59% (p=0.008 n=5+5) LeafSize/7-8 245µs ± 0% 204µs ± 1% -16.75% (p=0.008 n=5+5) LeafSize/8-8 73.4µs ± 0% 66.9µs ± 1% -8.79% (p=0.008 n=5+5) LeafSize/9-8 538µs ± 0% 437µs ± 0% -18.75% (p=0.008 n=5+5) LeafSize/10-8 472µs ± 0% 396µs ± 1% -16.01% (p=0.008 n=5+5) LeafSize/11-8 460µs ± 0% 374µs ± 0% -18.58% (p=0.008 n=5+5) LeafSize/12-8 378µs ± 0% 308µs ± 0% -18.38% (p=0.008 n=5+5) LeafSize/13-8 343µs ± 0% 284µs ± 0% -17.30% (p=0.008 n=5+5) LeafSize/14-8 248µs ± 0% 206µs ± 0% -16.94% (p=0.008 n=5+5) LeafSize/15-8 169µs ± 0% 144µs ± 0% -14.69% (p=0.008 n=5+5) LeafSize/16-8 72.9µs ± 0% 66.8µs ± 1% -8.27% (p=0.008 n=5+5) LeafSize/32-8 82.5µs ± 0% 76.7µs ± 0% -7.04% (p=0.008 n=5+5) LeafSize/64-8 134µs ± 0% 129µs ± 0% -3.80% (p=0.008 n=5+5) ProbablyPrime/n=0-8 44.2ms ± 0% 43.4ms ± 0% -1.95% (p=0.008 n=5+5) ProbablyPrime/n=1-8 64.9ms ± 0% 64.0ms ± 0% -1.27% (p=0.008 n=5+5) ProbablyPrime/n=5-8 147ms ± 0% 146ms ± 0% -0.58% (p=0.008 n=5+5) ProbablyPrime/n=10-8 250ms ± 0% 249ms ± 0% -0.35% (p=0.008 n=5+5) ProbablyPrime/n=20-8 456ms ± 0% 455ms ± 0% -0.18% (p=0.008 n=5+5) ProbablyPrime/Lucas-8 23.6ms ± 0% 22.7ms ± 0% -3.74% (p=0.008 n=5+5) ProbablyPrime/MillerRabinBase2-8 20.7ms ± 0% 20.6ms ± 0% ~ (p=0.421 n=5+5) FloatSqrt/64-8 2.25µs ± 1% 2.29µs ± 0% +1.48% (p=0.008 n=5+5) FloatSqrt/128-8 4.86µs ± 1% 4.92µs ± 1% +1.21% (p=0.032 n=5+5) FloatSqrt/256-8 13.6µs ± 0% 13.7µs ± 1% +1.31% (p=0.032 n=5+5) FloatSqrt/1000-8 70.0µs ± 1% 70.1µs ± 0% ~ (p=0.690 n=5+5) FloatSqrt/10000-8 1.92ms ± 0% 1.90ms ± 0% -0.59% (p=0.008 n=5+5) FloatSqrt/100000-8 55.3ms ± 0% 54.8ms ± 0% -1.01% (p=0.008 n=5+5) FloatSqrt/1000000-8 4.56s ± 0% 4.50s ± 0% -1.28% (p=0.008 n=5+5) name old speed new speed delta AddVV/1-8 2.97GB/s ± 0% 5.56GB/s ± 0% +86.85% (p=0.008 n=5+5) AddVV/2-8 9.47GB/s ± 0% 10.66GB/s ± 0% +12.50% (p=0.008 n=5+5) AddVV/3-8 12.4GB/s ± 0% 14.7GB/s ± 0% +19.10% (p=0.008 n=5+5) AddVV/4-8 14.6GB/s ± 0% 18.9GB/s ± 0% +29.63% (p=0.016 n=4+5) AddVV/5-8 16.4GB/s ± 0% 22.0GB/s ± 0% +34.47% (p=0.016 n=5+4) AddVV/10-8 21.7GB/s ± 0% 35.5GB/s ± 0% +63.89% (p=0.008 n=5+5) AddVV/100-8 29.4GB/s ± 0% 68.0GB/s ± 0% +131.38% (p=0.008 n=5+5) AddVV/1000-8 31.7GB/s ± 0% 61.9GB/s ± 0% +95.43% (p=0.008 n=5+5) AddVV/10000-8 31.2GB/s ± 0% 56.4GB/s ± 0% +80.83% (p=0.008 n=5+5) AddVV/100000-8 25.9GB/s ± 3% 41.4GB/s ± 0% +59.98% (p=0.008 n=5+5) SubVV/1-8 2.97GB/s ± 0% 5.56GB/s ± 0% +86.97% (p=0.016 n=4+5) SubVV/2-8 9.47GB/s ± 0% 10.66GB/s ± 0% +12.51% (p=0.008 n=5+5) SubVV/3-8 12.4GB/s ± 0% 14.8GB/s ± 0% +19.23% (p=0.016 n=4+5) SubVV/4-8 14.6GB/s ± 0% 18.9GB/s ± 0% +29.56% (p=0.008 n=5+5) SubVV/5-8 16.4GB/s ± 0% 22.0GB/s ± 0% +34.47% (p=0.016 n=4+5) SubVV/10-8 21.7GB/s ± 0% 35.5GB/s ± 0% +63.89% (p=0.008 n=5+5) SubVV/100-8 29.4GB/s ± 0% 68.0GB/s ± 0% +131.38% (p=0.008 n=5+5) SubVV/1000-8 31.6GB/s ± 0% 80.1GB/s ± 0% +153.08% (p=0.008 n=5+5) SubVV/10000-8 31.2GB/s ± 0% 56.7GB/s ± 0% +81.79% (p=0.008 n=5+5) SubVV/100000-8 29.1GB/s ±10% 29.0GB/s ±18% ~ (p=0.690 n=5+5) AddVW/1-8 859MB/s ± 0% 859MB/s ± 0% -0.01% (p=0.008 n=5+5) AddVW/2-8 811MB/s ± 1% 814MB/s ± 0% ~ (p=0.413 n=5+4) AddVW/3-8 2.08GB/s ± 0% 2.08GB/s ± 0% ~ (p=0.206 n=5+5) AddVW/4-8 2.46GB/s ± 0% 2.46GB/s ± 0% ~ (p=0.056 n=5+5) AddVW/5-8 2.75GB/s ± 0% 2.75GB/s ± 0% ~ (p=0.508 n=5+5) AddVW/10-8 3.63GB/s ± 0% 3.63GB/s ± 0% ~ (p=0.214 n=5+5) AddVW/100-8 4.79GB/s ± 0% 4.79GB/s ± 0% ~ (p=0.500 n=5+5) AddVW/1000-8 5.27GB/s ± 0% 5.25GB/s ± 0% -0.43% (p=0.008 n=5+5) AddVW/10000-8 5.30GB/s ± 0% 5.30GB/s ± 0% ~ (p=0.397 n=5+5) AddVW/100000-8 5.27GB/s ± 1% 5.25GB/s ± 1% ~ (p=0.690 n=5+5) AddMulVVW/1-8 1.92GB/s ± 0% 1.96GB/s ± 1% +1.95% (p=0.008 n=5+5) AddMulVVW/2-8 2.16GB/s ± 1% 2.25GB/s ± 1% +4.32% (p=0.008 n=5+5) AddMulVVW/3-8 2.39GB/s ± 1% 2.25GB/s ± 3% -5.79% (p=0.008 n=5+5) AddMulVVW/4-8 2.00GB/s ± 0% 2.31GB/s ± 1% +15.31% (p=0.008 n=5+5) AddMulVVW/5-8 2.22GB/s ± 0% 2.14GB/s ± 0% -3.86% (p=0.008 n=5+5) AddMulVVW/10-8 2.15GB/s ± 1% 2.25GB/s ± 0% +5.03% (p=0.008 n=5+5) AddMulVVW/100-8 2.09GB/s ± 0% 2.14GB/s ± 0% +2.25% (p=0.008 n=5+5) AddMulVVW/1000-8 2.04GB/s ± 0% 2.38GB/s ± 0% +16.52% (p=0.008 n=5+5) AddMulVVW/10000-8 2.03GB/s ± 0% 2.10GB/s ± 0% +3.64% (p=0.008 n=5+5) AddMulVVW/100000-8 2.02GB/s ± 0% 2.02GB/s ± 1% ~ (p=0.690 n=5+5) Change-Id: Ie482d67a7dbb5af6f5d81af2b3d9d14bd66336db Reviewed-on: https://go-review.googlesource.com/77831Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 05 Mar, 2018 10 commits
-
-
Yury Smolsky authored
It is not clear from documentation what the Process.Kill does. And it leads to reccuring confusion about Cmd.Start/Wait methods. Fixes #24220 Change-Id: I66609d21d2954e195d13648014681530eed8ea6c Reviewed-on: https://go-review.googlesource.com/98715Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Mostyn Bramley-Moore authored
We should avoid writing temp files to GOROOT, since it might be readonly. Fixes #23881 Change-Id: Iaa38ec404b303f0cf27fdfb7daf1ddd60fd5d1c9 GitHub-Last-Rev: de0211df8474cc3bbef40f792e2f85b3b6ee259c GitHub-Pull-Request: golang/go#24238 Reviewed-on: https://go-review.googlesource.com/98517 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Matthew Dempsky authored
No functional changes, just changing all the orderfoo functions into (*Order).foo methods. Passes toolstash-check. Change-Id: Ib9833daa98aff3c645ce56794a414f8472689152 Reviewed-on: https://go-review.googlesource.com/98617 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Andrey Mirtchovski authored
This change modifies Go to disable loading of users' shell history for TestTerminalSignal tests. TestTerminalSignal, as part of its workload, will execute a new interactive bash shell. Bash will attempt to load the user's history from the file pointed to by the HISTFILE environment variable. For users with large histories that may take up to several seconds, pushing the whole test past the 5 second timeout and causing it to fail. Change-Id: I11b2f83ee91f51fa1e9774a39181ab365f9a6b3a GitHub-Last-Rev: 7efdf616a2fcecdf479420fc0004057cee2ea6b2 GitHub-Pull-Request: golang/go#24255 Reviewed-on: https://go-review.googlesource.com/98616Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Hana Kim authored
This is an updated version of golang.org/cl/96395, with the fix to TestUserSpan. This reverts commit 7b6f6267e90a8e4eab37a3f2164ba882e6222adb. Change-Id: I31eec8ba0997f9178dffef8dac608e731ab70872 Reviewed-on: https://go-review.googlesource.com/98236 Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Alberto Donizetti authored
Change-Id: Ic21d25db5d56ce77516c53082dfbc010e5875b81 Reviewed-on: https://go-review.googlesource.com/98655 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Ian Lance Taylor authored
This was originally C code using names with underscores, which were retained when the code was rewritten into Go. Change the code to use Go-like camel case names. The names that come from the ELF ABI are left unchanged. Change-Id: I181bc5dd81284c07bc67b7df4635f4734b41d646 Reviewed-on: https://go-review.googlesource.com/98520 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Tobias Klauser authored
Also fix the indentation of the SYS_* definitions in sys_linux_mipsx.s and order them numerically. Change-Id: I0c454301c329a163e7db09dcb25d4e825149858c Reviewed-on: https://go-review.googlesource.com/98448 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Alberto Donizetti authored
This change move bits.Len* intrinsification tests to the new codegen test harness, removing them from the old ssa_test file. Five different test functions (one for each bit.Len function tested) was used, to avoid possible unwanted interactions between multiple calls inside one function. Change-Id: Iffd5be55b58e88597fa30a562a28dacb01236d8b Reviewed-on: https://go-review.googlesource.com/98156 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>
-
Keith Randall authored
Missed removing the argument loading from the indexbody function. Change-Id: Ia1391231fc99771d00410a09fe80a09f08ceed02 Reviewed-on: https://go-review.googlesource.com/98575 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Elias Naur <elias.naur@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 04 Mar, 2018 7 commits
-
-
Giovanni Bajo authored
Fold offsets for: {ADD,SUB,MUL}[SD]mem ADD[LQ]constmem {ADD,SUB,AND,OR,XOR}[LQ]mem Cumulatively, the rules trigger ~900 times in all.bash. Fixes #23325 Change-Id: If6c701f68fa0b57907a353a07a516b914127d0d8 Reviewed-on: https://go-review.googlesource.com/98035 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Keith Randall authored
Also move the arm64 CountByte implementation while we're here. Fixes #19792 Change-Id: I1e0fdf1e03e3135af84150a2703b58dad1b0d57e Reviewed-on: https://go-review.googlesource.com/98518 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Keith Randall authored
Move bytes.Compare and runtime·cmpstring to bytealg. Update #19792 Change-Id: I139e6d7c59686bef7a3017e3dec99eba5fd10447 Reviewed-on: https://go-review.googlesource.com/98515 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Keith Randall authored
Move bytes.Count and strings.Count to bytealg. Update #19792 Change-Id: I3e4e14b504a0b71758885bb131e5656e342cf8cb Reviewed-on: https://go-review.googlesource.com/98495 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Giovanni Bajo authored
Change-Id: If542f0b5c5754e6eb2f9b302fe5a148ba9a57338 Reviewed-on: https://go-review.googlesource.com/98443 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Giovanni Bajo authored
This CL moves the load/store combining tests into asmcheck. In addition at being more compact, it's also now easier to spot what it is missing in each architecture. While doing so, I think I uncovered a bug in ppc64le and arm64 rules, because they fail to load/store combine in non-trivial functions. Not sure why, I'll open an issue. Change-Id: Ia1572d53c0553d9104f3e52b95e4d1768a8440a3 Reviewed-on: https://go-review.googlesource.com/98441 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Giovanni Bajo authored
Before this change, in case of any failure, asmcheck was dumping to stderr the whole output of compile -S, which can be very long if it contains multiple functions. Make it so it filters the output to only display the assembly output of functions for which at least one opcode check failed. This greatly simplifies debugging. Change-Id: I1bbf54473b8252a3384e2c1dade82d926afc119d Reviewed-on: https://go-review.googlesource.com/98444 Run-TryBot: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>
-
- 03 Mar, 2018 7 commits
-
-
Giovanni Bajo authored
Change-Id: I69c1688506d1aeca655047acf35d1bff966fc01e Reviewed-on: https://go-review.googlesource.com/98442 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Giovanni Bajo authored
Currently, the top-level testsuite always uses whatever version of Go is found in the PATH to execute all the tests. This forces the developers to tweak the PATH to run the testsuite. Change it to use the same version of Go used to run run.go. This allows developers to run the testsuite using the tip compiler by simply saying "../bin/go run run.go". I think this is a better solution compared to always forcing "../bin/go", because it allows developers to run the testsuite using different Go versions, for instance to check if a new test is fixed in tip compared to the installed compiler. Fixes #24217 Change-Id: I41b299c753b6e77c41e28be9091b2b630efea9d2 Reviewed-on: https://go-review.googlesource.com/98439 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Pascal S. de Kloe authored
name old time/op new time/op delta CodeEncoder-12 1.89ms ± 1% 1.91ms ± 0% +1.16% (p=0.000 n=20+19) CodeMarshal-12 2.09ms ± 1% 2.12ms ± 0% +1.63% (p=0.000 n=17+18) CodeDecoder-12 8.43ms ± 1% 8.32ms ± 1% -1.35% (p=0.000 n=18+20) UnicodeDecoder-12 399ns ± 0% 339ns ± 0% -15.00% (p=0.000 n=20+19) DecoderStream-12 281ns ± 1% 231ns ± 0% -17.91% (p=0.000 n=20+16) CodeUnmarshal-12 9.35ms ± 2% 9.15ms ± 2% -2.11% (p=0.000 n=20+20) CodeUnmarshalReuse-12 8.41ms ± 2% 8.29ms ± 2% -1.34% (p=0.000 n=20+20) UnmarshalString-12 81.2ns ± 2% 74.0ns ± 4% -8.89% (p=0.000 n=20+20) UnmarshalFloat64-12 71.1ns ± 2% 64.3ns ± 1% -9.60% (p=0.000 n=20+19) UnmarshalInt64-12 60.6ns ± 2% 53.2ns ± 0% -12.28% (p=0.000 n=18+18) Issue10335-12 96.9ns ± 0% 87.7ns ± 1% -9.52% (p=0.000 n=17+20) Unmapped-12 247ns ± 4% 231ns ± 3% -6.34% (p=0.000 n=20+20) TypeFieldsCache/MissTypes1-12 11.1µs ± 0% 11.1µs ± 0% ~ (p=0.376 n=19+20) TypeFieldsCache/MissTypes10-12 33.9µs ± 0% 33.8µs ± 0% -0.32% (p=0.000 n=18+9) name old speed new speed delta CodeEncoder-12 1.03GB/s ± 1% 1.01GB/s ± 0% -1.15% (p=0.000 n=20+19) CodeMarshal-12 930MB/s ± 1% 915MB/s ± 0% -1.60% (p=0.000 n=17+18) CodeDecoder-12 230MB/s ± 1% 233MB/s ± 1% +1.37% (p=0.000 n=18+20) UnicodeDecoder-12 35.0MB/s ± 0% 41.2MB/s ± 0% +17.60% (p=0.000 n=20+19) CodeUnmarshal-12 208MB/s ± 2% 212MB/s ± 2% +2.16% (p=0.000 n=20+20) name old alloc/op new alloc/op delta Issue10335-12 184B ± 0% 184B ± 0% ~ (all equal) Unmapped-12 216B ± 0% 216B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Issue10335-12 3.00 ± 0% 3.00 ± 0% ~ (all equal) Unmapped-12 4.00 ± 0% 4.00 ± 0% ~ (all equal) Change-Id: I4b1a87a205da2ef9a572f86f85bc833653c61570 Reviewed-on: https://go-review.googlesource.com/98440Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Tobias Klauser authored
Use the __vdso_clock_gettime fast path via the vDSO on linux/arm to speed up nanotime and walltime. This results in the following performance improvement for time.Now on a RaspberryPi 3 (running 32bit Raspbian, i.e. GOOS=linux/GOARCH=arm): name old time/op new time/op delta TimeNow 0.99µs ± 0% 0.39µs ± 1% -60.74% (p=0.000 n=12+20) Change-Id: I3598278a6c88d7f6a6ce66c56b9d25f9dd2f4c9a Reviewed-on: https://go-review.googlesource.com/98095Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Tobias Klauser authored
It's unused since https://golang.org/cl/99320043 Change-Id: I74d69ff894aa2fb556f1c2083406c118c559d91b Reviewed-on: https://go-review.googlesource.com/98195 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
Keith Randall authored
Move bytes.Equal, runtime.memequal, and runtime.memequal_varlen to the bytealg package. Update #19792 Change-Id: Ic4175e952936016ea0bda6c7c3dbb33afdc8e4ac Reviewed-on: https://go-review.googlesource.com/98355 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Joe Tsai authored
The previous type cache is quadratic in time in the situation where new types are continually encountered. Now that it is possible to dynamically create new types with the reflect package, this can cause json to perform very poorly. Switch to sync.Map which does well when the cache has hit steady state, but also handles occasional updates in better than quadratic time. benchmark old ns/op new ns/op delta BenchmarkTypeFieldsCache/MissTypes1-8 14817 16202 +9.35% BenchmarkTypeFieldsCache/MissTypes10-8 70926 69144 -2.51% BenchmarkTypeFieldsCache/MissTypes100-8 976467 208973 -78.60% BenchmarkTypeFieldsCache/MissTypes1000-8 79520162 1750371 -97.80% BenchmarkTypeFieldsCache/MissTypes10000-8 6873625837 16847806 -99.75% BenchmarkTypeFieldsCache/HitTypes1000-8 7.51 8.80 +17.18% BenchmarkTypeFieldsCache/HitTypes10000-8 7.58 8.68 +14.51% The old implementation takes 12 minutes just to build a cache of size 1e5 due to the quadratic behavior. I did not bother benchmark sizes above that. Change-Id: I5e6facc1eb8e1b80e5ca285e4dd2cc8815618dad Reviewed-on: https://go-review.googlesource.com/76850 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 02 Mar, 2018 14 commits
-
-
Shamil Garatuev authored
Make ReadSubKeyNames work even if key is opened with only ENUMERATE_SUB_KEYs access rights mask. Fixes #23869 Change-Id: I138bd51715fdbc3bda05607c64bde1150f4fe6b2 Reviewed-on: https://go-review.googlesource.com/97435Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
-
Keith Randall authored
Move the IndexByte function from the runtime to a new bytealg package. The new package will eventually hold all the optimized assembly for groveling through byte slices and strings. It seems a better home for this code than randomly keeping it in runtime. Once this is in, the next step is to move the other functions (Compare, Equal, ...). Update #19792 This change seems complicated enough that we might just declare "not worth it" and abandon. Opinions welcome. The core assembly is all unchanged, except minor modifications where the code reads cpu feature bits. The wrapper functions have been cleaned up as they are now actually checked by vet. Change-Id: I9fa75bee5d85db3a65b3fd3b7997e60367523796 Reviewed-on: https://go-review.googlesource.com/98016 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Brad Fitzpatrick authored
Flaky tests failing trybots help nobody. Updates #22857 Change-Id: I87bc018651ab4fe02560a6d24c08a1d7ccd8ba37 Reviewed-on: https://go-review.googlesource.com/97416Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
Damien Mathieu authored
Since that method uses 'mux.m', we need to lock the mutex to avoid data races. Change-Id: I998448a6e482b5d6a1b24f3354bb824906e23172 GitHub-Last-Rev: 163a7d4942e793b328e05a7eb91f6d3fdc4ba12b GitHub-Pull-Request: golang/go#23994 Reviewed-on: https://go-review.googlesource.com/96575Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
-
David du Colombier authored
TestEmptyDwarfRanges has been added in CL 94816. This test is failing on Plan 9 because executables don't have a DWARF symbol table. Fixes #24226. Change-Id: Iff7e34b8c2703a2f19ee8087a4d64d0bb98496cd Reviewed-on: https://go-review.googlesource.com/98275Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Hana Kim authored
This reverts commit 16398894. This broke TestUserTaskSpan test. Change-Id: If5ff8bdfe84e8cb30787b03ead87205ece3d5601 Reviewed-on: https://go-review.googlesource.com/98235Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Hana Kim authored
Even though undocumented, the assumption is the Event's link field points to the following event in the future. The new span/task event processing breaks the assumption. Change-Id: I4ce2f30c67c4f525ec0a121a7e43d8bdd2ec3f77 Reviewed-on: https://go-review.googlesource.com/96395Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Alberto Donizetti authored
Change-Id: I9fe6572d1043ef9ee09c0925059ded554ad24c6b Reviewed-on: https://go-review.googlesource.com/98215Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Michael Fraenkel authored
When recursively calling walkexpr, r.Type is still the untyped value. It then sometimes recursively calls finishcompare, which complains that you can't compare the resulting expression to that untyped value. Updates #23834. Change-Id: I6b7acd3970ceaff8da9216bfa0ae24aca5dee828 Reviewed-on: https://go-review.googlesource.com/97856Reviewed-by: Matthew Dempsky <mdempsky@google.com>
-
Than McIntosh authored
Add DWARF register mappings for ARM64, so that that arch will become usable with "-dwarflocationlists". [NB: I've plugged in a set of numbers from the doc, but this will require additional manual testing.] Change-Id: Id9aa63857bc8b4f5c825f49274101cf372e9e856 Reviewed-on: https://go-review.googlesource.com/82515Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Alessandro Arzilli authored
Dsymutil, an utility used on macOS when externally linking executables, does not support base address selector entries in debug_ranges. CL 73271 worked around this problem by removing base address selectors and emitting CU-relative relocations for each list entry. This commit, as an optimization, reintroduces the base address selectors and changes the linker to remove them again, but only when it knows that it will have to invoke the external linker on macOS. Compilecmp comparing master with a branch that has scope tracking always enabled: completed 15 of 15, estimated time remaining 0s (eta 2:43PM) name old time/op new time/op delta Template 272ms ± 8% 257ms ± 5% -5.33% (p=0.000 n=15+14) Unicode 124ms ± 7% 122ms ± 5% ~ (p=0.210 n=14+14) GoTypes 873ms ± 3% 870ms ± 5% ~ (p=0.856 n=15+13) Compiler 4.49s ± 2% 4.49s ± 5% ~ (p=0.982 n=14+14) SSA 11.8s ± 4% 11.8s ± 3% ~ (p=0.653 n=15+15) Flate 163ms ± 6% 164ms ± 9% ~ (p=0.914 n=14+15) GoParser 203ms ± 6% 202ms ±10% ~ (p=0.571 n=14+14) Reflect 547ms ± 7% 542ms ± 4% ~ (p=0.914 n=15+14) Tar 244ms ± 7% 237ms ± 3% -2.80% (p=0.002 n=14+13) XML 289ms ± 6% 289ms ± 5% ~ (p=0.839 n=14+14) [Geo mean] 537ms 531ms -1.10% name old user-time/op new user-time/op delta Template 360ms ± 4% 341ms ± 7% -5.16% (p=0.000 n=14+14) Unicode 189ms ±11% 190ms ± 8% ~ (p=0.844 n=15+15) GoTypes 1.13s ± 4% 1.14s ± 7% ~ (p=0.582 n=15+14) Compiler 5.34s ± 2% 5.40s ± 4% +1.19% (p=0.036 n=11+13) SSA 14.7s ± 2% 14.7s ± 3% ~ (p=0.602 n=15+15) Flate 211ms ± 7% 214ms ± 8% ~ (p=0.252 n=14+14) GoParser 267ms ±12% 266ms ± 2% ~ (p=0.837 n=15+11) Reflect 706ms ± 4% 701ms ± 3% ~ (p=0.213 n=14+12) Tar 331ms ± 9% 320ms ± 5% -3.30% (p=0.025 n=15+14) XML 378ms ± 4% 373ms ± 6% ~ (p=0.253 n=14+15) [Geo mean] 704ms 700ms -0.58% name old alloc/op new alloc/op delta Template 38.0MB ± 0% 38.4MB ± 0% +1.12% (p=0.000 n=15+15) Unicode 28.8MB ± 0% 28.8MB ± 0% +0.17% (p=0.000 n=15+15) GoTypes 112MB ± 0% 114MB ± 0% +1.47% (p=0.000 n=15+15) Compiler 465MB ± 0% 473MB ± 0% +1.71% (p=0.000 n=15+15) SSA 1.48GB ± 0% 1.53GB ± 0% +3.07% (p=0.000 n=15+15) Flate 24.3MB ± 0% 24.7MB ± 0% +1.67% (p=0.000 n=15+15) GoParser 30.7MB ± 0% 31.0MB ± 0% +1.15% (p=0.000 n=12+15) Reflect 76.3MB ± 0% 77.1MB ± 0% +0.97% (p=0.000 n=15+15) Tar 39.2MB ± 0% 39.6MB ± 0% +0.91% (p=0.000 n=15+15) XML 41.5MB ± 0% 42.0MB ± 0% +1.29% (p=0.000 n=15+15) [Geo mean] 77.5MB 78.6MB +1.35% name old allocs/op new allocs/op delta Template 385k ± 0% 387k ± 0% +0.51% (p=0.000 n=15+15) Unicode 342k ± 0% 343k ± 0% +0.10% (p=0.000 n=14+15) GoTypes 1.19M ± 0% 1.19M ± 0% +0.62% (p=0.000 n=15+15) Compiler 4.51M ± 0% 4.54M ± 0% +0.50% (p=0.000 n=14+15) SSA 12.2M ± 0% 12.4M ± 0% +1.12% (p=0.000 n=14+15) Flate 234k ± 0% 236k ± 0% +0.60% (p=0.000 n=15+15) GoParser 318k ± 0% 320k ± 0% +0.60% (p=0.000 n=15+15) Reflect 974k ± 0% 977k ± 0% +0.27% (p=0.000 n=15+15) Tar 395k ± 0% 397k ± 0% +0.37% (p=0.000 n=14+15) XML 404k ± 0% 407k ± 0% +0.53% (p=0.000 n=15+15) [Geo mean] 794k 798k +0.52% name old text-bytes new text-bytes delta HelloSize 680kB ± 0% 680kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 9.62kB ± 0% 9.62kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.11MB ± 0% 1.13MB ± 0% +1.85% (p=0.000 n=15+15) Change-Id: I61c98ba0340cb798034b2bb55e3ab3a58ac1cf23 Reviewed-on: https://go-review.googlesource.com/98075Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Heschi Kreinick authored
When generating location lists, batch up changes for all zero-width instructions, not just phis. This prevents the creation of location list entries that don't actually cover any instructions. This isn't perfect because of the caveats in the prior CL (Copy is zero-width sometimes) but in practice this seems to fix all of the empty lists in std. Change-Id: Ice4a9ade36b6b24ca111d1494c414eec96e5af25 Reviewed-on: https://go-review.googlesource.com/97958 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
Heschi Kreinick authored
Add a bool to opInfo to indicate if an Op never results in any instructions. This is a conservative approximation: some operations, like Copy, may or may not generate code depending on their arguments. I built the list by reading each arch's ssaGenValue function. Hopefully I got them all. Change-Id: I130b251b65f18208294e129bb7ddc3f91d57d31d Reviewed-on: https://go-review.googlesource.com/97957Reviewed-by: Keith Randall <khr@golang.org>
-
Zhou Peng authored
Change-Id: I289af4884583537639800e37928c22814d38cba9 Reviewed-on: https://go-review.googlesource.com/98115Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
-