Commits · c8791538315e633a41461173964a99bd90e103e3 · go / golang

24 Feb, 2018 2 commits

cmd/compile/internal/syntax: use stringer for operators and tokens · c8791538

Daniel Martí authored Feb 20, 2018

With its new -linecomment flag, it is now possible to use stringer on
values whose strings aren't valid identifiers. This is the case with
tokens and operators in Go.

Operator alredy had inline comments with each operator's string
representation; only minor modifications were needed. The inline
comments were added to each of the token names, using the same strategy.

Comments that were previously inline or part of the string arrays were
moved to the line immediately before the name they correspond to.

Finally, declare tokStrFast as a function that uses the generated arrays
directly. Avoiding the branch and strconv call means that we avoid a
performance regression in the scanner, perhaps due to the lack of
mid-stack inlining.

Performance is not affected. Measured with 'go test -run StdLib -fast'
on an X1 Carbon Gen2 (i5-4300U @ 1.90GHz, 8GB RAM, SSD), the best of 5
runs before and after the changes are:

	parsed 1709399 lines (3763 files) in 1.707402159s (1001169 lines/s)
	allocated 449.282Mb (263.137Mb/s)

	parsed 1709329 lines (3765 files) in 1.706663154s (1001562 lines/s)
	allocated 449.290Mb (263.256Mb/s)

Change-Id: Idcc4f83393fcadd6579700e3602c09496ea2625b
Reviewed-on: https://go-review.googlesource.com/95357Reviewed-by: Robert Griesemer <gri@golang.org>

c8791538

math/big: speed-up addMulVVW on amd64 · c3935c08

Ilya Tocar authored Oct 31, 2017

Use MULX/ADOX/ADCX instructions to speed-up addMulVVW,
when they are available. addMulVVW is a hotspot in rsa.
This is faster than ADD/ADC/IMUL version, because ADOX/ADCX only
modify carry/overflow flag, so they can be interleaved with each other
and with MULX, which doesn't modify flags at all.
Increasing unroll factor to e. g. 16 makes rsa 1% faster, but 3PrimeRSA2048Decrypt
performance falls back to baseline.

Updates #20058

AddMulVVW/1-8 3.28ns ± 2% 3.26ns ± 3% ~ (p=0.107 n=10+10)
AddMulVVW/2-8 4.26ns ± 2% 4.24ns ± 3% ~ (p=0.327 n=9+9)
AddMulVVW/3-8 5.07ns ± 2% 5.26ns ± 2% +3.73% (p=0.000 n=10+10)
AddMulVVW/4-8 6.40ns ± 2% 6.50ns ± 2% +1.61% (p=0.000 n=10+10)
AddMulVVW/5-8 6.77ns ± 2% 6.86ns ± 1% +1.38% (p=0.001 n=9+9)
AddMulVVW/10-8 12.2ns ± 2% 10.6ns ± 3% -13.65% (p=0.000 n=10+10)
AddMulVVW/100-8 79.7ns ± 2% 52.4ns ± 1% -34.17% (p=0.000 n=10+10)
AddMulVVW/1000-8 695ns ± 1% 491ns ± 2% -29.39% (p=0.000 n=9+10)
AddMulVVW/10000-8 7.26µs ± 2% 5.92µs ± 6% -18.42% (p=0.000 n=10+10)
AddMulVVW/100000-8 72.6µs ± 2% 62.2µs ± 2% -14.31% (p=0.000 n=10+10)

crypto/rsa speed-up is smaller, but stil noticeable:

RSA2048Decrypt-8 1.61ms ± 1% 1.38ms ± 1% -14.13% (p=0.000 n=10+10)
RSA2048Sign-8 1.93ms ± 1% 1.70ms ± 1% -11.86% (p=0.000 n=10+10)
3PrimeRSA2048Decrypt-8 932µs ± 0% 828µs ± 0% -11.15% (p=0.000 n=10+10)

Results on crypto/tls:

HandshakeServer/RSA-8 901µs ± 1% 777µs ± 0% -13.70% (p=0.000 n=10+8)
HandshakeServer/ECDHE-P256-RSA-8 1.01ms ± 1% 0.90ms ± 0% -11.53% (p=0.000 n=10+9)

Full math/big benchmarks:

name old time/op new time/op delta
AddVV/1-8 3.74ns ± 6% 3.55ns ± 2% ~ (p=0.082 n=10+8)
AddVV/2-8 3.96ns ± 2% 3.98ns ± 5% ~ (p=0.794 n=10+9)
AddVV/3-8 4.97ns ± 2% 4.94ns ± 1% ~ (p=0.081 n=10+9)
AddVV/4-8 5.59ns ± 2% 5.59ns ± 2% ~ (p=0.809 n=10+10)
AddVV/5-8 6.63ns ± 1% 6.62ns ± 1% ~ (p=0.560 n=9+10)
AddVV/10-8 8.11ns ± 1% 8.11ns ± 2% ~ (p=0.402 n=10+10)
AddVV/100-8 46.9ns ± 2% 46.8ns ± 1% ~ (p=0.809 n=10+10)
AddVV/1000-8 389ns ± 1% 391ns ± 4% ~ (p=0.809 n=10+10)
AddVV/10000-8 5.05µs ± 5% 4.98µs ± 2% ~ (p=0.113 n=9+10)
AddVV/100000-8 55.3µs ± 3% 55.2µs ± 3% ~ (p=0.796 n=10+10)
AddVW/1-8 3.04ns ± 3% 3.02ns ± 3% ~ (p=0.538 n=10+10)
AddVW/2-8 3.57ns ± 2% 3.61ns ± 2% +1.12% (p=0.032 n=9+9)
AddVW/3-8 3.77ns ± 1% 3.79ns ± 2% ~ (p=0.719 n=10+10)
AddVW/4-8 4.69ns ± 1% 4.69ns ± 2% ~ (p=0.920 n=10+9)
AddVW/5-8 4.58ns ± 1% 4.58ns ± 1% ~ (p=0.812 n=10+10)
AddVW/10-8 7.62ns ± 2% 7.63ns ± 1% ~ (p=0.926 n=10+10)
AddVW/100-8 41.1ns ± 2% 42.4ns ± 3% +3.34% (p=0.000 n=10+10)
AddVW/1000-8 386ns ± 2% 389ns ± 4% ~ (p=0.514 n=10+10)
AddVW/10000-8 3.88µs ± 3% 3.87µs ± 3% ~ (p=0.448 n=10+10)
AddVW/100000-8 41.2µs ± 3% 41.7µs ± 3% ~ (p=0.148 n=10+10)
AddMulVVW/1-8 3.28ns ± 2% 3.26ns ± 3% ~ (p=0.107 n=10+10)
AddMulVVW/2-8 4.26ns ± 2% 4.24ns ± 3% ~ (p=0.327 n=9+9)
AddMulVVW/3-8 5.07ns ± 2% 5.26ns ± 2% +3.73% (p=0.000 n=10+10)
AddMulVVW/4-8 6.40ns ± 2% 6.50ns ± 2% +1.61% (p=0.000 n=10+10)
AddMulVVW/5-8 6.77ns ± 2% 6.86ns ± 1% +1.38% (p=0.001 n=9+9)
AddMulVVW/10-8 12.2ns ± 2% 10.6ns ± 3% -13.65% (p=0.000 n=10+10)
AddMulVVW/100-8 79.7ns ± 2% 52.4ns ± 1% -34.17% (p=0.000 n=10+10)
AddMulVVW/1000-8 695ns ± 1% 491ns ± 2% -29.39% (p=0.000 n=9+10)
AddMulVVW/10000-8 7.26µs ± 2% 5.92µs ± 6% -18.42% (p=0.000 n=10+10)
AddMulVVW/100000-8 72.6µs ± 2% 62.2µs ± 2% -14.31% (p=0.000 n=10+10)
DecimalConversion-8 108µs ±19% 104µs ± 4% ~ (p=0.460 n=10+8)
FloatString/100-8 926ns ±14% 908ns ± 5% ~ (p=0.398 n=9+9)
FloatString/1000-8 25.7µs ± 1% 25.7µs ± 1% ~ (p=0.739 n=10+10)
FloatString/10000-8 2.13ms ± 1% 2.12ms ± 1% ~ (p=0.353 n=10+10)
FloatString/100000-8 207ms ± 1% 206ms ± 2% ~ (p=0.912 n=10+10)
FloatAdd/10-8 61.3ns ± 3% 61.9ns ± 3% ~ (p=0.183 n=10+10)
FloatAdd/100-8 62.0ns ± 2% 62.9ns ± 4% ~ (p=0.118 n=10+10)
FloatAdd/1000-8 84.7ns ± 2% 84.4ns ± 1% ~ (p=0.591 n=10+10)
FloatAdd/10000-8 305ns ± 2% 306ns ± 1% ~ (p=0.443 n=10+10)
FloatAdd/100000-8 2.45µs ± 1% 2.46µs ± 1% ~ (p=0.782 n=10+10)
FloatSub/10-8 56.8ns ± 4% 56.5ns ± 5% ~ (p=0.423 n=10+10)
FloatSub/100-8 57.3ns ± 4% 57.1ns ± 5% ~ (p=0.540 n=10+10)
FloatSub/1000-8 66.8ns ± 4% 66.6ns ± 1% ~ (p=0.868 n=10+10)
FloatSub/10000-8 199ns ± 1% 198ns ± 1% ~ (p=0.287 n=10+9)
FloatSub/100000-8 1.47µs ± 2% 1.47µs ± 2% ~ (p=0.920 n=10+9)
ParseFloatSmallExp-8 8.74µs ±10% 9.48µs ±10% +8.51% (p=0.010 n=9+10)
ParseFloatLargeExp-8 39.2µs ±25% 39.6µs ±12% ~ (p=0.529 n=10+10)
GCD10x10/WithoutXY-8 173ns ±23% 177ns ±20% ~ (p=0.698 n=10+10)
GCD10x10/WithXY-8 736ns ±12% 728ns ±16% ~ (p=0.838 n=10+10)
GCD10x100/WithoutXY-8 325ns ±16% 326ns ±14% ~ (p=0.912 n=10+10)
GCD10x100/WithXY-8 1.14µs ±13% 1.16µs ± 6% ~ (p=0.287 n=10+9)
GCD10x1000/WithoutXY-8 851ns ±25% 820ns ±12% ~ (p=0.592 n=10+10)
GCD10x1000/WithXY-8 2.89µs ±17% 2.85µs ± 5% ~ (p=1.000 n=10+9)
GCD10x10000/WithoutXY-8 6.66µs ±12% 6.82µs ±19% ~ (p=0.529 n=10+10)
GCD10x10000/WithXY-8 18.0µs ± 5% 17.2µs ±19% ~ (p=0.315 n=7+10)
GCD10x100000/WithoutXY-8 77.8µs ±18% 73.3µs ±11% ~ (p=0.315 n=10+9)
GCD10x100000/WithXY-8 186µs ±14% 204µs ±29% ~ (p=0.218 n=10+10)
GCD100x100/WithoutXY-8 1.09µs ± 1% 1.09µs ± 2% ~ (p=0.117 n=9+10)
GCD100x100/WithXY-8 7.93µs ± 1% 7.97µs ± 1% +0.52% (p=0.006 n=10+10)
GCD100x1000/WithoutXY-8 2.00µs ± 3% 2.04µs ± 6% ~ (p=0.053 n=9+10)
GCD100x1000/WithXY-8 9.23µs ± 1% 9.29µs ± 1% +0.63% (p=0.009 n=10+10)
GCD100x10000/WithoutXY-8 10.2µs ±11% 9.7µs ± 6% ~ (p=0.278 n=10+9)
GCD100x10000/WithXY-8 33.3µs ± 4% 33.6µs ± 4% ~ (p=0.481 n=10+10)
GCD100x100000/WithoutXY-8 106µs ±17% 105µs ±13% ~ (p=0.853 n=10+10)
GCD100x100000/WithXY-8 289µs ±17% 276µs ± 8% ~ (p=0.353 n=10+10)
GCD1000x1000/WithoutXY-8 12.2µs ± 1% 12.1µs ± 1% -0.45% (p=0.007 n=10+10)
GCD1000x1000/WithXY-8 131µs ± 1% 132µs ± 0% +0.93% (p=0.000 n=9+7)
GCD1000x10000/WithoutXY-8 20.6µs ± 2% 20.6µs ± 1% ~ (p=0.326 n=10+9)
GCD1000x10000/WithXY-8 238µs ± 1% 237µs ± 1% ~ (p=0.356 n=9+10)
GCD1000x100000/WithoutXY-8 117µs ± 8% 114µs ±11% ~ (p=0.190 n=10+10)
GCD1000x100000/WithXY-8 1.51ms ± 1% 1.50ms ± 1% ~ (p=0.053 n=9+10)
GCD10000x10000/WithoutXY-8 220µs ± 1% 218µs ± 1% -0.86% (p=0.000 n=10+10)
GCD10000x10000/WithXY-8 3.04ms ± 0% 3.05ms ± 0% +0.33% (p=0.001 n=9+10)
GCD10000x100000/WithoutXY-8 513µs ± 0% 511µs ± 0% -0.38% (p=0.000 n=10+10)
GCD10000x100000/WithXY-8 15.1ms ± 0% 15.0ms ± 0% ~ (p=0.053 n=10+9)
GCD100000x100000/WithoutXY-8 10.4ms ± 1% 10.4ms ± 2% ~ (p=0.258 n=9+9)
GCD100000x100000/WithXY-8 205ms ± 1% 205ms ± 1% ~ (p=0.481 n=10+10)
Hilbert-8 1.25ms ±15% 1.24ms ±17% ~ (p=0.853 n=10+10)
Binomial-8 3.03µs ±24% 2.90µs ±16% ~ (p=0.481 n=10+10)
QuoRem-8 1.95µs ± 1% 1.95µs ± 2% ~ (p=0.117 n=9+10)
Exp-8 5.12ms ± 2% 3.99ms ± 1% -22.02% (p=0.000 n=10+9)
Exp2-8 5.14ms ± 2% 3.98ms ± 0% -22.55% (p=0.000 n=10+9)
Bitset-8 16.4ns ± 2% 16.5ns ± 2% ~ (p=0.311 n=9+10)
BitsetNeg-8 46.3ns ± 4% 45.8ns ± 4% ~ (p=0.272 n=10+10)
BitsetOrig-8 250ns ±19% 247ns ±14% ~ (p=0.671 n=10+10)
BitsetNegOrig-8 416ns ±14% 429ns ±14% ~ (p=0.353 n=10+10)
ModSqrt225_Tonelli-8 400µs ± 0% 320µs ± 0% -19.88% (p=0.000 n=9+7)
ModSqrt224_3Mod4-8 123µs ± 1% 97µs ± 0% -21.21% (p=0.000 n=9+10)
ModSqrt5430_Tonelli-8 1.87s ± 0% 1.39s ± 1% -25.70% (p=0.000 n=9+10)
ModSqrt5430_3Mod4-8 630ms ± 2% 465ms ± 1% -26.12% (p=0.000 n=10+10)
Sqrt-8 25.8µs ± 1% 25.9µs ± 0% +0.66% (p=0.002 n=10+8)
IntSqr/1-8 11.3ns ± 1% 11.3ns ± 2% ~ (p=0.360 n=9+10)
IntSqr/2-8 26.6ns ± 1% 27.4ns ± 2% +2.87% (p=0.000 n=8+9)
IntSqr/3-8 36.5ns ± 6% 36.6ns ± 5% ~ (p=0.589 n=10+10)
IntSqr/5-8 57.2ns ± 2% 57.8ns ± 1% +0.92% (p=0.045 n=10+9)
IntSqr/8-8 112ns ± 1% 93ns ± 1% -16.60% (p=0.000 n=10+10)
IntSqr/10-8 148ns ± 1% 129ns ± 5% -12.85% (p=0.000 n=10+10)
IntSqr/20-8 642ns ±28% 692ns ±21% ~ (p=0.105 n=10+10)
IntSqr/30-8 1.03µs ±18% 1.06µs ±15% ~ (p=0.422 n=10+8)
IntSqr/50-8 2.33µs ±14% 2.14µs ±20% ~ (p=0.063 n=10+10)
IntSqr/80-8 4.06µs ±13% 3.72µs ±14% -8.31% (p=0.029 n=10+10)
IntSqr/100-8 5.79µs ±10% 5.20µs ±18% -10.15% (p=0.004 n=10+10)
IntSqr/200-8 17.1µs ± 1% 12.9µs ± 3% -24.44% (p=0.000 n=10+10)
IntSqr/300-8 35.9µs ± 0% 26.6µs ± 1% -25.75% (p=0.000 n=10+10)
IntSqr/500-8 84.9µs ± 0% 71.7µs ± 1% -15.49% (p=0.000 n=10+10)
IntSqr/800-8 170µs ± 1% 142µs ± 2% -16.73% (p=0.000 n=10+10)
IntSqr/1000-8 258µs ± 1% 218µs ± 1% -15.65% (p=0.000 n=10+10)
Mul-8 10.4ms ± 1% 8.3ms ± 0% -20.05% (p=0.000 n=10+9)
Exp3Power/0x10-8 311ns ±15% 321ns ±24% ~ (p=0.447 n=10+10)
Exp3Power/0x40-8 358ns ±21% 346ns ±37% ~ (p=0.591 n=10+10)
Exp3Power/0x100-8 611ns ±19% 570ns ±27% ~ (p=0.393 n=10+10)
Exp3Power/0x400-8 1.31µs ±26% 1.34µs ±19% ~ (p=0.853 n=10+10)
Exp3Power/0x1000-8 6.76µs ±23% 6.22µs ±16% ~ (p=0.095 n=10+9)
Exp3Power/0x4000-8 37.6µs ±14% 36.4µs ±21% ~ (p=0.247 n=10+10)
Exp3Power/0x10000-8 345µs ±14% 310µs ±11% -9.99% (p=0.005 n=10+10)
Exp3Power/0x40000-8 2.77ms ± 1% 2.34ms ± 1% -15.47% (p=0.000 n=10+10)
Exp3Power/0x100000-8 25.1ms ± 1% 21.3ms ± 1% -15.26% (p=0.000 n=10+10)
Exp3Power/0x400000-8 225ms ± 1% 190ms ± 1% -15.61% (p=0.000 n=10+10)
Fibo-8 23.4ms ± 1% 23.3ms ± 0% ~ (p=0.052 n=10+10)
NatSqr/1-8 58.4ns ±24% 59.8ns ±38% ~ (p=0.739 n=10+10)
NatSqr/2-8 122ns ±21% 122ns ±16% ~ (p=0.896 n=10+10)
NatSqr/3-8 140ns ±28% 148ns ±30% ~ (p=0.288 n=10+10)
NatSqr/5-8 193ns ±29% 210ns ±34% ~ (p=0.469 n=10+10)
NatSqr/8-8 317ns ±21% 296ns ±25% ~ (p=0.393 n=10+10)
NatSqr/10-8 362ns ± 8% 373ns ±30% ~ (p=0.617 n=9+10)
NatSqr/20-8 1.24µs ±16% 1.06µs ±29% -14.57% (p=0.019 n=10+10)
NatSqr/30-8 1.90µs ±32% 1.71µs ±10% ~ (p=0.176 n=10+9)
NatSqr/50-8 4.22µs ±19% 3.67µs ± 7% -13.03% (p=0.017 n=10+9)
NatSqr/80-8 7.33µs ±20% 6.50µs ±15% -11.26% (p=0.009 n=10+10)
NatSqr/100-8 9.84µs ±18% 9.33µs ± 8% ~ (p=0.280 n=10+10)
NatSqr/200-8 21.4µs ± 7% 20.0µs ±14% ~ (p=0.075 n=10+10)
NatSqr/300-8 38.0µs ± 2% 31.3µs ±10% -17.63% (p=0.000 n=10+10)
NatSqr/500-8 102µs ± 5% 101µs ± 4% ~ (p=0.780 n=9+10)
NatSqr/800-8 190µs ± 3% 166µs ± 6% -12.29% (p=0.000 n=10+10)
NatSqr/1000-8 277µs ± 2% 245µs ± 6% -11.64% (p=0.000 n=10+10)
ScanPi-8 144µs ±23% 149µs ±24% ~ (p=0.579 n=10+10)
StringPiParallel-8 25.6µs ± 0% 25.8µs ± 0% +0.69% (p=0.000 n=9+10)
Scan/10/Base2-8 305ns ± 1% 309ns ± 1% +1.32% (p=0.000 n=10+9)
Scan/100/Base2-8 1.95µs ± 1% 1.98µs ± 1% +1.10% (p=0.000 n=10+10)
Scan/1000/Base2-8 19.5µs ± 1% 19.7µs ± 1% +1.39% (p=0.000 n=10+10)
Scan/10000/Base2-8 270µs ± 1% 272µs ± 1% +0.58% (p=0.024 n=9+9)
Scan/100000/Base2-8 10.3ms ± 0% 10.3ms ± 0% +0.16% (p=0.022 n=9+10)
Scan/10/Base8-8 146ns ± 4% 154ns ± 4% +5.57% (p=0.000 n=9+9)
Scan/100/Base8-8 748ns ± 1% 759ns ± 1% +1.51% (p=0.000 n=9+10)
Scan/1000/Base8-8 7.88µs ± 1% 8.00µs ± 1% +1.64% (p=0.000 n=10+10)
Scan/10000/Base8-8 155µs ± 1% 155µs ± 1% ~ (p=0.968 n=10+9)
Scan/100000/Base8-8 9.11ms ± 0% 9.11ms ± 0% ~ (p=0.604 n=9+10)
Scan/10/Base10-8 140ns ± 5% 149ns ± 5% +6.39% (p=0.000 n=9+10)
Scan/100/Base10-8 680ns ± 0% 688ns ± 1% +1.08% (p=0.000 n=9+10)
Scan/1000/Base10-8 7.09µs ± 1% 7.16µs ± 1% +0.98% (p=0.019 n=10+10)
Scan/10000/Base10-8 149µs ± 3% 150µs ± 3% ~ (p=0.143 n=10+10)
Scan/100000/Base10-8 9.16ms ± 0% 9.16ms ± 0% ~ (p=0.661 n=10+9)
Scan/10/Base16-8 134ns ± 5% 135ns ± 3% ~ (p=0.505 n=9+9)
Scan/100/Base16-8 560ns ± 1% 563ns ± 0% +0.67% (p=0.000 n=10+8)
Scan/1000/Base16-8 6.28µs ± 1% 6.26µs ± 1% ~ (p=0.448 n=10+10)
Scan/10000/Base16-8 161µs ± 1% 162µs ± 1% +0.74% (p=0.008 n=9+9)
Scan/100000/Base16-8 9.64ms ± 0% 9.64ms ± 0% ~ (p=0.436 n=10+10)
String/10/Base2-8 116ns ±12% 118ns ±13% ~ (p=0.645 n=10+10)
String/100/Base2-8 871ns ±23% 860ns ±22% ~ (p=0.699 n=10+10)
String/1000/Base2-8 10.0µs ±20% 10.0µs ±23% ~ (p=0.853 n=10+10)
String/10000/Base2-8 110µs ±21% 120µs ±25% ~ (p=0.436 n=10+10)
String/100000/Base2-8 768µs ±11% 733µs ±16% ~ (p=0.393 n=10+10)
String/10/Base8-8 51.3ns ± 1% 51.0ns ± 3% ~ (p=0.286 n=9+9)
String/100/Base8-8 284ns ± 9% 272ns ±12% ~ (p=0.267 n=9+10)
String/1000/Base8-8 3.06µs ± 9% 3.04µs ±10% ~ (p=0.739 n=10+10)
String/10000/Base8-8 36.1µs ±14% 35.1µs ± 9% ~ (p=0.447 n=10+9)
String/100000/Base8-8 371µs ±12% 373µs ±16% ~ (p=0.739 n=10+10)
String/10/Base10-8 167ns ±11% 165ns ± 9% ~ (p=0.781 n=10+10)
String/100/Base10-8 727ns ± 1% 740ns ± 2% +1.70% (p=0.001 n=10+10)
String/1000/Base10-8 5.30µs ±18% 5.37µs ±14% ~ (p=0.631 n=10+10)
String/10000/Base10-8 45.0µs ±14% 44.6µs ±10% ~ (p=0.720 n=9+10)
String/100000/Base10-8 5.10ms ± 1% 5.05ms ± 3% ~ (p=0.211 n=9+10)
String/10/Base16-8 47.7ns ± 6% 47.7ns ± 6% ~ (p=0.985 n=10+10)
String/100/Base16-8 221ns ±10% 234ns ±27% ~ (p=0.541 n=10+10)
String/1000/Base16-8 2.23µs ±11% 2.12µs ± 8% -4.81% (p=0.029 n=9+8)
String/10000/Base16-8 28.3µs ±21% 28.5µs ±14% ~ (p=0.796 n=10+10)
String/100000/Base16-8 291µs ±16% 293µs ±15% ~ (p=0.931 n=9+9)
LeafSize/0-8 2.43ms ± 1% 2.49ms ± 1% +2.56% (p=0.000 n=10+10)
LeafSize/1-8 49.7µs ± 9% 46.3µs ±16% -6.78% (p=0.017 n=10+9)
LeafSize/2-8 48.4µs ±18% 46.3µs ±19% ~ (p=0.436 n=10+10)
LeafSize/3-8 81.7µs ± 3% 80.9µs ± 3% ~ (p=0.278 n=10+9)
LeafSize/4-8 47.0µs ± 7% 47.9µs ±13% ~ (p=0.905 n=9+10)
LeafSize/5-8 96.8µs ± 1% 97.3µs ± 2% ~ (p=0.515 n=8+10)
LeafSize/6-8 82.5µs ± 4% 80.9µs ± 2% -1.92% (p=0.019 n=10+10)
LeafSize/7-8 67.2µs ±13% 66.6µs ± 9% ~ (p=0.842 n=10+9)
LeafSize/8-8 46.0µs ±28% 45.1µs ±12% ~ (p=0.739 n=10+10)
LeafSize/9-8 111µs ± 1% 111µs ± 1% ~ (p=0.739 n=10+10)
LeafSize/10-8 98.8µs ± 4% 97.9µs ± 3% ~ (p=0.278 n=10+9)
LeafSize/11-8 96.8µs ± 1% 96.4µs ± 1% ~ (p=0.211 n=9+10)
LeafSize/12-8 81.0µs ± 4% 81.3µs ± 3% ~ (p=0.579 n=10+10)
LeafSize/13-8 79.7µs ± 5% 79.2µs ± 3% ~ (p=0.661 n=10+9)
LeafSize/14-8 67.6µs ±12% 65.8µs ± 7% ~ (p=0.447 n=10+9)
LeafSize/15-8 63.9µs ±17% 66.3µs ±14% ~ (p=0.481 n=10+10)
LeafSize/16-8 44.0µs ±28% 46.0µs ±27% ~ (p=0.481 n=10+10)
LeafSize/32-8 46.2µs ±13% 43.5µs ±18% ~ (p=0.156 n=9+10)
LeafSize/64-8 53.3µs ±10% 53.0µs ±19% ~ (p=0.730 n=9+9)
ProbablyPrime/n=0-8 3.60ms ± 1% 3.39ms ± 1% -5.87% (p=0.000 n=10+9)
ProbablyPrime/n=1-8 4.42ms ± 1% 4.08ms ± 1% -7.69% (p=0.000 n=10+10)
ProbablyPrime/n=5-8 7.57ms ± 2% 6.79ms ± 1% -10.24% (p=0.000 n=10+10)
ProbablyPrime/n=10-8 11.6ms ± 2% 10.2ms ± 1% -11.69% (p=0.000 n=10+10)
ProbablyPrime/n=20-8 19.4ms ± 2% 16.9ms ± 2% -12.89% (p=0.000 n=10+10)
ProbablyPrime/Lucas-8 2.81ms ± 2% 2.72ms ± 1% -3.22% (p=0.000 n=10+9)
ProbablyPrime/MillerRabinBase2-8 797µs ± 1% 680µs ± 1% -14.64% (p=0.000 n=10+10)

name old speed new speed delta
AddVV/1-8 17.1GB/s ± 6% 18.0GB/s ± 2% ~ (p=0.122 n=10+8)
AddVV/2-8 32.4GB/s ± 2% 32.2GB/s ± 4% ~ (p=0.661 n=10+9)
AddVV/3-8 38.6GB/s ± 2% 38.9GB/s ± 1% ~ (p=0.113 n=10+9)
AddVV/4-8 45.8GB/s ± 2% 45.8GB/s ± 2% ~ (p=0.796 n=10+10)
AddVV/5-8 48.1GB/s ± 2% 48.3GB/s ± 1% ~ (p=0.315 n=10+10)
AddVV/10-8 78.9GB/s ± 1% 78.9GB/s ± 2% ~ (p=0.353 n=10+10)
AddVV/100-8 136GB/s ± 2% 137GB/s ± 1% ~ (p=0.971 n=10+10)
AddVV/1000-8 164GB/s ± 1% 164GB/s ± 4% ~ (p=0.853 n=10+10)
AddVV/10000-8 126GB/s ± 6% 129GB/s ± 2% ~ (p=0.063 n=10+10)
AddVV/100000-8 116GB/s ± 3% 116GB/s ± 3% ~ (p=0.796 n=10+10)
AddVW/1-8 2.64GB/s ± 3% 2.64GB/s ± 3% ~ (p=0.579 n=10+10)
AddVW/2-8 4.49GB/s ± 2% 4.44GB/s ± 2% -1.09% (p=0.040 n=9+9)
AddVW/3-8 6.36GB/s ± 1% 6.34GB/s ± 2% ~ (p=0.684 n=10+10)
AddVW/4-8 6.83GB/s ± 1% 6.82GB/s ± 2% ~ (p=0.905 n=10+9)
AddVW/5-8 8.75GB/s ± 1% 8.73GB/s ± 1% ~ (p=0.796 n=10+10)
AddVW/10-8 10.5GB/s ± 2% 10.5GB/s ± 1% ~ (p=0.971 n=10+10)
AddVW/100-8 19.5GB/s ± 2% 18.9GB/s ± 2% -3.22% (p=0.000 n=10+10)
AddVW/1000-8 20.7GB/s ± 2% 20.6GB/s ± 4% ~ (p=0.631 n=10+10)
AddVW/10000-8 20.6GB/s ± 3% 20.7GB/s ± 3% ~ (p=0.481 n=10+10)
AddVW/100000-8 19.4GB/s ± 2% 19.2GB/s ± 3% ~ (p=0.165 n=10+10)
AddMulVVW/1-8 19.5GB/s ± 2% 19.7GB/s ± 3% ~ (p=0.123 n=10+10)
AddMulVVW/2-8 30.1GB/s ± 2% 30.2GB/s ± 3% ~ (p=0.297 n=9+9)
AddMulVVW/3-8 37.9GB/s ± 2% 36.5GB/s ± 2% -3.63% (p=0.000 n=10+10)
AddMulVVW/4-8 40.0GB/s ± 2% 39.4GB/s ± 2% -1.58% (p=0.001 n=10+10)
AddMulVVW/5-8 47.3GB/s ± 2% 46.6GB/s ± 1% -1.35% (p=0.001 n=9+9)
AddMulVVW/10-8 52.3GB/s ± 2% 60.6GB/s ± 3% +15.76% (p=0.000 n=10+10)
AddMulVVW/100-8 80.3GB/s ± 2% 122.1GB/s ± 1% +51.92% (p=0.000 n=10+10)
AddMulVVW/1000-8 92.0GB/s ± 1% 130.3GB/s ± 2% +41.61% (p=0.000 n=9+10)
AddMulVVW/10000-8 88.2GB/s ± 2% 108.2GB/s ± 5% +22.66% (p=0.000 n=10+10)
AddMulVVW/100000-8 88.2GB/s ± 2% 102.9GB/s ± 2% +16.69% (p=0.000 n=10+10)

Change-Id: Ic98e30c91d437d845fed03e07e976c3fdbf02b36
Reviewed-on: https://go-review.googlesource.com/74851
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Adam Langley <agl@golang.org>

c3935c08

23 Feb, 2018 17 commits

archive/zip: fix handling of Info-ZIP Unix extended timestamps · 9697a119

Joe Tsai authored Feb 23, 2018

The Info-ZIP Unix1 extra field is specified as such:
>>>
Value    Size   Description
-----    ----   -----------
0x5855   Short  tag for this extra block type ("UX")
TSize    Short  total data size for this block
AcTime   Long   time of last access (GMT/UTC)
ModTime  Long   time of last modification (GMT/UTC)
<<<

The previous handling was incorrect in that it read the AcTime field
instead of the ModTime field.

The test-osx.zip test unfortunately locked in the wrong behavior.
Manually parsing that ZIP file shows that the encoded MS-DOS
date and time are 0x4b5f and 0xa97d, which corresponds with a
date of 2017-10-31 21:11:58, which matches the correct mod time
(off by 1 second due to MS-DOS timestamp resolution).

Fixes #23901

Change-Id: I567824c66e8316b9acd103dbecde366874a4b7ef
Reviewed-on: https://go-review.googlesource.com/96895
Run-TryBot: Joe Tsai <joetsai@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

9697a119

runtime: don't check for String/Error methods in printany · 804e3e56

Ian Lance Taylor authored Feb 23, 2018

They have either already been called by preprintpanics, or they can
not be called safely because of the various conditions checked at the
start of gopanic.

Fixes #24059

Change-Id: I4a6233d12c9f7aaaee72f343257ea108bae79241
Reviewed-on: https://go-review.googlesource.com/96755Reviewed-by: Austin Clements <austin@google.com>

804e3e56

os: respect umask in Mkdir and OpenFile on BSD systems when perm has ModeSticky set · a5e8e2d9

Yuval Pavel Zholkover authored Dec 16, 2017

Instead of calling Chmod directly on perm, stat the created file/dir to extract the
actual permission bits which can be different from perm due to umask.

Fixes #23120.

Change-Id: I3e70032451fc254bf48ce9627e98988f84af8d91
Reviewed-on: https://go-review.googlesource.com/84477
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

a5e8e2d9

runtime: reduce arena size to 4MB on 64-bit Windows · 78846472

Austin Clements authored Feb 23, 2018

Currently, we use 64MB heap arenas on 64-bit platforms. This works
well on UNIX-like OSes because they treat untouched pages as
essentially free. However, on Windows, committed memory is charged
against a process whether or not it has demand-faulted physical pages
in. Hence, on Windows, even a process with a tiny heap will commit
64MB for one heap arena, plus another 32MB for the arena map. Things
are much worse under the race detector, which increases the heap
commitment by a factor of 5.5X, leading to 384MB of committed memory
at runtime init.

Fix this by reducing the heap arena size to 4MB on Windows.

To counterbalance the effect of increasing the arena map size by a
factor of 16, and to further reduce the impact of the commitment for
the arena map, we switch from a single entry L1 arena map to a 64
entry L1 arena map.

Compared to the original arena design, this slows down the
x/benchmarks garbage benchmark by 0.49% (the slow down of this commit
alone is 1.59%, but the previous commit bought us a 1% speed-up):

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.29ms ± 1%  +0.49%  (p=0.000 n=17+18)

(https://perf.golang.org/search?q=upload:20180223.1)

(This was measured on linux/amd64 by modifying its arena configuration
as above.)

Fixes #23900.

Change-Id: I6b7fa5ecebee2947bf20cfeb78c248809469c6b1
Reviewed-on: https://go-review.googlesource.com/96780
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

78846472

runtime: support a two-level arena map · ec252105

Austin Clements authored Feb 23, 2018

Currently, the heap arena map is a single, large array that covers
every possible arena frame in the entire address space. This is
practical up to about 48 bits of address space with 64 MB arenas.

However, there are two problems with this:

1. mips64, ppc64, and s390x support full 64-bit address spaces (though
   on Linux only s390x has kernel support for 64-bit address spaces).
   On these platforms, it would be good to support these larger
   address spaces.

2. On Windows, processes are charged for untouched memory, so for
   processes with small heaps, the mostly-untouched 32 MB arena map
   plus a 64 MB arena are significant overhead. Hence, it would be
   good to reduce both the arena map size and the arena size, but with
   a single-level arena, these are inversely proportional.

This CL adds support for a two-level arena map. Arena frame numbers
are now divided into arenaL1Bits of L1 index and arenaL2Bits of L2
index.

At the moment, arenaL1Bits is always 0, so we effectively have a
single level map. We do a few things so that this has no cost beyond
the current single-level map:

1. We embed the L2 array directly in mheap, so if there's a single
   entry in the L2 array, the representation is identical to the
   current representation and there's no extra level of indirection.

2. Hot code that accesses the arena map is structured so that it
   optimizes to nearly the same machine code as it does currently.

3. We make some small tweaks to hot code paths and to the inliner
   itself to keep some important functions inlined despite their
   now-larger ASTs. In particular, this is necessary for
   heapBitsForAddr and heapBits.next.

Possibly as a result of some of the tweaks, this actually slightly
improves the performance of the x/benchmarks garbage benchmark:

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.26ms ± 1%  -1.07%  (p=0.000 n=17+19)

(https://perf.golang.org/search?q=upload:20180223.2)

For #23900.

Change-Id: If5164e0961754f97eb9eca58f837f36d759505ff
Reviewed-on: https://go-review.googlesource.com/96779
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

ec252105

cmd/compile: teach front-end deadcode about && and || · 2dbf15e8

Austin Clements authored Feb 23, 2018

The front-end dead code elimination is very simple. Currently, it just
looks for if statements with constant boolean conditions. Its main
purpose is to reduce load on the compiler and shrink code before
inlining computes hairiness.

This CL teaches front-end dead code elimination about short-circuiting
boolean expressions && and ||, since they're essentially the same as
if statements.

This also teaches the inliner that the constant 'if' form left behind
by deadcode is free.

These changes will help with runtime modifications in the next CL that
would otherwise inhibit inlining in some hot code paths. Currently,
however, they have no significant impact on benchmarks.

Change-Id: I886203b3c4acdbfef08148fddd7f3a7af5afc7c1
Reviewed-on: https://go-review.googlesource.com/96778
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>

2dbf15e8

runtime: rename "arena index" to "arena map" · 33b76920

Austin Clements authored Feb 22, 2018

There are too many places where I want to talk about "indexing into
the arena index". Make this less awkward and ambiguous by calling it
the "arena map" instead.

Change-Id: I726b0667bb2139dbc006175a0ec09a871cdf73f9
Reviewed-on: https://go-review.googlesource.com/96777
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>

33b76920

runtime: don't assume arena is in address order · 9680980e

Austin Clements authored Feb 22, 2018

On amd64, the arena is no longer in address space order, but currently
the heap dumper assumes that it is. Fix this assumption.

Change-Id: Iab1953cd36b359d0fb78ed49e5eb813116a18855
Reviewed-on: https://go-review.googlesource.com/96776
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

9680980e

path: use OS-specific function in MkdirAll, don't always keep trailing slash · b86e7668

Ian Lance Taylor authored Feb 19, 2018

CL 86295 changed MkdirAll to always pass a trailing path separator to
support extended-length paths on Windows.

However, when Stat is called on an existing file followed by trailing
slash, it will return a "not a directory" error, skipping the fast
path at the beginning of MkdirAll.

This change fixes MkdirAll to only pass the trailing path separator
where required on Windows, by using an OS-specific function fixRootDirectory.

Updates #23918

Change-Id: I23f84a20e65ccce556efa743d026d352b4812c34
Reviewed-on: https://go-review.googlesource.com/95255
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David du Colombier <0intro@gmail.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>

b86e7668

cmd/vet: use type info to detect the atomic funcs · bae3fd66

Daniel Martí authored Feb 03, 2018

Simply checking if a name is "atomic" isn't enough, as that might be a
var or another imported package. Now that vet requires type information,
we can do better. And add a simple regression test.

Change-Id: Ibd2004428374e3628cd3cd0ffb5f37cedaf448ea
Reviewed-on: https://go-review.googlesource.com/91795
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Robert Griesemer <gri@golang.org>

bae3fd66

crypto/x509: tighten EKU checking for requested EKUs. · 0681c7c3

Adam Langley authored Feb 22, 2018

There are, sadly, many exceptions to EKU checking to reflect mistakes
that CAs have made in practice. However, the requirements for checking
requested EKUs against the leaf should be tighter than for checking leaf
EKUs against a CA.

Fixes #23884

Change-Id: I05ea874c4ada0696d8bb18cac4377c0b398fcb5e
Reviewed-on: https://go-review.googlesource.com/96379Reviewed-by: Jonathan Rudenberg <jonathan@titanous.com>
Reviewed-by: Filippo Valsorda <hi@filippo.io>
Run-TryBot: Filippo Valsorda <hi@filippo.io>
TryBot-Result: Gobot Gobot <gobot@golang.org>

0681c7c3

regexp: Regexp shouldn't keep references to inputs · 72635401

Oleg Bulatov authored Feb 23, 2018

If you try to find something in a slice of bytes using a Regexp object,
the byte array will not be released by GC until you use the Regexp object
on another slice of bytes. It happens because the Regexp object keep
references to the input data in its cache.

Change-Id: I873107f15c1900aa53ccae5d29dbc885b9562808
Reviewed-on: https://go-review.googlesource.com/96715Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

72635401

cmd/compile: add code generation tests for sqrt intrinsics · 37a038a3

Alberto Donizetti authored Feb 23, 2018

Add "sqrt-intrisified" code generation tests for mips64 and 386, where
we weren't intrisifying math.Sqrt (see CL 96615 and CL 95916), and for
mips and amd64, which lacked sqrt intrinsics tests.

Change-Id: I0cfc08aec6eefd47f3cd7a5995a89393e8b7ed9e
Reviewed-on: https://go-review.googlesource.com/96716
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

37a038a3

runtime: rename the TestGcHashmapIndirection to TestGcMapIndirection · fceaa2e2

mingrammer authored Feb 23, 2018

There was still the word 'Hashmap' in gc_test.go, so I renamed it to just 'Map'

Previous renaming commit: https://golang.org/cl/90336

Change-Id: I5b0e5c2229d1c30937c7216247f4533effb81ce7
Reviewed-on: https://go-review.googlesource.com/96675Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

fceaa2e2

cmd/compile: intrinsify math.Sqrt on 386 · 9ee78af8

Alberto Donizetti authored Feb 23, 2018

It seems like all the pieces were already there, it only needed the
final plumbing.

Before:

	0x001b 00027 (test.go:9)	MOVSD	X0, (SP)
	0x0020 00032 (test.go:9)	CALL	math.Sqrt(SB)
	0x0025 00037 (test.go:9)	MOVSD	8(SP), X0

After:

	0x0018 00024 (test.go:9)	SQRTSD	X0, X0

name    old time/op  new time/op  delta
Sqrt-4  4.60ns ± 2%  0.45ns ± 1%  -90.33%  (p=0.000 n=10+10)

Change-Id: I0f623958e19e726840140bf9b495d3f3a9184b9d
Reviewed-on: https://go-review.googlesource.com/96615
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

9ee78af8

cmd/compile: use | in the last repetitive generic rules · f6c67813

Alberto Donizetti authored Feb 22, 2018

This change or-ifies the last low-hanging rules in generic. Again,
this is limited at short and repetitive rules, where the use or ors
does not impact readability.

Ran rulegen, no change in the actual compiler code.

Change-Id: I972b523bc08532f173a3645b47d6936b6e1218c8
Reviewed-on: https://go-review.googlesource.com/96335Reviewed-by: Giovanni Bajo <rasky@develer.com>

f6c67813

runtime: fix a few typos in comments · 5b3cd560

Jerrin Shaji George authored Feb 22, 2018

Change-Id: I07a1eb02ffc621c5696b49491181300bf411f822
Reviewed-on: https://go-review.googlesource.com/96475Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

5b3cd560

22 Feb, 2018 10 commits

go/types: add -panic flag to gotype command for debugging · 70b09c72

Robert Griesemer authored Feb 22, 2018

Setting -panic will cause gotype to panic with the first reported
error, producing a stack trace for debugging.

For #23914.

Change-Id: I40c41cf10aa13d1dd9a099f727ef4201802de13a
Reviewed-on: https://go-review.googlesource.com/96375Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

70b09c72

syscall: remove list of unimplemented syscalls · 6450c591

Tobias Klauser authored Feb 22, 2018

The syscall package is frozen and we don't want to encourage anyone to
implement these syscalls.

Change-Id: I6b6e33e32a4b097da6012226aa15300735e50e9f
Reviewed-on: https://go-review.googlesource.com/96315Reviewed-by: Ian Lance Taylor <iant@golang.org>

6450c591

go/types: fix regression with short variable declarations · 2465ae64

Robert Griesemer authored Feb 22, 2018

The variables on the lhs of a short variable declaration are
only in scope after the variable declaration. Specifically,
function literals on the rhs of a short variable declaration
must not see newly declared variables on the lhs.

This used to work and this bug was likely introduced with
https://go-review.googlesource.com/c/go/+/83397 for go1.11.
Luckily this is just an oversight and the fix is trivial:
Simply use the mechanism for delayed type-checkin of function
literals introduced in the before-mentioned change here as well.

Fixes #24026.

Change-Id: I74ce3a0d05c5a2a42ce4b27601645964f906e82d
Reviewed-on: https://go-review.googlesource.com/96177Reviewed-by: Alan Donovan <adonovan@google.com>

2465ae64

cmd/compile: fix FP accuracy issue introduced by FMA optimization on ARM64 · 7113d3a5

Ben Shi authored Feb 22, 2018

Two ARM64 rules are added to avoid FP accuracy issue, which causes
build failure.
https://build.golang.org/log/1360f5c9ef3f37968216350283c1013e9681725d

fixes #24033

Change-Id: I9b74b584ab5cc53fa49476de275dc549adf97610
Reviewed-on: https://go-review.googlesource.com/96355Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

7113d3a5

database/sql: add String method to IsolationLevel · ef3ab3f5

Alexey Palazhchenko authored Feb 06, 2018

Fixes #23632

Change-Id: I7197e13df6cf28400a6dd86c110f41129550abb6
Reviewed-on: https://go-review.googlesource.com/92235Reviewed-by: Daniel Theophanes <kardianos@gmail.com>

ef3ab3f5

cmd/compile: use | in the most repetitive s390x rules · 1e05924c

Alberto Donizetti authored Feb 21, 2018

For now, limited to the most repetitive rules that are also short and
simple, so that we can have a substantial conciseness win without
compromising rules readability.

Ran rulegen, no changes in the rewrite files.

Change-Id: I8447784895a218c5c1b4dfa1cdb355bd73dabfd1
Reviewed-on: https://go-review.googlesource.com/95955Reviewed-by: Giovanni Bajo <rasky@develer.com>

1e05924c

reflect: avoid calling common if type is known to be *rtype · 1dbe4c50

Martin Möhrmann authored Feb 21, 2018

If the type of Type is known to be *rtype than the common
function is a no-op and does not need to be called.

name  old time/op  new time/op  delta
New   31.0ns ± 5%  30.2ns ± 4%  -2.74%  (p=0.008 n=20+20)

Change-Id: I5d00346dbc782e34c530166d1ee0499b24068b51
Reviewed-on: https://go-review.googlesource.com/96115Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

1dbe4c50

cmd/compile: improve FP performance on ARM64 · f4c3072c

Ben Shi authored Feb 17, 2018

FMADD/FMSUB/FNMADD/FNMSUB are efficient FP instructions, which can
be used by the comiler to improve FP performance. This CL implements
this optimization.

1. The compilecmp benchmark shows little change.
name old time/op new time/op delta
Template 2.35s ± 4% 2.38s ± 4% ~ (p=0.161 n=15+15)
Unicode 1.36s ± 5% 1.36s ± 4% ~ (p=0.685 n=14+13)
GoTypes 8.11s ± 3% 8.13s ± 2% ~ (p=0.624 n=15+15)
Compiler 40.5s ± 2% 40.7s ± 2% ~ (p=0.137 n=15+15)
SSA 115s ± 3% 116s ± 1% ~ (p=0.270 n=15+14)
Flate 1.46s ± 4% 1.45s ± 5% ~ (p=0.870 n=15+15)
GoParser 1.85s ± 2% 1.87s ± 3% ~ (p=0.477 n=14+15)
Reflect 5.11s ± 4% 5.10s ± 2% ~ (p=0.624 n=15+15)
Tar 2.23s ± 3% 2.23s ± 5% ~ (p=0.624 n=15+15)
XML 2.72s ± 5% 2.74s ± 3% ~ (p=0.290 n=15+14)
[Geo mean] 5.02s 5.03s +0.29%

name old user-time/op new user-time/op delta
Template 2.90s ± 2% 2.90s ± 3% ~ (p=0.780 n=14+15)
Unicode 1.71s ± 5% 1.70s ± 3% ~ (p=0.458 n=14+13)
GoTypes 9.77s ± 2% 9.76s ± 2% ~ (p=0.838 n=15+15)
Compiler 49.1s ± 2% 49.1s ± 2% ~ (p=0.902 n=15+15)
SSA 144s ± 1% 144s ± 2% ~ (p=0.567 n=15+15)
Flate 1.75s ± 5% 1.74s ± 3% ~ (p=0.461 n=15+15)
GoParser 2.22s ± 2% 2.21s ± 3% ~ (p=0.233 n=15+15)
Reflect 5.99s ± 2% 5.95s ± 1% ~ (p=0.093 n=14+15)
Tar 2.68s ± 2% 2.67s ± 3% ~ (p=0.310 n=14+15)
XML 3.22s ± 2% 3.24s ± 3% ~ (p=0.512 n=15+15)
[Geo mean] 6.08s 6.07s -0.19%

name old text-bytes new text-bytes delta
HelloSize 641kB ± 0% 641kB ± 0% ~ (all equal)

name old data-bytes new data-bytes delta
HelloSize 9.46kB ± 0% 9.46kB ± 0% ~ (all equal)

name old bss-bytes new bss-bytes delta
HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal)

name old exe-bytes new exe-bytes delta
HelloSize 1.24MB ± 0% 1.24MB ± 0% ~ (all equal)

2. The go1 benchmark shows little improvement in total (excluding noise),
but some improvement in test case Mandelbrot200 and FmtFprintfFloat.
name old time/op new time/op delta
BinaryTree17-4 42.1s ± 2% 42.0s ± 2% ~ (p=0.453 n=30+28)
Fannkuch11-4 33.5s ± 3% 33.3s ± 3% -0.38% (p=0.045 n=30+30)
FmtFprintfEmpty-4 534ns ± 0% 534ns ± 0% ~ (all equal)
FmtFprintfString-4 1.09µs ± 0% 1.09µs ± 0% -0.27% (p=0.000 n=23+17)
FmtFprintfInt-4 1.16µs ± 3% 1.16µs ± 3% ~ (p=0.714 n=30+30)
FmtFprintfIntInt-4 1.76µs ± 1% 1.77µs ± 0% +0.15% (p=0.002 n=23+23)
FmtFprintfPrefixedInt-4 2.21µs ± 3% 2.20µs ± 3% ~ (p=0.390 n=30+30)
FmtFprintfFloat-4 3.28µs ± 0% 3.11µs ± 0% -5.01% (p=0.000 n=25+26)
FmtManyArgs-4 7.18µs ± 0% 7.19µs ± 0% +0.13% (p=0.000 n=24+25)
GobDecode-4 94.9ms ± 0% 95.6ms ± 5% +0.83% (p=0.002 n=23+29)
GobEncode-4 80.7ms ± 4% 79.8ms ± 0% -1.11% (p=0.003 n=30+24)
Gzip-4 4.58s ± 4% 4.59s ± 3% +0.26% (p=0.002 n=30+26)
Gunzip-4 449ms ± 4% 443ms ± 0% ~ (p=0.096 n=30+26)
HTTPClientServer-4 553µs ± 1% 548µs ± 1% -0.96% (p=0.000 n=30+30)
JSONEncode-4 215ms ± 4% 214ms ± 4% -0.29% (p=0.000 n=30+30)
JSONDecode-4 868ms ± 4% 875ms ± 5% +0.79% (p=0.008 n=30+30)
Mandelbrot200-4 51.4ms ± 0% 46.7ms ± 3% -9.09% (p=0.000 n=25+26)
GoParse-4 42.1ms ± 0% 41.8ms ± 0% -0.61% (p=0.000 n=25+24)
RegexpMatchEasy0_32-4 1.02µs ± 4% 1.02µs ± 4% -0.17% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 3.90µs ± 0% 3.95µs ± 4% ~ (p=0.516 n=23+30)
RegexpMatchEasy1_32-4 970ns ± 3% 973ns ± 3% ~ (p=0.951 n=30+30)
RegexpMatchEasy1_1K-4 6.43µs ± 3% 6.33µs ± 0% -1.62% (p=0.000 n=30+25)
RegexpMatchMedium_32-4 1.75µs ± 0% 1.75µs ± 0% ~ (p=0.422 n=25+24)
RegexpMatchMedium_1K-4 568µs ± 3% 562µs ± 0% ~ (p=0.079 n=30+24)
RegexpMatchHard_32-4 30.8µs ± 0% 31.2µs ± 4% +1.46% (p=0.018 n=23+30)
RegexpMatchHard_1K-4 932µs ± 0% 946µs ± 3% +1.49% (p=0.000 n=24+30)
Revcomp-4 7.69s ± 3% 7.69s ± 2% +0.04% (p=0.032 n=24+25)
Template-4 893ms ± 5% 880ms ± 6% -1.53% (p=0.000 n=30+30)
TimeParse-4 4.90µs ± 3% 4.84µs ± 0% ~ (p=0.080 n=30+25)
TimeFormat-4 4.70µs ± 1% 4.76µs ± 0% +1.21% (p=0.000 n=23+26)
[Geo mean] 710µs 706µs -0.63%

name old speed new speed delta
GobDecode-4 8.09MB/s ± 0% 8.03MB/s ± 5% -0.77% (p=0.002 n=23+29)
GobEncode-4 9.52MB/s ± 4% 9.62MB/s ± 0% +1.07% (p=0.003 n=30+24)
Gzip-4 4.24MB/s ± 4% 4.23MB/s ± 3% -0.35% (p=0.002 n=30+26)
Gunzip-4 43.2MB/s ± 4% 43.8MB/s ± 0% ~ (p=0.123 n=30+26)
JSONEncode-4 9.03MB/s ± 4% 9.06MB/s ± 4% +0.28% (p=0.000 n=30+30)
JSONDecode-4 2.24MB/s ± 4% 2.22MB/s ± 5% -0.79% (p=0.008 n=30+30)
GoParse-4 1.38MB/s ± 1% 1.38MB/s ± 0% ~ (p=0.401 n=25+17)
RegexpMatchEasy0_32-4 31.4MB/s ± 4% 31.5MB/s ± 3% +0.16% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 262MB/s ± 0% 259MB/s ± 4% ~ (p=0.693 n=23+30)
RegexpMatchEasy1_32-4 33.0MB/s ± 3% 32.9MB/s ± 3% ~ (p=0.139 n=30+30)
RegexpMatchEasy1_1K-4 159MB/s ± 3% 162MB/s ± 0% +1.60% (p=0.000 n=30+25)
RegexpMatchMedium_32-4 570kB/s ± 0% 570kB/s ± 0% ~ (all equal)
RegexpMatchMedium_1K-4 1.80MB/s ± 3% 1.82MB/s ± 0% +1.09% (p=0.007 n=30+24)
RegexpMatchHard_32-4 1.04MB/s ± 0% 1.03MB/s ± 3% -1.38% (p=0.003 n=23+30)
RegexpMatchHard_1K-4 1.10MB/s ± 0% 1.08MB/s ± 3% -1.52% (p=0.000 n=24+30)
Revcomp-4 33.0MB/s ± 3% 33.0MB/s ± 2% ~ (p=0.128 n=24+25)
Template-4 2.17MB/s ± 5% 2.21MB/s ± 6% +1.61% (p=0.000 n=30+30)
[Geo mean] 7.79MB/s 7.79MB/s +0.05%

Change-Id: Ied3dbdb5ba8e386168629cba06fcd4263bbb83e1
Reviewed-on: https://go-review.googlesource.com/94901
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

f4c3072c

cmd/asm: add arm64 instructions for math optimization · f5de4200

erifan01 authored Jan 26, 2018

Add arm64 HW instructions FMADDD, FMADDS, FMSUBD, FMSUBS, FNMADDD, FNMADDS,
FNMSUBD, FNMSUBS, VFMLA, VFMLS, VMOV (element) for math optimization.

Add check on register element index and test cases.

Change-Id: Ice07c50b1a02d488ad2cde2a4e8aea93f3e3afff
Reviewed-on: https://go-review.googlesource.com/90876Reviewed-by: Cherry Zhang <cherryyz@google.com>

f5de4200

cmd/compile: decouple emitted block order from regalloc block order · c18ff184

David Chase authored Jun 30, 2017

While tinkering with different block orders for the preemptible
loop experiment, crashed the register allocator with a "bad"
one (these exist).  Realized that one knob was controlling
two things (register allocation and branch patterns) and
decided that life would be simpler if the two orders were
independent.

Ran some experiments and determined that we have probably,
mostly, been optimizing for register allocation effects, not
branch effects.  Bad block orders for register allocation are
somewhat costly.

This will also allow separate experimentation with perhaps-
better block orders for register allocation.

Change-Id: I6ecf2f24cca178b6f8acc0d3c4caaef043c11ed9
Reviewed-on: https://go-review.googlesource.com/47314
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

c18ff184

21 Feb, 2018 11 commits

cmd/trace: add memory usage reporting · a66af728

Hana Kim authored Feb 06, 2018

Enabled when the tool runs with DEBUG_MEMORY_USAGE=1 env var.
After reporting the usage, it waits until user enters input
(helpful when checking top or other memory monitor)

Also adds net/http/pprof to export debug endpoints.

From the trace included in #21870

$ DEBUG_MEMORY_USAGE=1 go tool trace trace.out
2018/02/21 16:04:49 Parsing trace...
after parsing trace
 Alloc:	3385747848 Bytes
 Sys:	3661654648 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	3488907264 Bytes
 HeapInUse:	3426377728 Bytes
 HeapAlloc:	3385747848 Bytes
Enter to continue...
2018/02/21 16:05:09 Serializing trace...
after generating trace
 Alloc:	4908929616 Bytes
 Sys:	5319063640 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	5032411136 Bytes
 HeapInUse:	4982865920 Bytes
 HeapAlloc:	4908929616 Bytes
Enter to continue...
2018/02/21 16:05:18 Splitting trace...
after spliting trace
 Alloc:	4909026200 Bytes
 Sys:	5319063640 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	5032411136 Bytes
 HeapInUse:	4983046144 Bytes
 HeapAlloc:	4909026200 Bytes
Enter to continue...
2018/02/21 16:05:39 Opening browser. Trace viewer is listening on http://127.0.0.1:33661
after httpJsonTrace
 Alloc:	5288336048 Bytes
 Sys:	7790245896 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	7381123072 Bytes
 HeapInUse:	5324120064 Bytes
 HeapAlloc:	5288336048 Bytes
Enter to continue...

Change-Id: I88bb3cb1af3cb62e4643a8cbafd5823672b2e464
Reviewed-on: https://go-review.googlesource.com/92355Reviewed-by: Peter Weinberger <pjw@google.com>

a66af728

cmd/compile/internal/syntax: simpler position base update for line directives (cleanup) · e2a86b6b

Robert Griesemer authored Feb 21, 2018

The existing code was somewhat convoluted and made several assumptions
about the encoding of position bases:

1) The position's base for a file contained a position whose base
   pointed to itself (which is true but an implementation detail
   of src.Pos).

2) Updating the position base for a line directive required finding
   the base of the most recent's base position.

This change simply stores the file's position base and keeps using it
directly for each line directive (instead of getting it from the most
recently updated base).

Change-Id: I4d80da513bededb636eab0ce53257fda73f0dbc0
Reviewed-on: https://go-review.googlesource.com/95736Reviewed-by: Matthew Dempsky <mdempsky@google.com>

e2a86b6b

net/http: support multipart/mixed in Request.MultipartReader · e02d6bb6

OneOfOne authored Feb 21, 2018

Fixes #23959

GitHub-Last-Rev: 08ce026f52f9fd65b49d99745dffed46a3951585
GitHub-Pull-Request: golang/go#24012
Change-Id: I7e71c41330346dbc4dad6ba813cabfa8a54e2f66
Reviewed-on: https://go-review.googlesource.com/95975
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

e02d6bb6

text/template: fix the documentation of the block action · 669676b7

Yury Smolsky authored Feb 18, 2018

Fixes #23520

Change-Id: Ia834819f3260691a1a0181034ef4b4b945965688
Reviewed-on: https://go-review.googlesource.com/94761Reviewed-by: Andrew Gerrand <adg@golang.org>

669676b7

runtime: clarify address space limit constants and comments · ea8d7a37

Austin Clements authored Feb 20, 2018

Now that we support the full non-contiguous virtual address space of
amd64 hardware, some of the comments and constants related to this are
out of date.

This renames memLimitBits to heapAddrBits because 1<<memLimitBits is
no longer the limit of the address space and rewrites the comment to
focus first on hardware limits (which span OSes) and then discuss
kernel limits.

Second, this eliminates the memLimit constant because there's no
longer a meaningful "highest possible heap pointer value" on amd64.

Updates #23862.

Change-Id: I44b32033d2deb6b69248fb8dda14fc0e65c47f11
Reviewed-on: https://go-review.googlesource.com/95498
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

ea8d7a37

runtime: offset the heap arena index by 2^47 on amd64 · ed1959c6

Austin Clements authored Feb 19, 2018

On amd64, the virtual address space, when interpreted as signed
values, is [-2^47, 2^47). Currently, we only support heap addresses in
the "positive" half of this, [0, 2^47). This suffices for linux/amd64
and windows/amd64, but solaris/amd64 can map user addresses in the
negative part of this range. Specifically, addresses
0xFFFF8000'00000000 to 0xFFFFFD80'00000000 are part of user space.
This leads to "memory allocated by OS not in usable address space"
panic, since we don't map heap arena index space for these addresses.

Fix this by offsetting addresses when computing arena indexes so that
arena entry 0 corresponds to address -2^47 on amd64. We already map
enough arena space for 2^48 heap addresses on 64-bit (because arm64's
virtual address space is [0, 2^48)), so we don't need to grow any
structures to support this.

A different approach would be to simply mask out the top 16 bits.
However, there are two advantages to the offset approach: 1) invalid
heap addresses continue to naturally map to invalid arena indexes so
we don't need extra checks and 2) it perturbs the mapping of addresses
to arena indexes more, which helps check that we don't accidentally
compute incorrect arena indexes somewhere that happen to be right most
of the time.

Several comments and constant names are now somewhat misleading. We'll
fix that in the next CL. This CL is the core change the arena
indexing.

Fixes #23862.

Change-Id: Idb8e299fded04593a286b01a9582da6ddbac2f9a
Reviewed-on: https://go-review.googlesource.com/95497
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

ed1959c6

runtime: abstract indexing of arena index · e9db7b9d

Austin Clements authored Feb 16, 2018

Accessing the arena index is about to get slightly more complicated.
Abstract this away into a set of functions for going back and forth
between addresses and arena slice indexes.

For #23862.

Change-Id: I0b20e74ef47a07b78ed0cf0a6128afe6f6e40f4b
Reviewed-on: https://go-review.googlesource.com/95496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

e9db7b9d

runtime: simplify bulkBarrierPreWrite · 3e214e56

Austin Clements authored Feb 16, 2018

Currently, bulkBarrierPreWrite uses inheap to decide whether the
destination is in the heap or whether to check for stack or global
data. However, this isn't the best question to ask.

Instead, get the span directly and query its state. This lets us
directly determine whether this might be a global, or is stack memory,
or is heap memory.

At this point, inheap is no longer used in the hot path, so drop it
from the must-be-inlined list and substitute spanOf.

This will help in a circuitous way with #23862, since fixing that is
going to push inheap very slightly over the inline-able threshold on a
few platforms.

Change-Id: I5360fc1181183598502409f12979899e1e4d45f7
Reviewed-on: https://go-review.googlesource.com/95495
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

3e214e56

cmd/trace: include P info in goroutine slices · 3e1ac1b0

Hana Kim authored Jan 26, 2018

The task-oriented trace view presents the execution trace organized
based on goroutines. Often, which P a goroutine was running on is
useful, so this CL includes the P ids in the goroutine execution slices.

R=go1.11

Change-Id: I96539bf8215e5c1cd8cc997a90204f57347c48c8
Reviewed-on: https://go-review.googlesource.com/90221Reviewed-by: Heschi Kreinick <heschi@google.com>

3e1ac1b0

cmd/trace: add user log event in the task-oriented trace view · f42418b2

Hana Kim authored Jan 26, 2018

Also append stack traces to task create/end slices.

R=go1.11

Change-Id: I2adb342e92b36d30bee2860393618eb4064450cf
Reviewed-on: https://go-review.googlesource.com/90220Reviewed-by: Heschi Kreinick <heschi@google.com>

f42418b2

cmd/trace: present the GC time in the usertask view · cacf8127

Hana Kim authored Jan 25, 2018

The GC time for a task is defined by the sum of GC duration
overlapping with the task's duration.

Also, grey out non-overlapping slices in the task-oriented
trace view.

R=go1.11

Change-Id: I42def0eb520f5d9bd07edd265e558706f6fab552
Reviewed-on: https://go-review.googlesource.com/90219Reviewed-by: Heschi Kreinick <heschi@google.com>

cacf8127