cmd/6g: change sbop swap logic

I added the nl->op == OLITERAL case during the recent performance round, and while it helps for small integer constants, it hurts for floating point constants. In the Mandelbrot benchmark it causes 2*Zr*Zi to compile like Zr*2*Zi: 0x000000000042663d <+249>: movsd %xmm6,%xmm0 0x0000000000426641 <+253>: movsd $2,%xmm1 0x000000000042664a <+262>: mulsd %xmm1,%xmm0 0x000000000042664e <+266>: mulsd %xmm5,%xmm0 instead of: 0x0000000000426835 <+276>: movsd $2,%xmm0 0x000000000042683e <+285>: mulsd %xmm6,%xmm0 0x0000000000426842 <+289>: mulsd %xmm5,%xmm0 It is unclear why that has such a dramatic performance effect in a tight loop, but it's obviously slightly better code, so go with it. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 5957470000 5973924000 +0.28% BenchmarkFannkuch11 3811295000 3869128000 +1.52% BenchmarkGobDecode 26001900 25670500 -1.27% BenchmarkGobEncode 12051430 11948590 -0.85% BenchmarkGzip 177432 174821 -1.47% BenchmarkGunzip 10967 10756 -1.92% BenchmarkJSONEncode 78924750 79746900 +1.04% BenchmarkJSONDecode 313606400 307081600 -2.08% BenchmarkMandelbrot200 13670860 8200725 -40.01% !!! BenchmarkRevcomp25M 1179194000 1206539000 +2.32% BenchmarkTemplate 447931200 443948200 -0.89% BenchmarkMD5Hash1K 2856 2873 +0.60% BenchmarkMD5Hash8K 22083 22029 -0.24% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 29.52 29.90 1.01x BenchmarkGobEncode 63.69 64.24 1.01x BenchmarkJSONEncode 24.59 24.33 0.99x BenchmarkJSONDecode 6.19 6.32 1.02x BenchmarkRevcomp25M 215.54 210.66 0.98x BenchmarkTemplate 4.33 4.37 1.01x BenchmarkMD5Hash1K 358.54 356.31 0.99x BenchmarkMD5Hash8K 370.95 371.86 1.00x R=ken2 CC=golang-dev https://golang.org/cl/6261051

cmd/6g: change sbop swap logic
I added the nl->op == OLITERAL case during the recent performance round, and while it helps for small integer constants, it hurts for floating point constants. In the Mandelbrot benchmark it causes 2*Zr*Zi to compile like Zr*2*Zi: 0x000000000042663d <+249>: movsd %xmm6,%xmm0 0x0000000000426641 <+253>: movsd $2,%xmm1 0x000000000042664a <+262>: mulsd %xmm1,%xmm0 0x000000000042664e <+266>: mulsd %xmm5,%xmm0 instead of: 0x0000000000426835 <+276>: movsd $2,%xmm0 0x000000000042683e <+285>: mulsd %xmm6,%xmm0 0x0000000000426842 <+289>: mulsd %xmm5,%xmm0 It is unclear why that has such a dramatic performance effect in a tight loop, but it's obviously slightly better code, so go with it. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 5957470000 5973924000 +0.28% BenchmarkFannkuch11 3811295000 3869128000 +1.52% BenchmarkGobDecode 26001900 25670500 -1.27% BenchmarkGobEncode 12051430 11948590 -0.85% BenchmarkGzip 177432 174821 -1.47% BenchmarkGunzip 10967 10756 -1.92% BenchmarkJSONEncode 78924750 79746900 +1.04% BenchmarkJSONDecode 313606400 307081600 -2.08% BenchmarkMandelbrot200 13670860 8200725 -40.01% !!! BenchmarkRevcomp25M 1179194000 1206539000 +2.32% BenchmarkTemplate 447931200 443948200 -0.89% BenchmarkMD5Hash1K 2856 2873 +0.60% BenchmarkMD5Hash8K 22083 22029 -0.24% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 29.52 29.90 1.01x BenchmarkGobEncode 63.69 64.24 1.01x BenchmarkJSONEncode 24.59 24.33 0.99x BenchmarkJSONDecode 6.19 6.32 1.02x BenchmarkRevcomp25M 215.54 210.66 0.98x BenchmarkTemplate 4.33 4.37 1.01x BenchmarkMD5Hash1K 358.54 356.31 0.99x BenchmarkMD5Hash8K 370.95 371.86 1.00x R=ken2 CC=golang-dev https://golang.org/cl/6261051
de96df1b · Russ Cox · dbcdce58 · de96df1b
Commit de96df1b authored May 30, 2012 by Russ Cox
Hide whitespace changes
Inline Side-by-side

Showing with 19 additions and 1 deletion

cgen.c src/cmd/6g/cgen.c +19 -1

No files found.
--- a/src/cmd/6g/cgen.c
+++ b/src/cmd/6g/cgen.c
@@ -396,7 +396,25 @@ cgen(Node *n, Node *res)
 	goto ret;

 sbop:	// symmetric binary
-	if(nl->ullman < nr->ullman || nl->op == OLITERAL) {
+	/*
+	 * put simplest on right - we'll generate into left
+	 * and then adjust it using the computation of right.
+	 * constants and variables have the same ullman
+	 * count, so look for constants specially.
+	 *
+	 * an integer constant we can use as an immediate
+	 * is simpler than a variable - we can use the immediate
+	 * in the adjustment instruction directly - so it goes
+	 * on the right.
+	 *
+	 * other constants, like big integers or floating point
+	 * constants, require a mov into a register, so those
+	 * might as well go on the left, so we can reuse that
+	 * register for the computation.
+	 */
+	if(nl->ullman < nr->ullman ||
+	   (nl->ullman == nr->ullman &&
+	    (smallintconst(nl) || (nr->op == OLITERAL && !smallintconst(nr))))) {
 		r = nl;
 		nl = nr;
 		nr = r;