1. 30 May, 2012 28 commits
    • Dave Cheney's avatar
      cmd/go: add -ccflags · 5b2cd445
      Dave Cheney authored
      Add -ccflags to pass arguments to {5,6,8}c
      similar to -gcflags for {5,6,8}g.
      
      R=golang-dev, rsc
      CC=golang-dev
      https://golang.org/cl/6260047
      5b2cd445
    • Russ Cox's avatar
      cmd/gc: contiguous loop layout · 001b75c9
      Russ Cox authored
      Drop expecttaken function in favor of extra argument
      to gbranch and bgen. Mark loop condition as likely to
      be true, so that loops are generated inline.
      
      The main benefit here is contiguous code when trying
      to read the generated assembly. It has only minor effects
      on the timing, and they mostly cancel the minor effects
      that aligning function entry points had.  One exception:
      both changes made Fannkuch faster.
      
      Compared to before CL 6244066 (before aligned functions)
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkBinaryTree17    4222117400   4201958800   -0.48%
      BenchmarkFannkuch11      3462631800   3215908600   -7.13%
      BenchmarkGobDecode         20887622     20899164   +0.06%
      BenchmarkGobEncode          9548772      9439083   -1.15%
      BenchmarkGzip                151687       152060   +0.25%
      BenchmarkGunzip                8742         8711   -0.35%
      BenchmarkJSONEncode        62730560     62686700   -0.07%
      BenchmarkJSONDecode       252569180    252368960   -0.08%
      BenchmarkMandelbrot200      5267599      5252531   -0.29%
      BenchmarkRevcomp25M       980813500    985248400   +0.45%
      BenchmarkTemplate         361259100    357414680   -1.06%
      
      Compared to tip (aligned functions):
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkBinaryTree17    4140739800   4201958800   +1.48%
      BenchmarkFannkuch11      3259914400   3215908600   -1.35%
      BenchmarkGobDecode         20620222     20899164   +1.35%
      BenchmarkGobEncode          9384886      9439083   +0.58%
      BenchmarkGzip                150333       152060   +1.15%
      BenchmarkGunzip                8741         8711   -0.34%
      BenchmarkJSONEncode        65210990     62686700   -3.87%
      BenchmarkJSONDecode       249394860    252368960   +1.19%
      BenchmarkMandelbrot200      5273394      5252531   -0.40%
      BenchmarkRevcomp25M       996013800    985248400   -1.08%
      BenchmarkTemplate         360620840    357414680   -0.89%
      
      R=ken2
      CC=golang-dev
      https://golang.org/cl/6245069
      001b75c9
    • Mikio Hara's avatar
      net: fix test to avoid unintentional nil pointer dereference · aad8e954
      Mikio Hara authored
      R=golang-dev, dave, rsc
      CC=golang-dev
      https://golang.org/cl/6248065
      aad8e954
    • Russ Cox's avatar
      cmd/5l: fix PLD · 6a5660f1
      Russ Cox authored
      Was missing break.
      
      R=ken2
      CC=golang-dev
      https://golang.org/cl/6250078
      6a5660f1
    • Russ Cox's avatar
      cmd/6l, cmd/8l, cmd/5l: add AUNDEF instruction · f2bd3a97
      Russ Cox authored
      On 6l and 8l, this is a real instruction, guaranteed to
      cause an 'undefined instruction' exception.
      
      On 5l, we simulate it as BL to address 0.
      
      The plan is to use it as a signal to the linker that this
      point in the instruction stream cannot be reached
      (hence the changes to nofollow).  This will help the
      compiler explain that panicindex and friends do not
      return without having to put a list of these functions
      in the linker.
      
      R=ken2
      CC=golang-dev
      https://golang.org/cl/6255064
      f2bd3a97
    • Russ Cox's avatar
      cmd/ld: align function entry on arch-specific boundary · 8820ab5d
      Russ Cox authored
      16 seems pretty standard on x86 for function entry.
      I don't know if ARM would benefit, so I used just 4
      (single instruction alignment).
      
      This has a minor absolute effect on the current timings.
      The main hope is that it will make them more consistent from
      run to run.
      
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkBinaryTree17    4222117400   4140739800   -1.93%
      BenchmarkFannkuch11      3462631800   3259914400   -5.85%
      BenchmarkGobDecode         20887622     20620222   -1.28%
      BenchmarkGobEncode          9548772      9384886   -1.72%
      BenchmarkGzip                151687       150333   -0.89%
      BenchmarkGunzip                8742         8741   -0.01%
      BenchmarkJSONEncode        62730560     65210990   +3.95%
      BenchmarkJSONDecode       252569180    249394860   -1.26%
      BenchmarkMandelbrot200      5267599      5273394   +0.11%
      BenchmarkRevcomp25M       980813500    996013800   +1.55%
      BenchmarkTemplate         361259100    360620840   -0.18%
      
      R=ken2
      CC=golang-dev
      https://golang.org/cl/6244066
      8820ab5d
    • Russ Cox's avatar
      cmd/6l, cmd/8l: fix chaining bug in jump rewrite · b91cf505
      Russ Cox authored
      The code was inconsistent about when it used
      brchain(x) and when it used x directly, with the result
      that you could end up emitting code for brchain(x) but
      leave the jump pointing at an unemitted x.
      
      R=ken2
      CC=golang-dev
      https://golang.org/cl/6250077
      b91cf505
    • Ivan Krasin's avatar
      compress/flate: fix overflow on 2GB input. Reset hashOffset every 16 MB. · 37f046ba
      Ivan Krasin authored
      This bug has been introduced in the following revision:
      
      changeset:   11404:26dceba5c610
      user:        Ivan Krasin <krasin@golang.org>
      date:        Mon Jan 23 09:19:39 2012 -0500
      summary:     compress/flate: reduce memory pressure at cost of additional arithmetic operation.
      
      This is the review page for that CL: https://golang.org/cl/5555070/
      
      R=rsc, imkrasin
      CC=golang-dev
      https://golang.org/cl/6249067
      37f046ba
    • Mats Lidell's avatar
      go-mode: Works for both GNU-Emacs and XEmacs-21.5 · b8a02560
      Mats Lidell authored
      Fixes some portability issues between the Emacsen.
      
      R=golang-dev, sameer, bradfitz, ryanb
      CC=golang-dev
      https://golang.org/cl/6206043
      b8a02560
    • Rob Pike's avatar
      test/bench/shootout: more speedups · 6f3ffd4d
      Rob Pike authored
      Most significant in mandelbrot, from avoiding MOVSD between registers,
      but there are others.
      
      R=golang-dev, rsc
      CC=golang-dev
      https://golang.org/cl/6258063
      6f3ffd4d
    • Russ Cox's avatar
      cmd/6g: avoid MOVSD between registers · a768de83
      Russ Cox authored
      MOVSD only copies the low half of the packed register pair,
      while MOVAPD copies both halves.  I assume the internal
      register renaming works better with the latter, since it makes
      our code run 25% faster.
      
      Before:
      mandelbrot 16000
              gcc -O2 mandelbrot.c	28.44u 0.00s 28.45r
              gc mandelbrot	44.12u 0.00s 44.13r
              gc_B mandelbrot	44.17u 0.01s 44.19r
      
      After:
      mandelbrot 16000
              gcc -O2 mandelbrot.c	28.22u 0.00s 28.23r
              gc mandelbrot	32.81u 0.00s 32.82r
              gc_B mandelbrot	32.82u 0.00s 32.83r
      
      R=ken2
      CC=golang-dev
      https://golang.org/cl/6248068
      a768de83
    • Russ Cox's avatar
      shootout: make mandelbrot.go more like mandelbrot.c · eb056dbe
      Russ Cox authored
      Surprise! The C code is using floating point values for its counters.
      Its off the critical path, but the Go code and C code are supposed to
      be as similar as possible to make comparisons meaningful.
      
      It doesn't have a significant effect.
      
      R=golang-dev, r
      CC=golang-dev
      https://golang.org/cl/6260058
      eb056dbe
    • Sameer Ajmani's avatar
      A+C: add Mats Lidell. He signed the agreement with the Sweden email · 3806cc7b
      Sameer Ajmani authored
      address, but his changelist is under the Gmail address.
      
      R=golang-dev, rsc
      CC=golang-dev
      https://golang.org/cl/6248069
      3806cc7b
    • Jean-Marc Eurin's avatar
      misc/emacs: Use patch output of gofmt instead of replacing the buffer. · 7b6111a9
      Jean-Marc Eurin authored
      This uses the patch output of gofmt (-d option) and applies each
      chunk to the buffer, instead of replacing the whole buffer.  The
      main advantage is that the undo history is kept across gofmt'ings,
      so it can really be used as a before-save-hook.
      
      R=sameer, sameer
      CC=golang-dev
      https://golang.org/cl/6198047
      7b6111a9
    • Rob Pike's avatar
      test/bench/shootout/timing.log: mandelbrot is restored · ec4d2135
      Rob Pike authored
      R=golang-dev, bradfitz, rsc
      CC=golang-dev
      https://golang.org/cl/6259054
      ec4d2135
    • Joel Sing's avatar
      runtime: always initialise procid on netbsd · deb93b0f
      Joel Sing authored
      The correct procid is needed for unparking LWPs on NetBSD - always
      initialise procid in minit() so that cgo works correctly. The non-cgo
      case already works correctly since procid is initialised via
      lwp_create().
      
      R=golang-dev, rsc
      CC=golang-dev
      https://golang.org/cl/6257071
      deb93b0f
    • Jan Ziak's avatar
      runtime: update field types in preparation for GC changes · 334bf95f
      Jan Ziak authored
      R=rsc, remyoudompheng, minux.ma, ality
      CC=golang-dev
      https://golang.org/cl/6242061
      334bf95f
    • Joel Sing's avatar
      cmd/ld: increase number of ELF sections · 586b6dfa
      Joel Sing authored
      On NetBSD a cgo enabled binary has more than 32 sections - bump NSECTS
      so that we can actually link them successfully.
      
      R=golang-dev, rsc
      CC=golang-dev
      https://golang.org/cl/6261052
      586b6dfa
    • Jan Ziak's avatar
      runtime: hide symbol table from garbage collector · 46d7d5fc
      Jan Ziak authored
      R=rsc
      CC=golang-dev
      https://golang.org/cl/6243059
      46d7d5fc
    • Marcel van Lohuizen's avatar
    • Russ Cox's avatar
      test/bench/go1: add mandelbrot for floating point · cb9759d0
      Russ Cox authored
      R=golang-dev, bradfitz
      CC=golang-dev
      https://golang.org/cl/6244063
      cb9759d0
    • Russ Cox's avatar
      cmd/6g: change sbop swap logic · de96df1b
      Russ Cox authored
      I added the nl->op == OLITERAL case during the recent
      performance round, and while it helps for small integer constants,
      it hurts for floating point constants.  In the Mandelbrot benchmark
      it causes 2*Zr*Zi to compile like Zr*2*Zi:
      
              0x000000000042663d <+249>:	movsd  %xmm6,%xmm0
              0x0000000000426641 <+253>:	movsd  $2,%xmm1
              0x000000000042664a <+262>:	mulsd  %xmm1,%xmm0
              0x000000000042664e <+266>:	mulsd  %xmm5,%xmm0
      
      instead of:
      
              0x0000000000426835 <+276>:	movsd  $2,%xmm0
              0x000000000042683e <+285>:	mulsd  %xmm6,%xmm0
              0x0000000000426842 <+289>:	mulsd  %xmm5,%xmm0
      
      It is unclear why that has such a dramatic performance effect
      in a tight loop, but it's obviously slightly better code, so go with it.
      
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkBinaryTree17    5957470000   5973924000   +0.28%
      BenchmarkFannkuch11      3811295000   3869128000   +1.52%
      BenchmarkGobDecode         26001900     25670500   -1.27%
      BenchmarkGobEncode         12051430     11948590   -0.85%
      BenchmarkGzip                177432       174821   -1.47%
      BenchmarkGunzip               10967        10756   -1.92%
      BenchmarkJSONEncode        78924750     79746900   +1.04%
      BenchmarkJSONDecode       313606400    307081600   -2.08%
      BenchmarkMandelbrot200     13670860      8200725  -40.01%  !!!
      BenchmarkRevcomp25M      1179194000   1206539000   +2.32%
      BenchmarkTemplate         447931200    443948200   -0.89%
      BenchmarkMD5Hash1K             2856         2873   +0.60%
      BenchmarkMD5Hash8K            22083        22029   -0.24%
      
      benchmark                  old MB/s     new MB/s  speedup
      BenchmarkGobDecode            29.52        29.90    1.01x
      BenchmarkGobEncode            63.69        64.24    1.01x
      BenchmarkJSONEncode           24.59        24.33    0.99x
      BenchmarkJSONDecode            6.19         6.32    1.02x
      BenchmarkRevcomp25M          215.54       210.66    0.98x
      BenchmarkTemplate              4.33         4.37    1.01x
      BenchmarkMD5Hash1K           358.54       356.31    0.99x
      BenchmarkMD5Hash8K           370.95       371.86    1.00x
      
      R=ken2
      CC=golang-dev
      https://golang.org/cl/6261051
      de96df1b
    • Nigel Tao's avatar
      image/png: optimize paeth some more. · dbcdce58
      Nigel Tao authored
      filterPaeth takes []byte arguments instead of byte arguments,
      which avoids some redudant computation of the previous pixel
      in the inner loop.
      
      Also eliminate a bounds check in decoding the up filter.
      
      benchmark                       old ns/op    new ns/op    delta
      BenchmarkDecodeGray               3139636      2812531  -10.42%
      BenchmarkDecodeNRGBAGradient     12341520     10971680  -11.10%
      BenchmarkDecodeNRGBAOpaque       10740780      9612455  -10.51%
      BenchmarkDecodePaletted           1819535      1818913   -0.03%
      BenchmarkDecodeRGB                8974695      8178070   -8.88%
      
      R=rsc
      CC=golang-dev
      https://golang.org/cl/6243061
      dbcdce58
    • Alex Brainman's avatar
      994cdcea
    • Rémy Oudompheng's avatar
      runtime: do not unset the special bit after finalization. · 34808787
      Rémy Oudompheng authored
      A block with finalizer might also be profiled. The special bit
      is needed to unregister the block from the profile. It will be
      unset only when the block is freed.
      
      Fixes #3668.
      
      R=golang-dev, rsc
      CC=golang-dev, remy
      https://golang.org/cl/6249066
      34808787
    • Andrew Balholm's avatar
      exp/html: Convert \r and \r\n to \n when tokenizing · 4e0749a4
      Andrew Balholm authored
      Also escape "\r" as "&#13;" when rendering HTML.
      
      Pass 2 additional tests.
      
      R=nigeltao
      CC=golang-dev
      https://golang.org/cl/6260046
      4e0749a4
    • Alex Brainman's avatar
      runtime: handle windows exceptions, even in cgo programs · afe0e97a
      Alex Brainman authored
      Fixes #3543.
      
      R=golang-dev, kardianos, rsc
      CC=golang-dev, hectorchu, vcc.163
      https://golang.org/cl/6245063
      afe0e97a
    • Nigel Tao's avatar
      exp/html: add some tokenizer and parser benchmarks. · 034fa90d
      Nigel Tao authored
      $GOROOT/src/pkg/exp/html/testdata/go1.html is an execution of the
      $GOROOT/doc/go1.html template by godoc.
      
      Sample numbers on my linux,amd64 desktop:
      BenchmarkParser	     500	   4699198 ns/op	  16.63 MB/s
      --- BENCH: BenchmarkParser
              parse_test.go:409: 1 iterations, 14653 mallocs per iteration
              parse_test.go:409: 100 iterations, 14651 mallocs per iteration
              parse_test.go:409: 500 iterations, 14651 mallocs per iteration
      BenchmarkRawLevelTokenizer	    2000	    904957 ns/op	  86.37 MB/s
      --- BENCH: BenchmarkRawLevelTokenizer
              token_test.go:657: 1 iterations, 28 mallocs per iteration
              token_test.go:657: 100 iterations, 28 mallocs per iteration
              token_test.go:657: 2000 iterations, 28 mallocs per iteration
      BenchmarkLowLevelTokenizer	    2000	   1134300 ns/op	  68.91 MB/s
      --- BENCH: BenchmarkLowLevelTokenizer
              token_test.go:657: 1 iterations, 41 mallocs per iteration
              token_test.go:657: 100 iterations, 41 mallocs per iteration
              token_test.go:657: 2000 iterations, 41 mallocs per iteration
      BenchmarkHighLevelTokenizer	    1000	   2096179 ns/op	  37.29 MB/s
      --- BENCH: BenchmarkHighLevelTokenizer
              token_test.go:657: 1 iterations, 6616 mallocs per iteration
              token_test.go:657: 100 iterations, 6616 mallocs per iteration
              token_test.go:657: 1000 iterations, 6616 mallocs per iteration
      
      R=rsc
      CC=andybalholm, golang-dev, r
      https://golang.org/cl/6257067
      034fa90d
  2. 29 May, 2012 12 commits