• David Chase's avatar
    cmd/compile: use sparse algorithm for phis in large program · 6b99fb5b
    David Chase authored
    This adds a sparse method for locating nearest ancestors
    in a dominator tree, and checks blocks with more than one
    predecessor for differences and inserts phi functions where
    there are.
    
    Uses reversed post order to cut number of passes, running
    it from first def to last use ("last use" for paramout and
    mem is end-of-program; last use for a phi input from a
    backedge is the source of the back edge)
    
    Includes a cutover from old algorithm to new to avoid paying
    large constant factor for small programs.  This keeps normal
    builds running at about the same time, while not running
    over-long on large machine-generated inputs.
    
    Add "phase" flags for ssa/build -- ssa/build/stats prints
    number of blocks, values (before and after linking references
    and inserting phis, so expansion can be measured), and their
    product; the product governs the cutover, where a good value
    seems to be somewhere between 1 and 5 million.
    
    Among the files compiled by make.bash, this is the shape of
    the tail of the distribution for #blocks, #vars, and their
    product:
    
    	 #blocks	#vars	    product
     max	6171	28180	173,898,780
    99.9%	1641	 6548	 10,401,878
      99%	 463	 1909	    873,721
      95%	 152	  639	     95,235
      90%	  84	  359	     30,021
    
    The old algorithm is indeed usually fastest, for 99%ile
    values of usually.
    
    The fix to LookupVarOutgoing
    ( https://go-review.googlesource.com/#/c/22790/ )
    deals with some of the same problems addressed by this CL,
    but on at least one bug ( #15537 ) this change is still
    a significant help.
    
    With this CL:
    /tmp/gopath$ rm -rf pkg bin
    /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
       github.com/gogo/protobuf/test/theproto3/combos/...
    ...
    real	4m35.200s
    user	13m16.644s
    sys	0m36.712s
    and pprof reports 3.4GB allocated in one of the larger profiles
    
    With tip:
    /tmp/gopath$ rm -rf pkg bin
    /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
       github.com/gogo/protobuf/test/theproto3/combos/...
    ...
    real	10m36.569s
    user	25m52.286s
    sys	4m3.696s
    and pprof reports 8.3GB allocated in the same larger profile
    
    With this CL, most of the compilation time on the benchmarked
    input is spent in register/stack allocation (cumulative 53%)
    and in the sparse lookup algorithm itself (cumulative 20%).
    
    Fixes #15537.
    
    Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
    Reviewed-on: https://go-review.googlesource.com/22342Reviewed-by: 's avatarKeith Randall <khr@golang.org>
    Run-TryBot: David Chase <drchase@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    6b99fb5b
prove.go 15 KB