• Ben Shi's avatar
    cmd/compile: optimize 386 code with FLDPI · e03220a5
    Ben Shi authored
    FLDPI pushes the constant pi to 387's register stack, which is
    more efficient than MOVSSconst/MOVSDconst.
    
    1. This optimization reduces 0.3KB of the total size of pkg/linux_386
    (exlcuding cmd/compile).
    
    2. There is little regression in the go1 benchmark.
    name                     old time/op    new time/op    delta
    BinaryTree17-4              3.30s ± 3%     3.30s ± 2%    ~     (p=0.759 n=40+39)
    Fannkuch11-4                3.53s ± 1%     3.54s ± 1%    ~     (p=0.168 n=40+40)
    FmtFprintfEmpty-4          45.5ns ± 3%    45.6ns ± 3%    ~     (p=0.553 n=40+40)
    FmtFprintfString-4         78.4ns ± 3%    78.3ns ± 3%    ~     (p=0.593 n=40+40)
    FmtFprintfInt-4            88.8ns ± 2%    89.9ns ± 2%    ~     (p=0.083 n=40+33)
    FmtFprintfIntInt-4          140ns ± 4%     140ns ± 4%    ~     (p=0.656 n=40+40)
    FmtFprintfPrefixedInt-4     180ns ± 2%     181ns ± 3%  +0.53%  (p=0.050 n=40+40)
    FmtFprintfFloat-4           408ns ± 4%     411ns ± 3%    ~     (p=0.112 n=40+40)
    FmtManyArgs-4               599ns ± 3%     602ns ± 3%    ~     (p=0.784 n=40+40)
    GobDecode-4                7.24ms ± 6%    7.30ms ± 5%    ~     (p=0.171 n=40+40)
    GobEncode-4                6.98ms ± 5%    6.89ms ± 8%    ~     (p=0.107 n=40+40)
    Gzip-4                      396ms ± 4%     396ms ± 3%    ~     (p=0.852 n=40+40)
    Gunzip-4                   41.3ms ± 3%    41.5ms ± 4%    ~     (p=0.221 n=40+40)
    HTTPClientServer-4         63.4µs ± 3%    63.4µs ± 2%    ~     (p=0.895 n=39+40)
    JSONEncode-4               17.5ms ± 2%    17.5ms ± 3%    ~     (p=0.090 n=40+40)
    JSONDecode-4               60.6ms ± 3%    60.1ms ± 4%    ~     (p=0.184 n=40+40)
    Mandelbrot200-4            7.80ms ± 3%    7.78ms ± 2%    ~     (p=0.512 n=40+40)
    GoParse-4                  3.30ms ± 3%    3.28ms ± 2%  -0.61%  (p=0.034 n=40+40)
    RegexpMatchEasy0_32-4       104ns ± 4%     103ns ± 4%    ~     (p=0.118 n=40+40)
    RegexpMatchEasy0_1K-4       850ns ± 2%     848ns ± 2%    ~     (p=0.370 n=40+40)
    RegexpMatchEasy1_32-4       112ns ± 4%     112ns ± 4%    ~     (p=0.848 n=40+40)
    RegexpMatchEasy1_1K-4      1.04µs ± 4%    1.03µs ± 4%    ~     (p=0.333 n=40+40)
    RegexpMatchMedium_32-4      132ns ± 4%     131ns ± 3%    ~     (p=0.527 n=40+40)
    RegexpMatchMedium_1K-4     43.4µs ± 3%    43.5µs ± 3%    ~     (p=0.111 n=40+40)
    RegexpMatchHard_32-4       2.24µs ± 4%    2.24µs ± 4%    ~     (p=0.441 n=40+40)
    RegexpMatchHard_1K-4       67.9µs ± 3%    68.0µs ± 3%    ~     (p=0.095 n=40+40)
    Revcomp-4                   1.84s ± 2%     1.84s ± 2%    ~     (p=0.677 n=40+40)
    Template-4                 68.4ms ± 3%    68.6ms ± 3%    ~     (p=0.345 n=40+40)
    TimeParse-4                 433ns ± 3%     433ns ± 3%    ~     (p=0.403 n=40+40)
    TimeFormat-4                407ns ± 3%     406ns ± 3%    ~     (p=0.900 n=40+40)
    [Geo mean]                 67.1µs         67.2µs       +0.04%
    
    name                     old speed      new speed      delta
    GobDecode-4               106MB/s ± 5%   105MB/s ± 5%    ~     (p=0.173 n=40+40)
    GobEncode-4               110MB/s ± 5%   112MB/s ± 9%    ~     (p=0.104 n=40+40)
    Gzip-4                   49.0MB/s ± 4%  49.1MB/s ± 4%    ~     (p=0.836 n=40+40)
    Gunzip-4                  471MB/s ± 3%   468MB/s ± 4%    ~     (p=0.218 n=40+40)
    JSONEncode-4              111MB/s ± 2%   111MB/s ± 3%    ~     (p=0.090 n=40+40)
    JSONDecode-4             32.0MB/s ± 3%  32.3MB/s ± 4%    ~     (p=0.194 n=40+40)
    GoParse-4                17.6MB/s ± 3%  17.7MB/s ± 2%  +0.62%  (p=0.035 n=40+40)
    RegexpMatchEasy0_32-4     307MB/s ± 4%   309MB/s ± 4%  +0.70%  (p=0.041 n=40+40)
    RegexpMatchEasy0_1K-4    1.20GB/s ± 3%  1.21GB/s ± 2%    ~     (p=0.353 n=40+40)
    RegexpMatchEasy1_32-4     285MB/s ± 3%   284MB/s ± 4%    ~     (p=0.384 n=40+40)
    RegexpMatchEasy1_1K-4     988MB/s ± 4%   992MB/s ± 3%    ~     (p=0.335 n=40+40)
    RegexpMatchMedium_32-4   7.56MB/s ± 4%  7.57MB/s ± 4%    ~     (p=0.314 n=40+40)
    RegexpMatchMedium_1K-4   23.6MB/s ± 3%  23.6MB/s ± 3%    ~     (p=0.107 n=40+40)
    RegexpMatchHard_32-4     14.3MB/s ± 4%  14.3MB/s ± 4%    ~     (p=0.429 n=40+40)
    RegexpMatchHard_1K-4     15.1MB/s ± 3%  15.1MB/s ± 3%    ~     (p=0.099 n=40+40)
    Revcomp-4                 138MB/s ± 2%   138MB/s ± 2%    ~     (p=0.658 n=40+40)
    Template-4               28.4MB/s ± 3%  28.3MB/s ± 3%    ~     (p=0.331 n=40+40)
    [Geo mean]               80.8MB/s       80.8MB/s       +0.09%
    
    Change-Id: I0cb715eead68ade097a302e7fb80ccbd1d1b511e
    Reviewed-on: https://go-review.googlesource.com/130975
    Run-TryBot: Ben Shi <powerman1st@163.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarKeith Randall <khr@golang.org>
    e03220a5
387.go 9.24 KB