• Ilya Tocar's avatar
    compress/flate: optimize huffSym · 4984d843
    Ilya Tocar authored
    By using local variables and assigning them back to decompressor
    at the end of huffSym, we allow compiler to keep them in registers
    and avoid reloading/storing them repeatedly. To archive this,
    moreBits was inlined and specialized to work with local variables.
    Also move EOF error conversion to helper function, to make inlined
    part of moreBits more readable. Together this results in nice speed-up:
    
    name                             old time/op    new time/op    delta
    Decode/Digits/Huffman/1e4-6         278µs ± 1%     240µs ± 2%  -13.72%  (p=0.000 n=10+10)
    Decode/Digits/Huffman/1e5-6        2.38ms ± 1%    2.05ms ± 1%  -14.12%  (p=0.000 n=10+10)
    Decode/Digits/Huffman/1e6-6        23.4ms ± 1%    19.9ms ± 0%  -14.69%  (p=0.000 n=9+9)
    Decode/Digits/Speed/1e4-6           280µs ± 2%     254µs ± 1%   -9.28%  (p=0.000 n=10+9)
    Decode/Digits/Speed/1e5-6          2.53ms ± 1%    2.35ms ± 1%   -7.17%  (p=0.000 n=10+10)
    Decode/Digits/Speed/1e6-6          24.8ms ± 1%    23.0ms ± 1%   -7.22%  (p=0.000 n=10+10)
    Decode/Digits/Default/1e4-6         281µs ± 2%     259µs ± 3%   -8.03%  (p=0.000 n=10+10)
    Decode/Digits/Default/1e5-6        2.45ms ± 1%    2.30ms ± 1%   -6.15%  (p=0.000 n=10+10)
    Decode/Digits/Default/1e6-6        24.1ms ± 1%    22.6ms ± 0%   -6.31%  (p=0.000 n=9+9)
    Decode/Digits/Compression/1e4-6     279µs ± 2%     261µs ± 2%   -6.53%  (p=0.000 n=8+9)
    Decode/Digits/Compression/1e5-6    2.44ms ± 1%    2.30ms ± 1%   -5.72%  (p=0.000 n=10+9)
    Decode/Digits/Compression/1e6-6    24.0ms ± 1%    22.6ms ± 0%   -6.10%  (p=0.000 n=10+9)
    Decode/Twain/Huffman/1e4-6          316µs ± 2%     267µs ± 3%  -15.30%  (p=0.000 n=9+10)
    Decode/Twain/Huffman/1e5-6         2.62ms ± 0%    2.22ms ± 0%  -15.24%  (p=0.000 n=10+10)
    Decode/Twain/Huffman/1e6-6         25.7ms ± 1%    21.8ms ± 0%  -15.19%  (p=0.000 n=10+10)
    Decode/Twain/Speed/1e4-6            290µs ± 1%     264µs ± 2%   -9.17%  (p=0.000 n=9+10)
    Decode/Twain/Speed/1e5-6           2.35ms ± 1%    2.13ms ± 1%   -9.74%  (p=0.000 n=9+10)
    Decode/Twain/Speed/1e6-6           22.9ms ± 0%    20.7ms ± 0%   -9.68%  (p=0.000 n=10+9)
    Decode/Twain/Default/1e4-6          270µs ± 2%     252µs ± 2%   -6.67%  (p=0.000 n=9+10)
    Decode/Twain/Default/1e5-6         2.02ms ± 1%    1.84ms ± 1%   -8.85%  (p=0.000 n=10+10)
    Decode/Twain/Default/1e6-6         19.1ms ± 0%    17.5ms ± 1%   -8.73%  (p=0.000 n=9+10)
    Decode/Twain/Compression/1e4-6      272µs ± 1%     250µs ± 4%   -8.20%  (p=0.000 n=9+10)
    Decode/Twain/Compression/1e5-6     2.01ms ± 0%    1.84ms ± 1%   -8.57%  (p=0.000 n=9+10)
    Decode/Twain/Compression/1e6-6     19.1ms ± 0%    17.4ms ± 1%   -8.75%  (p=0.000 n=9+10)
    
    name                             old speed      new speed      delta
    Decode/Digits/Huffman/1e4-6      35.9MB/s ± 1%  41.7MB/s ± 2%  +15.91%  (p=0.000 n=10+10)
    Decode/Digits/Huffman/1e5-6      41.9MB/s ± 1%  48.8MB/s ± 1%  +16.44%  (p=0.000 n=10+10)
    Decode/Digits/Huffman/1e6-6      42.8MB/s ± 1%  50.2MB/s ± 0%  +17.22%  (p=0.000 n=9+9)
    Decode/Digits/Speed/1e4-6        35.7MB/s ± 2%  39.4MB/s ± 1%  +10.22%  (p=0.000 n=10+9)
    Decode/Digits/Speed/1e5-6        39.6MB/s ± 1%  42.6MB/s ± 1%   +7.73%  (p=0.000 n=10+10)
    Decode/Digits/Speed/1e6-6        40.3MB/s ± 1%  43.4MB/s ± 1%   +7.78%  (p=0.000 n=10+10)
    Decode/Digits/Default/1e4-6      35.6MB/s ± 2%  38.7MB/s ± 2%   +8.74%  (p=0.000 n=10+10)
    Decode/Digits/Default/1e5-6      40.9MB/s ± 1%  43.6MB/s ± 1%   +6.55%  (p=0.000 n=10+10)
    Decode/Digits/Default/1e6-6      41.5MB/s ± 1%  44.3MB/s ± 0%   +6.73%  (p=0.000 n=9+9)
    Decode/Digits/Compression/1e4-6  35.8MB/s ± 2%  38.3MB/s ± 2%   +6.99%  (p=0.000 n=8+9)
    Decode/Digits/Compression/1e5-6  40.9MB/s ± 1%  43.4MB/s ± 1%   +6.07%  (p=0.000 n=10+9)
    Decode/Digits/Compression/1e6-6  41.6MB/s ± 1%  44.3MB/s ± 0%   +6.49%  (p=0.000 n=10+9)
    Decode/Twain/Huffman/1e4-6       31.7MB/s ± 2%  37.4MB/s ± 3%  +18.08%  (p=0.000 n=9+10)
    Decode/Twain/Huffman/1e5-6       38.2MB/s ± 0%  45.0MB/s ± 0%  +17.97%  (p=0.000 n=10+10)
    Decode/Twain/Huffman/1e6-6       38.9MB/s ± 1%  45.9MB/s ± 0%  +17.90%  (p=0.000 n=10+10)
    Decode/Twain/Speed/1e4-6         34.5MB/s ± 1%  38.0MB/s ± 2%  +10.11%  (p=0.000 n=9+10)
    Decode/Twain/Speed/1e5-6         42.5MB/s ± 1%  47.0MB/s ± 1%  +10.79%  (p=0.000 n=9+10)
    Decode/Twain/Speed/1e6-6         43.7MB/s ± 0%  48.3MB/s ± 0%  +10.72%  (p=0.000 n=10+9)
    Decode/Twain/Default/1e4-6       37.1MB/s ± 2%  39.8MB/s ± 2%   +7.15%  (p=0.000 n=9+10)
    Decode/Twain/Default/1e5-6       49.5MB/s ± 1%  54.3MB/s ± 1%   +9.71%  (p=0.000 n=10+10)
    Decode/Twain/Default/1e6-6       52.3MB/s ± 0%  57.3MB/s ± 1%   +9.57%  (p=0.000 n=9+10)
    Decode/Twain/Compression/1e4-6   36.7MB/s ± 1%  40.0MB/s ± 4%   +8.96%  (p=0.000 n=9+10)
    Decode/Twain/Compression/1e5-6   49.8MB/s ± 0%  54.5MB/s ± 1%   +9.38%  (p=0.000 n=9+10)
    Decode/Twain/Compression/1e6-6   52.3MB/s ± 0%  57.3MB/s ± 1%   +9.58%  (p=0.000 n=9+10)
    
    Change-Id: Iabfd285535ddb210f7f48f33317c6463b5532400
    Reviewed-on: https://go-review.googlesource.com/102235
    Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
    4984d843
Name
Last commit
Last update
..
bzip2 Loading commit data...
flate Loading commit data...
gzip Loading commit data...
lzw Loading commit data...
testdata Loading commit data...
zlib Loading commit data...