• Klaus Post's avatar
    compress/flate: use uncompressed if dynamic encoding is larger · 6ec481b0
    Klaus Post authored
    This adds size calculation to "dynamic" writes.
    This ensures that if dynamic Huffman encoding is bigger,
    or only slightly smaller than raw data, the block is written
    uncompressed.
    
    To minimize the code duplication of this function, the
    size calculation has been moved to separate functions.
    
    Since I was modifying these calculations, I changed "int64"
    size calculations to "int". Blocks are of very limited size,
    so there is not any risk of overflows.
    This should mainly improve 32 bit performance, but amd64 also
    gets a slight boost:
    
    name                       old time/op    new time/op    delta
    EncodeDigitsHuffman1e4-8     49.9µs ± 1%    49.3µs ± 1%  -1.21%  (p=0.000 n=10+10)
    EncodeDigitsHuffman1e5-8      476µs ± 1%     471µs ± 3%    ~     (p=0.218 n=10+10)
    EncodeDigitsHuffman1e6-8     4.80ms ± 2%    4.75ms ± 2%    ~      (p=0.243 n=10+9)
    EncodeDigitsSpeed1e4-8        305µs ± 3%     300µs ± 1%  -1.86%  (p=0.005 n=10+10)
    EncodeDigitsSpeed1e5-8       3.67ms ± 2%    3.58ms ± 1%  -2.29%    (p=0.000 n=9+8)
    EncodeDigitsSpeed1e6-8       38.3ms ± 2%    37.0ms ± 1%  -3.45%    (p=0.000 n=9+9)
    EncodeDigitsDefault1e4-8      361µs ± 2%     353µs ± 1%  -2.21%  (p=0.000 n=10+10)
    EncodeDigitsDefault1e5-8     5.24ms ± 2%    5.19ms ± 2%    ~     (p=0.105 n=10+10)
    EncodeDigitsDefault1e6-8     56.5ms ± 3%    55.1ms ± 1%  -2.42%  (p=0.001 n=10+10)
    EncodeDigitsCompress1e4-8     362µs ± 2%     358µs ± 2%    ~     (p=0.123 n=10+10)
    EncodeDigitsCompress1e5-8    5.26ms ± 3%    5.20ms ± 1%    ~     (p=0.089 n=10+10)
    EncodeDigitsCompress1e6-8    56.0ms ± 4%    55.0ms ± 1%    ~      (p=0.065 n=10+9)
    EncodeTwainHuffman1e4-8      70.9µs ± 3%    67.6µs ± 2%  -4.59%  (p=0.000 n=10+10)
    EncodeTwainHuffman1e5-8       556µs ± 2%     533µs ± 1%  -4.20%  (p=0.000 n=10+10)
    EncodeTwainHuffman1e6-8      5.54ms ± 3%    5.29ms ± 1%  -4.37%   (p=0.000 n=10+9)
    EncodeTwainSpeed1e4-8         294µs ± 3%     293µs ± 1%    ~      (p=0.965 n=10+8)
    EncodeTwainSpeed1e5-8        2.59ms ± 2%    2.56ms ± 1%    ~     (p=0.353 n=10+10)
    EncodeTwainSpeed1e6-8        25.6ms ± 1%    24.9ms ± 1%  -2.62%   (p=0.000 n=9+10)
    EncodeTwainDefault1e4-8       419µs ± 2%     417µs ± 1%    ~      (p=0.780 n=10+9)
    EncodeTwainDefault1e5-8      6.23ms ± 4%    6.16ms ± 1%    ~     (p=0.218 n=10+10)
    EncodeTwainDefault1e6-8      66.2ms ± 2%    65.7ms ± 1%    ~     (p=0.529 n=10+10)
    EncodeTwainCompress1e4-8      426µs ± 1%     428µs ± 2%    ~      (p=0.549 n=9+10)
    EncodeTwainCompress1e5-8     6.80ms ± 1%    6.85ms ± 3%    ~      (p=0.156 n=9+10)
    EncodeTwainCompress1e6-8     74.6ms ± 3%    73.8ms ± 2%    ~     (p=0.280 n=10+10)
    
    name                       old speed      new speed      delta
    EncodeDigitsHuffman1e4-8    200MB/s ± 1%   203MB/s ± 1%  +1.23%  (p=0.000 n=10+10)
    EncodeDigitsHuffman1e5-8    210MB/s ± 1%   212MB/s ± 3%    ~      (p=0.356 n=10+9)
    EncodeDigitsHuffman1e6-8    208MB/s ± 2%   210MB/s ± 2%    ~      (p=0.243 n=10+9)
    EncodeDigitsSpeed1e4-8     32.8MB/s ± 3%  33.4MB/s ± 1%  +1.88%  (p=0.005 n=10+10)
    EncodeDigitsSpeed1e5-8     27.2MB/s ± 2%  27.9MB/s ± 1%  +2.60%   (p=0.000 n=10+8)
    EncodeDigitsSpeed1e6-8     26.1MB/s ± 2%  27.0MB/s ± 1%  +3.56%    (p=0.000 n=9+9)
    EncodeDigitsDefault1e4-8   27.7MB/s ± 2%  28.4MB/s ± 1%  +2.24%  (p=0.000 n=10+10)
    EncodeDigitsDefault1e5-8   19.1MB/s ± 2%  19.3MB/s ± 2%    ~     (p=0.101 n=10+10)
    EncodeDigitsDefault1e6-8   17.7MB/s ± 3%  18.1MB/s ± 1%  +2.46%  (p=0.001 n=10+10)
    EncodeDigitsCompress1e4-8  27.6MB/s ± 2%  27.9MB/s ± 2%    ~     (p=0.119 n=10+10)
    EncodeDigitsCompress1e5-8  19.0MB/s ± 3%  19.2MB/s ± 1%    ~     (p=0.085 n=10+10)
    EncodeDigitsCompress1e6-8  17.9MB/s ± 4%  18.1MB/s ± 3%    ~     (p=0.110 n=10+10)
    EncodeTwainHuffman1e4-8     141MB/s ± 3%   148MB/s ± 2%  +4.79%  (p=0.000 n=10+10)
    EncodeTwainHuffman1e5-8     180MB/s ± 2%   188MB/s ± 1%  +4.38%  (p=0.000 n=10+10)
    EncodeTwainHuffman1e6-8     181MB/s ± 3%   189MB/s ± 1%  +4.54%   (p=0.000 n=10+9)
    EncodeTwainSpeed1e4-8      34.0MB/s ± 3%  34.1MB/s ± 1%    ~      (p=0.948 n=10+8)
    EncodeTwainSpeed1e5-8      38.7MB/s ± 2%  39.0MB/s ± 1%    ~     (p=0.353 n=10+10)
    EncodeTwainSpeed1e6-8      39.1MB/s ± 1%  40.1MB/s ± 1%  +2.68%   (p=0.000 n=9+10)
    EncodeTwainDefault1e4-8    23.9MB/s ± 2%  24.0MB/s ± 1%    ~      (p=0.734 n=10+9)
    EncodeTwainDefault1e5-8    16.0MB/s ± 4%  16.2MB/s ± 1%    ~     (p=0.210 n=10+10)
    EncodeTwainDefault1e6-8    15.1MB/s ± 2%  15.2MB/s ± 1%    ~     (p=0.515 n=10+10)
    EncodeTwainCompress1e4-8   23.5MB/s ± 1%  23.4MB/s ± 2%    ~      (p=0.536 n=9+10)
    EncodeTwainCompress1e5-8   14.7MB/s ± 1%  14.6MB/s ± 3%    ~      (p=0.138 n=9+10)
    EncodeTwainCompress1e6-8   13.4MB/s ± 3%  13.5MB/s ± 2%    ~     (p=0.239 n=10+10)
    
    This improves "random input" to the dynamic writer, which is why the test data is
    updated. The output size goes from 1051 to 1005 bytes.
    
    Change-Id: I3ee11d2d2511b277d2dd16734aeea07c98bca450
    Reviewed-on: https://go-review.googlesource.com/21757Reviewed-by: 's avatarJoe Tsai <joetsai@digital-static.net>
    Run-TryBot: Joe Tsai <joetsai@digital-static.net>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: 's avatarNigel Tao <nigeltao@golang.org>
    6ec481b0
huffman_bit_writer.go 18.4 KB