• Martin Möhrmann's avatar
    strings: correctly handle invalid utf8 sequences in Map · f74de24f
    Martin Möhrmann authored
    When an invalid UTF-8 byte sequence is decoded in a range loop over a string
    a utf8.RuneError rune is returned. This is not distinguishable from decoding
    the valid '\uFFFD' sequence representing utf8.RuneError from a string without
    further checks within the range loop.
    
    The previous Map code did not do any extra checks and would thereby not map
    invalid UTF-8 byte sequences correctly when those were mapping to utf8.RuneError.
    
    Fix this by adding the extra checks necessary to distinguish the decoding
    of invalid utf8 byte sequences from decoding the sequence for utf8.RuneError
    when the mapping of a rune is utf8.RuneError.
    
    This fix does not result in a measureable performance regression:
    name                old time/op  new time/op  delta
    ByteByteMap         1.05µs ± 3%  1.03µs ± 3%   ~     (p=0.118 n=10+10)
    Map/identity/ASCII   169ns ± 2%   170ns ± 1%   ~     (p=0.501 n=9+10)
    Map/identity/Greek   298ns ± 1%   303ns ± 4%   ~     (p=0.338 n=10+10)
    Map/change/ASCII     323ns ± 3%   325ns ± 4%   ~     (p=0.679 n=8+10)
    Map/change/Greek     628ns ± 5%   635ns ± 1%   ~     (p=0.460 n=10+9)
    MapNoChanges         120ns ± 4%   119ns ± 1%   ~     (p=0.496 n=10+9)
    
    Fixes #26305
    
    Change-Id: I70e99fa244983c5040756fa4549ac1e8cb6022c3
    Reviewed-on: https://go-review.googlesource.com/c/131495Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
    f74de24f
strings_test.go 44.5 KB