• Marcel van Lohuizen's avatar
    exp/norm: added Reader and Writer and bug fixes to support these. · 25171439
    Marcel van Lohuizen authored
    Needed to ensure that finding the last boundary does not result in O(n^2)-like behavior.
    Now prevents lookbacks beyond 31 characters across the board (starter + 30 non-starters).
    composition.go:
    - maxCombiningCharacters now means exactly that.
    - Bug fix.
    - Small performance improvement/ made code consistent with other code.
    forminfo.go:
    - Bug fix: ccc needs to be 0 for inert runes.
    normalize.go:
    - A few bug fixes.
    - Limit the amount of combining characters considered in FirstBoundary.
    - Ditto for LastBoundary.
    - Changed semantics of LastBoundary to not consider trailing illegal runes a boundary
      as long as adding bytes might still make them legal.
    trie.go:
    - As utf8.UTFMax is 4, we should treat UTF-8 encodings of size 5 or greater as illegal.
      This has no impact on the normalization process, but it prevents buffer overflows
      where we expect at most UTFMax bytes.
    
    R=r
    CC=golang-dev
    https://golang.org/cl/4963041
    25171439
readwriter_test.go 1.8 KB