• Marcel van Lohuizen's avatar
    exp/locale/collate: include composed characters into the table. This eliminates · 9aa70984
    Marcel van Lohuizen authored
    the need to decompose characters for the majority of cases.  This considerably
    speeds up collation while increasing the table size minimally.
    
    To detect non-normalized strings, rather than relying on exp/norm, the table
    now includes CCC information. The inclusion of this information does not
    increase table size.
    
    DETAILS
     - Raw collation elements are now a struct that includes the CCC, rather
       than a slice of ints.
     - Builder now ensures that NFD and NFC counterparts are included in the table.
       This also fixes a bug for Korean which is responsible for most of the growth
       of the table size.
     - As there is no more normalization step, code should now handle both strings
       and byte slices as input. Introduced source type to facilitate this.
    
    NOTES
     - This change does not handle normalization correctly entirely for contractions.
       This causes a few failures with the regtest. table_test.go contains a few
       uncommented tests that can be enabled once this is fixed.  The easiest is to
       fix this once we have the new norm.Iter.
     - Removed a test cases in table_test that covers cases that are now guaranteed
       to not exist.
    
    R=rsc, mpvl
    CC=golang-dev
    https://golang.org/cl/6971044
    9aa70984
trie.go 3.25 KB