• Marcel van Lohuizen's avatar
    exp/locale/collate: include composed characters into the table. This eliminates · 9aa70984
    Marcel van Lohuizen authored
    the need to decompose characters for the majority of cases.  This considerably
    speeds up collation while increasing the table size minimally.
    
    To detect non-normalized strings, rather than relying on exp/norm, the table
    now includes CCC information. The inclusion of this information does not
    increase table size.
    
    DETAILS
     - Raw collation elements are now a struct that includes the CCC, rather
       than a slice of ints.
     - Builder now ensures that NFD and NFC counterparts are included in the table.
       This also fixes a bug for Korean which is responsible for most of the growth
       of the table size.
     - As there is no more normalization step, code should now handle both strings
       and byte slices as input. Introduced source type to facilitate this.
    
    NOTES
     - This change does not handle normalization correctly entirely for contractions.
       This causes a few failures with the regtest. table_test.go contains a few
       uncommented tests that can be enabled once this is fixed.  The easiest is to
       fix this once we have the new norm.Iter.
     - Removed a test cases in table_test that covers cases that are now guaranteed
       to not exist.
    
    R=rsc, mpvl
    CC=golang-dev
    https://golang.org/cl/6971044
    9aa70984
Name
Last commit
Last update
api Loading commit data...
doc Loading commit data...
include Loading commit data...
lib Loading commit data...
misc Loading commit data...
src Loading commit data...
test Loading commit data...
.hgignore Loading commit data...
.hgtags Loading commit data...
AUTHORS Loading commit data...
CONTRIBUTORS Loading commit data...
LICENSE Loading commit data...
PATENTS Loading commit data...
README Loading commit data...
favicon.ico Loading commit data...
robots.txt Loading commit data...