-
Marcel van Lohuizen authored
the need to decompose characters for the majority of cases. This considerably speeds up collation while increasing the table size minimally. To detect non-normalized strings, rather than relying on exp/norm, the table now includes CCC information. The inclusion of this information does not increase table size. DETAILS - Raw collation elements are now a struct that includes the CCC, rather than a slice of ints. - Builder now ensures that NFD and NFC counterparts are included in the table. This also fixes a bug for Korean which is responsible for most of the growth of the table size. - As there is no more normalization step, code should now handle both strings and byte slices as input. Introduced source type to facilitate this. NOTES - This change does not handle normalization correctly entirely for contractions. This causes a few failures with the regtest. table_test.go contains a few uncommented tests that can be enabled once this is fixed. The easiest is to fix this once we have the new norm.Iter. - Removed a test cases in table_test that covers cases that are now guaranteed to not exist. R=rsc, mpvl CC=golang-dev https://golang.org/cl/6971044
9aa70984