• Marcel van Lohuizen's avatar
    exp/norm: changed API of Iter. · cfcc3ebf
    Marcel van Lohuizen authored
    Motivations:
     - Simpler UI. Previous API proved a bit awkward for practical purposes.
     - Iter is often used in cases where one want to be able to bail out early.
       The old implementaton had too much look-ahead to be efficient.
    Disadvantages:
     - ASCII performance is bad. This is unavoidable for tiny iterations.
       Example is included to show how to work around this.
    
    Description:
    Iter now iterates per boundary/segment. It returns a slice of bytes that
    either points to the input bytes, the internal decomposition strings,
    or the small internal buffer that each iterator has. In many cases, copying
    bytes is avoided.
    The method Seek was added to support jumping around the input without
    having to reinitialize.
    
    Details:
     - Table adjustments: some decompositions exist of multiple segments.
       Decompositions that are of this type are now marked so that Iter can
       handle them separately.
     - The old iterator had a different next function for different normal forms
       that was assigned to a function pointer called by Next.
       The new iterator uses this mechanism to switch between different modes
       for handling different type of input as well.  This greatly improves
       performance for Hangul and ASCII. It is also used for multi-segment
       decompositions.
     - input is now a struct of sting and []byte, instead of an interface.
       This simplifies optimizing the ASCII case.
    
    R=rsc
    CC=golang-dev
    https://golang.org/cl/6873072
    cfcc3ebf
iter_test.go 5.46 KB