• Nigel Tao's avatar
    go.net/publicsuffix: tighten the encoding from 8 bytes per node to 4. · 0f34b776
    Nigel Tao authored
    On the full list (running gen.go with -subset=false):
    
    Before, there were 6086 nodes (at 8 bytes per node) before. After,
    there were 6086 nodes (at 4 bytes per node) plus 354 children entries
    (at 4 bytes per node). The difference is 22928 bytes.
    
    In comparison, the (crushed) text is 21082 bytes, and for the curious,
    the longest label is 36 bytes: "xn--correios-e-telecomunicaes-ghc29a".
    
    All 32 bits in the nodes table are used, but there's wiggle room to
    accomodate future changes to effective_tld_names.dat:
    
    The largest children index is 353 (in 9 bits, so max is 511).
    The largest node type is 2 (in 2 bits, so max is 3).
    The largest text offset is 21080 (in 15 bits, so max is 32767).
    The largest text length is 36 (in 6 bits, so max is 63).
    
    benchmark                old ns/op    new ns/op    delta
    BenchmarkPublicSuffix        19948        19744   -1.02%
    
    R=dr.volker.dobler
    CC=golang-dev
    https://golang.org/cl/6999045
    0f34b776
list_test.go 6.94 KB