• Mike Samuel's avatar
    exp/template/html: Grammar rules for HTML comments and special tags. · 1f13423d
    Mike Samuel authored
    Augments type context and adds grammatical rules to handle special HTML constructs:
        <!-- comments -->
        <script>raw text</script>
        <textarea>no tags here</textarea>
    
    This CL does not elide comment content.  I recommend we do that but
    have not done it in this CL.
    
    I used a codesearch tool over a codebase in another template language.
    
    Based on the below I think we should definitely recognize
      <script>, <style>, <textarea>, and <title>
    as each of these appears frequently enough that there are few
    template using apps that do not use most of them.
    
    Of the other special tags,
      <xmp>, <noscript>
    are used but infrequently, and
      <noframe> and friend, <listing>
    do not appear at all.
    
    We could support <xmp> even though it is obsolete in HTML5
    because we already have the machinery, but I suggest we do not
    support noscript since it is a normal tag in some browser
    configurations.
    
    I suggest recognizing and eliding <!-- comments -->
    (but not escaping text spans) as they are widely used to
    embed comments in template source.  Not eliding them increases
    the size of content sent over the network, and risks leaking
    code and project internal details.
    The template language I tested elides them so there are
    no instance of IE conditional compilation directives in the
    codebase but that could be a source of confusion.
    
    The codesearch does the equivalent of
    $ find . -name \*.file-extension \
      | perl -ne 'print "\L$1\n" while s@<([a-z][a-z0-9])@@i' \
      | sort | uniq -c | sort
    
    The 5 uses of <plaintext> seem to be in tricky code and can be ignored.
    The 2 uses of <xmp> appear in the same tricky code and can be ignored.
    I also ignored end tags to avoid biasing against unary
    elements and threw out some nonsense names since since the
    long tail is dominated by uses of < as a comparison operator
    in the template languages expression language.
    
    I have added asterisks next to abnormal elements.
    
      26765 div
       7432 span
       7414 td
       4233 a
       3730 tr
       3238 input
       2102 br
       1756 li
       1755 img
       1674 table
       1388 p
       1311 th
       1064 option
        992 b
        891 label
        714 script *
        519 ul
        446 tbody
        412 button
        381 form
        377 h2
        358 select
        353 strong
        318 h3
        314 body
        303 html
        266 link
        262 textarea *
        261 head
        258 meta
        225 title *
        189 h1
        176 col
        156 style *
        151 hr
        119 iframe
        103 h4
        101 pre
        100 dt
         98 thead
         90 dd
         83 map
         80 i
         69 object
         66 ol
         65 em
         60 param
         60 font
         57 fieldset
         51 string
         51 field
         51 center
         44 bidi
         37 kbd
         35 legend
         30 nobr
         29 dl
         28 var
         26 small
         21 cite
         21 base
         20 embed
         19 colgroup
         12 u
         12 canvas
         10 sup
         10 rect
         10 optgroup
         10 noscript *
          9 wbr
          9 blockquote
          8 tfoot
          8 code
          8 caption
          8 abbr
          7 msg
          6 tt
          6 text
          6 h5
          5 svg
          5 plaintext *
          5 article
          4 shortquote
          4 number
          4 menu
          4 ins
          3 progress
          3 header
          3 content
          3 bool
          3 audio
          3 attribute
          3 acronym
          2 xmp *
          2 overwrite
          2 objects
          2 nobreak
          2 metadata
          2 description
          2 datasource
          2 category
          2 action
    
    R=nigeltao
    CC=golang-dev
    https://golang.org/cl/4964045
    1f13423d
Name
Last commit
Last update
doc Loading commit data...
include Loading commit data...
lib Loading commit data...
misc Loading commit data...
src Loading commit data...
test Loading commit data...
.hgignore Loading commit data...
.hgtags Loading commit data...
AUTHORS Loading commit data...
CONTRIBUTORS Loading commit data...
LICENSE Loading commit data...
PATENTS Loading commit data...
README Loading commit data...
favicon.ico Loading commit data...
robots.txt Loading commit data...