ó ŒOc@sFdZddlZddlmZddlmZddlmZddl m Z m Z y e Z Wnek reefZ nXdefd„ƒYZydd lmZWnek r¿n Xd efd „ƒYZeƒZd „Zedd „Zeedd„Zeedd„Zedd„Zedd„ZeƒZdS(s? An interface to html5lib that mimics the lxml.html interface. iÿÿÿÿN(t HTMLParser(t TreeBuilder(tetree(t_contains_block_level_tagtXHTML_NAMESPACERcBseZdZed„ZRS(s*An html5lib HTML parser with lxml as tree.cKs tj|d|dt|dS(Ntstrictttree(t _HTMLParsert__init__R(tselfRtkwargs((s;/usr/lib/python2.7/vendor-packages/lxml/html/html5parser.pyRs(t__name__t __module__t__doc__tFalseR(((s;/usr/lib/python2.7/vendor-packages/lxml/html/html5parser.pyRs(t XHTMLParserRcBseZdZed„ZRS(s+An html5lib XHTML Parser with lxml as tree.cKs tj|d|dt|dS(NRR(t _XHTMLParserRR(R RR ((s;/usr/lib/python2.7/vendor-packages/lxml/html/html5parser.pyR#s(R R R RR(((s;/usr/lib/python2.7/vendor-packages/lxml/html/html5parser.pyR scCs6|j|ƒ}|dk r|S|jdt|fƒS(Ns{%s}%s(tfindtNoneR(Rttagtelem((s;/usr/lib/python2.7/vendor-packages/lxml/html/html5parser.pyt _find_tag)s cCsLt|tƒstdƒ‚n|dkr3t}n|j|d|ƒjƒS(s%Parse a whole document into a string.sstring requiredt useChardetN(t isinstancet_stringst TypeErrorRt html_parsertparsetgetroot(thtmlt guess_charsettparser((s;/usr/lib/python2.7/vendor-packages/lxml/html/html5parser.pytdocument_fromstring0s   cCs¥t|tƒstdƒ‚n|dkr3t}n|j|dd|ƒ}|r¡t|dtƒr¡|r¡|djƒr”tjd|dƒ‚n|d=q¡n|S(s”Parses several HTML elements, returning a list of elements. The first item in the list may be a string. If no_leading_text is true, then it will be an error if there is leading text, and it will always be a list of only elements. If `guess_charset` is `True` and the text was not unicode but a bytestring, the `chardet` library will perform charset guessing on the string. sstring requiredtdivRisThere is leading text: %rN( RRRRRt parseFragmenttstripRt ParserError(Rtno_leading_textRRtchildren((s;/usr/lib/python2.7/vendor-packages/lxml/html/html5parser.pytfragments_fromstring;s     c Cs>t|tƒstdƒ‚nt|ƒ}t|d|d|d| t}|rºt|tƒsjd}nt|ƒ}|r¶t|dtƒr¦|d|_|d=n|j |ƒn|S|sÒt j dƒ‚nt |ƒdkröt j d ƒ‚n|d}|j r1|j jƒr1t j d |j ƒ‚nd |_ |S( sXParses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element. If create_parent is true (or is a tag name) then a parent node will be created to encapsulate the HTML in a single element. In this case, leading or trailing text is allowed. sstring requiredRRR%R!isNo elements foundisMultiple elements foundsElement followed by text: %rN(RRRtboolR'tkwt basestringtElementttexttextendRR$tlenttailR#R(Rt create_parentRRtaccept_leading_texttelementstnew_roottresult((s;/usr/lib/python2.7/vendor-packages/lxml/html/html5parser.pytfragment_fromstringWs2         cCst|tƒstdƒ‚nt|d|d|ƒ}|d jƒjƒ}|jdƒsj|jdƒrn|St|dƒ}t|ƒr|St|dƒ}t|ƒd krò|j sÈ|j j ƒ rò|d j sê|d j j ƒ rò|d St |ƒr d |_ n d |_ |S(süParse the html, returning a single element/document. This tries to minimally parse the chunk of text, without knowing if it is a fragment or a document. base_url will set the document's base_url attribute (and the tree's docinfo.URL) sstring requiredRRi2sRR(((s;/usr/lib/python2.7/vendor-packages/lxml/html/html5parser.pyts2       (*