� K�g�7����dZddlmZddlZddlZddlZddlmZm Z erddl m Z ej � d��Zej �e��Zej�e��eejd<ejd��e_ejd ��e_eje_ejd ej��e_ejd ��ZGd �d ej��ZdS)a  This module imports a copy of [`html.parser.HTMLParser`][] and modifies it heavily through monkey-patches. A copy is imported rather than the module being directly imported as this ensures that the user can import and use the unmodified library for their own needs. �)� annotationsN)� TYPE_CHECKING�Sequence)�Markdownz html.parser� htmlparserz\?>z&([a-zA-Z][-.a-zA-Z0-9]*);a� <[a-zA-Z][^`\t\n\r\f />\x00]* # tag name <= added backtick here (?:[\s/]* # optional whitespace before attribute name (?:(?<=['"\s/])[^`\s/>][^\s/=>]* # attribute name <= added backtick here (?:\s*=+\s* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^`>\s]* # bare value <= added backtick here ) (?:\s*,)* # possibly followed by a comma )?(?:\s|/(?!>))* )* )? \s* # trailing whitespace z ^([ ]*\n){2}c����eZdZUdZd*�fd� Z�fd�Z�fd�Zed+d ���Zd,d �Z d-d�Z d.d�Z d/d�Z d0d�Z d1d�Zd/d�Zd2d�Zd2d�Zd0d�Zd0d�Zd0d�Zd0d�Zd3�fd � Zd3�fd!� Zd4d5�fd$� Zd%Zd&ed'<d6d(�Zd3d)�Z�xZS)7� HTMLExtractorz� Extract raw HTML from text. The raw HTML is stored in the [`htmlStash`][markdown.util.HtmlStash] of the [`Markdown`][markdown.Markdown] instance passed to `md` and the remaining text is stored in `cleandoc` as a list of strings. �mdrc���d|vrd|d<tdg��|_dg|_t��j|i|��||_dS)N�convert_charrefsF�hrr)�set� empty_tags�lineno_start_cache�super�__init__r )�selfr �args�kwargs� __class__s ��c/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/markdown/htmlparser.pyrzHTMLExtractor.__init__Ss]��� �V� +� +�).�F�%� &��t�f�+�+���#$�#��� �����$�)�&�)�)�)������c���d|_d|_g|_g|_g|_dg|_t �����dS)z1Reset this instance. Loses all unprocessed data.FrN)�inraw�intail�stack�_cache�cleandocrr�reset�rrs �rrzHTMLExtractor.reset`sE����� ��� � "�� �!#�� �#%�� �#$�#��� ��� � �����rc����t�����t|j��r[|jr:|js3|�t�|j����n|�|j��t|j ��rX|j � |j j �d�|j ������g|_ dSdS)zHandle any buffered data.�N)r�close�len�rawdatar � cdata_elem� handle_datar�unescaperr�appendr � htmlStash�store�joinr s �rr#zHTMLExtractor.closeks���� ��� � ���� �t�|� � � /��$� /�T�_� /�� � ��!4�!4�T�\�!B�!B�C�C�C�C�� � ���.�.�.� �t�{� � � � �M� � ���!2�!8�!8������9M�9M�!N�!N� O� O� O��D�K�K�K� � r�return�intc�J�tt|j��dz |jdz ��D]a}|j|}|j�d|��}|dkrt|j��}|j�|dz���b|j|jdz S)zHReturns char index in `self.rawdata` for the start of the current line. �� �����)�ranger$r�linenor%�findr))r�ii�last_line_start_pos�lf_poss r� line_offsetzHTMLExtractor.line_offsetzs�����D�3�4�4�Q�6�� �A� �F�F� 5� 5�B�"&�"9�"�"=� ��\�&�&�t�-@�A�A�F���|�|��T�\�*�*�� � #� *� *�6�!�8� 4� 4� 4� 4��&�t�{�1�}�5�5r�boolc��|jdkrdS|jdkrdS|j|j|j|jz����dkS)z� Returns True if current position is at start of line. Allows for up to three blank spaces at start of line. rT�Fr")�offsetr%r9�strip�rs r� at_line_startzHTMLExtractor.at_line_start�sW�� �;�!� � ��4� �;��?�?��5��|�D�,�T�-=�� �-K�K�L�R�R�T�T�XZ�Z�Zr�tag�strc���|j|jz}tj�|j|��}|r!|j||����Sd�|��S)z� Returns the text of the end tag. If it fails to extract the actual text from the raw data, it builds a closing tag with `tag`. z</{}>)r9r=r� endendtag�searchr%�end�format)rrA�start�ms r�get_endtag_textzHTMLExtractor.get_endtag_text�sb��� �4�;�.�� � � '� '�� �e� <� <�� � '��<��a�e�e�g�g� �.� .��>�>�#�&�&� &r�attrs�Sequence[tuple[str, str]]c��||jvr|�||��dS|j�|��rC|js|���r(|js!d|_|j�d��|� ��}|jr6|j �|��|j �|��dS|j�|��||j vr|� ��dSdS)NTr1)r�handle_startendtagr �is_block_levelrr@rrr)�get_starttag_textrr�CDATA_CONTENT_ELEMENTS�clear_cdata_mode)rrArK�texts r�handle_starttagzHTMLExtractor.handle_starttag�s�� �$�/� !� !� � #� #�C�� /� /� /� �F� �7� !� !�#� &� &� '�D�K� '�D�<N�<N�<P�<P� '�Y]�Yc� '��D�J� �M� � �� &� &� &��%�%�'�'�� �:� (� �J� � �c� "� "� "� �K� � �t� $� $� $� $� $� �M� � �� &� &� &��d�1�1�1��%�%�'�'�'�'�'�2�1rc��|�|��}|j�rH|j�|��||jvr,|jr%|j���|krn|j�%t |j��dkr�t�|j |j |j zt |��zd���r|j�d��nd|_ d|_|j �|jj�d�|j������|j �d��g|_dSdS|j �|��dS)Nrr1TFr"� )rJrrr)r�popr$� blank_line_re�matchr%r9r=rrr r*r+r,)rrArSs r� handle_endtagzHTMLExtractor.handle_endtag�sj���#�#�C�(�(�� �:� '� �K� � �t� $� $� $��d�j� � ��j���z�~�~�'�'�3�.�.���j���4�:���!�#�#� �&�&�t�|�D�4D�t�{�4R�UX�Y]�U^�U^�4^�4_�4_�'`�a�a�'��K�&�&�t�,�,�,�,�#'�D�K�"�� �� �$�$�T�W�%6�%<�%<�R�W�W�T�[�=Q�=Q�%R�%R�S�S�S�� �$�$�V�,�,�,� �� � � �$�#� �M� � �� &� &� &� &� &r�datac��|jr d|vrd|_|jr|j�|��dS|j�|��dS)Nr1F)rrrr)r�rr[s rr'zHTMLExtractor.handle_data�s]�� �;� �4�4�<�<��D�K� �:� '� �K� � �t� $� $� $� $� $� �M� � �� &� &� &� &� &r�is_blockc��|js|jr|j�|��dS|����r|r�t �|j|j|j zt|��zd���r|dz }nd|_|j r |j dnd}|� d��s/|� d��r|j �d��|j �|j j�|����|j �d��dS|j �|��dS)z Handle empty tags (`<data>`). Nr1Tr2r"rV)rrrr)r@rXrYr%r9r=r$r�endswithr r*r+)rr[r^�items r�handle_empty_tagzHTMLExtractor.handle_empty_tag�sR�� �:� '��� '� �K� � �t� $� $� $� $� $� � � � !� !� '�h� '��"�"�4�<��0@�4�;�0N�QT�UY�QZ�QZ�0Z�0[�0[�#\�]�]� #��� ���#�� �(,� �=�4�=��$�$�2�D��=�=��(�(� +�T�]�]�4�-@�-@� +�� �$�$�T�*�*�*� �M� � ���!2�!8�!8��!>�!>� ?� ?� ?� �M� � �� (� (� (� (� (� �M� � �� &� &� &� &� &rc��|�|���|j�|�����dS)N�r^)rbrPr rO)rrArKs rrNz HTMLExtractor.handle_startendtag�s>�� ���d�4�4�6�6���AW�AW�X[�A\�A\��]�]�]�]�]r�namec�Z�|�d�|��d���dS)Nz&#{};Frd�rbrG�rres r�handle_charrefzHTMLExtractor.handle_charref�s-�� ���g�n�n�T�2�2�U��C�C�C�C�Crc�Z�|�d�|��d���dS)Nz&{};Frdrgrhs r�handle_entityrefzHTMLExtractor.handle_entityref�s-�� ���f�m�m�D�1�1�E��B�B�B�B�Brc�Z�|�d�|��d���dS)Nz <!--{}-->Trdrgr]s r�handle_commentzHTMLExtractor.handle_comment�s/�� ���k�0�0��6�6���F�F�F�F�Frc�Z�|�d�|��d���dS)Nz<!{}>Trdrgr]s r� handle_declzHTMLExtractor.handle_decl�s-�� ���g�n�n�T�2�2�T��B�B�B�B�Brc�Z�|�d�|��d���dS)Nz<?{}?>Trdrgr]s r� handle_pizHTMLExtractor.handle_pis-�� ���h�o�o�d�3�3�d��C�C�C�C�Crc��|�d��rdnd}|�d�||��d���dS)NzCDATA[z]]>z]>z<![{}{}Trd)� startswithrbrG)rr[rFs r� unknown_declzHTMLExtractor.unknown_declsK�����x�0�0�:�e�e�d�� ���i�.�.�t�S�9�9�D��I�I�I�I�Ir�ic���|���s|jr!t���|��S|�d��|dzS)Nz<?�)r@rr�parse_pir'�rrurs �rrxzHTMLExtractor.parse_pisW��� � � � � � '�4�;� '��7�7�#�#�A�&�&� &� ��������1�u� rc���|���s|jr!t���|��S|�d��|dzS)Nz<!rw)r@rr�parse_html_declarationr'rys �rr{z$HTMLExtractor.parse_html_declarationsW��� � � � � � 5�4�;� 5��7�7�1�1�!�4�4� 4� ��������1�u� rr�reportc���t���||��}|dkrdS|�|j||�d���|S)Nr2Frd)r�parse_bogus_commentrbr%)rrur|�posrs �rr~z!HTMLExtractor.parse_bogus_commentsU����g�g�)�)�!�V�4�4�� �"�9�9��2� ���d�l�1�S�5�1�E��B�B�B�� rNz str | None�_HTMLExtractor__starttag_textc��|jS)z)Return full source of start tag: `<...>`.)r�r?s rrPzHTMLExtractor.get_starttag_text's ���#�#rc��d|_|�|��}|dkr|S|j}|||�|_g}tj�||dz��}|s Jd���|���}|�d�����x|_ }||kr�tj �||��}|sn�|�ddd��\} } } | sd} nI| dd�dcxkr| dd�ks"n| dd�dcxkr| dd�kr nn | dd�} | rt� | ��} |� | ���| f��|���}||k��|||�� ��} | d vr�|���\} }d |jvrM| |j�d ��z} t!|j��|j�d ��z }n|t!|j��z}|�|||���|S| �d ��r|�||��n4||jvr|�|��|�||��|S) Nrr0z#unexpected call to parse_starttag()rwr<�'r2�")�>�/>r1r�)r��check_for_whole_start_tagr%r�tagfind_tolerantrYrF�group�lower�lasttag�attrfind_tolerantr(r)r>�getpos�countr$�rfindr'r`rNrQ�set_cdata_moderT)rru�endposr%rKrY�krArI�attrname�rest� attrvaluerFr4r=s r�parse_starttagzHTMLExtractor.parse_starttag+s���#����/�/��2�2�� �A�:�:��M��,��&�q��x�0������+�1�1�'�1�Q�3�?�?���;�;�;�;�;�u� �I�I�K�K��"�[�[��^�^�1�1�3�3�3�� �s��&�j�j��,�2�2�7�A�>�>�A�� ��()����1�a�(8�(8� %�H�d�I�� ,� � � ��2�A�2��$�8�8�8�8�)�B�C�C�.�8�8�8�8��2�A�2��#�7�7�7�7��2�3�3��7�7�7�7�7�%�a��d�O� �� ;�&�/�/� �:�:� � �L�L�(�.�.�*�*�I�6� 7� 7� 7������A��&�j�j��a��h��%�%�'�'�� �k� !� !�!�[�[�]�]�N�F�F��t�+�+�+��$�"6�"<�"<�T�"B�"B�B���T�1�2�2��/�5�5�d�;�;�<��� �#�d�&:�";�";�;�� � � �W�Q�v�X�.� /� /� /��M� �<�<�� � � -� � #� #�C�� /� /� /� /��d�1�1�1��#�#�C�(�(�(� � � ��e� ,� ,� ,�� r)r r)r-r.)r-r:)rArBr-rB)rArBrKrL)rArB)r[rB)r[rBr^r:)rerB)rur.r-r.)r)rur.r|r.r-r.)r-rB)�__name__� __module__� __qualname__�__doc__rrr#�propertyr9r@rJrTrZr'rbrNrirkrmrorqrtrxr{r~r��__annotations__rPr�� __classcell__)rs@rr r JsY���������� � � � � � � � � � � � � � � � �� 6� 6� 6��X� 6� [� [� [� [� '� '� '� '�(�(�(�(�*'�'�'�'�6'�'�'�'�'�'�'�'�.^�^�^�^�D�D�D�D�C�C�C�C�G�G�G�G�C�C�C�C�D�D�D�D�J�J�J�J��������������������#'�O�&�&�&�&�$�$�$�$�0�0�0�0�0�0�0�0rr )r�� __future__r�re�importlib.util� importlib�sys�typingrr�markdownr�util� find_spec�spec�module_from_specr�loader� exec_module�modules�compile�piclose� entityref� incomplete�VERBOSE�locatestarttagend_tolerantrX� HTMLParserr �rr�<module>r�so��(�� #�"�"�"�"�"� � � � ����� � � � �*�*�*�*�*�*�*�*��"�!�!�!�!�!�!� �~��� �.�.�� �^� ,� ,�T� 2� 2� �� ��� �#�#�#�&�� �L�� �R�Z��'�'� ��!�r�z�"?�@�@� ��#�,� ��(2�� �4��Z�)�)� �%�$�� �?�+�+� �Q�Q�Q�Q�Q�J�)�Q�Q�Q�Q�Qr
Memory