� J�g:����dZddlZddlZddlZddlmZmZmZddlm Z ddl m Z ddl m Z mZmZddlmZdd lmZdd lmZdd lmZdd lmZdd lmZddlmZGd�d��ZdS)a Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco �N)�List�Optional�Union�)�CharSetGroupProber)� CharSetProber)� InputState�LanguageFilter� ProbingState)�EscCharSetProber)� Latin1Prober)�MacRomanProber)�MBCSGroupProber)� ResultDict)�SBCSGroupProber)� UTF1632Proberc �X�eZdZdZdZejd��Zejd��Zejd��Z dddd d d d d d�Z dddd dddd�Z e j dfde deddfd�Zedefd���Zedefd���Zedeefd���Zd!d�Zdeeefddfd�Zdefd �ZdS)"�UniversalDetectoraq The ``UniversalDetector`` class underlies the ``chardet.detect`` function and coordinates all of the different charset probers. To get a ``dict`` containing an encoding and its confidence, you can simply run: .. code:: u = UniversalDetector() u.feed(some_bytes) u.close() detected = u.result g�������?s[�-�]s(|~{)s[�-�]z Windows-1252z Windows-1250z Windows-1251z Windows-1256z Windows-1253z Windows-1255z Windows-1254z Windows-1257)� iso-8859-1z iso-8859-2z iso-8859-5z iso-8859-6z iso-8859-7z iso-8859-8� iso-8859-9z iso-8859-13z ISO-8859-11�GB18030�CP949�UTF-16)�asciirztis-620r�gb2312zeuc-krzutf-16leF� lang_filter�should_rename_legacy�returnNc� �d|_d|_g|_dddd�|_d|_d|_t j|_d|_ ||_ tj t��|_d|_||_|���dS)N���encoding� confidence�languageF�)�_esc_charset_prober�_utf1632_prober�_charset_probers�result�done� _got_datar � PURE_ASCII� _input_state� _last_charr�logging� getLogger�__name__�logger�_has_win_bytesr�reset)�selfrrs �i/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/chardet/universaldetector.py�__init__zUniversalDetector.__init__ds��� @D�� �8<���57������# �# �� � �� ����&�1������&����'��1�1�� �#���$8��!� � � � � � � � r%c��|jS�N)r-�r5s r6� input_statezUniversalDetector.input_state{s ��� � r%c��|jSr9)r3r:s r6� has_win_byteszUniversalDetector.has_win_bytess ���"�"r%c��|jSr9)r(r:s r6�charset_probersz!UniversalDetector.charset_probers�s ���$�$r%c�2�dddd�|_d|_d|_d|_tj|_d|_|jr|j� ��|j r|j � ��|j D]}|� ���dS)z� Reset the UniversalDetector and all of its probers back to their initial states. This is called by ``__init__``, so you only need to call this directly in between analyses of different documents. Nr r!Fr%) r)r*r+r3r r,r-r.r&r4r'r()r5�probers r6r4zUniversalDetector.reset�s��� $(�s��M�M�� ��� ����#���&�1������ � #� -� � $� *� *� ,� ,� ,� � � )� � � &� &� (� (� (��+� � �F� �L�L�N�N�N�N� � r%�byte_strc�p�|jrdS|sdSt|t��st|��}|js�|�t j��r dddd�|_n�|�t jt j f��r dddd�|_nx|�d��r dddd�|_nW|�d ��r d ddd�|_n6|�t j t j f��r d ddd�|_d |_|jd � d |_dS|j tjkrt|j�|��rtj|_ nH|j tjkr3|j�|j|z��rtj|_ |dd�|_|jst-��|_|jjt0jkr]|j�|��t0jkr5|jj|j���dd�|_d |_dS|j tjkr�|jst?|j ��|_|j�|��t0jkr?|jj|j���|jj!d�|_d |_dSdS|j tjk�r'|j"s�tG|j ��g|_"|j tHj%zr&|j"�&tO����|j"�&tQ����|j"�&tS����|j"D]U}|�|��t0jkr0|j|���|j!d�|_d |_n�V|j*�|��r d |_+dSdSdS)a� Takes a chunk of a document and feeds it through all of the relevant charset probers. After calling ``feed``, you can check the value of the ``done`` attribute to see if you need to continue feeding the ``UniversalDetector`` more data, or if it has made a prediction (in the ``result`` attribute). .. note:: You should always call ``close`` when you're done feeding in your document if ``done`` is not already ``True``. Nz UTF-8-SIG��?�r!zUTF-32s��zX-ISO-10646-UCS-4-3412s��zX-ISO-10646-UCS-4-2143rTr"�����),r*� isinstance� bytearrayr+� startswith�codecs�BOM_UTF8r)� BOM_UTF32_LE� BOM_UTF32_BE�BOM_LE�BOM_BEr-r r,�HIGH_BYTE_DETECTOR�search� HIGH_BYTE� ESC_DETECTORr.� ESC_ASCIIr'r�stater � DETECTING�feed�FOUND_IT� charset_name�get_confidencer&r rr$r(rr �NON_CJK�appendrr r�WIN_BYTE_DETECTORr3)r5rBrAs r6rWzUniversalDetector.feed�s]�� �9� � �F�� � �F��(�I�.�.� +� ��*�*�H��~�% ��"�"�6�?�3�3� X�!,�"%� "���� � � �$�$�f�&9�6�;N�%O�P�P� X�,4�3�TV�W�W�� � ��$�$�%8�9�9� X�!9�"%� "� ��� � � �$�$�%8�9�9� X�!9�"%� "� ��� � � �$�$�f�m�V�]�%C�D�D� X�,4�3�TV�W�W�� �!�D�N��{�:�&�2� �� ��� � � � 5� 5� 5��&�-�-�h�7�7� 9�$.�$8��!�!��!�Z�%:�:�:��%�,�,�T�_�x�-G�H�H�;�%/�$8��!�"�2�3�3�-����#� 3�#0�?�?�D� � � � %��)?� ?� ?��#�(�(��2�2�l�6K�K�K� $� 4� A�"&�"6�"E�"E�"G�"G� "���� � !�� ��� � � � 4� 4� 4��+� N�+;�D�<L�+M�+M��(��'�,�,�X�6�6�,�:O�O�O� $� 8� E�"&�":�"I�"I�"K�"K� $� 8� A���� � !�� � � � P�O�� �*�"6� 6� 6��(� ?�)8��9I�)J�)J�(K��%��#�n�&<�<�D��)�0�0��1B�1B�C�C�C��%�,�,�\�^�^�<�<�<��%�,�,�^�-=�-=�>�>�>��/� � ���;�;�x�(�(�L�,A�A�A�$*�$7�&,�&;�&;�&=�&=�$*�O�#�#�D�K� !%�D�I��E�B��%�,�,�X�6�6� +�&*��#�#�#�%7� 6�" +� +r%c ��|jr|jSd|_|js|j�d���n%|jt jkr dddd�|_�n|jt jkr�d}d}d}|j D]#}|s�|� ��}||kr|}|}�$|r�||j kr�|j }|�J�|� ��}|� ��}|�d ��r"|jr|j�||��}|jr/|j�|pd� ��|��}|||jd�|_|j���t,jkr�|jd ��|j�d ��|j D]�}|s�t1|t2��rD|jD];}|j�d |j |j|� �����<�^|j�d |j |j|� ������|jS) z� Stop analyzing the current document and come up with a final prediction. :returns: The ``result`` attribute, a ``dict`` with the keys `encoding`, `confidence`, and `language`. Tzno data received!rrDrEr!Nr ziso-8859r"z no probers hit minimum thresholdz%s %s confidence = %s)r*r)r+r2�debugr-r r,rRr(rZ�MINIMUM_THRESHOLDrY�lowerrIr3� ISO_WIN_MAP�getr� LEGACY_MAPr$�getEffectiveLevelr/�DEBUGrGr�probers) r5�prober_confidence�max_prober_confidence� max_proberrArY�lower_charset_namer#� group_probers r6�closezUniversalDetector.closes��� �9� ��;� ��� ��~�( � �K� � �1� 2� 2� 2� 2�� �*�"7� 7� 7�'.�c�r�R�R�D�K�K�� �*�"6� 6� 6� $� �$'� !��J��/� (� (�����$*�$9�$9�$;�$;�!�$�'<�<�<�,=�)�!'�J��� �4�t�7M�M�M�)�6� �#�/�/�/�%1�%7�%7�%9�%9�"�'�6�6�8�8� �&�0�0��<�<���*��'+�'7�';�';�.� �(�(� ��,��#'�?�#6�#6�%�+��2�2�4�4�l�$�$�L�!-�",� *� 3���� � �;� (� (� *� *�g�m� ;� ;��{�:�&�.�� �!�!�"D�E�E�E�$(�$9���L�'�!� �!�,�0B�C�C��&2�&:���F� �K�-�-� 7� &� 3� &�� &� 5� 5� 7� 7� ������ �)�)�3�(�5�(�1�(�7�7�9�9� ���� �{�r%)rN)r1� __module__� __qualname__�__doc__r`�re�compilerPrSr]rbrdr �ALL�boolr7�property�intr;r=rrr?r4r�bytesrHrWrrm�r%r6rr8s��������� ��#���N�3�3���2�:�l�+�+�L�"�� �>�2�2��$�$�$�$�$�$�$�%� � �K� �$� �$������J�'5�&8�%*���#��#�� � ����.�!�S�!�!�!��X�!��#�t�#�#�#��X�#��%��m�!4�%�%�%��X�%�����&A+�U�5�)�#3�4�A+��A+�A+�A+�A+�FM�z�M�M�M�M�M�Mr%r)rprJr/rq�typingrrr�charsetgroupproberr� charsetproberr�enumsr r r � escproberr � latin1proberr �macromanproberr�mbcsgroupproberr� resultdictr�sbcsgroupproberr� utf1632proberrrrxr%r6�<module>r�sF��8��� � � ����� � � � �(�(�(�(�(�(�(�(�(�(�2�2�2�2�2�2�(�(�(�(�(�(�;�;�;�;�;�;�;�;�;�;�'�'�'�'�'�'�&�&�&�&�&�&�*�*�*�*�*�*�,�,�,�,�,�,�"�"�"�"�"�"�,�,�,�,�,�,�(�(�(�(�(�(�r�r�r�r�r�r�r�r�r�rr%
Memory