� J�g,��p�ddlZddlZddlmZmZddlmZmZejd��Z Gd�d��Z dS)�N)�Optional�Union�)�LanguageFilter� ProbingStates%[a-zA-Z]*[�-�]+[a-zA-Z]*[^a-zA-Z�-�]?c�`�eZdZdZejfdeddfd�Zdd�Zede e fd���Z ede e fd���Z d e eefdefd �Zedefd ���Zdefd �Zed e eefdefd���Zed e eefdefd���Zed e eefdefd���ZdS)� CharSetProbergffffff�?� lang_filter�returnNc��tj|_d|_||_t jt��|_dS)NT) r� DETECTING�_state�activer �logging� getLogger�__name__�logger)�selfr s �e/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/chardet/charsetprober.py�__init__zCharSetProber.__init__,s1��"�,�� ��� �&����'��1�1�� � � �c�(�tj|_dS�N)rr r�rs r�resetzCharSetProber.reset2s��"�,�� � � rc��dSr�rs r� charset_namezCharSetProber.charset_name5s���trc��t�r��NotImplementedErrorrs r�languagezCharSetProber.language9s��!�!r�byte_strc��t�rr )rr#s r�feedzCharSetProber.feed=s��!�!rc��|jSr)rrs r�statezCharSetProber.state@s ���{�rc��dS)Ngrrs r�get_confidencezCharSetProber.get_confidenceDs���sr�bufc�2�tjdd|��}|S)Ns([-])+� )�re�sub)r*s r�filter_high_byte_onlyz#CharSetProber.filter_high_byte_onlyGs���f�&��c�2�2��� rc��t��}t�|��}|D]Z}|�|dd���|dd�}|���s|dkrd}|�|���[|S)u7 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [€-ÿ] marker: everything else [^a-zA-Z€-ÿ] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. N�������r,)� bytearray�INTERNATIONAL_WORDS_PATTERN�findall�extend�isalpha)r*�filtered�words�word� last_chars r�filter_international_wordsz(CharSetProber.filter_international_wordsLs����;�;�� ,�3�3�C�8�8��� '� '�D� �O�O�D��"��I� &� &� &� �R�S�S� �I��$�$�&�&� !�9�w�+>�+>� � � �O�O�I� &� &� &� &��rc�v�t��}d}d}t|���d��}t|��D]U\}}|dkr|dz}d}�|dkr<||kr4|s2|�|||���|�d��d}�V|s|�||d ���|S) a[ Returns a copy of ``buf`` that retains only the sequences of English alphabet and high byte characters that are not between <> characters. This filter can be applied to all scripts which contain both English characters and extended ASCII characters, but is currently only used by ``Latin1Prober``. Fr�c�>r�<r,TN)r3� memoryview�cast� enumerater6)r*r8�in_tag�prev�curr�buf_chars r�remove_xml_tagszCharSetProber.remove_xml_tagsns����;�;��������o�o�"�"�3�'�'��'��n�n� � �N�D�(��4����a�x������T�!�!��$�;�;�v�;��O�O�C��T� �N�3�3�3��O�O�D�)�)�)����� (� �O�O�C����J� '� '� '��r)r N)r� __module__� __qualname__�SHORTCUT_THRESHOLDr�NONErr�propertyr�strrr"r�bytesr3rr%r'�floatr)� staticmethodr/r<rHrrrr r (s���������5C�5H�2�2�N�2�T�2�2�2�2� -�-�-�-���h�s�m�����X���"�(�3�-�"�"�"��X�"�"�U�5�)�#3�4�"��"�"�"�"���|�����X����������5�� �)9�#:��u�����\�����e�Y�.>�(?��I�����\��B�$�U�5�)�#3�4�$��$�$�$��\�$�$�$rr ) rr-�typingrr�enumsrr�compiler4r rrr�<module>rUs���:���� � � � �"�"�"�"�"�"�"�"�/�/�/�/�/�/�/�/�(�b�j�8���� k�k�k�k�k�k�k�k�k�kr
Memory