� J�g9!��J�ddlmZmZddlmZddlmZGd�de��ZdS)�)�List�Union�)� CharSetProber)� ProbingStatec�8��eZdZdZdZdZd�fd� Zd�fd� Zede fd���Z ede fd ���Z de fd �Z de fd �Zdefd �Zdefd �Zdefd�Zdefd�Zdeeddfd�Zdeeddfd�Zdeeefdefd�Zedefd���Zde fd�Z�xZS)� UTF1632Proberad This class simply looks for occurrences of zero bytes, and infers whether the file is UTF16 or UTF32 (low-endian or big-endian) For instance, files looking like ( [nonzero] )+ have a good probability to be UTF32BE. Files looking like ( [nonzero] )+ may be guessed to be UTF16BE, and inversely for little-endian varieties. �g�G�z�?�returnNc�2��t�����d|_dgdz|_dgdz|_t j|_gd�|_d|_ d|_ d|_ d|_ d|_ d|_|���dS)Nr��rrrrF)�super�__init__�position� zeros_at_mod�nonzeros_at_modr� DETECTING�_state�quad�invalid_utf16be�invalid_utf16le�invalid_utf32be�invalid_utf32le�'first_half_surrogate_pair_detected_16be�'first_half_surrogate_pair_detected_16le�reset��self� __class__s ��e/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/chardet/utf1632prober.pyrzUTF1632Prober.__init__)s���� ���������� ��C�!�G��� !�s�Q�w���"�,�� � �L�L�� �$���$���$���$���7<��4�7<��4� � � � � � � � �c� ��t�����d|_dgdz|_dgdz|_t j|_d|_d|_ d|_ d|_ d|_ d|_ gd�|_dS)Nrr Fr)rrrrrrrrrrrrrrrrs �r!rzUTF1632Prober.reset8s���� ��� � ������ ��C�!�G��� !�s�Q�w���"�,�� �$���$���$���$���7<��4�7<��4� �L�L�� � � r"c��|���rdS|���rdS|���rdS|���rdSdS)Nzutf-32bezutf-32lezutf-16bezutf-16lezutf-16)�is_likely_utf32be�is_likely_utf32le�is_likely_utf16be�is_likely_utf16le�rs r!� charset_namezUTF1632Prober.charset_nameFsk�� � !� !� #� #� ��:� � !� !� #� #� ��:� � !� !� #� #� ��:� � !� !� #� #� ��:��xr"c��dS)N��r)s r!�languagezUTF1632Prober.languageSs���rr"c�2�td|jdz ��S)N��?g@��maxrr)s r!�approx_32bit_charsz UTF1632Prober.approx_32bit_charsW����3�� ��+�,�,�,r"c�2�td|jdz ��S)Nr0g@r1r)s r!�approx_16bit_charsz UTF1632Prober.approx_16bit_charsZr4r"c��|���}||jkok|jd|z |jkoR|jd|z |jko9|jd|z |jko |jd|z |jko|j S�Nrr��)r3�MIN_CHARS_FOR_DETECTIONr�EXPECTED_RATIOrr�r� approx_charss r!r%zUTF1632Prober.is_likely_utf32be]s����.�.�0�0� ��t�;�;� � � �a� �<� /�$�2E� E� )��!�!�$�|�3�d�6I�I� )��!�!�$�|�3�d�6I�I� )��$�Q�'�,�6��9L�L� )��(�(�  r"c��|���}||jkok|jd|z |jkoR|jd|z |jko9|jd|z |jko |jd|z |jko|j Sr8)r3r;rr<rrr=s r!r&zUTF1632Prober.is_likely_utf32legs����.�.�0�0� ��t�;�;� � � �� #�l� 2�T�5H� H� )��!�!�$�|�3�d�6I�I� )��!�!�$�|�3�d�6I�I� )��!�!�$�|�3�d�6I�I� )��(�(�  r"c���|���}||jkoU|jd|jdz|z |jko.|jd|jdz|z |jko|j S)Nrr:rr9)r6r;rr<rrr=s r!r'zUTF1632Prober.is_likely_utf16beq����.�.�0�0� ��t�;�;� � � !�!� $�t�';�A�'>� >�,� N��!� "� )��"�1�%��(9�!�(<�<� �L��!�"� )��(�(�  r"c���|���}||jkoU|jd|jdz|z |jko.|jd|jdz|z |jko|j S)Nrr9rr:)r6r;rr<rrr=s r!r(zUTF1632Prober.is_likely_utf16le{rAr"rc�H�|ddks:|ddks.|ddkr)|ddkrd|dcxkrdkr nnd|_|ddks;|ddks/|ddkr,|ddkr"d|dcxkrdkrnd Sd|_d Sd Sd Sd S) z� Validate if the quad of bytes is valid UTF-32. UTF-32 is valid in the range 0x00000000 - 0x0010FFFF excluding 0x0000D800 - 0x0000DFFF https://en.wikipedia.org/wiki/UTF-32 rr���r9��Tr:N)rr)rrs r!�validate_utf32_charactersz'UTF1632Prober.validate_utf32_characters�s��� ��G�q�L�L��A�w��~�~��Q��1� � ��a��A���$�$�q�'�2I�2I�2I�2I�T�2I�2I�2I�2I�2I�#'�D� � ��G�q�L�L��A�w��~�~��Q��1� � ��a��A���$�$�q�'�2I�2I�2I�2I�T�2I�2I�2I�2I�2I�2I�#'�D� � � �� ���2I�2Ir"�pairc��|js<d|dcxkrdkr nnd|_nCd|dcxkrdkr nn-d|_n%d|dcxkrdkr nnd|_nd|_|jsAd|dcxkrdkr nn d|_d Sd|dcxkrdkr nd Sd|_d Sd Sd|dcxkrdkr nn d|_d Sd|_d S) a9 Validate if the pair of bytes is valid UTF-16. UTF-16 is valid in the range 0x0000 - 0xFFFF excluding 0xD800 - 0xFFFF with an exception for surrogate pairs, which must be in the range 0xD800-0xDBFF followed by 0xDC00-0xDFFF https://en.wikipedia.org/wiki/UTF-16 rEr��T��rFFrN)rrrr)rrHs r!�validate_utf16_charactersz'UTF1632Prober.validate_utf16_characters�sj���;� ,��t�A�w�&�&�&�&�$�&�&�&�&�&�?C��<�<���a��(�(�(�(�D�(�(�(�(�(�'+��$���t�A�w�&�&�&�&�$�&�&�&�&�&�?D��<�<�'+��$��;� ,��t�A�w�&�&�&�&�$�&�&�&�&�&�?C��<�<�<���a��(�(�(�(�D�(�(�(�(�(�(�'+��$�$�$�)�(��t�A�w�&�&�&�&�$�&�&�&�&�&�?D��<�<�<�'+��$�$�$r"�byte_strc��|D]�}|jdz}||j|<|dkr^|�|j��|�|jdd���|�|jdd���|dkr|j|xxdz cc<n|j|xxdz cc<|xjdz c_��|jS)Nr r:rr9r)rrrGrLrr�state)rrM�c�mod4s r!�feedzUTF1632Prober.feed�s���� � �A��=�1�$�D��D�I�d�O��q�y�y��.�.�t�y�9�9�9��.�.�t�y��1��~�>�>�>��.�.�t�y��1��~�>�>�>��A�v�v��!�$�'�'�'�1�,�'�'�'�'��$�T�*�*�*�a�/�*�*�*� �M�M�Q� �M�M�M��z�r"c���|jtjtjhvr|jS|���dkrtj|_n|jdkrtj|_|jS)Ng�������?i)rr�NOT_ME�FOUND_IT�get_confidencerr)s r!rOzUTF1632Prober.state�sf�� �;�<�.� �0E�F� F� F��;� � � � � � �4� '� '�&�/�D�K�K� �]�X� %� %�'�-�D�K��{�r"c��|���s<|���s(|���s|���rdndS)Ng333333�?g)r(r'r&r%r)s r!rVzUTF1632Prober.get_confidence�sh���&�&�(�(� ��)�)�+�+� ��)�)�+�+�  � �)�)�+�+�  �D�D�� r")r N) �__name__� __module__� __qualname__�__doc__r;r<rr�property�strr*r.�floatr3r6�boolr%r&r'r(r�intrGrLr�bytes� bytearrayrrRrOrV� __classcell__)r s@r!r r s���������!���N� � � � � � � !� !� !� !� !� !�� �c� � � ��X� ���#�����X��-�E�-�-�-�-�-�E�-�-�-�-� �4� � � � � �4� � � � � �4� � � � � �4� � � � �(�d�3�i�(�D�(�(�(�(�,,�d�3�i�,�D�,�,�,�,�@ �U�5�)�#3�4� �� � � � �� �|� � � ��X� �  ��  �  �  �  �  �  �  �  r"r N)�typingrr� charsetproberr�enumsrr r-r"r!�<module>rgs���*��������(�(�(�(�(�(�������F �F �F �F �F �M�F �F �F �F �F r"
Memory