� ���g�2���ddlZddlZddlZddlZddlZddlZddlZddlZddlZddl m Z m Z ddl m Z ddlmZmZddlmZddlmZdd lmZee��ZGd �d ��ZGd �d e ��ZGd�dee ��ZGd�de��ZGd�de��ZGd�de��ZGd�de��ZGd�de��Z Gd�de��Z!Gd�de��Z"Gd�de��Z#Gd �d!e��Z$Gd"�d#��Z%dS)$�N)�ABC�abstractmethod)�Path)�Optional�Union�)�config�)�FileLock)� get_loggerc�b�eZdZd deefd�Zdedefd�Zdededefd �Zdd ededefd �Z dS)�ExtractManagerN� cache_dirc��|r*tj�|tj��n tj|_t|_dS�N) �os�path�joinr �EXTRACTED_DATASETS_DIR�EXTRACTED_DATASETS_PATH� extract_dir� Extractor� extractor)�selfrs �f/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/datasets/utils/extract.py�__init__zExtractManager.__init__s;��FO� s�B�G�L�L��F�$A� B� B� B�U[�Us� ��#�����r�returnc��ddlm}tj�|��}tj�|j||����S)Nr )�hash_url_to_filename)� file_utilsr rr�abspathrr)rrr �abs_paths r�_get_output_pathzExtractManager._get_output_pathsN��4�4�4�4�4�4��7�?�?�4�(�(���w�|�|�D�,�.B�.B�8�.L�.L�M�M�Mr� output_path� force_extractc��|pStj�|�� o3tj�|��otj|�� Sr)rr�isfile�isdir�listdir)rr%r&s r� _do_extractzExtractManager._do_extract%sK��� �����{�+�+� +� l�R�W�]�]�;�5O�5O�5k�TV�T^�_j�Tk�Tk�0l� rF� input_pathc���|j�|��}|s|S|�|��}|�||��r|j�|||��|Sr)r�infer_extractor_formatr$r+�extract)rr,r&�extractor_formatr%s rr/zExtractManager.extract*ss���>�@�@��L�L��� �� ��+�+�J�7�7� � � � �K�� 7� 7� N� �N� "� "�:�{�<L� M� M� M��rr�F) �__name__� __module__� __qualname__r�strrr$�boolr+r/�rrrrs�������#�#�(�3�-�#�#�#�#� N�S�N�S�N�N�N�N� �s� �4� �D� � � � � ��#��d��s������rrc��eZdZeedeeefdefd�����Z e edeeefdeeefddfd�����Z dS)� BaseExtractorrrc ��dSrr7��clsr�kwargss r�is_extractablezBaseExtractor.is_extractable5s��GJ�srr,r%Nc��dSrr7)r,r%s rr/zBaseExtractor.extract9s��VY�VYr) r2r3r4� classmethodrrrr5r6r>� staticmethodr/r7rrr9r94s���������J�%��c� �"2�J��J�J�J��^��[�J���Y�E�$��)�,�Y�5��s��;K�Y�PT�Y�Y�Y��^��\�Y�Y�Yrr9c��eZdZUgZeeed<edee e fde fd���Z e d dee e fdedefd���Zd S) �MagicNumberBaseExtractor� magic_numbersr�magic_number_lengthc��t|d��5}|�|��cddd��S#1swxYwYdS)N�rb)�open�read)rrE�fs r�read_magic_numberz*MagicNumberBaseExtractor.read_magic_numberAs��� �$�� � � /���6�6�-�.�.� /� /� /� /� /� /� /� /� /� /� /� /���� /� /� /� /� /� /s �3�7�7r� magic_numberrc�����sGtd�|jD����} |�||���n#t$rYdSwxYwt �fd�|jD����S)Nc3�4K�|]}t|��V��dSr)�len)�.0�cls_magic_numbers r� <genexpr>z:MagicNumberBaseExtractor.is_extractable.<locals>.<genexpr>Is,����%f�%f�@P�c�*:�&;�&;�%f�%f�%f�%f�%f�%frFc3�B�K�|]}��|��V��dSr)� startswith)rPrQrLs �rrRz:MagicNumberBaseExtractor.is_extractable.<locals>.<genexpr>Ns3�����g�g�AQ�<�*�*�+;�<�<�g�g�g�g�g�gr)�maxrDrK�OSError�any)r<rrLrEs ` rr>z'MagicNumberBaseExtractor.is_extractableFs����� �"%�%f�%f�TW�Te�%f�%f�%f�"f�"f� � �"�4�4�T�;N�O�O� � ��� � � ��u�u� �����g�g�g�g�UX�Uf�g�g�g�g�g�gs�:� A�AN�r)r2r3r4rD�list�bytes�__annotations__rArrr5�intrKr@r6r>r7rrrCrC>s��������!#�M�4��;�#�#�#��/��d�C�i� 0�/�s�/�/�/��\�/��h�h�%��c� �"2�h�%�h�RV�h�h�h��[�h�h�hrrCc��eZdZedeeefdefd���Ze d���Z e deeefdeeefddfd���Z dS) � TarExtractorrrc �*�tj|��Sr)�tarfile� is_tarfiler;s rr>zTarExtractor.is_extractableRs���!�$�'�'�'rc#�P��K�dtdtfd��dtdtdtf�fd� �dtdtf��fd� }�|��}|D]�}�|j|��r$t�d|j�d����7|���r7|||��r+t�d|j�d |j������|���r7|||��r+t�d|j�d |j������|V���d S) a� Fix for CVE-2007-4559 Desc: Directory traversal vulnerability in the (1) extract and (2) extractall functions in the tarfile module in Python allows user-assisted remote attackers to overwrite arbitrary files via a .. (dot dot) sequence in filenames in a TAR archive, a related issue to CVE-2001-1267. See: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2007-4559 From: https://stackoverflow.com/a/10077309 rrc�z�tj�tj�|����Sr)rr�realpathr")rs r�resolvedz*TarExtractor.safemembers.<locals>.resolvedbs&���7�#�#�B�G�O�O�D�$9�$9�:�:� :r�basec�~���tj�||�����|�� Sr)rrrrT)rrfres �r�badpathz)TarExtractor.safemembers.<locals>.badpathes4����x��� � �T�4� 8� 8�9�9�D�D�T�J�J�J� Jrc����tj�|tj�|j������}�|j|���S)N)rf)rrr�dirname�name�linkname)�inforf�tiprhres ��r�badlinkz)TarExtractor.safemembers.<locals>.badlinkisI����(�2�7�<�<��b�g�o�o�d�i�.H�.H�I�I�J�J�C��7�4�=�s�3�3�3� 3rzExtraction of z is blocked (illegal path)z is blocked: Symlink to z is blocked: Hard link to N)r5r6rk�logger�error�issymrl�islnk)�membersr%rorf�finforhres @@r� safememberszTarExtractor.safemembersVs������� ;�3� ;�3� ;� ;� ;� ;� K�#� K�S� K�T� K� K� K� K� K� K� 4�� 4�� 4� 4� 4� 4� 4� 4� 4� �x� �$�$��� � �E��w�u�z�4�(�(� �� � �T�e�j�T�T�T�U�U�U�U������ �7�7�5�$�#7�#7� �� � �b�e�j�b�b�RW�R`�b�b�c�c�c�c������ �7�7�5�$�#7�#7� �� � �d�e�j�d�d�TY�Tb�d�d�e�e�e�e�� � � � � � rr,r%Nc���tj|d���tj|��}|�|t �||�����|���dS)NT��exist_ok)rt)r�makedirsr`rH� extractallr^rv�close)r,r%�tar_files rr/zTarExtractor.extractzsf�� � �K�$�/�/�/�/��<� �+�+�����K��1I�1I�(�T_�1`�1`��a�a�a��������r) r2r3r4r@rrr5r6r>rArvr/r7rrr^r^Qs��������(�%��c� �"2�(��(�(�(��[�(��!�!��\�!�F��E�$��)�,��5��s��;K��PT�����\���rr^c�X�eZdZdgZedeeefdeeefddfd���ZdS)� GzipExtractors�r,r%rNc���tj|d��5}t|d��5}tj||��ddd��n #1swxYwYddd��dS#1swxYwYdS�NrG�wb)�gziprH�shutil� copyfileobj)r,r%� gzip_file�extracted_files rr/zGzipExtractor.extract�s��� �Y�z�4� (� (� >�I��k�4�(�(� >�N��"�9�n�=�=�=� >� >� >� >� >� >� >� >� >� >� >���� >� >� >� >� >� >� >� >� >� >� >� >� >� >� >� >���� >� >� >� >� >� >�3�A!�A � A!� A � A!�A �A!�!A%�(A%� r2r3r4rDrArrr5r/r7rrrr�sa������ �M�M��>�E�$��)�,�>�5��s��;K�>�PT�>�>�>��\�>�>�>rrc���eZdZgd�Zed deeefdede f�fd� ��Z e deeefdeeefdd fd ���Z �xZ S) � ZipExtractor)sPKsPKsPKrrrLrc����t���||���rdS ddlm}m}m}m}m}m}m } m } m } m } t|d��5} | | ��}|r�||dkr&||dkr||dkr ddd��dS||||kr�| �||��| ���||krc||| krW| �| ��}t#|��| kr/t%j| |��}||| kr ddd��dSddd��n #1swxYwYdS#t($rYdSwxYw)N�rLTr) � _CD_SIGNATURE�_ECD_DISK_NUMBER�_ECD_DISK_START�_ECD_ENTRIES_TOTAL� _ECD_OFFSET� _ECD_SIZE� _EndRecData�sizeCentralDir�stringCentralDir�structCentralDirrGF)�superr>�zipfiler�r�r�r�r�r�r�r�r�r�rH�seek�tellrIrO�struct�unpack� Exception)r<rrLr�r�r�r�r�r�r�r�r�r��fp�endrec�data�centdir� __class__s �rr>zZipExtractor.is_extractable�sk��� �7�7� !� !�$�\� !� B� B� ��4� � � � � � � � � � � � � � � � � � � � � � � � � ��d�D�!�!� 0�R�$��R����� 0��0�1�Q�6�6�6�)�;L�PQ�;Q�;Q�V\�]h�Vi�mn�Vn�Vn�#� 0� 0� 0� 0� 0� 0� 0� 0�  � 0�1�V�O�5L�L�L�����{� 3�4�4�4��7�7�9�9��{�(;�;�;��y�@Q�Uc�@c�@c�#%�7�7�>�#:�#:�D�"�4�y�y�N�:�:�*0�-�8H�$�*O�*O��#*�=�#9�=M�#M�#M�+/� 0� 0� 0� 0� 0� 0� 0� 0� 0� 0� 0� 0� 0� 0� 0� 0� 0� 0� 0���� 0� 0� 0� 0��5��� � � ��5�5� ���sH�(E�3E � E�B!E �1 E�> E� E�E�E�E� E%�$E%r,r%Nc���tj|d���tj|d��5}|�|��|���ddd��dS#1swxYwYdS)NTrx�r)rrzr��ZipFiler{r|)r,r%�zip_files rr/zZipExtractor.extract�s��� � �K�$�/�/�/�/� �_�Z�� -� -� �� � � � � ,� ,� ,� �N�N� � � � � � � � � � � � � � � ���� � � � � � s�*A#�#A'�*A'rX)r2r3r4rDr@rrr5rZr6r>rAr/� __classcell__)r�s@rr�r��s�����������M� �"�"�%��c� �"2�"�%�"�RV�"�"�"�"�"��[�"�H��E�$��)�,��5��s��;K��PT�����\�����rr�c�X�eZdZdgZedeeefdeeefddfd���ZdS)� XzExtractors�7zXZr,r%rNc���tj|��5}t|d��5}tj||��ddd��n #1swxYwYddd��dS#1swxYwYdS)Nr�)�lzmarHr�r��r,r%�compressed_filer�s rr/zXzExtractor.extract�s �� �Y�z� "� "� D�o��k�4�(�(� D�N��"�?�N�C�C�C� D� D� D� D� D� D� D� D� D� D� D���� D� D� D� D� D� D� D� D� D� D� D� D� D� D� D� D���� D� D� D� D� D� Ds3�A �A� A �A � A �A �A � A$�'A$r�r7rrr�r��sk������0�1�M��D�E�$��)�,�D�5��s��;K�D�PT�D�D�D��\�D�D�Drr�c�Z�eZdZddgZedeeefdeeefddfd���ZdS)� RarExtractorsRar!sRar!r,r%rNc���tjstd���ddl}t j|d���|�|��}|�|��|���dS)NzPlease pip install rarfilerTrx) r �RARFILE_AVAILABLE� ImportError�rarfilerrz�RarFiler{r|)r,r%r��rfs rr/zRarExtractor.extract�sn���'� <��:�;�;� ;����� � �K�$�/�/�/�/� �_�_�Z� (� (�� � � �k�"�"�"� ��� � � � � rr�r7rrr�r��se������(�*A�B�M���E�$��)�,��5��s��;K��PT�����\���rr�c�X�eZdZdgZedeeefdeeefddfd���ZdS)� ZstdExtractors(�/�r,r%rNc�:�tjstd���ddl}|���}t |d��5}t |d��5}|�||��ddd��n #1swxYwYddd��dS#1swxYwYdS)NzPlease pip install zstandardrrGr�)r �ZSTANDARD_AVAILABLEr�� zstandard�ZstdDecompressorrH� copy_stream)r,r%�zstd�dctx�ifh�ofhs rr/zZstdExtractor.extract�s!���)� >��<�=�=� =� � � � ��$�$�&�&�� �*�d� #� #� '�s�D��d�,C�,C� '�s� � � �S�#� &� &� &� '� '� '� '� '� '� '� '� '� '� '���� '� '� '� '� '� '� '� '� '� '� '� '� '� '� '� '���� '� '� '� '� '� 's6�B�A8�, B�8A< �<B�?A< �B�B�Br�r7rrr�r��sb������(�)�M��'�E�$��)�,�'�5��s��;K�'�PT�'�'�'��\�'�'�'rr�c�X�eZdZdgZedeeefdeeefddfd���ZdS)�Bzip2ExtractorsBZhr,r%rNc���tj|d��5}t|d��5}tj||��ddd��n #1swxYwYddd��dS#1swxYwYdSr�)�bz2rHr�r�r�s rr/zBzip2Extractor.extract�s �� �X�j�$� '� '� D�?��k�4�(�(� D�N��"�?�N�C�C�C� D� D� D� D� D� D� D� D� D� D� D���� D� D� D� D� D� D� D� D� D� D� D� D� D� D� D� D���� D� D� D� D� D� Dr�r�r7rrr�r��sk������$�%�M��D�E�$��)�,�D�5��s��;K�D�PT�D�D�D��\�D�D�Drr�c�X�eZdZdgZedeeefdeeefddfd���ZdS)�SevenZipExtractors7z��'r,r%rNc���tjstd���ddl}t j|d���|�|d��5}|�|��ddd��dS#1swxYwYdS)NzPlease pip install py7zrrTrxr�)r �PY7ZR_AVAILABLEr��py7zrrrz� SevenZipFiler{)r,r%r��archives rr/zSevenZipExtractor.extract�s����%� :��8�9�9� 9�� � � � � �K�$�/�/�/�/� � � � �C� 0� 0� ,�G� � � �{� +� +� +� ,� ,� ,� ,� ,� ,� ,� ,� ,� ,� ,� ,���� ,� ,� ,� ,� ,� ,s� A/�/A3�6A3r�r7rrr�r��sb������0�1�M��,�E�$��)�,�,�5��s��;K�,�PT�,�,�,��\�,�,�,rr�c�X�eZdZdgZedeeefdeeefddfd���ZdS)� Lz4Extractors"Mr,r%rNc�&�tjstd���ddl}|j�|d��5}t |d��5}t j||��ddd��n #1swxYwYddd��dS#1swxYwYdS)NzPlease pip install lz4rrGr�)r � LZ4_AVAILABLEr�� lz4.frame�framerHr�r�)r,r%�lz4r�r�s rr/zLz4Extractor.extracts7���#� 8��6�7�7� 7����� �Y�^�^�J�� -� -� D���k�4�(�(� D�N��"�?�N�C�C�C� D� D� D� D� D� D� D� D� D� D� D���� D� D� D� D� D� D� D� D� D� D� D� D� D� D� D� D���� D� D� D� D� D� Ds5�B� A.�" B�.A2 �2B�5A2 �6B�B � B r�r7rrr�r��sk������(�)�M��D�E�$��)�,�D�5��s��;K�D�PT�D�D�D��\�D�D�Drr�c �N�eZdZUeeeeeee e e d� Z e eeefed<ed���Zedeeefdefd���Zeddeeefded efd ���Zedeeefd eefd ���Zed eeefd eeefded dfd���ZdS)r) �tarr��zip�xz�rarr�r��7zr�� extractorsc�b�td�|j���D����S)Nc3�rK�|]2}t|t���|jD]}t|��V���3dSr)� issubclassrCrDrO)rPr�extractor_magic_numbers rrRz9Extractor._get_magic_number_max_length.<locals>.<genexpr>sn���� � ���)�%=�>�>� �+4�*A�  � �'� �&� '� '� � � � � � � r)rUr��values)r<s r�_get_magic_number_max_lengthz&Extractor._get_magic_number_max_lengths>��� � � �^�2�2�4�4� � � � � � rrrEc�^� t�||���S#t$rYdSwxYw)N)rEr)rCrKrV)rrEs r�_read_magic_numberzExtractor._read_magic_number$sC�� �+�=�=�d�Xk�=�l�l� l��� � � ��3�3� ���s �� ,�,F�return_extractorrc��tjdt���|�|��}|r|sdnd|j|fS|sdndS)Nz{Method 'is_extractable' was deprecated in version 2.4.0 and will be removed in 3.0.0. Use 'infer_extractor_format' instead.)�categoryTF)FN)�warnings�warn� FutureWarningr.r�)r<rr�r0s rr>zExtractor.is_extractable+sm��� � 4�"� � � � � �5�5�d�;�;�� � ^�/�]�4�4�d�C�N�K[�<\�5]� ]�,�?�u�u�-�?rc���|���}|�||��}|j���D] \}}|�||���r|cS�!dS)Nr�)r�r�r��itemsr>)r<r�magic_number_max_lengthrLr0rs rr.z Extractor.infer_extractor_format7s���"%�"B�"B�"D�"D���-�-�d�4K�L�L� �+.�>�+?�+?�+A�+A� (� (� '� �i��'�'��<�'�H�H� (�'�'�'�'� (� (� (rr,r%r0Nc��tjtj�|��d���t t |���d����}t|��5tj |d���|j |}|� ||��cddd��S#1swxYwYdS)NTrxz.lock)� ignore_errors) rrzrrjr5r� with_suffixr r��rmtreer�r/)r<r,r%r0� lock_pathrs rr/zExtractor.extract?s��� � �B�G�O�O�K�0�0�4�@�@�@�@���[�)�)�5�5�g�>�>�?�?� � �i� � � >� >� �M�+�T� :� :� :� :���'7�8�I��$�$�Z��=�=� >� >� >� >� >� >� >� >� >� >� >� >���� >� >� >� >� >� >s�29B8�8B<�?B<r1)r2r3r4r^rr�r�r�r�r�r�r�r��dictr5�typer9r[r@r�rArrr\r�r6r>rr.r/r7rrrr s����������������� 2� 2�J��S�$�}�-�-�.� � � �� � ��[� ����t�S�y�!1�������\�� � @� @�%��c� �"2� @�d� @�W[� @� @� @��[� @��(�%��c� �*:�(�x��}�(�(�(��[�(�� >��$��)�$� >��4��9�%� >�� >� � >� >� >��[� >� >� >rr)&r�r�r�rr�r�r`r�r��abcrr�pathlibr�typingrr�r � _filelockr �loggingr r2rprr9rCr^rr�r�r�r�r�r�r�rr7rr�<module>r�s;�� � � � � � � � � � � � � � � � � � � � � � � � �������������#�#�#�#�#�#�#�#�������"�"�"�"�"�"�"�"������������������� ��H� � ����������<Z�Z�Z�Z�Z�C�Z�Z�Z�h�h�h�h�h�}�c�h�h�h�&.�.�.�.�.�=�.�.�.�b>�>�>�>�>�,�>�>�>�1�1�1�1�1�+�1�1�1�hD�D�D�D�D�*�D�D�D� � � � � �+� � � � '� '� '� '� '�,� '� '� '�D�D�D�D�D�-�D�D�D� ,� ,� ,� ,� ,�0� ,� ,� ,� D� D� D� D� D�+� D� D� D�?>�?>�?>�?>�?>�?>�?>�?>�?>�?>r
Memory