� ��gs��dZddlZddlZddlmZddlmZmZmZddl Z ddl ZddlZ ddlmZddlmZddlmZddlmZmZmZmZmZdd lmZmZmZmZm Z m!Z!m"Z"m#Z#m$Z$dd l%m&Z&ddl'm(Z(ddl)m*Z*m+Z+dd l,m-Z-m.Z.m/Z/m0Z0ddl1m2Z2ddl3m4Z4m5Z5e2j6e7��Z8e9Z:deedee;fd�Z<Gd�de=��Z>Gd�d��Z?Gd�de?��Z@Gd�d��ZAGd�deA��ZBdS)z$To write records into Parquet files.�N)�Iterable)�Any�Optional�Union)� url_to_fs�)�config)�Audio�Features�Image�Value�Video) �FeatureType�_ArrayXDExtensionType�_visit�cast_to_python_objects�generate_from_arrow_type�get_nested_type�%list_of_np_array_to_pyarrow_listarray�numpy_to_pyarrow_listarray�to_pyarrow_listarray)�is_remote_filesystem)�DatasetInfo)�DuplicatedKeysError� KeyHasher)� array_cast�cast_array_to_feature�embed_table_storage� table_cast)�logging)�asdict�first_non_null_value�features�returnc��|sdStj�dtddf�fd�}t||��tjurdn�S)a� Get the writer_batch_size that defines the maximum row group size in the parquet files. The default in `datasets` is 1,000 but we lower it to 100 for image/audio datasets and 10 for videos. This allows to optimize random access to parquet file, since accessing 1 row requires to read its entire row group. This can be improved to get optimized size for querying/iterating but at least it matches the dataset viewer expectations on HF. Args: features (`datasets.Features` or `None`): Dataset Features from `datasets`. Returns: writer_batch_size (`Optional[int]`): Writer batch size to pass to a dataset builder. If `None`, then it will use the `datasets` default. N�featurer$c��t|t��rt�tj��dSt|t ��rt�tj��dSt|t��rt�tj��dSt|t��r'|j dkrt�tj��dSdSdS)N�binary)� isinstancer�minr �)PARQUET_ROW_GROUP_SIZE_FOR_IMAGE_DATASETSr �)PARQUET_ROW_GROUP_SIZE_FOR_AUDIO_DATASETSr�)PARQUET_ROW_GROUP_SIZE_FOR_VIDEO_DATASETSr �dtype�*PARQUET_ROW_GROUP_SIZE_FOR_BINARY_DATASETS)r&� batch_sizes ��e/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/datasets/arrow_writer.py�set_batch_sizez-get_writer_batch_size.<locals>.set_batch_sizeLs��g�u�%�%� \��Z��)Y�Z�Z�J�J�J� �� '� '� \��Z��)Y�Z�Z�J�J�J� �� '� '� \��Z��)Y�Z�Z�J�J�J� �� '� '� \�G�M�X�,E�,E��Z��)Z�[�[�J�J�J� \� \�,E�,E�)�np�infrr)r#r2r0s @r1�get_writer_batch_sizer65su��$��t��J� \�� \�� \� \� \� \� \� \��8�^�$�$�$��'�'�4�4�Z�7r3c��eZdZdS)�SchemaInferenceErrorN)�__name__� __module__�__qualname__�r3r1r8r8\s��Dr3r8c ��eZdZdZ ddedeedeedeefd�Zdefd �Ze dede eeeffd ��Zd deej fd�ZdS)� TypedSequencea� This data container generalizes the typing when instantiating pyarrow arrays, tables or batches. More specifically it adds several features: - Support extension types like ``datasets.features.Array2DExtensionType``: By default pyarrow arrays don't return extension arrays. One has to call ``pa.ExtensionArray.from_storage(type, pa.array(data, type.storage_type))`` in order to get an extension array. - Support for ``try_type`` parameter that can be used instead of ``type``: When an array is transformed, we like to keep the same type as before if possible. For example when calling :func:`datasets.Dataset.map`, we don't want to change the type of each column by default. - Better error message when a pyarrow array overflows. Example:: from datasets.features import Array2D, Array2DExtensionType, Value from datasets.arrow_writer import TypedSequence import pyarrow as pa arr = pa.array(TypedSequence([1, 2, 3], type=Value("int32"))) assert arr.type == pa.int32() arr = pa.array(TypedSequence([1, 2, 3], try_type=Value("int32"))) assert arr.type == pa.int32() arr = pa.array(TypedSequence(["foo", "bar"], try_type=Value("int32"))) assert arr.type == pa.string() arr = pa.array(TypedSequence([[[1, 2, 3]]], type=Array2D((1, 3), "int64"))) assert arr.type == Array2DExtensionType((1, 3), "int64") table = pa.Table.from_pydict({ "image": TypedSequence([[[1, 2, 3]]], type=Array2D((1, 3), "int64")) }) assert table["image"].type == Array2DExtensionType((1, 3), "int64") N�data�type�try_type�optimized_int_typec��|�|�td��||_||_||_||_|jdu|_|duo|duo|du|_d|_dS)Nz)You cannot specify both type and try_type)� ValueErrorr?r@rArB�trying_type�trying_int_optimization�_inferred_type)�selfr?r@rArBs r1�__init__zTypedSequence.__init__�s{�� 4��H�I�I�I�� "4��=��4��'9��'E�'k�$�RV�,�'k�[c�gk�[k��$�"��r3r$c�t�|j�+ttj|��j��|_|jS)a�Return the inferred feature type. This is done by converting the sequence to an Arrow array, and getting the corresponding feature type. Since building the Arrow array can be expensive, the value of the inferred type is cached as soon as pa.array is called on the typed sequence. Returns: FeatureType: inferred feature type of the sequence. )rGr�pa�arrayr@�rHs r1�get_inferred_typezTypedSequence.get_inferred_type�s1��&�":�2�8�D�>�>�;N�"O�"O�D��"�"r3c��tjrXdtjvrJddl}t|��\}}t ||jj��rd�|D��t��fS|dfS)a�Implement type inference for custom objects like PIL.Image.Image -> Image type. This function is only used for custom python objects that can't be direclty passed to build an Arrow array. In such cases is infers the feature type to use, and it encodes the data so that they can be passed to an Arrow array. Args: data (Iterable): array of data to infer the type, e.g. a list of PIL images. Returns: Tuple[Iterable, Optional[FeatureType]]: a tuple with: - the (possibly encoded) array, if the inferred feature type requires encoding - the inferred feature type if the array is made of supported custom objects like PIL images, else None. �PILrNc�X�g|]'}|�!t��|��nd��(S�N)r�encode_example��.0�values r1� <listcomp>z?TypedSequence._infer_custom_type_and_encode.<locals>.<listcomp>�s6��g�g�g�Y^��9J��.�.�u�5�5�5�PT�g�g�gr3)r � PIL_AVAILABLE�sys�modules� PIL.Imager"r)r)r?rP�non_null_idx�non_null_values r1�_infer_custom_type_and_encodez+TypedSequence._infer_custom_type_and_encode�s|��"�� q�E�S�[�$8�$8��+?��+E�+E�(�L�.��.�#�)�/�:�:� q�g�g�bf�g�g�g�in�ip�ip�p�p��T�z�r3c ��|�td��~|j}|j�$|j�|�|��\}|_|j�|jr|jn|j}n|j}|�t|��nd}|j�t|j��nd}d} t|t��r0t||��}tj �||��St|tj��rt#|��}nyt|t$��r?|r=tt'|��dtj��rt)|��}n%d}tjt-|d��}|j�r7tj�|j��r|�|��}�ntj�|j��r�tj�|jj��r#t;|tj|��}n�tj�|jj��rbtj�|jjj��r4t;|tjtj|��}n!|�t?|||j|j��}|S#t@tj!j"tj!j#f$�r�}|js t|tj!j#��r�|j�r� t|tj��rt#|��cYd}~St|t$��r/|r-tId�|D��rt)|��cYd}~Sd}tjt-|d��cYd}~S#tj!j"$�r}d tK|��vr$tMd tO|��d|�d��d�|jrfd tK|��vrUtj(|�)��j*} tV�,d| �d��|cYd}~cYd}~S|rVdtK|��vrEtjt-|dd��}|�t?||dd��}|cYd}~cYd}~S�d}~wwxYwd tK|��vr$tMd tO|��d|�d��d�|jrad tK|��vrPtj(|�)��j*} tV�,d| �d��|cYd}~S|rQdtK|��vr@tjt-|dd��}|�t?||dd��}|cYd}~S�d}~wwxYw)z=This function is called when calling pa.array(typed_sequence)NzMTypedSequence is supposed to be used with pa.array(typed_sequence, type=None)FrT)�only_1d_for_numpy)�allow_primitive_to_str�allow_decimal_to_strc3�JK�|]}t|tj��V��dSrR)r)r4�ndarrayrTs r1� <genexpr>z0TypedSequence.__arrow_array__.<locals>.<genexpr>s1��@q�@q�ch��E�SU�S]�A^�A^�@q�@q�@q�@q�@q�@qr3�overflowz There was an overflow with type zE. Try to reduce writer_batch_size to have batches smaller than 2GB. (�)znot in rangezFailed to cast a sequence to z. Falling back to int64.zCould not convert)r`�optimize_list_casting)-rDr?r@rAr^rGrErrBr)rrrK�ExtensionArray�from_storager4rdr�listr"rrLrrF�types�is_int64�cast�is_list� value_typer�list_r� TypeError�lib�ArrowInvalid�ArrowNotImplementedError�any�str� OverflowError�type_r.�to_pandas_dtype�name�logger�info) rHr@r?�pa_type�optimized_int_pa_type�trying_cast_to_python_objects�storage�out�e�optimized_int_pa_type_strs r1�__arrow_array__zTypedSequence.__arrow_array__�s��l�m�m�m��y��9��!6�(,�(J�(J�4�(P�(P�%�D�$�%��&�$(�$4�C�4�=�=�$�)�D�D��&�D�+/�+;�/�$�'�'�'��8<�8O�8[�O�D�3�4�4�4�ae� �).�%�V ��'�#8�9�9� H�.�t�W�=�=��(�5�5�g�w�G�G�G��$�� +�+� U�0��6�6��D�$�'�'� U�D� U�Z�@T�UY�@Z�@Z�[\�@]�_a�_i�5j�5j� U�;�D�A�A��04�-��h�5�d�d�S�S�S�T�T��+� ��8�$�$�S�X�.�.�Y��(�(�#8�9�9�C�C��X�%�%�c�h�/�/�Y��x�(�(��)<�=�=�Y�(��b�h�7L�.M�.M�N�N��)�)�#�(�*=�>�>�Y�2�8�CT�CT�UX�U]�Uh�Us�Ct�Ct�Y�(��b�h�r�x�@U�7V�7V�.W�.W�X�X��!�,��$�:J�6J�ei�eu�au��J��F��F�+� �6 �6 �6 ��#� � �1�b�f�6U�(V�(V� ��- ��!�$�� 3�3�^�9�$�?�?�?�?�?�?�?�?�#�D�$�/�/�^�D�^�S�@q�@q�lp�@q�@q�@q�=q�=q�^�D�T�J�J�J�J�J�J�J�J�8<�5�!�x�(>�t�W[�(\�(\�(\�]�]�]�]�]�]�]�]��v�*��!�S��V�V�+�+�+�W�u�T�{�{�W�W�ST�W�W�W��#�$��5��.�C�PQ�F�F�:R�:R�46�H�=R�=b�=b�=d�=d�4e�4e�4j�1��o�<U�o�o�o�� #� � � � � � � � � � � �6� �;N�RU�VW�RX�RX�;X�;X� �h�2�4�4�gl�m�m�m�� +�"7� #�T�$�]a�#�#�#�C� #� � � � � � � � � � � ��+��,�s�1�v�v�%�%�#�O�u�T�{�{�O�O�KL�O�O�O��-� �.�C��F�F�2J�2J�,.�H�5J�5Z�5Z�5\�5\�,]�,]�,b�)��o�<U�o�o�o�p�p�p�� .� �3F�#�a�&�&�3P�3P��h�5�d�d�jo�p�p�p�q�q��#�/��T�RV�mq�r�r�r�C�� m6 ��s��AK&�!HK&�&,W,�/W'�(O�*W,�0>O�.W,�4$O�W,�S3�3BS.� S3�W'�W,�A S.�"S3�#W'�'W,�-S.�.S3�3BW'� W,�A W'� W,�&W'�'W,)NNNrR)r9r:r;�__doc__rrrrIrN�staticmethod�tupler^rK�DataTyper�r<r3r1r>r>`s��%�%�T'+�*.�48�#�#��#��{�#�#��;�'� #� %�[�1�#�#�#�#�* #�;� #� #� #� #��H��x��R]�I^�?^�9_��\��0i�i�H�R�[�$9�i�i�i�i�i�ir3r>c �j��eZdZ ddeedeedeedeef�fd� Z�xZS)�OptimizedTypedSequenceNr@rA�colrBc��td��td��td��td��d�}|�|�|�|d��}t��||||��dS)N�int8�int32)�attention_mask�special_tokens_mask� input_ids�token_type_ids)r@rArB)r �get�superrI)rHr?r@rAr�rB�optimized_int_type_by_col� __class__s �r1rIzOptimizedTypedSequence.__init__2s��$�F�m�m�#(��=�=��w��#�� % �% �!��<�H�,�!:�!>�!>�s�D�!I�!I�� D�8�Pb��c�c�c�c�cr3)NNNN)r9r:r;rrrwrI� __classcell__)r�s@r1r�r�1s��'+�*.�!�48� d�d��{�#�d��;�'� d� �c�]�d�%�[�1� d�d�d�d�d�d�d�d�d�dr3r�c�r�eZdZdZejZ d,deejdee dee d eejd ee deedee d ee de de de de de deefd�Zd�Zd�Zd�Zd�Zdejfd�Zed��Zed-ded ee dee e ffd��Zd�Zd �Z d.d!ee efd"eee eefdeefd#�Zd$�Z d-d%ej!deefd&�Z" d-d'ee e#fdeefd(�Z$d-d)ej!deefd*�Z%d/d+�Z&dS)0�ArrowWriterz,Shuffles and writes Examples to Arrow files.NFT�examples�schemar#�path�stream�fingerprint�writer_batch_size� hash_salt�check_duplicates�disable_nullable�update_features� with_metadata�unit�embed_local_files�storage_optionsc�J�|�|�td��|�||_d|_n6|�&||_tj|j��|_nd|_d|_|�t|��|_ntd��|_||_| |_|�vt|fi|pi��\}}||_ t|j ��s|n|j �|��|_ |j �|d��|_d|_nd|_ d|_ ||_d|_||_| |_|pt'|j��pt(j|_| |_||_||_| |_d|_d|_g|_g|_d|_g|_ dS)Nz1At least one of path and stream must be provided.��wbTFr)!rD� _features�_schemar�from_arrow_schemar�_hasher�_check_duplicates�_disable_nullabler�_fsr�unstrip_protocol�_path�openr��_closable_streamr�r�r6r �DEFAULT_MAX_BATCH_SIZEr�r�r�r�r�� _num_examples� _num_bytes�current_examples�current_rows� pa_writer�hkey_record)rHr�r#r�r�r�r�r�r�r�r�r�r�r�r��fss r1rIzArrowWriter.__init__Ls��"�<�F�N��P�Q�Q�Q��%�D�N��D�L�L� � �&,�D�L�%�7��E�E�D�N�N�!�D�N��D�L�� $�Y�/�/�D�L�L�$�R�=�=�D�L�!1��!1��>� ��A�A�/�*?�R�A�A�H�B��24�D�H�%9�$�(�%C�%C�h��Ib�Ib�cg�Ih�Ih�D�J��(�-�-��d�3�3�D�K�$(�D�!�!��D�H��D�J� �D�K�$)�D�!�&�� 0��g�!6�t�~�!F�!F�g�&�Jg� �� /��*�� !2��BD��,.��?C��r3c�d�|jt|j��zt|j��zS)z/Return the number of writed and staged examples)r��lenr�r�rMs r1�__len__zArrowWriter.__len__�s+��!�C��(=�$>�$>�>��T�EV�AW�AW�W�Wr3c��|SrRr<rMs r1� __enter__zArrowWriter.__enter__�s��r3c�.�|��dSrR)�close)rH�exc_type�exc_val�exc_tbs r1�__exit__zArrowWriter.__exit__�s�� r3c��|jr+ |j��n#t$rYnwxYw|jr'|jjs|j��dSdSdSrR)r�r�� Exceptionr�r��closedrMs r1r�zArrowWriter.close�s��>� � ��$�$�&�&�&�&�� );� ��K�� s�#� 0�0�inferred_schemac�j�|j}tj|��}|j�X|jrPd�|jjD��}|jD])}|j}||vr|||kr|j|||<�*||_|}n||_|j}|jrtjd�|D��}|j rB|� |�t|j��|j ��}n|� i��}||_|�|j|��|_dS)Nc��i|] }|j|��Sr<)r{�rU�fields r1� <dictcomp>z-ArrowWriter._build_writer.<locals>.<dictcomp>�s��M�M�M��%�*�e�M�M�Mr3c3�XK�|]%}tj|j|jd��V��&dS�F)�nullableN�rKr�r{r@r�s r1rez,ArrowWriter._build_writer.<locals>.<genexpr>�s8��d�d�TY�r�x�� E�J��O�O�O�d�d�d�d�d�dr3)r#)r�rr�r�r�r@r{�arrow_schemar�rKr��_build_metadatarr�r�� _WRITER_CLASSr�r�)rHr�r��inferred_features�fields�inferred_fieldr{s r1� _build_writerzArrowWriter._build_writer�sM��$�6��G�G��>�%��#� 4�M�M��9L�M�M�M��&7�&<�K�K�N�)�.�D��v�~�~�)�V�D�\�9�9�6:�n�T�6J�-�d�3��!2��$3��.�D�N� 1� >�F�� e��Y�d�d�]c�d�d�d�d�d�F�� .��)�)�$�*>�*>�{�TX�Tb�?c�?c�?c�ei�eu�*v�*v�w�w�F�F��)�)�"�-�-�F��+�+�D�K��@�@��r3c��|j�|jn&|j�tj|jj��nd}|jr |�tjd�|D��}|�|ngS)Nc3�XK�|]%}tj|j|jd��V��&dSr�r�r�s r1rez%ArrowWriter.schema.<locals>.<genexpr>�s8��f�f�UZ��U�Z�%� P� P� P�f�f�f�f�f�fr3)r�r�rKr�r@r�)rHr�s r1r�zArrowWriter.schema�sz��|�'� �L�L�48�N�4N�"�)�D�N�/�0�0�0�TX� � �!� g�g�&9��i�f�f�^e�f�f�f�f�f�G�!�-�w�w�2�5r3r}r$c��dg}t|��i}�fd�|D��|d<|�||d<dtj|��iS)Nr#c�"��i|]}|�|��Sr<r<)rU�key�info_as_dicts �r1r�z/ArrowWriter._build_metadata.<locals>.<dictcomp>�s ��H�H�H�s�C��c�!2�H�H�Hr3r}r��huggingface)r!�json�dumps)r}r�� info_keys�metadatar�s @r1r�zArrowWriter._build_metadata�s]��L� ��d�|�|��H�H�H�H�i�H�H�H��"�&1�H�]�#��t�z�(�3�3�4�4r3c�b��|jsdS|jrjt|jj��|jdd��fd�|jjD��}�fd��D��}||z}n t|jdd��}i}|D]o�t �fd�|jD��r7�fd�|jD��}d�|D��}tj|��|�<�Y�fd�|jD��|�<�p|� |� ��g|_dS) ziWrite stored examples from the write-pool of examples. It makes a table out of the examples and write it.Nrc��g|]}|�v�|�� Sr<r<)rUr�� examples_colss �r1rWz6ArrowWriter.write_examples_on_file.<locals>.<listcomp>�s#��T�T�T�3�s�m�?S�?S�3�?S�?S�?Sr3c��g|]}|�v�|�� Sr<r<�rUr��schema_colss �r1rWz6ArrowWriter.write_examples_on_file.<locals>.<listcomp>�s#��Q�Q�Q�#�#�[�:P�:P�#�:P�:P�:Pr3c3�|�K�|]6}t|d�tjtjf��V��7dS)rN)r)rK�Array�ChunkedArray�rU�rowr�s �r1rez5ArrowWriter.write_examples_on_file.<locals>.<genexpr>�s>��i�i�C�:�c�!�f�S�k�B�H�b�o�+F�G�G�i�i�i�i�i�ir3c�,��g|]}|d��S�rr<r�s �r1rWz6ArrowWriter.write_examples_on_file.<locals>.<listcomp>�s!��G�G�G�#�#�a�&��+�G�G�Gr3c�`�g|]+}t|tj��r|jn|gD]}|��,Sr<)r)rKr��chunks)rUrL�chunks r1rWz6ArrowWriter.write_examples_on_file.<locals>.<listcomp>�sZ��2<�U�B�O�2T�2T�"a�%�,�,�[`�Za��r3c��g|]h}t|d�tjtjf��r&|d��dn |d��iSr�)r)rKr�r�� to_pylistr�s �r1rWz6ArrowWriter.write_examples_on_file.<locals>.<listcomp>�sv��'�'�'��3=�S��V�C�[�2�8�UW�Ud�Je�2f�2f�w�C��F�3�K�)�)�+�+�A�.�.�lo�pq�lr�sv�lw�'�'�'r3)�batch_examples) r�r��set�names�keysrk�allrK� concat_arrays�write_batch) rH�common_cols� extra_cols�colsr��arraysr�r�r�s @@@r1�write_examples_on_filez"ArrowWriter.write_examples_on_file�s��$� ��F��;� 5��d�k�/�0�0�K� �1�!�4�Q�7�<�<�>�>�M�T�T�T�T�$�+�*;�T�T�T�K�Q�Q�Q�Q��Q�Q�Q�J��+�D�D��-�a�0��3�4�4�D�� C��i�i�i�i�SW�Sh�i�i�i�i�i� �G�G�G�G��1F�G�G�G��!'�� ')�&6�v�&>�&>��s�#�#�'�'�'�'�#�4�'�'�'��s�#�#� ��7�7�7� "��r3c��|jsdStj|j��}|�|��g|_dS)zwWrite stored rows from the write-pool of rows. It concatenates the single-row tables and it writes the resulting table.N)r�rK� concat_tables�write_table)rH�tables r1�write_rows_on_filezArrowWriter.write_rows_on_file�sG�� F�� !2�3�3��r3�exampler�c��|jrS|j�|��}|j�||f��|j�||f��n|j�|df��|�|j}|�Pt|j��|kr:|jr|��g|_|� ��dSdSdS)z�Add a given (Example,Key) pair to the write-pool of examples which is written to file. Args: example: the Example to add. key: Optional, a unique identifier(str, int or bytes) associated with each example r�N) r�r��hashr��appendr�r�r��check_duplicate_keysr)rHrr�r�rs r1�writezArrowWriter.write�s��!� 8��<�$�$�S�)�)�D��!�(�(�'�4��9�9�9��#�#�T�3�K�0�0�0�0� �!�(�(�'�2��7�7�7��$� $� 6��(�S��1F�-G�-G�K\�-\�-\��%� &��)�)�+�+�+�#%�� '�'�)�)�)�)�)� )�(�-\�-\r3c��t��}�jD]O\�}�|vr1��fd�t�j��D��}t||��|��PdS)z+Raises error if duplicates found in a batchc�V��g|]%\}\}}|�k�t�j|z��&Sr<)rwr�)rU�index�duplicate_hash�_rrHs ��r1rWz4ArrowWriter.check_duplicate_keys.<locals>.<listcomp>!sG��)�)�)�2��2��%��-�-��*�U�2�3�3�-�-�-r3N)r�r�� enumerater�add)rH� tmp_recordr��duplicate_key_indicesrs` @r1rz ArrowWriter.check_duplicate_keyss��U�U� ��)� %� %�I�D�#��z�!�!�)�)�)�)�)�6?��@P�6Q�6Q�)�)�)�%�*�#�/D�E�E�E��t�$�$�$�$� %� %r3r�c��t|��dkr tdt|��d��|j�|��|�|j}|�.t|j��|kr|��dSdSdS)z�Add a given single-row Table to the write-pool of rows which is written to file. Args: row: the row to add. rz>Only single-row pyarrow tables are allowed but got table with z rows.N)r�rDr�rr�r)rHr�r�s r1� write_rowzArrowWriter.write_row+s��s�8�8�q�=�=��n�^a�be�^f�^f�n�n�n�o�o�o�� %�%�%��$� $� 6��(�S��1B�-C�-C�GX�-X�-X��#�#�%�%�%�%�%�)�(�-X�-Xr3r�c�D��|rAttt|��dkrdS|j� |jrdn|j}|j�|jr|jnd}g}t��}|jrYt|jj ��|��fd�|jj D��}�fd��D��}||z} nt|��} | D]�} || }|r|| nd}t|tjtjf��rA|�t#||��n|} |�| ��t'|j��|| <�}|�| |vr|| nd}t+|||| ��}|�tj|��|��|| <��|j�|jn|j}tj�||��}|�||��dS)z�Write a batch of Example to file. Ignores the batch if it appears to be empty, preventing a potential schema update of unknown types. Args: batch_examples: the batch of examples to add. rNc��g|]}|�v�|�� Sr<r<)rUr�� batch_colss �r1rWz+ArrowWriter.write_batch.<locals>.<listcomp>Os#��Q�Q�Q�3�s�j�?P�?P�3�?P�?P�?Pr3c��g|]}|�v�|�� Sr<r<r�s �r1rWz+ArrowWriter.write_batch.<locals>.<listcomp>Ps#��N�N�N�#�s�+�7M�7M�#�7M�7M�7Mr3)r@rAr�)r�)r��next�iter�valuesr�r�r�rr�r�r�r�rkr)rKr�r�rrrr@r�rLrNr��Table�from_arraysr )rHr�r�r#�try_featuresrr�rrrr�� col_values�col_typerL�col_try_type�typed_sequencer��pa_tablerr�s @@r1rzArrowWriter.write_batch9sO�� c�$�t�N�,A�,A�,C�,C�'D�'D�"E�"E�F�F�!�K�K��F��>�1�d�6J�1�4�4�PT�P^��)-��)?�D�DX�)?�t�~�~�^b��$�J�J��;� (��d�k�/�0�0�K�'�,�,�.�.�J�Q�Q�Q�Q�$�+�*;�Q�Q�Q�K�N�N�N�N��N�N�N�J��+�D�D��'�'�D�� L� L�C�'��,�J�(0�:�x��}�}�d�H��*�r�x��&A�B�B� L�GO�G[�-�j�(�C�C�C�ak�� e�$�$�$�)A�*�/�)R�)R�!�#�&�&�4@�4L�QT�Xd�Qd�Qd�|�C�0�0�jn��!7� ��\h�nq�!r�!r�!r�� b�h�~�6�6�7�7�7�)7�)I�)I�)K�)K�!�#�&�&�37�>�3I�"�/�/�t�{��8�'�'��v�'�>�>��#4�5�5�5�5�5r3r+c�d�|�|j}|j�|�|j��|��}t||j��}|jrt|��}|xj |j z c_ |xj|jz c_|j� ||��dS)zUWrite a Table to file. Args: example: the Table to add. N)r�)r�r�r�r��combine_chunksrr�r�rr��nbytesr��num_rowsr )rHr+r�s r1r zArrowWriter.write_tableds��$� $� 6��>�!��x��?�?�?��*�*�,�,��h��5�5��!� 5�*�8�4�4�H��8�?�*��h�/�/��"�"�8�->�?�?�?�?�?r3c�d�|��|jr|��g|_|��|j�!|jr|�|j��|j�<|j��d|_|r|j ��n*|r|j ��td��t�d|j �d|j�d|j�d|jr|jnd�d� ��|j |jfS)Nz@Please pass `features` or at least one example when writing dataz Done writing � z in z bytes r��.)rr�rr�rr�r�r�r�r�r8r|�debugr�r�r�r�)rH�close_streams r1�finalizezArrowWriter.finalizevsC��!�!�!��!� "��%�%�'�'�'�!�D��#�#�%�%�%��>�!�d�k�!��t�{�+�+�+��>�%��N� � �"�"�"�!�D�N�� $��!�!�#�#�#�� $��!�!�#�#�#�&�'i�j�j�j��{�D�.�{�{��{�{��{�{�fj�fp�Xx�X\�Xb�Xb�vx�{�{�{� � � ��!�4�?�2�2r3)NNNNNNNFFFTr�FNrR)NN)T)'r9r:r;r�rK�RecordBatchStreamWriterr�r�Schemarrw� NativeFile�int�bool�dictrIr�r�r�r�r��propertyr�r�rr�rrrr�bytesrrr$rrkrr r5r<r3r1r�r�Gs1��6�6��.�M�'+�'+�"�*.�%)�+/�#'�+0�!&� %�"��"'�*.�A�A��#�A��8�$�A��s�m� A� ��'�A��c�]� A�$�C�=�A��C�=�A�#�4�.�A��A��A��A��A� �A�"�$��A�A�A�A�FX�X�X�� A�R�Y�A�A�A�A�2�6�6��X�6��5�5�k�5�� 5�QU�VY�[^�V^�Q_�5�5�5��\�5� #� #� #�D��15�+/� *�*��c�3�h��*��e�C��e�O�,� -�*�$�C�=� *�*�*�*�B %� %� %�&�&�R�X�&�(�3�-�&�&�&�&�",0�)6�)6��S�$�Y��)6�$�C�=�)6�)6�)6�)6�V@�@�B�H�@��#��@�@�@�@�$3�3�3�3�3�3r3r�c��eZdZejZdS)� ParquetWriterN)r9r:r;�pqr?r�r<r3r1r?r?�s��$�M�M�Mr3r?)Cr�r�rY�collections.abcr�typingrrr�fsspec�numpyr4�pyarrowrK�pyarrow.parquet�parquetr@�fsspec.corerr�r r#r rrr r�features.featuresrrrrrrrrr�filesystemsrr}r�keyhashrrr rrrr�utilsr �utils.py_utilsr!r"� get_loggerr9r|r@ryr9r6rDr8r>r�r�r?r<r3r1�<module>rOs��+�*�� $�$�$�$�$�$�'�'�'�'�'�'�'�'�'�'� � � � ��!�!�!�!�!�!��:�:�:�:�:�:�:�:�:�:�:�:�:�:� � � � � � � � � � � � � � � � � � � � � � �.�-�-�-�-�-��3�3�3�3�3�3�3�3�U�U�U�U�U�U�U�U�U�U�U�U��8�8�8�8�8�8�8�8� �� H� %� %��$8�H�X�$6�$8�8�C�=�$8�$8�$8�$8�N � � � � �:� � � �N�N�N�N�N�N�N�N�bd�d�d�d�d�]�d�d�d�,F3�F3�F3�F3�F3�F3�F3�F3�R %�%�%�%�%�K�%�%�%�%�%r3