� ���g�L����dZddlZddlZddlZddlZddlZddlmZddlmZddl m Z m Z m Z ddl Z ddlmZddlmZmZdd lmZdd lmZdd lmZdd lmZdd lmZddlmZmZee ��Z!eGd�d����Z"eGd�d����Z#Gd�de$��Z%Gd�de$��Z&eGd�d����Z'eGd�d����Z(Gd�de)e*e(f��Z+dS)awDatasetInfo record information we know about a dataset. This includes things that we know about the dataset statically, i.e.: - description - canonical location - does it have validation and tests splits - size - etc. This also includes the things that can and should be computed once we've processed the dataset as well: - number of examples (in each split) - etc. �N)� dataclass)�Path)�ClassVar�Optional�Union)� url_to_fs)� DatasetCard�DatasetCardData�)�config)�Features)� SplitDict)�Version)� get_logger)�asdict� unique_valuesc�,�eZdZUdZeed<dZeed<dS)�SupervisedKeysData��input�outputN)�__name__� __module__� __qualname__r�str�__annotations__r���]/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/datasets/info.pyrr7s/��������E�3�O�O�O��F�C�����rrc�,�eZdZUdZeed<dZeed<dS)�DownloadChecksumsEntryDatar�key�valueN)rrrr"rrr#rrrr!r!=s*��������C��M�M�M��E�3�O�O�O�O�Orr!c��eZdZdZdS)�MissingCachedSizesConfigErrorz;The expected cached sizes of the download file are missing.N�rrr�__doc__rrrr%r%Cs������E�E�E�Err%c��eZdZdZdS)�NonMatchingCachedSizesErrorz/The prepared split doesn't have expected sizes.Nr&rrrr)r)Gs������9�9�9�9rr)c�j�eZdZUdZeeed<dZeeed<d�Z e deddfd���Z dS)�PostProcessedInfoN�features�resources_checksumsc��|j�:t|jt��s"tj|j��|_dSdSdS�N)r,� isinstancer � from_dict��selfs r� __post_init__zPostProcessedInfo.__post_init__PsA�� �=� $�Z�� �x�-P�-P� $�$�.�t�}�=�=�D�M�M�M� %� $� $� $r�post_processed_info_dict�returnc���d�tj|��D���|di�fd�|���D����S)Nc��h|] }|j�� Sr��name��.0�fs r� <setcomp>z.PostProcessedInfo.from_dict.<locals>.<setcomp>W���?�?�?�!�q�v�?�?�?rc�$��i|] \}}|�v� ||�� Srr�r<�k�v� field_namess �r� <dictcomp>z/PostProcessedInfo.from_dict.<locals>.<dictcomp>Xs*���\�\�\�t�q�!�1�P[�K[�K[�a��K[�K[�K[rr�� dataclasses�fields�items)�clsr5rDs @rr1zPostProcessedInfo.from_dictUsY���?�?�{�'9�#�'>�'>�?�?�?� ��s�]�]�\�\�\�\�'?�'E�'E�'G�'G�\�\�\�]�]�]r) rrrr,rr rr-�dictr4� classmethodr1rrrr+r+Ks��������#'�H�h�x� �'�'�'�*.���$��.�.�.�>�>�>� �^��^�:M�^�^�^��[�^�^�^rr+c�6�eZdZUdZeje���Zeed<eje���Z eed<eje���Z eed<eje���Z eed<dZ e eed<dZe eed <dZe eed <dZe eed <dZe eed <dZe eed <dZe eeefed<dZe eed<dZe eed<dZe eed<dZe eed<dZe eed<dZ e eed<gd�Z!e"e#eed<d�Z$d+de efd�Z%d,d�Z&d�Z'e(de#dfd���Z)e(d-dede ed dfd!���Z*e(d"ed dfd#���Z+d.d/d&�Z,d0d'�Z-d efd(�Z.e(d)ed dfd*���Z/dS)1� DatasetInfoa� Information about a dataset. `DatasetInfo` documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list. Not all fields are known on construction and may be updated later. Attributes: description (`str`): A description of the dataset. citation (`str`): A BibTeX citation of the dataset. homepage (`str`): A URL to the official homepage for the dataset. license (`str`): The dataset's license. It can be the name of the license or a paragraph containing the terms of the license. features ([`Features`], *optional*): The features used to specify the dataset's column types. post_processed (`PostProcessedInfo`, *optional*): Information regarding the resources of a possible post-processing of a dataset. For example, it can contain the information of an index. supervised_keys (`SupervisedKeysData`, *optional*): Specifies the input feature and the label for supervised learning if applicable for the dataset (legacy from TFDS). builder_name (`str`, *optional*): The name of the `GeneratorBasedBuilder` subclass used to create the dataset. Usually matched to the corresponding script name. It is also the snake_case version of the dataset builder class name. config_name (`str`, *optional*): The name of the configuration derived from [`BuilderConfig`]. version (`str` or [`Version`], *optional*): The version of the dataset. splits (`dict`, *optional*): The mapping between split name and metadata. download_checksums (`dict`, *optional*): The mapping between the URL to download the dataset's checksums and corresponding metadata. download_size (`int`, *optional*): The size of the files to download to generate the dataset, in bytes. post_processing_size (`int`, *optional*): Size of the dataset in bytes after post-processing, if any. dataset_size (`int`, *optional*): The combined size in bytes of the Arrow tables for all splits. size_in_bytes (`int`, *optional*): The combined size in bytes of all files associated with the dataset (downloaded files + Arrow files). **config_kwargs (additional keyword arguments): Keyword arguments to be passed to the [`BuilderConfig`] and used in the [`DatasetBuilder`]. )�default_factory� description�citation�homepage�licenseNr,�post_processed�supervised_keys� builder_name� dataset_name� config_name�version�splits�download_checksums� download_size�post_processing_size� dataset_size� size_in_bytes)rXr\r^r,rZ�_INCLUDED_INFO_IN_YAMLc�T�|j�8t|jt��stj|j��|_|j�>t|jt ��s$t �|j��|_|j�lt|jt��sRt|jt��rt|j��|_ntj|j��|_|j �8t|j t��stj |j ��|_ |j �ht|j t��sPt|j ttf��rt|j �|_ dStdi|j ��|_ dSdSdS)Nr)r,r0r r1rTr+rYrrrZr�from_split_dictrUr�tuple�listr2s rr4zDatasetInfo.__post_init__�s^�� �=� $�Z�� �x�-P�-P� $�$�.�t�}�=�=�D�M� � � *�:�d�>Q�Sd�3e�3e� *�"3�"=�"=�d�>Q�"R�"R�D� � �<� #�J�t�|�W�,M�,M� #��$�,��,�,� ?�&�t�|�4�4�� � �&�0���>�>�� � �;� "�:�d�k�9�+M�+M� "�#�3�D�K�@�@�D�K� � � +�J�t�?S�Ug�4h�4h� +��$�.��� �>�>� R�'9�4�;O�'P��$�$�$�'9�'Q�'Q�D�<P�'Q�'Q��$�$�$� ,� +� +� +rF�storage_optionsc���t|fi|pi��^}}|�tj|tj��d��5}|�||���ddd��n #1swxYwY|jrc|�tj|tj��d��5}|� |��ddd��dS#1swxYwYdSdS)a�Write `DatasetInfo` and license (if present) as JSON files to `dataset_info_dir`. Args: dataset_info_dir (`str`): Destination directory. pretty_print (`bool`, defaults to `False`): If `True`, the JSON will be pretty-printed with the indent level of 4. storage_options (`dict`, *optional*): Key/value pairs to be passed on to the file-system backend, if any. <Added version="2.9.0"/> Example: ```py >>> from datasets import load_dataset >>> ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation") >>> ds.info.write_to_directory("/path/to/directory/") ``` �wb)� pretty_printN) r�open� posixpath�joinr �DATASET_INFO_FILENAME� _dump_inforS�LICENSE_FILENAME� _dump_license)r3�dataset_info_dirrhre�fs�_r=s r�write_to_directoryzDatasetInfo.write_to_directory�s`��,�+�G�G��0E�2�G�G���Q� �W�W�Y�^�$4�f�6R�S�S�UY� Z� Z� :�^_� �O�O�A�L�O� 9� 9� 9� :� :� :� :� :� :� :� :� :� :� :���� :� :� :� :� �<� &������(8�&�:Q�R�R�TX�Y�Y� &�]^��"�"�1�%�%�%� &� &� &� &� &� &� &� &� &� &� &� &���� &� &� &� &� &� &� &� &s$�A)�)A-�0A-�.C�C�Cc��|�tjt|��|rdnd����d����dS)zQDump info in `file` file-like object open in bytes mode (to support remote files)�N��indent�utf-8)�write�json�dumpsr�encode)r3�filerhs rrmzDatasetInfo._dump_info�sG�� � � �4�:�f�T�l�l� �3N�1�1�$�O�O�O�V�V�W^�_�_�`�`�`�`�`rc�`�|�|j�d����dS)zTDump license in `file` file-like object open in bytes mode (to support remote files)rxN)ryrSr|)r3r}s rrozDatasetInfo._dump_license�s*�� � � �4�<�&�&�w�/�/�0�0�0�0�0r� dataset_infosc���d��D���t���dkr#t�fd��D����r�dSd�td��D���������}d�td��D���������}d�td��D���������}d�td��D���������}d}d}|||||||� ��S) Nc�:�g|]}|�|�����Sr/)�copy)r<� dset_infos r� <listcomp>z*DatasetInfo.from_merge.<locals>.<listcomp>�s'��b�b�b�i�I�La����)�)�La�La�Larrc3�0�K�|]}�d|kV��dS)rNr)r<r�rs �r� <genexpr>z)DatasetInfo.from_merge.<locals>.<genexpr>�s-�����)g�)g�I�-��*:�i�*G�)g�)g�)g�)g�)g�)grz c3�$K�|] }|jV�� dSr/)rP�r<�infos rr�z)DatasetInfo.from_merge.<locals>.<genexpr>�s%����/[�/[�T��0@�/[�/[�/[�/[�/[�/[rc3�$K�|] }|jV�� dSr/)rQr�s rr�z)DatasetInfo.from_merge.<locals>.<genexpr>��$����,U�,U�t�T�]�,U�,U�,U�,U�,U�,Urc3�$K�|] }|jV�� dSr/)rRr�s rr�z)DatasetInfo.from_merge.<locals>.<genexpr>�r�rc3�$K�|] }|jV�� dSr/)rSr�s rr�z)DatasetInfo.from_merge.<locals>.<genexpr>�s$����+S�+S�T�D�L�+S�+S�+S�+S�+S�+Sr)rPrQrRrSr,rU)�len�allrkr�strip)rJrrPrQrRrSr,rUs ` r� from_mergezDatasetInfo.from_merge�s\���b�b�=�b�b�b� � �}� � �� !� !�c�)g�)g�)g�)g�Yf�)g�)g�)g�&g�&g� !� ��#� #��k�k�-�/[�/[�]�/[�/[�/[�"[�"[�\�\�b�b�d�d� ��;�;�}�,U�,U�}�,U�,U�,U�U�U�V�V�\�\�^�^���;�;�}�,U�,U�}�,U�,U�,U�U�U�V�V�\�\�^�^���+�+�m�+S�+S�]�+S�+S�+S�S�S�T�T�Z�Z�\�\�������s�#�����+�  � � � rrpr6c�l�t|fi|pi��^}}t�d|����|std���|�t j|tj��dd���5}tj |��}ddd��n #1swxYwY|� |��S)a Create [`DatasetInfo`] from the JSON file in `dataset_info_dir`. This function updates all the dynamically generated fields (num_examples, hash, time of creation,...) of the [`DatasetInfo`]. This will overwrite all previous metadata. Args: dataset_info_dir (`str`): The directory containing the metadata file. This should be the root directory of a specific dataset version. storage_options (`dict`, *optional*): Key/value pairs to be passed on to the file-system backend, if any. <Added version="2.9.0"/> Example: ```py >>> from datasets import DatasetInfo >>> ds_info = DatasetInfo.from_directory("/path/to/directory/") ``` zLoading Dataset info from zECalling DatasetInfo.from_directory() with undefined dataset_info_dir.�rrx��encodingN) r�loggerr�� ValueErrorrirjrkr rlrz�loadr1)rJrprerqrrr=�dataset_info_dicts r�from_directoryzDatasetInfo.from_directory�s���4�+�G�G��0E�2�G�G���Q�� � �C�1A�C�C�D�D�D�� f��d�e�e� e� �W�W�Y�^�$4�f�6R�S�S�UX�cj�W� k� k� -�op� $� �!� � � � -� -� -� -� -� -� -� -� -� -� -���� -� -� -� -��}�}�.�/�/�/s�5B�B�Br�c���d�tj|��D���|di�fd�|���D����S)Nc��h|] }|j�� Srr9r;s rr>z(DatasetInfo.from_dict.<locals>.<setcomp>r?rc�$��i|] \}}|�v� ||�� SrrrAs �rrEz)DatasetInfo.from_dict.<locals>.<dictcomp>s)���U�U�U�t�q�!�A��DT�DT�a��DT�DT�DTrrrF)rJr�rDs @rr1zDatasetInfo.from_dictsY���?�?�{�'9�#�'>�'>�?�?�?� ��s�V�V�U�U�U�U�'8�'>�'>�'@�'@�U�U�U�V�V�VrT�other_dataset_infoc�v��|j}|jdi�fd�|j���D����dS)Nc�H��i|]\}}|��� |tj|����Sr/�r��deepcopy)r<rBrC� ignore_nones �rrEz&DatasetInfo.update.<locals>.<dictcomp>!s:�������A�q��M��M��4�=��#�#�!�M�Mrr)�__dict__�updaterI)r3r�r�� self_dicts ` rr�zDatasetInfo.updatesj����M� �� �� � �����.�7�=�=�?�?���� � � � � rc�^�|jdid�|j���D����S)Nc�>�i|]\}}|tj|����Srr�)r<rBrCs rrEz$DatasetInfo.copy.<locals>.<dictcomp>)s(�� W� W� W���A��D�M�!�$4�$4� W� W� Wrr)� __class__r�rIr2s rr�zDatasetInfo.copy(s7���t�~�X�X� W� W���AT�AT�AV�AV� W� W� W�X�X�Xrc��i}t|��}|D]p}||jvret||��}t|d��r|���||<�Ct|d��r|���||<�k|||<�q|S)N� _to_yaml_list�_to_yaml_string)rr`�getattr�hasattrr�r�)r3� yaml_dictr�r"r#s r� _to_yaml_dictzDatasetInfo._to_yaml_dict+s���� �"�4�L�L��$� +� +�C��d�1�1�1���c�*�*���5�/�2�2�+�%*�%8�%8�%:�%:�I�c�N�N��U�$5�6�6�+�%*�%:�%:�%<�%<�I�c�N�N�%*�I�c�N���r� yaml_datac�|��tj|��}|�d���tj|d��|d<|�d���t j|d��|d<d�t j|��D���|di�fd�|���D����S)Nr,rZc��h|] }|j�� Srr9r;s rr>z.DatasetInfo._from_yaml_dict.<locals>.<setcomp>@r?rc�$��i|] \}}|�v� ||�� SrrrAs �rrEz/DatasetInfo._from_yaml_dict.<locals>.<dictcomp>As)���M�M�M�t�q�!�A��<L�<L�a��<L�<L�<Lrr) r�r��getr �_from_yaml_listrrGrHrI)rJr�rDs @r�_from_yaml_dictzDatasetInfo._from_yaml_dict9s�����M�)�,�,� � �=�=�� $� $� 0�$,�$<�Y�z�=R�$S�$S�I�j� !� �=�=�� "� "� .�"+�";�I�h�<O�"P�"P�I�h� �?�?�{�'9�#�'>�'>�?�?�?� ��s�N�N�M�M�M�M�y���'8�'8�M�M�M�N�N�Nr)FN)Fr/)T)r�rN)r6rN)0rrrr'rG�fieldrrPrrQrRrSr,rr rTr+rUrrVrWrXrYrrrZrKr[r\�intr]r^r_r`rrdr4rsrmrorLr�r�r1r�r�r�r�rrrrNrN[sh�������*�*�Z)�{�(��=�=�=�K��=�=�=�%�K�%�c�:�:�:�H�c�:�:�:�%�K�%�c�:�:�:�H�c�:�:�:�$�;�$�S�9�9�9�G�S�9�9�9�#'�H�h�x� �'�'�'�26�N�H�.�/�6�6�6�48�O�X�0�1�8�8�8�#'�L�(�3�-�&�&�&�"&�L�(�3�-�&�&�&�!%�K��#��%�%�%�-1�G�X�e�C��L�)� *�1�1�1�!�F�H�T�N�!�!�!�)-�����-�-�-�#'�M�8�C�=�'�'�'�*.��(�3�-�.�.�.�"&�L�(�3�-�&�&�&�#'�M�8�C�=�'�'�'�3�3�3��H�T�#�Y�/����R�R�R�&&�&�X`�ae�Xf�&�&�&�&�:a�a�a�a�1�1�1�� �t�M�':� � � ��[� �.�0�0�c�0�H�T�N�0�^k�0�0�0��[�0�B�W�$�W�=�W�W�W��[�W� � � � � �Y�Y�Y�Y� �t� � � � ��O��O��O�O�O��[�O�O�OrrNc�`�eZdZd d d�Zed d���Zededdfd���Zdeddfd�ZdS) �DatasetInfosDictFr6Nc�f�i}tj�|tj��}tj�|tj��}|s|�|��}|�|��tj�|��rct|dd���5}d�|� ��D��}tj |||rdnd���ddd��n #1swxYwYtj�|��rtj|��} | j} nd} t!��} |r_|�| ��| �"tdt%| ��zdz��n| } | �t)|����dSdS) N�wrxr�c�4�i|]\}}|t|����Sr)r�r<rXr�s rrEz7DatasetInfosDict.write_to_directory.<locals>.<dictcomp>Os3��&�&�&�7M�{�I�K�� �!2�!2�&�&�&rrurvz--- z --- )�os�pathrkr �DATASETDICT_INFOS_FILENAME�REPOCARD_FILENAMEr�r��existsrirIrz�dumpr r��datar �to_dataset_card_datar�saver) r3�dataset_infos_dir� overwriterh�total_dataset_infos�dataset_infos_path�dataset_readme_pathr=�dataset_infos_dict� dataset_card�dataset_card_datas rrsz#DatasetInfosDict.write_to_directoryEs �� ���W�\�\�*;�V�=^�_�_�� �g�l�l�+<�f�>V�W�W��� I�"&�"5�"5�6G�"H�"H� ��"�"�4�(�(�(� �7�>�>�,� -� -� U��(�#��@�@�@� U�A�&�&�Qd�Qj�Qj�Ql�Ql�&�&�&�"�� �,�a�\�8S���t�T�T�T�T�  U� U� U� U� U� U� U� U� U� U� U���� U� U� U� U� �7�>�>�-� .� .� 2�&�+�,?�@�@�L� ,� 1� � ��L� /� 1� 1� � � 9� � 4� 4�5F� G� G� G�MY�Ma� �G�c�*;�&<�&<�<�y�H�I�I�I�gs� � � � �d�#6�7�7� 8� 8� 8� 8� 8�  9� 9s�4:C:�:C>�C>c��t�d|����tj�tj�|t j����rLtj t|��t jz ��j }d|vr|� |��Stj�tj�|t j ����r�ttj�|t j ��d���5}|d�tj |�����D����cddd��S#1swxYwYdS|��S)NzLoading Dataset Infos from � dataset_inforxr�c�J�i|] \}}|t�|����!Sr)rNr1)r<rXr�s rrEz3DatasetInfosDict.from_directory.<locals>.<dictcomp>ms>�����:�K�):�$�[�%:�%:�;L�%M�%M���r)r�r�r�r�r�rkr r�r r�rr��from_dataset_card_datar�rirzrI)rJr�r�r=s rr�zDatasetInfosDict.from_directoryas���� � �E�2C�E�E�F�F�F� �7�>�>�"�'�,�,�'8�&�:R�S�S� T� T� E� +� 0��6G�1H�1H�6�Kc�1c� d� d� i� ��!2�2�2��1�1�2C�D�D�D� �7�>�>�"�'�,�,�'8�&�:[�\�\� ]� ]� ��b�g�l�l�#4�f�6W�X�X�cj�k�k�k� �op��s���>B�i��l�l�>P�>P�>R�>R������ � � � � � � � � � � � ���� � � � � � ��3�5�5�Ls�19E7�7E;�>E;r�c��t|�d��ttf��r�t|dt��r|d�|dD����St�|d��}|d�dd��|_||j|i��S|��S)Nr�c�l�i|]1}|�dd��t�|����2S)rX�default)r�rNr�)r<�dataset_info_yaml_dicts rrEz;DatasetInfosDict.from_dataset_card_data.<locals>.<dictcomp>zsQ�����3�/�2�2�=�)�L�L�k�Ni�Ni�2�O�O���rrXr�)r0r�rdrKrNr�rX)rJr�r�s rr�z'DatasetInfosDict.from_dataset_card_dataus��� �'�+�+�N�;�;�d�D�\� J� J� ��+�N�;�T�B�B� E��s���7H��6W� ������ +�:�:�;L�^�;\�]�]� �+<�^�+L�+P�+P�Q^�`i�+j�+j� �(��s�L�4�l�C�D�D�D��3�5�5�Lrc�:�|�r�d|vr@t|dt��r%|d�dd��|di}n4d|vr.t|dt��rd�|dD��}ni}i|�d�|���D���}|���D] \}}||d<� t |��dkrft t|�������|d<|d� dd��}|dkrd|i|d�|d<dSdSg|d<t|�����D]>\}}|� dd��d|i|�}|d� |���=dSdS)Nr�rXr�c� �i|] }|d|�� S)rXr)r<�config_metadatas rrEz9DatasetInfosDict.to_dataset_card_data.<locals>.<dictcomp>�s/��*�*�*�'�$�M�2�O�*�*�*rc�>�i|]\}}||�����Sr)r�r�s rrEz9DatasetInfosDict.to_dataset_card_data.<locals>.<dictcomp>�s+��e�e�e�>T�k�9�;� � 7� 7� 9� 9�e�e�err ) r0rKr�rdrIr��next�iter�values�pop�sorted�append)r3r��dataset_metadata_infosr�rX�dset_info_yaml_dictr�s rr�z%DatasetInfosDict.to_dataset_card_data�s=�� �% U��!2�2�2�z�BS�Tb�Bc�ei�7j�7j�2�%�n�5�9�9�-��S�S�Uf�gu�Uv�*�&�&� �#4�4�4��DU�Vd�De�gk�9l�9l�4�*�*�+<�^�+L�*�*�*�&�&� *,�&�#�(�#�e�e�X\�Xb�Xb�Xd�Xd�e�e�e�#� � 5H�4M�4M�4O�4O� A� A�0� �0�5@�#�M�2�2��&�'�'�1�,�,�48��>Q�>X�>X�>Z�>Z�9[�9[�4\�4\�!�.�1�/��?�C�C�M�SW�X�X� ��)�+�+�&�{�9�+�N�;�9�%�n�5�5�5�,�+�57�!�.�1�;A�BU�B[�B[�B]�B]�;^�;^�U�U�7�K�!7�*�.�.�}�d�C�C�C�.;�[�-c�Lb�-c�*�%�n�5�<�<�=S�T�T�T�T�K% U�% U�BU�Ur)FF)r6N)r6r�) rrrrsrLr�r r�r�rrrr�r�Ds�������9�9�9�9�9�8�����[��&����K]�����[��$&U�o�&U�$�&U�&U�&U�&U�&U�&Urr�),r'r�rGrzr�rjr�pathlibr�typingrrr�fsspec� fsspec.corer�huggingface_hubr r rr r,r rZr�utilsr� utils.loggingr�utils.py_utilsrrrr�rr!� Exceptionr%r)r+rNrKrr�rrr�<module>r�s��� � � � � � ����� � � � � � � � �����!�!�!�!�!�!�������,�,�,�,�,�,�,�,�,�,� � � � �!�!�!�!�!�!�8�8�8�8�8�8�8�8�������������������������%�%�%�%�%�%�1�1�1�1�1�1�1�1� ��H� � �� �������� ���  �������� ��� F�F�F�F�F�I�F�F�F�:�:�:�:�:�)�:�:�:� � ^� ^� ^� ^� ^� ^� ^� �� ^� �eO�eO�eO�eO�eO�eO�eO� ��eO�PjU�jU�jU�jU�jU�t�C��,�-�jU�jU�jU�jU�jUr
Memory