� ���g�B���dZddlZddlmZmZddlmZmZddlm Z ddl m Z ddl m Z dd lmZdd lmZmZmZdd lmZdd lmZee��ZGd �de��Z ddedeeeeefdee deee efdeeeefdeeeeff d�Z ddedeeeefdee deee efdeedeeeeeff d�Z! ddedeeeefdee deee efdeedeeeeefdeefd�Z" ddedeedeeeeeeeeeeefffdee deee efdeeeefdeeeefdefd�Z# ddedeedeeeeeeeeeeefffdee deee efdeeeefdeeeeffd�Z$dS)zList and inspect datasets.�N)�Mapping�Sequence)�Optional�Union�)�DownloadConfig)� DownloadMode)�StreamingDownloadManager)� DatasetInfo)�dataset_module_factory�get_dataset_builder_class�load_dataset_builder)� get_logger)�Versionc��eZdZdS)�SplitsNotFoundErrorN)�__name__� __module__� __qualname__���`/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/datasets/inspect.pyrr&s�������Drr�path� data_files�download_config� download_mode�revision�tokenc �b��������t���������}�������fd�|D��S)a�Get the meta information about a dataset, returned as a dict mapping config name to DatasetInfoDict. Args: path (`str`): path to the dataset processing script with the dataset builder. Can be either: - a local path to processing script or the directory containing the script (if the script has the same name as the directory), e.g. `'./dataset/squad'` or `'./dataset/squad/squad.py'` - a dataset identifier on the Hugging Face Hub (list all available datasets and ids with [`huggingface_hub.list_datasets`]), e.g. `'rajpurkar/squad'`, `'nyu-mll/glue'` or``'openai/webtext'` revision (`Union[str, datasets.Version]`, *optional*): If specified, the dataset module will be loaded from the datasets repository at this version. By default: - it is set to the local version of the lib. - it will also try to load it from the main branch if it's not available at the local version of the lib. Specifying a version that is different from your local version of the lib might cause compatibility issues. download_config ([`DownloadConfig`], *optional*): Specific download configuration parameters. download_mode ([`DownloadMode`] or `str`, defaults to `REUSE_DATASET_IF_EXISTS`): Download/generate mode. data_files (`Union[Dict, List, str]`, *optional*): Defining the data_files of the dataset configuration. token (`str` or `bool`, *optional*): Optional string or boolean to use as Bearer token for remote files on the Datasets Hub. If `True`, or not specified, will get token from `"~/.huggingface"`. **config_kwargs (additional keyword arguments): Optional attributes for builder class which will override the attributes if supplied. Example: ```py >>> from datasets import get_dataset_infos >>> get_dataset_infos('cornell-movie-review-data/rotten_tomatoes') {'default': DatasetInfo(description="Movie Review Dataset. This is a dataset of containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews...), ...} ``` )rrrrrrc�:��i|]}|td�|�����d������S))r� config_namerrrrrr)�get_dataset_config_info) �.0r!� config_kwargsrrrrrrs �������r� <dictcomp>z%get_dataset_infos.<locals>.<dictcomp>^sa��� � � � � �,�  ��#�!�+�'���  �  ��  �  � � � r)�get_dataset_config_names)rrrrrrr$� config_namess``````` r�get_dataset_infosr(*s}���������X,� ��'�#��� ���L� � � � � � � � � � �(� � � � r�dynamic_modules_pathc ��t|f|||||d�|��}t|tj�|�����}t |j�����p"|j� d|j pd��gS)a Get the list of available config names for a particular dataset. Args: path (`str`): path to the dataset processing script with the dataset builder. Can be either: - a local path to processing script or the directory containing the script (if the script has the same name as the directory), e.g. `'./dataset/squad'` or `'./dataset/squad/squad.py'` - a dataset identifier on the Hugging Face Hub (list all available datasets and ids with [`huggingface_hub.list_datasets`]), e.g. `'rajpurkar/squad'`, `'nyu-mll/glue'` or `'openai/webtext'` revision (`Union[str, datasets.Version]`, *optional*): If specified, the dataset module will be loaded from the datasets repository at this version. By default: - it is set to the local version of the lib. - it will also try to load it from the main branch if it's not available at the local version of the lib. Specifying a version that is different from your local version of the lib might cause compatibility issues. download_config ([`DownloadConfig`], *optional*): Specific download configuration parameters. download_mode ([`DownloadMode`] or `str`, defaults to `REUSE_DATASET_IF_EXISTS`): Download/generate mode. dynamic_modules_path (`str`, defaults to `~/.cache/huggingface/modules/datasets_modules`): Optional path to the directory in which the dynamic modules are saved. It must have been initialized with `init_dynamic_modules`. By default the datasets are stored inside the `datasets_modules` module. data_files (`Union[Dict, List, str]`, *optional*): Defining the data_files of the dataset configuration. **download_kwargs (additional keyword arguments): Optional attributes for [`DownloadConfig`] which will override the attributes in `download_config` if supplied, for example `token`. Example: ```py >>> from datasets import get_dataset_config_names >>> get_dataset_config_names("nyu-mll/glue") ['cola', 'sst2', 'mrpc', 'qqp', 'stsb', 'mnli', 'mnli_mismatched', 'mnli_matched', 'qnli', 'rte', 'wnli', 'ax'] ``` �rrrr)r�� dataset_namer!�default) r r �osr�basename�list�builder_configs�keys�builder_kwargs�get�DEFAULT_CONFIG_NAME) rrrrr)r�download_kwargs�dataset_module� builder_clss rr&r&ms���p,� ���'�#�1�� �� ���N�,�N���IY�IY�Z^�I_�I_�`�`�`�K� � �+�0�0�2�2� 3� 3� ��%�)�)�-��9X�9e�\e�f�f�8�r�returnc �&�t|f|||||d�|��}t|tj�|�����}t |j�����} | rt| ��dkr| dnd} nd} |j p| S)a Get the default config name for a particular dataset. Can return None only if the dataset has multiple configurations and no default configuration. Args: path (`str`): path to the dataset processing script with the dataset builder. Can be either: - a local path to processing script or the directory containing the script (if the script has the same name as the directory), e.g. `'./dataset/squad'` or `'./dataset/squad/squad.py'` - a dataset identifier on the Hugging Face Hub (list all available datasets and ids with [`huggingface_hub.list_datasets`]), e.g. `'rajpurkar/squad'`, `'nyu-mll/glue'` or `'openai/webtext'` revision (`Union[str, datasets.Version]`, *optional*): If specified, the dataset module will be loaded from the datasets repository at this version. By default: - it is set to the local version of the lib. - it will also try to load it from the main branch if it's not available at the local version of the lib. Specifying a version that is different from your local version of the lib might cause compatibility issues. download_config ([`DownloadConfig`], *optional*): Specific download configuration parameters. download_mode ([`DownloadMode`] or `str`, defaults to `REUSE_DATASET_IF_EXISTS`): Download/generate mode. dynamic_modules_path (`str`, defaults to `~/.cache/huggingface/modules/datasets_modules`): Optional path to the directory in which the dynamic modules are saved. It must have been initialized with `init_dynamic_modules`. By default the datasets are stored inside the `datasets_modules` module. data_files (`Union[Dict, List, str]`, *optional*): Defining the data_files of the dataset configuration. **download_kwargs (additional keyword arguments): Optional attributes for [`DownloadConfig`] which will override the attributes in `download_config` if supplied, for example `token`. Returns: Optional[str]: the default config name if there is one Example: ```py >>> from datasets import get_dataset_default_config_name >>> get_dataset_default_config_name("openbookqa") 'main' ``` r+r,rrNr.) r r r/rr0r1r2r3�lenr6) rrrrr)rr7r8r9r2�default_config_names r�get_dataset_default_config_namer>�s���b,� ���'�#�1�� �� ���N�,�N���IY�IY�Z^�I_�I_�`�`�`�K��;�6�;�;�=�=�>�>�O��(�47��4H�4H�A�4M�4M�o�a�0�0�SW���'�� � *� A�.A�Arr!c ���t�f||||||d�|��}|j} | j��|r|���n t ��}|�||_|�t|j|����� �fd�|� t|j|�����D��| _n"#t$r} td��| �d} ~ wwxYw| S)aGet the meta information (DatasetInfo) about a dataset for a particular config Args: path (``str``): path to the dataset processing script with the dataset builder. Can be either: - a local path to processing script or the directory containing the script (if the script has the same name as the directory), e.g. ``'./dataset/squad'`` or ``'./dataset/squad/squad.py'`` - a dataset identifier on the Hugging Face Hub (list all available datasets and ids with [`huggingface_hub.list_datasets`]), e.g. ``'rajpurkar/squad'``, ``'nyu-mll/glue'`` or ``'openai/webtext'`` config_name (:obj:`str`, optional): Defining the name of the dataset configuration. data_files (:obj:`str` or :obj:`Sequence` or :obj:`Mapping`, optional): Path(s) to source data file(s). download_config (:class:`~download.DownloadConfig`, optional): Specific download configuration parameters. download_mode (:class:`DownloadMode` or :obj:`str`, default ``REUSE_DATASET_IF_EXISTS``): Download/generate mode. revision (:class:`~utils.Version` or :obj:`str`, optional): Version of the dataset script to load. As datasets have their own git repository on the Datasets Hub, the default version "main" corresponds to their "main" branch. You can specify a different version than the default "main" by using a commit SHA or a git tag of the dataset repository. token (``str`` or :obj:`bool`, optional): Optional string or boolean to use as Bearer token for remote files on the Datasets Hub. If True, or not specified, will get token from `"~/.huggingface"`. **config_kwargs (additional keyword arguments): optional attributes for builder class which will override the attributes if supplied. )�namerrrrrN)� base_pathrc�0��i|]}|j|j�d���S))r@r-)r@)r#�split_generatorrs �rr%z+get_dataset_config_info.<locals>.<dictcomp>)s:������#� �$��/C�UY�&Z�&Z���rz<The split names could not be parsed from the dataset config.) r�info�splits�copyrr�_check_manual_downloadr rA�_split_generators� Exceptionr) rr!rrrrrr$�builderrD�errs ` rr"r"�s;���>#� � � ��'�#��� � � � � �G� �<�D� �{��4C�Y�/�.�.�0�0�0��IY�IY�� � �$)�O� !��&�&� $�w�/@�Ra� b� b� b� � � � o�����'.�'@�'@�,�w�7H�Zi�j�j�j�(�(����D�K�K�� � o� o� o�%�&d�e�e�kn� n����� o���� �Ks�;:B6�6 C�C�Cc �v�t|f||||||d�|��}t|j�����S)a�Get the list of available splits for a particular config and dataset. Args: path (`str`): path to the dataset processing script with the dataset builder. Can be either: - a local path to processing script or the directory containing the script (if the script has the same name as the directory), e.g. `'./dataset/squad'` or `'./dataset/squad/squad.py'` - a dataset identifier on the Hugging Face Hub (list all available datasets and ids with [`huggingface_hub.list_datasets`]), e.g. `'rajpurkar/squad'`, `'nyu-mll/glue'` or `'openai/webtext'` config_name (`str`, *optional*): Defining the name of the dataset configuration. data_files (`str` or `Sequence` or `Mapping`, *optional*): Path(s) to source data file(s). download_config ([`DownloadConfig`], *optional*): Specific download configuration parameters. download_mode ([`DownloadMode`] or `str`, defaults to `REUSE_DATASET_IF_EXISTS`): Download/generate mode. revision ([`Version`] or `str`, *optional*): Version of the dataset script to load. As datasets have their own git repository on the Datasets Hub, the default version "main" corresponds to their "main" branch. You can specify a different version than the default "main" by using a commit SHA or a git tag of the dataset repository. token (`str` or `bool`, *optional*): Optional string or boolean to use as Bearer token for remote files on the Datasets Hub. If `True`, or not specified, will get token from `"~/.huggingface"`. **config_kwargs (additional keyword arguments): Optional attributes for builder class which will override the attributes if supplied. Example: ```py >>> from datasets import get_dataset_split_names >>> get_dataset_split_names('cornell-movie-review-data/rotten_tomatoes') ['train', 'validation', 'test'] ``` )r!rrrrr)r"r1rEr3) rr!rrrrrr$rDs r�get_dataset_split_namesrM4s[��Z #� � ���'�#��� � � � � �D� �� � � �"�"� #� #�#r)NNNNN)NNNNNN)%�__doc__r/�collections.abcrr�typingrr�download.download_configr�download.download_managerr �#download.streaming_download_managerr rDr �loadr r r� utils.loggingr� utils.versionrr�logger� ValueErrorr�str�dictr1�boolr(r&r>r"rMrrr�<module>r\s��� !� � � � � �-�-�-�-�-�-�-�-�"�"�"�"�"�"�"�"�4�4�4�4�4�4�3�3�3�3�3�3�I�I�I�I�I�I����������������� &�%�%�%�%�%�"�"�"�"�"�"� ��H� � �� � � � � �*� � � � 48�04�8<�.2�(,� @�@� �@���t�T�3��/�0�@��n�-�@��E�,��"3�4�5� @� �u�S�'�\�*�+� @� �E�$��)�$� %� @�@�@�@�J/3�04�8<�*.�37� D�D� �D��u�S�'�\�*�+�D��n�-�D��E�,��"3�4�5� D� #�3�-� D� ��t�T�3��/�0� D�D�D�D�R/3�04�8<�*.�37� @B�@B� �@B��u�S�'�\�*�+�@B��n�-�@B��E�,��"3�4�5� @B� #�3�-� @B� ��t�T�3��/�0� @B��c�]�@B�@B�@B�@B�J"&�_c�04�8<�.2�(,�:�:� �:��#��:���s�H�S�M�7�3��c�8�TW�=�FX�@Y�;Y�3Z�Z�[�\�:��n�-� :� �E�,��"3�4�5� :� �u�S�'�\�*�+� :� �E�$��)�$� %�:��:�:�:�:�~"&�_c�04�8<�.2�(,�7$�7$� �7$��#��7$���s�H�S�M�7�3��c�8�TW�=�FX�@Y�;Y�3Z�Z�[�\�7$��n�-� 7$� �E�,��"3�4�5� 7$� �u�S�'�\�*�+� 7$� �E�$��)�$� %�7$�7$�7$�7$�7$�7$r
Memory