�
���gU� � �R � d Z ddlZddlZddlZddlZddlZddlZddlZddlZddl Z ddl
Z
ddlZddlZddl
Z
ddlZddlZddlZddlZddlmZ ddlmZmZmZ ddlmZ ddlmZ ddlmZmZ ddlmZ dd lm Z m!Z! dd
l"m#Z# ddl$m%Z% ddl&m'Z'm(Z(m)Z)m*Z*m+Z+m,Z,m-Z- ddl.Z.ddl/Z0ddl1Z2ddl3Z4ddl5m6Z7 dd
l8m9Z9 ddl:m;Z;m<Z<m=Z=m>Z>m?Z?m@Z@ ddlAmBZB ddlCmDZD ddlEmFZF ddlGmHZH ddlImJZJ ddlKmLZLmMZM ddlNmOZO ddlPmQZQ ddlRmSZSmTZTmUZUmVZVmZmWZWmXZX ddlYmZZZm[Z[m\Z\m]Z]m^Z^m_Z_ ddl`maZa ddlbmcZcmdZdmeZemfZfmgZgmhZhmiZimjZjmkZkmlZl ddlmmnZnmoZompZpmqZq ddlrmsZsmtZt ddlumvZvmwZw ddlxmyZy dd lzm{Z{ dd!l|m}Z}m~Z~mZm�Z� dd"l�m�Z�m�Z�m�Z�m�Z�m�Z�m�Z�m�Z�m�Z�m�Z�m�Z�m�Z� dd#l�m�Z� dd$l�m�Z� dd%l�m�Z� dd&l�m�Z� dd'l�m�Z� dd(l�m�Z�m�Z�m�Z�m�Z�m�Z�m�Z� dd)l�m�Z� dd*l�m�Z�m�Z�m�Z� dd+l�m�Z�m�Z� e'rddl�Z�ddl�Z�ddl�Z�ddl�Z�dd,l�m�Z� dd-l�m�Z� e�j� e�� � Z�d.Z� G d/� d0� � Z� G d1� d2� � Z� G d3� d4e�� � Z�d5� Z�d6e�d7eUfd8�Z�d9e�fd:�Z�d;e�e� fd<�Z�d=� Z� G d>� d?e�� � Z� G d@� dAe�e{e�� � Z� dZdBe�e� dCe+ev dDe+e} dEe�fdF�Z� d[dHe�dA dIe+e�e� dJe+e� dCe+ev dDe+e} dKe�dL d9dAfdM�Z�dNe�dOe�dPe�d9e�fdQ�Z� d\dRe*dSe�dTe�dUe�dVe+e,e�e�e� f dWe+e� fdX�Z� d\dRe*dSe�dTe�dUe�dVe+e,e�e�e� f dWe+e� fdY�Z�dS )]z'Simple Dataset wrapping an Arrow Table.� N)�Counter)�Iterable�Iterator�Mapping)�Sequence)�deepcopy)�partial�wraps)�BytesIO)�ceil�floor��Path)�sample)�
TYPE_CHECKING�Any�BinaryIO�Callable�Optional�Union�overload)� url_to_fs)�
CommitInfo�CommitOperationAdd�CommitOperationDelete�DatasetCard�DatasetCardData�HfApi)�RepoFile)�Pool)�
thread_map� )�config)�ArrowReader)�ArrowWriter�OptimizedTypedSequence)�sanitize_patterns)�xgetsize)�Audio�
ClassLabel�Features�Imager �Value�Video)�FeatureType�_align_features�!_check_if_features_can_be_aligned�generate_from_arrow_type�pandas_types_mapper�require_decoding)�is_remote_filesystem)
�fingerprint_transform�format_kwargs_for_fingerprint� format_transform_for_fingerprint�generate_fingerprint�generate_random_fingerprint�#get_temporary_cache_files_directory�is_caching_enabled�,maybe_register_dataset_for_temp_dir_deletion�update_fingerprint�validate_fingerprint)�format_table�get_format_type_from_alias�
get_formatter�query_table)�LazyDict�_is_range_contiguous)�DatasetInfo�DatasetInfosDict)� _split_re)�IndexableMixin)�
NamedSplit�Split� SplitDict� SplitInfo)�
InMemoryTable�MemoryMappedTable�Table�,_memory_mapped_record_batch_reader_from_file�cast_array_to_feature�
concat_tables�embed_table_storage�list_table_cache_files�
table_cast�
table_iter�
table_visitor)�logging)�tqdm)�estimate_dataset_size)�is_small_dataset)�MetadataConfigs)�Literal�asdict�convert_file_size_to_int�glob_pattern_to_regex�iflatmap_unordered�string_to_dict)�)stratified_shuffle_split_generate_indices)�
dataset_to_tf�minimal_tf_collate_fn�multiprocess_dataset_to_tf)�ListLike�PathLike��DatasetDict)�IterableDatasetzLdata/{split}-[0-9][0-9][0-9][0-9][0-9]-of-[0-9][0-9][0-9][0-9][0-9]*.parquetc � � e Zd ZdZdedee fd�Zed� � � Z ed� � � Z
edefd�� � Zedefd �� � Z
edefd
�� � Zedee fd�� � Zedefd�� � Zedee fd
�� � Zedee fd�� � Zedee fd�� � Zedee fd�� � Zedee fd�� � Zedee fd�� � Zed� � � Zed� � � ZdS )�DatasetInfoMixinzqThis base class exposes some attributes of DatasetInfo
at the base level of the Dataset for easy access.
�info�splitc �"