� %�g'��ddlZddlZddlmZddlmZmZmZddlZddl m Z ddlmZm Z mZmZddlmZddlmZmZmZmZdd lmZmZmZmZmZmZmZmZm Z m!Z!m"Z"m#Z#m$Z$ee%��Z&dd ddddd d dddddd dd �Z'dddiiZ(e(�)��D]&\Z*Z+e de*��re'�,e+��'Gd�de��Z-Gd�de��Z.Gd�de��Z/Gd�d��Z0Gd�d��Z1Gd�de1e0��Z2e��rddl3m4cm5Z6Gd�de6j7��Z8Gd �d!e1e0��Z9d"�Z: d8d#e d$eej;d%ee<d&ee<d'e=d(e=d)ee>ee?efd*ee=d+e=d,eed-e=d.ee<d/e=d0e=d1e fd2�Z@Gd3�d4e��ZAGd5�d6e1e0��ZBd9d7�ZCdS):�N)�suppress)�Callable�Optional�Union)�version)�BatchSampler� DataLoader�IterableDataset� RandomSampler�)� get_logger)�DistributedType� GradientState�PartialState�is_torch_xla_available) �RNGType� broadcast�broadcast_object_list�compare_versions�concatenate�find_batch_size�get_data_structure�initialize_tensors�is_torch_version�*is_torchdata_stateful_dataloader_available�send_to_device� slice_tensors�synchronize_rng_statesF��)� batch_size�shuffle�sampler� batch_sampler�num_workers� collate_fn� pin_memory� drop_last�timeout�worker_init_fn�multiprocessing_context� generator�prefetch_factor�persistent_workers�pin_memory_device�2.6.0�in_orderT�>=c�8��eZdZdZ�fd�Z�fd�Zdefd�Z�xZS)�SeedableRandomSamplera� Same as a random sampler, except that in `__iter__` a seed can be used. Needed specifically in distributed cases, when the random generator for each GPU needs to start from the same seed and be fully reproducable on multiple iterations. If a custom `generator` is passed, it will rely on its initial seed as well as the current iteration it is on (stored in `self.epoch`). c��|�dd��}t��j|i|��|�|ntj��|_d|_dS)N� data_seedr)�pop�super�__init__�torch�random�initial_seed�epoch)�self�args�kwargsr6� __class__s ��f/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/accelerate/data_loader.pyr9zSeedableRandomSampler.__init__Ss\��J�J�{�D�1�1� ��$�)�&�)�)�)�)2�)>�I�I�E�L�D]�D]�D_�D_�� c#��K�|j�atjttd��rtj��nd��|_|j�|j��|j|jz}|j�|��t�� Ed{V��|� |jdz��dS)N�get_default_device�cpu��devicer)r,r:� Generator�hasattrrE�manual_seedr<r=r8�__iter__� set_epoch)r>�seedrAs �rBrLzSeedableRandomSampler.__iter__Zs��>�!�"�_�5<�U�DX�5Y�5Y�d�u�/�1�1�1�_d��D�N� �N�&�&�t�'8�9�9�9��z�D�-�-��"�"�4�(�(�(��7�7�#�#�%�%�%�%�%�%�%�%�%��t�z�A�~�&�&�&�&�&rCr=c��||_dS)z*Sets the current iteration of the sampler.N)r=�r>r=s rBrMzSeedableRandomSampler.set_epochhs �� rC) �__name__� __module__�__qualname__�__doc__r9rL�intrM� __classcell__�rAs@rBr4r4Hsv��'�'�'�'�'��s��rCr4c�f�eZdZdZ ddededed ed ef d�Zed��Z d �Z d�Zd�Zd�Z dS)�BatchSamplerSharda� Wraps a PyTorch `BatchSampler` to generate batches for one of the processes only. Instances of this class will always yield a number of batches that is a round multiple of `num_processes` and that all have the same size. Depending on the value of the `drop_last` attribute of the batch sampler passed, it will either stop the iteration at the first batch that would be too small / not present on all processes or loop with indices from the beginning. Args: batch_sampler (`torch.utils.data.sampler.BatchSampler`): The batch sampler to split in several shards. num_processes (`int`, *optional*, defaults to 1): The number of processes running concurrently. process_index (`int`, *optional*, defaults to 0): The index of the current process. split_batches (`bool`, *optional*, defaults to `False`): Whether the shards should be created by splitting a batch to give a piece of it on each process, or by yielding different full batches on each process. On two processes with a sampler of `[[0, 1, 2, 3], [4, 5, 6, 7]]`, this will result in: - the sampler on process 0 to yield `[0, 1, 2, 3]` and the sampler on process 1 to yield `[4, 5, 6, 7]` if this argument is set to `False`. - the sampler on process 0 to yield `[0, 1]` then `[4, 5]` and the sampler on process 1 to yield `[2, 3]` then `[6, 7]` if this argument is set to `True`. even_batches (`bool`, *optional*, defaults to `True`): Whether or not to loop back at the beginning of the sampler when the number of samples is not a round multiple of (original batch size / number of processes). <Tip warning={true}> `BatchSampler`s with varying batch sizes are not enabled by default. To enable this behaviour, set `even_batches` equal to `False` </Tip>rrFTr$� num_processes� process_index� split_batches�even_batchesc�8�|r)|j|zdkrtd|j�d|�d��||_||_||_||_||_t|dd��|_t|dd��|_|j�|jrtd��dSdS) NrzDTo use `BatchSamplerShard` in `split_batches` mode, the batch size (�;) needs to be a round multiple of the number of processes (�).r!r(Fz�You need to use `even_batches=False` when the batch sampler has no batch size. If you are not calling this method directly, set `accelerator.even_batches=False` instead.) r!� ValueErrorr$rZr[r\r]�getattrr()r>r$rZr[r\r]s rBr9zBatchSamplerShard.__init__�s�� ]�5� �E��J�J��^�Wd�Wo�^�^�LY�^�^�^�� +��*��*��*��(��!�-��t�D�D�� U�C�C��?�"�t�'8�"��f�� #�"�"�"rCc�*�t|j��S�N��lenr$�r>s rB�total_lengthzBatchSamplerShard.total_length��4�%�&�&�&rCc�j�|jrt|j��St|j��|jzdkrt|j��|jzSt|j��|jz}|jr|S|jr|dzS|jt|j��|jzkr|dzn|S�Nrr)r\rfr$rZr(r]r[�r>�lengths rB�__len__zBatchSamplerShard.__len__�s�� +��t�)�*�*�*��t�!�"�"�T�%7�7�1�<�<��t�)�*�*�d�.@�@�@��T�'�(�(�D�,>�>��>� o��M� � � o��A�:��"&�!3�c�$�:L�6M�6M�PT�Pb�6b�!b�!b�6�A�:�:�hn�nrCc�`�|jr|��n|��Srd)r\�_iter_with_split�_iter_with_no_splitrgs rBrLzBatchSamplerShard.__iter__�s-��*.�*<�\�t�$�$�&�&�&�$�BZ�BZ�B\�B\�\rCc#�K�g}|jj|jz}t|j��D]D\}}|dkr|}t |��|jkr|||jz||jdzz�V��E|js�t |��dkr�t |��|jkr�|js>t |��||jzkr!|||jz||jdzz�V�dSdSt |��|jkr||z }t |��|jk�||z}|||jz||jdzz�V�dSdSdSdSrk)r$r!rZ� enumeraterfr[r(r])r>�initial_data�batch_length�idx�batchs rBrpz"BatchSamplerShard._iter_with_split�s��)�4��8J�J��#�D�$6�7�7� i� i�J�C��a�x�x�$��5�z�z�T�_�,�,��L�4�+=�=��PT�Pb�ef�Pf�@g�g�h�h�h�h��~� i�#�l�"3�"3�a�"7�"7�C��J�J��<X�<X��$� i��u�:�:��t�/A� A�A�A��t�/A� A�L�TX�Tf�ij�Tj�Dk� k�l�l�l�l�l�l�B�A��,�'�'�$�/�9�9� �L�0�L��,�'�'�$�/�9�9��,��L�4�+=�=��PT�Pb�ef�Pf�@g�g�h�h�h�h�h�h� i� i�"7�"7�<X�<XrCc#�K�g}g}t|j��D]l\}}|js||jkr||z }||jz|jkr|}||jz|jdz kr%|j�t |��|jkr|V�g}�m|j�sCt |��dk�r1|jst |��dkr|V�dSdSt |��|jkr|V�t |��|j|jzkr%||z }t |��|j|jzk�%t |��|jkrg}|dz }d}||jzdkst |��dkrn||jzt |��z }||||�z }||jz|jkr|V�|}g}|dz }||jzdk�Ut |��dk�hdSdSdSdS)Nrr)rsr$r(rZr[r!rfr])r>rt�batch_to_yieldrvrw�cycle_index� end_indexs rBrqz%BatchSamplerShard._iter_with_no_split�s[��#�D�$6�7�7� $� $�J�C��>� &�c�D�,>�&>�&>��%��T�'�'�4�+=�=�=�!&��T�'�'�4�+=��+A�A�A��'�3�u�:�:��+H�+H�$�$�$�$�!#��~� �#�l�"3�"3�a�"7�"7��$� ��~�&�&��*�*�(�(�(�(�(�(�+�*��~�&�&�$�/�9�9�(�(�(�(��,�'�'�$�*<�t��*N�N�N� �L�0�L��,�'�'�$�*<�t��*N�N�N��u�:�:��0�0��E��1�H�C� ��D�.�.�!�3�3�s�5�z�z�A�~�~� +�d�o� =��E� � � J�I��\�+�i�*?�@�@�E��T�/�/�4�3E�E�E�#��"+�K��E��1�H�C��D�.�.�!�3�3�s�5�z�z�A�~�~�~�~�) � �"7�"7�(8F�~rCN)rrFT)rQrRrSrTrrU�boolr9�propertyrhrnrLrprq�rCrBrYrYms�� J��#�!� ��#�� 4�'�'��X�'�o�o�o�$]�]�]�i�i�i�,-�-�-�-�-rCrYc �P�eZdZdZ ddedededed ed efd�Zd�Zd �Z d�Z dS)�IterableDatasetSharda� Wraps a PyTorch `IterableDataset` to generate samples for one of the processes only. Instances of this class will always yield a number of samples that is a round multiple of the actual batch size (depending of the value of `split_batches`, this is either `batch_size` or `batch_size x num_processes`). Depending on the value of the `drop_last` attribute of the batch sampler passed, it will either stop the iteration at the first batch that would be too small or loop with indices from the beginning. Args: dataset (`torch.utils.data.dataset.IterableDataset`): The batch sampler to split in several shards. batch_size (`int`, *optional*, defaults to 1): The size of the batches per shard (if `split_batches=False`) or the size of the batches (if `split_batches=True`). drop_last (`bool`, *optional*, defaults to `False`): Whether or not to drop the last incomplete batch or complete the last batches by using the samples from the beginning. num_processes (`int`, *optional*, defaults to 1): The number of processes running concurrently. process_index (`int`, *optional*, defaults to 0): The index of the current process. split_batches (`bool`, *optional*, defaults to `False`): Whether the shards should be created by splitting a batch to give a piece of it on each process, or by yielding different full batches on each process. On two processes with an iterable dataset yielding of `[0, 1, 2, 3, 4, 5, 6, 7]`, this will result in: - the shard on process 0 to yield `[0, 1, 2, 3]` and the shard on process 1 to yield `[4, 5, 6, 7]` if this argument is set to `False`. - the shard on process 0 to yield `[0, 1, 4, 5]` and the sampler on process 1 to yield `[2, 3, 6, 7]` if this argument is set to `True`. rFr�datasetr!r(rZr[r\c��|r%|dkr||zdkrtd|�d|�d��||_||_||_||_||_||_dS)NrrzGTo use `IterableDatasetShard` in `split_batches` mode, the batch size (r_r`)rar�r!r(rZr[r\)r>r�r!r(rZr[r\s rBr9zIterableDatasetShard.__init__*s�� Z�!�^�^� �]�0J�a�0O�0O��^�Zd�^�^�LY�^�^�^�� $��"��*��*��*��rCc�v�||_t|jd��r|j�|��dSdS�NrM)r=rJr�rMrPs rBrMzIterableDatasetShard.set_epoch?sC�� 4�<��-�-� *��L�"�"�5�)�)�)�)�)� *� *rCc��|jr,t|j��|j|jzz|jzStjt|j��|j|jzz��|jzSrd)r(rfr�r!rZ�math�ceilrgs rBrnzIterableDatasetShard.__len__Dsg��>� k��%�%�$�/�D�<N�*N�O�SW�Sb�b�b��9�S��.�.�$�/�D�DV�2V�W�X�X�[_�[j�j�jrCc#�>K�t|jd��s]t|jd��rHt|jjtj��r$|jj�|j��|jr|j n|j |j z}|jr|j |j zn|j }t|j|z|jdz|z��}d}g}|jD]Q}|� |��t|��|kr'|D]}||V�� |�|��}g}�R|jsct|��dkrR|�|��}t|��|kr||z }t|��|k�|D]}||V�� dSdSdS)NrMr,rr)rJr�� isinstancer,r:rIrKr=r\r!rZ�ranger[�appendrf�copyr()r>�real_batch_size�process_batch_size� process_slice�first_batch� current_batch�element�is rBrLzIterableDatasetShard.__iter__Ks��k�2�2� ;��k�2�2� ;��4�<�1�5�?�C�C� ;� �L�"�.�.�t�z�:�:�:�-1�-?�k�$�/�/�d�o�X\�Xj�Fj��HL�HZ�o�d�o��1C�C�C�`d�`o��d�0�3E�E��HZ�]^�H^�bt�Gt�u�u� �� |� #� #�G�� )�)�)��=�!�!�_�4�4�&�+�+�A�'��*�*�*�*�*��&�"/�"4�"4�"6�"6�K� "� ��~� '�#�m�"4�"4�q�"8�"8��"�+�0�0�2�2��m�$�$��6�6��,� ��m�$�$��6�6�"� '� '��#�A�&�&�&�&�&� '� '�"8�"8� '� 'rCN)rFrrF)rQrRrSrTr rUr|r9rMrnrLr~rCrBr�r� s��F��#�+�+� �+��+�� +� �+�� +��+�+�+�+�**�*�*� k�k�k�'�'�'�'�'rCr�c�*�eZdZdZd�Zd�Zd�Zd�ZdS)�DataLoaderStateMixina� Mixin class that adds a state to a `DataLoader` to keep track of the status inside the dataloader such as at the end of the iteration, the number of items in the dataset in the last batch relative to the batch size, and other useful information that might be needed. **Available attributes:** - **end_of_dataloader** (`bool`) -- Whether at the last iteration or batch - **remainder** (`int`) -- The number of items that are remaining in the last batch, relative to the total batch size <Tip warning={true}> Inheriters of this class should ensure that the class creates a `GradientState()` instance, stored in `self.gradient_state`. </Tip> c�"�d|_d|_dS�NF��end_of_dataloader� remainder)�clsr@s rB�__init_subclass__z&DataLoaderStateMixin.__init_subclass__�s�� %�� rCc�"�d|_d|_dSr�r�rgs rB�resetzDataLoaderStateMixin.reset�s��!&��rCc�6�|��tt��5|js7t |jdt |j��}||jz|_ddd��n#1swxYwY|j � |��dS)z6Prepares the gradient state for the current dataloader�total_dataset_lengthN)r�r� Exception� _drop_lastrbr�rf�total_batch_sizer��gradient_state�_add_dataloaderrls rB�beginzDataLoaderStateMixin.begin�s�� i� � � @� @��?� @� ��/E�s�4�<�GX�GX�Y�Y��!'�$�*?�!?�� @� @� @� @� @� @� @� @� @� @� @�� @� @� @� @� ��+�+�D�1�1�1�1�1s�?A4�4A8�;A8c�:�|j�|��dS)z9Cleans up the gradient state after exiting the dataloaderN)r��_remove_dataloaderrgs rB�endzDataLoaderStateMixin.end�s��.�.�t�4�4�4�4�4rCN)rQrRrSrTr�r�r�r�r~rCrBr�r�lsZ��(��2�2�2�5�5�5�5�5rCr�c�T�eZdZdZdd�Zd�Zd�Zd�Zed��Z d �Z d �Zd�ZdS) �DataLoaderAdapterz� A class which wraps around a PyTorch `DataLoader` (or variants of it) to be used with the `Accelerator`. For compatability reasons, this class inherits from the class it wraps around, so it can be used as a drop-in. FNc��||_t��rddlm}|rt��st d��|r|tjtj�d��}d|vr6t|dd��r%tdd ��r|�d��||fd |i|��|_nt|fd |i|��|_t|jd��r |j��|_dSdS)Nr)�StatefulDataLoaderz`StatefulDataLoader is not available. Please install torchdata version 0.8.0 or higher to use it.� torchdatar1�<z0.11r2r0r$� state_dict)�use_stateful_dataloaderr�torchdata.stateful_dataloaderr��ImportErrorr�parse� importlib�metadatarrr7�base_dataloaderr rJr�� dl_state_dict)r>r�r�r$r@r��torchdata_versions rBr9zDataLoaderAdapter.__init__�sI��'>��$�5�7�7� I�H�H�H�H�H�H�"� �+U�+W�+W� ��r�� #� ^� '� �i�.@�.H�.H��.U�.U� V� V��f�$�$�$�%6��V�D�D�%�$�T�7�3�3�%�� :�&�&�&�#5�#5�g�#e�#e�]�#e�^d�#e�#e�D� � �#-�g�#]�#]�]�#]�V\�#]�#]�D� ��4�'��6�6� C�!%�!5�!@�!@�!B�!B�D�� C� CrCc�T�|dkrt��t|j|��S)Nr�)�AttributeErrorrbr�)r>�names rB�__getattr__zDataLoaderAdapter.__getattr__�s-��$�$�$� �"�"�"��t�+�T�2�2�2rCc��|jSrd)r�rgs rBr�zDataLoaderAdapter.state_dict�s��!�!rCc�:�|j�|��dSrd)r��load_state_dict)r>r�s rBr�z!DataLoaderAdapter.load_state_dict�s��,�,�Z�8�8�8�8�8rCc��|jjS)a In order to maintain backwards compatability with other code, we need to ensure `isinstance(obj, DataLoader)` returs true. This is because some downstream code assumes that the `DataLoader` is the base class of the object. )r�rArgs rBrAzDataLoaderAdapter.__class__�s��#�-�-rCc�*�t|j��Srd)rfr�rgs rBrnzDataLoaderAdapter.__len__�s��4�'�(�(�(rCc��t��jtjkr�t��jdz }|jddkr|jdxx|zcc<|jddkr|jdxx|zcc<|jd�Md|jdvr@|jdddkr+|jddxx|j|zzcc<dSdSdSdSdS)a: Adjusts the state dict for prefetching. Natively, this will adjust all of the iters yielded keys in `self.dl_state_dict` by a factor of `num_processes - 1`, however if a custom correction is needed, this can be overridden. This should modify `self.dl_state_dict` directly r�_sampler_iter_yieldedr�_num_yielded�_index_sampler_stateN�samples_yielded)r�distributed_typer�NOrZr�r!)r>�factors rB�adjust_state_dict_for_prefetchz0DataLoaderAdapter.adjust_state_dict_for_prefetch�s"��>�>�*�o�.@�@�@�!�^�^�1�A�5�F��!�"9�:�Q�>�>��"�#:�;�;�;�v�E�;�;�;��!�.�1�A�5�5��"�>�2�2�2�f�<�2�2�2��!�"8�9�E�%��);�<R�)S�S�S��*�+A�B�CT�U�XY�Y�Y��&�'=�>�?P�Q�Q�Q�UY�Ud�gm�Um�m�Q�Q�Q�Q�Q�A�@�F�E�S�S�Y�YrCc��t|jd��rC|j��|_|��|j|jd<dSdS)Nr��_iterator_finished)rJr�r�r�r�r�rgs rB�_update_state_dictz$DataLoaderAdapter._update_state_dict�sf��4�'��6�6� N�!%�!5�!@�!@�!B�!B�D��/�/�1�1�1�7;�7M�D��3�4�4�4� N� NrC)FN) rQrRrSrTr9r�r�r�r}rArnr�r�r~rCrBr�r��s�� C�C�C�C�03�3�3�"�"�"�9�9�9��.�.��X�.�)�)�)�n�n�n�.N�N�N�N�NrCr�c��eZdZdZ ddedef�fd� Zd�Z�fd �Zd efd�Z e d��Ze d ��Zd�Z d�Z�xZS)�DataLoaderSharda� Subclass of `DataLoaderAdapter` that will deal with device placement and current distributed setup. Args: dataset (`torch.utils.data.dataset.Dataset`): The dataset to use to build this dataloader. device (`torch.device`, *optional*): If passed, the device to put all batches on. rng_types (list of `str` or [`~utils.RNGType`]): The list of random number generators to synchronize at the beginning of each iteration. Should be one or several of: - `"torch"`: the base torch random number generator - `"cuda"`: the CUDA random number generator (GPU only) - `"xla"`: the XLA random number generator (TPU only) - `"generator"`: an optional `torch.Generator` synchronized_generator (`torch.Generator`, *optional*): A random number generator to keep synchronized across processes. skip_batches (`int`, *optional*, defaults to 0): The number of batches to skip at the beginning. use_stateful_dataloader (`bool`, *optional*, defaults to `False`): Whether to have this class adapt `StatefulDataLoader` from `torchdata` instead of the regular `DataLoader`. **kwargs (additional keyword arguments, *optional*): All other keyword arguments to pass to the regular `DataLoader` initialization. **Available attributes:** - **total_batch_size** (`int`) -- Total batch size of the dataloader across all processes. Equal to the original batch size when `split_batches=True`; otherwise the original batch size * the total number of processes - **total_dataset_length** (`int`) -- Total length of the inner dataset across all processes. NrFr�� _non_blockingc ��t��j|fd|i| ��||_||_||_||_t ��|_||_||_ d|_ dS)Nr�r)r8r9rH� rng_types�synchronized_generator�skip_batchesrr�r�r�� iteration)r>r�rHr�r�r�r�r�r��torch_device_meshr@rAs �rBr9zDataLoaderShard.__init__so�� \�\�:Q�\�U[�\�\�\��"��&<��#�(��+�o�o��$��*��rCc#�K�|j�t|j|j��|��|�|j��|j��} t|��}n#t$rdV�YnwxYwd} |j �t||j |j��}|� ��t|��}||jkr|V�|dz }|}n:#t$r-d|_|� ��||jkr|V�YnwxYw��|xjdz c_|��dS)NrT��non_blockingr)r�rr�r�rMr�r�rL�next� StopIterationrHrr�r�r�r�r�)r>�dataloader_iterr��batch_index� next_batchs rBrLzDataLoaderShard.__iter__-s��>�%�"�4�>�4�3N�O�O�O�� t�~�&�&�&��.�7�7�9�9�� 1�1�M�M�� E�E�E�E�E� �� ;�*�$2�=�$�+�\`�\n�$o�$o�$o�M��'�'�)�)�)�!�/�2�2� ��$�"3�3�3�'�'�'�'��q� �� *� � �� )-��&��'�'�)�)�)��$�"3�3�3�'�'�'�'�� $ ��!�� s%�,A<�<B �B �AC1�14D(�'D(c�j��t��}tg|dd��RS)a Define the `__reduce__` method to ensure a `DataLoaderShard` can be pickled and unpickled. This needs to be explicitly defined since default pickling behavior is broken by `DataLoaderAdapter` messing with its `__class__` member. rN)r8� __reduce__r��r>r?rAs �rBr�zDataLoaderShard.__reduce__Ps3��w�w�!�!�#�#��+�$�q�r�r�(�+�+�+rCr=c��|j|kr||_t|jd��r|j�|��t|jd��r;t|jjd��r!|jj�|��dSt|jd��r|j�|��dSdS)NrMr#)r�rJr$rMr#r�rPs rBrMzDataLoaderShard.set_epochYs��>�U�"�"�"�D�N��4�%�{�3�3� 0��(�(��/�/�/��4�%�y�1�1� *�g�d�>P�>X�Ze�6f�6f� *��&�0�0��7�7�7�7�7��T�\�;� /� /� *��L�"�"�5�)�)�)�)�)� *� *rCc��t|jt��r|jn|j}t |dd��r|jn|jt |dd��zS)Nr\FrZr)r�r#rr$rbr!)r>r$s rBr�z DataLoaderShard.total_batch_sizefs^��(2�4�<��(N�(N�f��TX�Tf� ��}�o�u�=�=� Y�M�$�$��*�W�]�O�UV�-W�-W�W� rCc�l�t|jd��r|jjSt|j��S)Nrh)rJr�rhrfrgs rBr�z$DataLoaderShard.total_dataset_lengthos1��4�<��0�0� %��<�,�,��t�|�$�$�$rCc� �t|��Srd��get_samplerrgs rBr�zDataLoaderShard.get_samplerv��4� � � rCc��t|jt��}|r||j_dS||j_t |jd��r||jj_dSdS�Nr$�r�r#rr$rJ�r>r#�sampler_is_batch_samplers rB�set_samplerzDataLoaderShard.set_samplery�n��#-�d�l�L�#I�#I� �#� C�#*�D�L� � � �)0�D��&��t�)�?�;�;� C�;B��"�0�8�8�8� C� CrC)NNNrFFFN)rQrRrSrTr|r9rLr�rUrMr}r�r�r�r�rVrWs@rBr�r��s�� J��#�� %� �#��.!�!�!�F,�,�,�,�,�*�s�*�*�*�*�� X� ��%�%��X�%�!�!�!�C�C�C�C�C�C�CrCr�c��eZdZdZdedejf�fd�Z�fd�Zde fd�Z ed��Zed ��Z ed ��Zed��Z�xZS)�MpDeviceLoaderWrappera� Wrapper for the xpl.MpDeviceLoader class that knows the total batch size. XLA preloading threads will all call DataLoaderShard's __iter__(). Remove rng_types from DataLoaderShard to prevent it from using the XLA device in the preloading threads, and synchronize the RNG once from the main thread only. **Available attributes:** - **total_batch_size** (`int`) -- Total batch size of the dataloader across all processes. Equal to the original batch size when `split_batches=True`; otherwise the original batch size * the total number of processes - **total_dataset_length** (`int`) -- Total length of the inner dataset across all processes. � dataloaderrHc��t��||��|jj|_d|j_||_dSrd)r8r9�_loaderr�� _rng_typesrH)r>r�rHrAs �rBr9zMpDeviceLoaderWrapper.__init__�s>��G�G��Z��0�0�0�"�l�4�D�O�%)�D�L�"� �D�K�K�KrCc��|j�t|j|jj��t ��Srd)r�rr�r�r8rL)r>rAs �rBrLzMpDeviceLoaderWrapper.__iter__�s8��*�&�t��8[�\�\�\��7�7�#�#�%�%�%rCr=c�h�t|jd��r|j�|��dSdSr�)rJr�rMrPs rBrMzMpDeviceLoaderWrapper.set_epoch�s<��t��4�4� 1��)�)�%�0�0�0�0�0� 1� 1rCc��|jjSrd)r�r�rgs rBr�z&MpDeviceLoaderWrapper.total_batch_size�s ��<�0�0rCc��|jjSrd)r�r�rgs rBr�z*MpDeviceLoaderWrapper.total_dataset_length�s ��<�4�4rCc��|jjSrd)r�r$rgs rBr$z#MpDeviceLoaderWrapper.batch_sampler�s ��<�-�-rCc��|jSrd)r�rgs rBr�z MpDeviceLoaderWrapper.dataloader�s ��<�rC)rQrRrSrTr�r:rHr9rLrUrMr}r�r�r$r�rVrWs@rBr�r��s�� !�� !�� !� !� !� !� !� !� &� &� &� &� &� 1�3� 1� 1� 1� 1� � 1� 1� �� 1� � 5� 5� �� 5� � .� .� �� .� � � � �� rCr�c��eZdZdZ ddededef�fd� Zd �Zd �Zdefd�Z d �Z �fd�Zed��Z ed��Zd�Zd�Z�xZS)�DataLoaderDispatcheraD Subclass of `DataLoaderAdapter` that will iterate and preprocess on process 0 only, then dispatch on each process their part of the batch. Args: split_batches (`bool`, *optional*, defaults to `False`): Whether the resulting `DataLoader` should split the batches of the original data loader across devices or yield full batches (in which case it will yield batches starting at the `process_index`-th and advancing of `num_processes` batches at each iteration). Another way to see this is that the observed batch size will be the same as the initial `dataloader` if this option is set to `True`, the batch size of the initial `dataloader` multiplied by `num_processes` otherwise. Setting this option to `True` requires that the batch size of the `dataloader` is a round multiple of `batch_size`. skip_batches (`int`, *optional*, defaults to 0): The number of batches to skip at the beginning of an iteration. use_stateful_dataloader (`bool`, *optional*, defaults to `False`): Whether to have this class adapt `StatefulDataLoader` from `torchdata` instead of the regular `DataLoader`. **Available attributes:** - **total_batch_size** (`int`) -- Total batch size of the dataloader across all processes. Equal to the original batch size when `split_batches=True`; otherwise the original batch size * the total number of processes - **total_dataset_length** (`int`) -- Total length of the inner dataset across all processes. FrNr\r�r�c ��d} ddlm}t||��r|j} t ��j|fd|i| ��||_| r+tjj j �|| ��t��|_ t��|_||_||_||_||_|�t(n||_d|_d|_d|_d|_|jr`d|jjvrR|jd|_d|jjvr|jd|_d|jjvr|jd|_|jr|js|jrt7d ��dSdS) NFr)�ShufflerIterDataPiper�)r"�tp�dp�fsdpz4TP + (DP/FSDP) is not yet supported in dispatch mode)�-torch.utils.data.datapipes.iter.combinatoricsrr��_shuffle_enabledr8r9r\r:�utils�data�graph_settings�apply_shuffle_settingsrr�r�stater�r�r�r�r�slice_fnr�� submesh_tp� submesh_dp�submesh_fsdp�mesh_dim_namesra) r>r�r\r�r�r�r�rr�r@r"rrAs �rBr9zDataLoaderDispatcher.__init__�s��V�V�V�V�V�V��g�3�4�4� /��.�G��\�\�:Q�\�U[�\�\�\�*�� ]��K��+�B�B�7�T[�B�\�\�\�+�o�o��!�^�^�� $��*��(��!2��)1�)9� � �x�� !� C�d�d�.D�.S�&S�&S�"�4�T�:�D�O��t�-�<�<�<�"&�"8��">��/�>�>�>�$(�$:�6�$B��!��?� U�� U�4�3D� U��S�T�T�T� U� U� U� UrCc��d\}}|jjdk�r7 |jrE|jrt�d��|��t|��}n�g}|jr4|��t|��}|g|jjz}nRt|jj��D]8}|��|� t|��9 t|d��}n"#t$r}td��|�d}~wwxYwt|��dg}n#t$rddg}Yn wxYwd|jg}t!|��|d|_|jrg|js`|jsY|jjdkr6t%|��dkr#t|d��}t|��dg}nddg}t!|��||fS) N)NNrz�Use of split_batches for TP would need the dataloader to produce duplicate batches,otherwise, use dispatch_batches=True instead.��dimaGYou can't use batches of different size with `dispatch_batches=True` or when using an `IterableDataset`.either pass `dispatch_batches=False` and have each process fetch its own batch or pass `split_batches=True`. By doing so, the main process will fetch a full batch and slice it into `num_processes` batches for each process.FTr)rr[r\r �logger�warningr�r�rZr�r�r�RuntimeErrorrr��_stop_iterationrr�rf)r>�iterator�batchesrw�_�e� batch_infos rB�_fetch_batchesz#DataLoaderDispatcher._fetch_batches sA��#��:�#�q�(�(�( *��%�!��L��+�+�-�-�-� ��N�N�E�E�!�G��;��/�/�1�1�1� $�X��#(�'�D�J�,D�"D��!&�t�z�'?�!@�!@�;�;�A� �3�3�5�5�5�#�N�N�4��>�>�:�:�:�:�!� +�G�� ;� ;� ;��'�!�!�!�*�V�� !�!��!��1��7�7��?� � �� *� *� *�"�D�\� � � � *�� 4�5�J��j�)�)�)�)�!�}�� 2��%� 2�d�o� 2��:�+�q�0�0�S��\�\�A�5E�5E�'��Q�7�7�7�E�"4�U�";�";�U�!C�J�J�"&��J�%�j�1�1�1��j� � s6�CD:�4D�D:� D%�D � D%�%D:�:E� Ec#�VK�|��|�|j��d}tdd��r|j��}n)|jjdkr|j��}d}d|_d}|� |��\}}d}|�s�||}}|jjdkrt|d��}t||jj|j ��}t|d��}|jsG|�E|�|t#d|jj��|jj|jj��}|�t'd|�d ��t)|��} | |jjz} |j}|s)|� |��\}}|jr |d�d }|js-|r+| |jjzdkrt+||gd��}| dz } t#|jj| z|jjdz| z��}|�|||jj|jj��}|r"d |_|��| |_||jkr|V�|dz }|��|xjdz c_|��dS) Nr2z2.0.1rFr�)�from_process)r[rZz"Batch does not contain any data (`zM`). At the end of all iterable data available before expected stop iteration.Trr)r�rMr�rr�rLrr[rrrrrHr�rr�r�slicerZrarrr�r�r�r�r�)r>� main_iterator�stop_iterationr�r��next_batch_infor�rwr�observed_batch_sizer!� data_slices rBrLzDataLoaderDispatcher.__iter__Ls�� t�~�&�&�&�� D�'�*�*� <�!�0�9�9�;�;�M�M� �Z� %�� *� *� �0�9�9�;�;�M��$��&*�&9�&9�-�&H�&H�#� �O�� 8 � *�O�:�E��z�'�1�,�,�*�:�a�=�9�9��"�5�$�*�*;�$�J\�]�]�]�E��e�!�4�4�4�E��?� �{�':�"�m�m��!�T�Z�5�6�6�"&�*�":�"&�*�":� ,��}� �N��N�N�N��#2�%�"8�"8��,�� 0H�H�J�!�1�N�!� *�/3�.A�.A�-�.P�.P�+� �O��'�*�O�A�,>�,F�%)�N��?� �~� �:M�PT�PZ�Ph�:h�lm�:m�:m�#�U�K�$8�a�@�@�@��a�� t�z�7�*�D�t�z�G_�bc�Gc�gq�Fq�r�r�J��M�M��"�j�6�"�j�6� "��E�� 5�)-��&��'�'�)�)�)�!4��d�/�/�/��1��K�q!�8 �r ��!�� rCr=c�,�|j|kr||_t|jd��r;t|jjd��r!|jj�|��dSt|jd��r|j�|��dSdS)Nr#rM)r�rJr$r#rMr�rPs rBrMzDataLoaderDispatcher.set_epoch�s��>�U�"�"�"�D�N��4�%�y�1�1� *�g�d�>P�>X�Ze�6f�6f� *��&�0�0��7�7�7�7�7� �T�\�;� /� /� *��L�"�"�5�)�)�)�)�)� *� *rCc��t|j��}|jr|S|jr||jjzSt j||jjz��Srd)rfr�r\r�rrZr�r�)r>�whole_lengths rBrnzDataLoaderDispatcher.__len__�sY��4�/�0�0�� F�� _� F��4�:�#;�;�;��9�\�D�J�,D�D�E�E�ErCc�j��t��}tg|dd��RS)a Define the `__reduce__` method to ensure a `DataLoaderDispatcher` can be pickled and unpickled. This needs to be explicitly defined since default pickling behavior is broken by `DataLoaderAdapter` messing with its `__class__` member. rN)r8r�r�r�s �rBr�zDataLoaderDispatcher.__reduce__�s3��w�w�!�!�#�#��$�0�t�A�B�B�x�0�0�0rCc�Z�|jr|jjn|jj|jjzSrd)r\r�r!rZrgs rBr�z%DataLoaderDispatcher.total_batch_size�s/��(,�'9�u�D�L�#�#��@W�Z^�Zf�Zt�@t� rCc�*�t|j��Srd)rfr�rgs rBr�z)DataLoaderDispatcher.total_dataset_length�s��4�<� � � rCc� �t|��Srdr�rgs rBr�z DataLoaderDispatcher.get_sampler�r�rCc��t|jt��}|r||j_dS||j_t |jd��r||jj_dSdSr�r�r�s rBr�z DataLoaderDispatcher.set_sampler�r�rC)FrFFFNN)rQrRrSrTr|r9rrLrUrMrnr�r}r�r�r�r�rVrWs@rBr�r��sS��:$�� %� �#��5U�5U��5U�� 5U��5U�5U�5U�5U�5U�5U�n@!�@!�@!�DJ�J�J�X*�s�*�*�*�*�F�F�F�1�1�1�1�1�� X� � �!�!��X�!�!�!�!�C�C�C�C�C�C�CrCr�c��t|jt��}|rt|jdd��}nt|jdd��}|S)a Get the sampler associated to the dataloader Args: dataloader (`torch.utils.data.dataloader.DataLoader`): The data loader to split across several devices. Returns: `torch.utils.data.Sampler`: The sampler associated to the dataloader r#N)r�r#rrbr$)r�r�r#s rBr�r��sO�� *�*�*<�l�K�K��E��*�,�i��>�>��*�2�I�t�D�D��NrCr�rHrZr[r\� put_on_devicer��dispatch_batchesr]�slice_fn_for_dispatch�use_seedable_samplerr6r�r��returnc��|�|sd}nt�jt��}|r|std��t ��}|�|j}|�|j}|r�|jtj kr.d|j vr|d��}||z}||z}nyd}d}d}d|j vr|d��}d|j vr|d��}d|j vr|d��}||z}||z}|r��j��j}nGt�jd��r �jj}n%td t�j��d ��|dkr$||zdkrtd�j�d |�d��j}t|t��s�jnd}t�jt"��}d}t%��}t|t&��ro| rmt)|j|j|jt1|dt3jtt2d��rt3j��nd��|��}t�jt&��r�|jtjkr�t3jtt2d��rt3j��nd��}t;t3jdt2j�� !��}|�"|��|�_#|�j_#|dks|jtj$k�rK|�sHt|t��rBt1�jdd��jj#}tK|�j�j&|||��}n�| s�t|d��r�|j#��t3jtt2d��rt3j��nd��|_#t;t3jdt2j�� !��}|j#�"|��|j#}|r�jn�j}tO|||||��}gd��|�|�d|vr|�(d��fd�tRD��}|�"�j&|d<|r|s �j|zn�j|d<|r0|�*d��tW|f||�j&|| | |d�|��no|r:tY|f|r|jtjkr|nd|�j|�j&||| d�|��n3tY|f|r|jtjkr|nd|||�j&|| d�|��t|t(��r| r��-|��|jtjkrt]�|��S�S)aZ Wraps a PyTorch `DataLoader` to generate batches for one of the processes only. Depending on the value of the `drop_last` attribute of the `dataloader` passed, it will either stop the iteration at the first batch that would be too small / not present on all processes or loop with indices from the beginning. Args: dataloader (`torch.utils.data.dataloader.DataLoader`): The data loader to split across several devices. device (`torch.device`): The target device for the returned `DataLoader`. num_processes (`int`, *optional*): The number of processes running concurrently. Will default to the value given by [`~state.PartialState`]. process_index (`int`, *optional*): The index of the current process. Will default to the value given by [`~state.PartialState`]. split_batches (`bool`, *optional*, defaults to `False`): Whether the resulting `DataLoader` should split the batches of the original data loader across devices or yield full batches (in which case it will yield batches starting at the `process_index`-th and advancing of `num_processes` batches at each iteration). Another way to see this is that the observed batch size will be the same as the initial `dataloader` if this option is set to `True`, the batch size of the initial `dataloader` multiplied by `num_processes` otherwise. Setting this option to `True` requires that the batch size of the `dataloader` is a round multiple of `batch_size`. put_on_device (`bool`, *optional*, defaults to `False`): Whether or not to put the batches on `device` (only works if the batches are nested list, tuples or dictionaries of tensors). rng_types (list of `str` or [`~utils.RNGType`]): The list of random number generators to synchronize at the beginning of each iteration. Should be one or several of: - `"torch"`: the base torch random number generator - `"cuda"`: the CUDA random number generator (GPU only) - `"xla"`: the XLA random number generator (TPU only) - `"generator"`: the `torch.Generator` of the sampler (or batch sampler if there is no sampler in your dataloader) or of the iterable dataset (if it exists) if the underlying dataset is of that type. dispatch_batches (`bool`, *optional*): If set to `True`, the dataloader prepared is only iterated through on the main process and then the batches are split and broadcast to each process. Will default to `True` when the underlying dataset is an `IterableDataset`, `False` otherwise. even_batches (`bool`, *optional*, defaults to `True`): If set to `True`, in cases where the total batch size across all processes does not exactly divide the dataset, samples at the start of the dataset will be duplicated so the batch can be divided equally among all workers. slice_fn_for_dispatch (`Callable`, *optional*`): If passed, this function will be used to slice tensors across `num_processes`. Will default to [`~utils.slice_tensors`]. This argument is used only when `dispatch_batches` is set to `True` and will be ignored otherwise. use_seedable_sampler (`bool`, *optional*, defaults to `False`): Whether to use the [`~data_loader.SeedableRandomSampler`] instead of a `RandomSampler` for better reproducability. Comes at a cost of potentially different performances due to different shuffling algorithms but ensures results will be the *exact* same. Should be paired with `set_seed()` at every `self.set_epoch` data_seed (`int`, *optional*, defaults to `None`): The seed to use for the underlying generator when using `use_seedable_sampler`. If `None`, the generator will use the current default seed from torch. non_blocking (`bool`, *optional*, defaults to `False`): If set to `True`, dataloader will utilize non-blocking host-to-device transfers. If the dataloader has `pin_memory` set to `True`, this will help to increase overlap between data transfer and computations. use_stateful_dataloader (`bool`, *optional*, defaults to `False`): "If set to true, the dataloader prepared by the Accelerator will be backed by " "[torchdata.StatefulDataLoader](https://github.com/pytorch/data/tree/main/torchdata/stateful_dataloader). This requires `torchdata` version 0.8.0 or higher that supports StatefulDataLoader to be installed." torch_device_mesh (`torch.distributed.DeviceMesh`, *optional*, defaults to `None`): PyTorch device mesh. Returns: `torch.utils.data.dataloader.DataLoader`: A new data loader that will yield the portion of the batches <Tip warning={true}> `BatchSampler`s with varying batch sizes are not enabled by default. To enable this behaviour, set `even_batches` equal to `False` </Tip> NFz<Using `dispatch_batches=True` requires `put_on_device=True`.rrrrr!aIn order to use `split_batches==True` you must have a `batch_size` attribute either in the passed `dataloader` or `dataloader.batch_sampler` objects, and it has to return a natural number. Your `dataloader.batch_size` is None and `dataloader.batch_sampler` (`z0`) does not have the `batch_size` attribute set.rz?To use a `DataLoader` in `split_batches` mode, the batch size (r_r`r,rErFrG)�data_source�replacement�num_samplesr,r6r~)�dtype)r!r(rZr[r\)rZr[r\r]�r!r"r#r$r(c �R��i|]#}|�v�|t�|t|��$Sr~�rb�_PYTORCH_DATALOADER_KWARGS��.0�kr�� ignore_kwargss ��rB� <dictcomp>z'prepare_data_loader.<locals>.<dictcomp>��B�� M�!�!� �7�:�q�"<�Q�"?�@�@�!�!�!rCr()r\r$r�r�rr�r�)rHr#r!r�r�r�r�r�)rHr$r�r�r�r�r�)/r�r�r rarrZr[r�r� DEEPSPEEDr�sizer!rJr$�typer#rr�rr4r5r6�_num_samplesrbr:rIrE�XLArU�empty�int64�random_�itemrKr,�MEGATRON_LMr�r(rY�remover<r7r�r�r�r�)r�rHrZr[r\r/r�r0r]r1r2r6r�r�r�r�submesh_tp_size�submesh_fsdp_size�submesh_dp_size�batch_size_for_check�new_dataset�new_batch_samplerr�r�r#r,rNr$r@r@s` @rB�prepare_data_loaderrT�sk��B�� O�$��)�*�*<�o�N�N��Y� �Y��W�X�X�X��N�N�E��+� ��+� ��@��!�_�%>�>�>��(�7�7�7�"3�D�"9�">�">�"@�"@��)�_�<�M�)�_�<�M�M�!"��O��O��(�7�7�7�"3�D�"9�">�">�"@�"@��(�7�7�7�"3�D�"9�">�">�"@�"@��*�9�9�9�$5�f�$=�$B�$B�$D�$D�!�)�_�<�M�-��?�M�� ,�#-�#8� � ��z�/��>�>� �'1�'?�'J�$�$� �j��j�6�7�7�j�j�j�� !�#�#�(<�}�(L�PQ�(Q�(Q��^�R\�Rg�^�^�LY�^�^�^�� $�K�8B�;�P_�8`�8`�j� �0�0�fj��)�*�*<�l�K�K��!��*�%�%�G��'�=�)�)� �.B� � (��+��+��,��W�U�Th�Ei�Ei�'t�u�'?�'A�'A�'A�ot�u�u�u�� *�$�m�4�4�1��9O�Sb�Sf�9f�9f��O�18��@T�1U�1U�`�5�+�-�-�-�[`� � � � ��5�;�r��5�5�5�=�=�?�?�D�D�F�F�G�G��d�#�#�#�(� ��'0� ��$��e�4��8S�S�S�]m�S��k�?�3�3� ��z�)�;��=�=�I�)3�);�)E�&�.��%�0�$�.�+�+�+� ��K�K�(� ;�G�G�[�,I�,I� ;��$�,�(-��=D�U�L`�=a�=a�l�u�7�9�9�9�gl�)�)�)�G�%��u�{�2�U�[�A�A�A�I�I�K�K�P�P�R�R�S�S�D��%�1�1�$�7�7�7�)0�):�&�2J�h�J�.�.�PZ�Ph�M� 1��+�+�+�)�!�!�!��M��!7�!?�K�S\�D\�D\��%�%�%��+��F�� (�2��{��6C�w�L\�w�J�!�]�2�2�bl�bw� �|��% �� ;��)�� '�+�!�+�&�*�$;�/� � �� "� �$�� *�f�u�/E��I\�/\�/\�6�6�bf�%�!�,��!�+�&�#9�$;� � �� %�� *�f�u�/E��I\�/\�/\�6�6�bf�+��#9�!�+�&�$;� � �� '�0�1�1�(�6J�(��w�'�'�'��!4�4�4�$�Z��8�8�8��rCc�<�eZdZdZdd�Zd�Zed��Zd�ZdS) �SkipBatchSamplerz� A `torch.utils.data.BatchSampler` that skips the first `n` batches of another `torch.utils.data.BatchSampler`. Should not be used if the original dataloader is a `StatefulDataLoader`. rc�"�||_||_dSrd)r$r�)r>r$r�s rBr9zSkipBatchSampler.__init__s��*��(��rCc#�\K�t|j��D]\}}||jkr|V��dSrd)rsr$r�)r>�index�sampless rBrLzSkipBatchSampler.__iter__sF��'��(:�;�;� � �N�E�7��)�)�)�� rCc�*�t|j��Srdrergs rBrhzSkipBatchSampler.total_lengthrirCc�:�t|j��|jz Srd)rfr$r�rgs rBrnzSkipBatchSampler.__len__s��4�%�&�&��):�:�:rCN�r) rQrRrSrTr9rLr}rhrnr~rCrBrVrV sk�� )�)�)�)�� '�'��X�'�;�;�;�;�;rCrVc�:��eZdZdZd�fd� Zd�Zd�Z�fd�Z�xZS) �SkipDataLoadera Subclass of a PyTorch `DataLoader` that will skip the first batches. Generally it's preferable to use `skip_first_batches`/`torchdata.StatefulDataLoader` instead of this class. Args: dataset (`torch.utils.data.dataset.Dataset`): The dataset to use to build this dataloader. skip_batches (`int`, *optional*, defaults to 0): The number of batches to skip at the beginning. kwargs: All other keyword arguments to pass to the regular `DataLoader` initialization. rFc�t��t��j|fd|i|��||_t��|_dS)Nr�)r8r9r�rr�)r>r�r�r�r@rAs �rBr9zSkipDataLoader.__init__/sC��\�\�:Q�\�U[�\�\�\�(��+�o�o��rCc#��K�|��t|j��D](\}}||jkr|��|V��)|��dSrd)r�rsr�rLr�r�r�)r>rYrws rBrLzSkipDataLoader.__iter__4sx�� %�d�&:�&C�&C�&E�&E�F�F� � �L�E�5��)�)�)��'�'�)�)�)�� rCc�:�t|j��|jz Srd)rfr�r�rgs rBrnzSkipDataLoader.__len__<s��4�'�(�(�4�+<�<�<rCc�j��t��}tg|dd��RS)a Define the `__reduce__` method to ensure a `SkipDataLoader` can be pickled and unpickled. This needs to be explicitly defined since default pickling behavior is broken by `DataLoaderAdapter` messing with its `__class__` member. rN)r8r�r_r�s �rBr�zSkipDataLoader.__reduce__?s3��w�w�!�!�#�#��*��a�b�b��*�*�*rC)rF) rQrRrSrTr9rLrnr�rVrWs@rBr_r_!s~��.�.�.�.�.�.� ��=�=�=�+�+�+�+�+�+�+�+�+rCr_c�&�� t��}|jtjkr�j}�j��j}d}t|t��rd}n;t�j t��}|r�j n�j}t||��}gd�� fd�tD��}|��j|d<�j|d<t�t ��r#|�||d<t!|f�j|�jd �|��nvt�t&��r@|�||d<n|r||d <�j|d<n||d<t'|f�j�j�jd�|��n!|�t-|fd|i|��nt/|fd|i|��|jtjkrt1�|��S) z� Creates a `torch.utils.data.DataLoader` that will efficiently skip the first `num_batches`. Should not be used if the original dataloader is a `StatefulDataLoader`. FN)r�r9c �R��i|]#}|�v�|t�|t|��$Sr~r;r=s ��rBrAz&skip_first_batches.<locals>.<dictcomp>erBrCr(r!r�)r\r$r�r#r$)rHr�r�)rr�rrGrHr�r�r�r r#rr$rVr<r(r!r�r\r�r�r�r�r_r r�) r��num_batchesrrHr�r�rSr$r@r@s ` @rB�skip_first_batchesrgIsD�� N�N�E��!4�4�4��"��*� �� G�$��'�?�+�+�V� ��#-�j�.@�,�#O�#O� �.F�d� �*�*�J�Ld� �,�]��U�U�U��M��+��F�� (�2��{��)�4��|��*�2�3�3� X��$�%0�F�>�"�)�� $�2�+�!�,� � � � � � � � �J�� 0� 0�X��$�%0�F�>�"�"� %� 8� 1�F�9��#-�#8�F�<� � �&7�F�?�#�$�� $� �*�#-�#D� � � � � � � ��$�'��T�T�k�T�V�T�T�J�J�#�G�W�W�;L�W�PV�W�W�J��!4�4�4�*�:�v�>�>� ��rC)NNNFFNNTNFNFFNr])Dr�r�� contextlibr�typingrrrr:� packagingr�torch.utils.datarr r r�loggingr rrrrrrrrrrrrrrrrrrrrQrr<�%_PYTORCH_DATALOADER_ADDITIONAL_KWARGS�items�v�additional_kwargs�updater4rYr�r�r�r��%torch_xla.distributed.parallel_loader�distributed�parallel_loader�xpl�MpDeviceLoaderr�r�r�rHrUr|�list�strrTrVr_rgr~rCrB�<module>rysv��,�,�,�,�,�,�,�,�,�,��U�U�U�U�U�U�U�U�U�U�U�U��W�W�W�W�W�W�W�W�W�W�W�W��" ��H� � ��#��&*1�:�t�2D�(E�%�A�G�G�I�I�=�=��A��a� � �=�"�)�)�*;�<�<�<��"�"�"�"�"�M�"�"�"�JY�Y�Y�Y�Y��Y�Y�Y�x`'�`'�`'�`'�`'�?�`'�`'�`'�F(5�(5�(5�(5�(5�(5�(5�(5�VYN�YN�YN�YN�YN�YN�YN�YN�xMC�MC�MC�MC�MC�'�)=�MC�MC�MC�`��2 �7�7�7�7�7�7�7�7�7�/ �/ �/ �/ �/ �� 2�/ �/ �/ �dOC�OC�OC�OC�OC�,�.B�OC�OC�OC�d��(&*�#'�#'��59�'+��04�!&�#��$)��k�k��k��U�\�"�k��C�=�k��C�=� k� �k�� k��U�3��<�0�1�2�k��t�n�k��k�$�H�-�k��k��}�k��k�"�k� �!k�k�k�k�\ ;�;�;�;�;�|�;�;�;�.%+�%+�%+�%+�%+�&�(<�%+�%+�%+�PL�L�L�L�L�LrC