� %�gOy��h�dZddlZddlZddlmZddlmZmZddlm Z m Z ddlmZddl Z ddlmZmZd d lmZd dlmZmZd dlmZmZmZe��r ddlmcmZe��rdd lmZd�Z d�Z!d�Z"d�Z#d�Z$e dd�d�Z%dId�Z&d�Z'd�Z(d�Z)d�Z*d�Z+d�Z,d�Z-d�Z.Gd�d e/��Z0d!�Z1d"�Z2e1d#��Z3d$efd%�Z4d$efd&�Z5dJd'�Z6dKd)�Z7e j8d e j9de j:d*e j;d+e j<d,e j=d-e j>d.e j?d/e j@d0e jAd1i ZBd2�eB�C��D��ZDd3�ZEdLd4e jFfd5�ZGe1dJd6eHfd7��ZIdJd6eHfd8�ZJdMd9�ZKdJd:�ZLGd;�d<eM��ZNe2dNd=��ZOdJd>�ZPe1dOdA��ZQdB�ZRGdC�dD��ZSdE�ZTdF�ZUedPdH��ZVdS)QzB A set of basic tensor ops compatible with tpu, gpu, and multigpu �N)�Mapping)�contextmanager�nullcontext)�update_wrapper�wraps)�Any�)�AcceleratorState�PartialState�)�!TORCH_DISTRIBUTED_OPERATION_TYPES)�DistributedType�TensorInformation)�is_npu_available�is_torch_distributed_available�is_torch_xla_available)�ReduceOpc�6�t|tj��S�N)� isinstance�torch�Tensor��tensors �k/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/accelerate/utils/operations.py�is_torch_tensorr,s��f�e�l�+�+�+�c ��t|tjjtjjtjjtjjtjjtjjtjj ��Sr) rr�xpu�FloatTensor� ByteTensor� IntTensor� LongTensor� HalfTensor�DoubleTensor�BFloat16Tensorrs r�is_torch_xpu_tensorr'0sO�� rc�,�t|t��Sr)rr��tensor_infos r�is_tensor_informationr+=s��k�#4�5�5�5rc�l�t|t��ot|d��ot|d��S)z� Checks if `data` is a `namedtuple` or not. Can have false positives, but only if a user is trying to mimic a `namedtuple` perfectly. �_asdict�_fields)r�tuple�hasattr��datas r� is_namedtupler3As3�� d�E�"�"�\�w�t�Y�'?�'?�\�G�D�R[�D\�D\�\rc��t|��rt|��t|��St|��|��S)zO Cast a generator to the same type as obj (list, tuple, or namedtuple) )r3�type�list)�obj� generators r� honor_typer9IsC�� S��$��t�C�y�y�$�y�/�/�*�*��t�C�y�y��#�#�#rF�� test_type�error_on_other_typec ��t|ttf��r t|��fd�|D��St|t��r:t|��fd�|��D��S�|��r�|g��Ri��S�r0tdt|��d�j�d�j�d��|S)ad Recursively apply a function on a data structure that is a nested list/tuple/dictionary of a given base type. Args: func (`callable`): The function to recursively apply. data (nested list/tuple/dictionary of `main_type`): The data on which to apply `func` *args: Positional arguments that will be passed to `func` when applied on the unpacked data. main_type (`type`, *optional*, defaults to `torch.Tensor`): The base type of the objects to which apply `func`. error_on_other_type (`bool`, *optional*, defaults to `False`): Whether to return an error or not if after unpacking `data`, we get on an object that is not of type `main_type`. If `False`, the function will leave objects of types different than `main_type` unchanged. **kwargs (additional keyword arguments, *optional*): Keyword arguments that will be passed to `func` when applied on the unpacked data. Returns: The same data structure as `data` with `func` applied to every object of type `main_type`. c3�@�K�|]}t�|g��R��d��V��dS)r:N��recursively_apply)�.0�o�argsr<�func�kwargsr;s ��r� <genexpr>z$recursively_apply.<locals>.<genexpr>msm�� "��!��"��.7�M`��dj�� rc �@��i|]\}}|t�|g��R��d��S)r:r?)rA�k�vrCr<rDrEr;s ��r� <dictcomp>z%recursively_apply.<locals>.<dictcomp>vsf�� A�q��$��!��"��.7�M`��dj�� rzUnsupported types (z ) passed to `z?`. Only nested list/tuple/dicts of objects that are valid for `z` should be passed.) rr/r6r9rr5�items� TypeError�__name__)rDr2r;r<rCrEs` ````rr@r@Tsw��,�$�� &�&� �� D�'� "� "� ��t�D�z�z� � � � � � � � �!�J�J�L�L� � � � � � � ��4�� t�D�*�4�*�*�*�6�*�*�*� � �� S�$�t�*�*� S� S�4�=� S� S�+4�+=� S� S� S� � � ��Krc��t|��st|d��r��dkrd� |��S#t$r|��cYSt$r5}t��rt �t��rd��n|�Yd}~nd}~wwxYw |��S#t$r|��cYSwxYwt |ttf��rt|��fd�|D��St |t��rUt �t��r�g�n��g�t|��fd�|��D��S|S) a� Recursively sends the elements in a nested list/tuple/dictionary of tensors to a given device. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to send to a given device. device (`torch.device`): The device to send the data to. Returns: The same data structure as `tensor` with all tensors sent to the proper device. �to�npuznpu:0)�non_blockingznpu:Nc3�>�K�|]}t|��V��dS)�rQ� skip_keysN��send_to_device)rA�t�devicerQrTs ��rrFz!send_to_device.<locals>.<genexpr>�s6��o�o�cd�^�A�v�L�T]�^�^�^�o�o�o�o�o�orc �J��i|]\}}||�vr|nt|�� S)rSrU)rArHrWrXrQrTs ��rrJz"send_to_device.<locals>.<dictcomp>�sN�� A�q��Y��1�1�N�1�f�S_�kt�,u�,u�,u� � � r)rr0rOrL�AssertionErrorrr�intr/r6r9r�strr5rK)rrXrQrT�errors ``` rrVrV�s��v��$�'�&�$�"7�"7�$��U�?�?��F� ��9�9�V�,�9�?�?�?�� %� %� %��9�9�V�$�$�$�$�$�� !�!� ��f�c�*�*�-�,�F�_�_�F�� %��9�9�V�,�9�?�?�?�� %� %� %��9�9�V�$�$�$�$�$� %�� F�U�D�M� *� *��o�o�o�o�o�o�hn�o�o�o� � � � �F�G� $� $��i��%�%� �"��I�I� � ��I��t�F�|�|� � � � � � �"�L�L�N�N� � � � � � �� s/�A�B"�$ B"�-+B�B"�&B=�=C�Cc�(�d�}t||��S)aK Recursively gathers the information needed to rebuild a nested list/tuple/dictionary of tensors. Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data to send to analyze. Returns: The same data structure as `data` with [`~utils.TensorInformation`] instead of tensors. c�8�t|j|j��S)N)�shape�dtype)rr`rars r�_get_data_structurez/get_data_structure.<locals>._get_data_structure�s�� v�|�6�<�H�H�H�Hrr?)r2rbs r�get_data_structurerc�s'��I�I�I��0�$�7�7�7rc�(�d�}t||��S)a: Recursively gathers the shape of a nested list/tuple/dictionary of tensors as a list. Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data to send to analyze. Returns: The same data structure as `data` with lists of tensor shapes instead of tensors. c�*�t|j��Sr)r6r`rs r� _get_shapezget_shape.<locals>._get_shape�s��F�L�!�!�!rr?)r2rfs r� get_shaperg�s#��"�"�"��Z��.�.�.rc�6�d�}t||t��S)z� Recursively initializes tensors from a nested list/tuple/dictionary of [`~utils.TensorInformation`]. Returns: The same data structure as `data` with tensors instead of [`~utils.TensorInformation`]. c�8�tj|jd|ji�S�Nra)r�emptyr`rar)s r�_initialize_tensorz.initialize_tensors.<locals>._initialize_tensor�s��{�K�-�G�[�5F�G�G�Gr�r;)r@r+)�data_structurerls r�initialize_tensorsro�s-��H�H�H��/��K`�a�a�a�arc�"�t|tttf��r3t |��dkr tdt |��d��t|ttf��rt|d��St|t��r.|��D]}t||��cSn:t|tj ��s tdt |��d��|jdS)a Recursively finds the batch size in a nested list/tuple/dictionary of lists of tensors. Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data from which to find the batch size. Returns: `int`: The batch size. rz&Cannot find the batch size from empty �.z0Can only find the batch size of tensors but got ) rr/r6r�len� ValueErrorr5�find_batch_size�keysrrrLr`)r2rHs rrtrt�s��$��g�.�/�/�Q�S��Y�Y�!�^�^��O�$�t�*�*�O�O�O�P�P�P��$�� &�&�Z��t�A�w�'�'�'� �D�'� "� "�Z�� ,� ,�A�"�4��7�+�+�+�+�+� ,� ��e�l� +� +�Z��X�4�PT�:�:�X�X�X�Y�Y�Y��:�a�=�rc�T� t|��S#ttf$rYnwxYwdS)a Same as [`utils.operations.find_batch_size`] except will ignore if `ValueError` and `TypeErrors` are raised Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data from which to find the batch size. Returns: `int`: The batch size. N)rtrsrLr1s r�ignorant_find_batch_sizerws?�� t�$�$�$�� "� � � �� 4s��%�%c�(�d�}t||��S)aS Recursively finds tensors in a nested list/tuple/dictionary and converts them to a list of numbers. Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data from which to convert to regular numbers. Returns: The same data structure as `data` with lists of numbers instead of `torch.Tensor`. c��|��}|jtjkr|�tj��}|��Sr)�detach�cpurar�bfloat16rO�float32�tolistrs r�_convert_to_listz!listify.<locals>._convert_to_list sM��$�$�&�&��<�5�>�)�)��Y�Y�u�}�-�-�F��}�}��rr?)r2rs r�listifyr�s$��-�t�4�4�4rc�V�d�}t||d��}tj��|S)Nc��|jdkr|��d}|��s|��}t j|��S)Nr)�ndim�clone� is_contiguous� contiguous�xm� all_gatherrs r�_tpu_gather_onez$_tpu_gather.<locals>._tpu_gather_one-sX��;�!��\�\�^�^�D�)�F��#�#�%�%� )��&�&�(�(�F��}�V�$�$�$rT�r<)r@r�� mark_step)rr��ress r�_tpu_gatherr�,s7��%�%�%��O�V�� N� N� N�C��L�N�N�N��Jrc�t��t��tjj��fd�}t ||d��S)Nc�N��jdkr��d��s��j�z�jdkrotj�j��z�j �j ��}�|��|jdg��dd��R�S�fd�t�j��D��}t j�|��tj|d��S)Nr�gloo�rarX��rc�8��g|]}tj��S�)r� empty_like)rA�_rs �r� <listcomp>z8_gpu_gather.<locals>._gpu_gather_one.<locals>.<listcomp>Ws$��[�[�[�1�e�.�v�6�6�[�[�[r��dim)r�r�r�r��backendrrk� num_processes�numelrarX�view�size�range�distributedr��cat)r�output_tensors� gather_op�states` ��r�_gpu_gather_onez$_gpu_gather.<locals>._gpu_gather_one?s(��;�!��\�\�^�^�D�)�F��#�#�%�%� )��&�&�(�(�F��=�$��&�)@�)@� #�[��#�f�l�l�n�n�4��l��|��N� �I�n�f�-�-�-�&�>�&�r�>�F�K�K�M�M�!�"�"�,=�>�>�>�>� \�[�[�[��e�FY�@Z�@Z�[�[�[�N��(�(��@�@�@��9�^��3�3�3�3rTr�)rrr��all_gather_into_tensorr@)rr�r�r�s @@r�_gpu_gatherr�;sM��N�N�E��!�8�I�4�4�4�4�4�4�8�_�f�$�O�O�O�Orc��eZdZdZdS)�DistributedOperationExceptionz� An exception class for distributed operations. Raised if the operation cannot be performed due to the shape of the tensors. N)rM� __module__�__qualname__�__doc__r�rrr�r�^s�� Drr�c�<��t��fd��}|S)zv Verifies that `tensor` is the same shape across all processes. Only ran if `PartialState().debug` is `True`. c � ��t��jtjkst��js�|i|��S�j�d�j��}d|vr |d}n|d}t��jjt|��jkrUtd|�d|jj�dt��jj�dt��jj�d|�d��t|��}t|g��}|d�o|� |d��t|��k}|sAd �d �t!|��D��}td|�d|��|i|��S) Nrqrrz%One or more of the tensors passed to z were not on the z+ while the `Accelerator` is configured for z. Please move it to the z before calling z - c�$�g|] \}}d|�d|��S)zProcess z: r�)rA�ir`s rr�z5verify_operation.<locals>.wrapper.<locals>.<listcomp>s.��2m�2m�2m�x�q�RW�3J�a�3J�3J�5�3J�3J�2m�2m�2mrznCannot apply desired operation due to shape mismatches. All shapes across devices must be valid. Operation: `z` Input shapes: - )r�distributed_typer�NO�debugr�rMrXr5�find_devicer�rg� gather_object�countrr�join� enumerate) rCrE� operationr�shapes�output�are_same�process_shape_str�functions �r�wrapperz!verify_operation.<locals>.wrapperls��>�>�*�o�.@�@�@��H\�@��8�T�,�V�,�,�,��*�@�@�X�->�@�@� ��v��H�%�F�F��!�W�F��>�>� �%��V�)<�)<�)A�A�A�/�b� �b�b�TZ�Ta�Tf�b�b�T`�Tb�Tb�Ti�Tn�b�b�)5��)>�)C�b�b�U^�b�b�b�� 6�"�"��x�(�(��!�9� ��|�|�F�1�I�.�.�#�f�+�+�=�H�� $,�M�M�2m�2m�[d�ek�[l�[l�2m�2m�2m�$n�$n�!�3�\�'0�\�\�HY�\�\�� x��(��(�(�(r�r�r�r�s` r�verify_operationr�gs5�� 8�_�_�)�)�)�)��_�)�4�Nrc�<��t��fd��}|S)z� Checks that `verify_operation` failed and if so reports a more helpful error chaining the existing `DistributedOperationException`. c�� |i|��S#t$r*}�j�d�j��}td|�d��|�d}~wwxYw)NrqzError found while calling `z1`. Please see the earlier error for more details.)r�r�rM)rCrE�er�r�s �rr�z"chained_operation.<locals>.wrapper�sx�� 8�T�,�V�,�,�,��,� � � �#�.�D�D��1B�D�D�I�/�j�i�j�j�j�� s�� ?�%:�?r�r�s` r�chained_operationr��s5��8�_�_��_��Nrc��t��jtjkrt |��St��jt vrt |��S|S)a4 Recursively gather tensor in a nested list/tuple/dictionary of tensors from all devices. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to gather. Returns: The same data structure as `tensor` with all tensors sent to the proper device. )rr�r�XLAr�r r�rs r�gatherr��sM��~�~�&�/�*=�=�=��6�"�"�"� �� (�,M� M� M��6�"�"�"�� r�objectc��d�tt��j��D��}tj�||��d�|D��S)Nc��g|]}d��Srr�)rAr�s rr�z&_gpu_gather_object.<locals>.<listcomp>�s��H�H�H�q�d�H�H�Hrc��g|] }|D]}|�� Sr�r�)rA�y�xs rr�z&_gpu_gather_object.<locals>.<listcomp>�s%��1�1�1�!�q�1�1�!�A�1�1�1�1r)r�rr�rr��all_gather_object)r��output_objectss r�_gpu_gather_objectr��sQ��H�H�E�,�.�.�*F�$G�$G�H�H�H�N� ��'�'��?�?�?�1�1�~�1�1�1�1rc��t��jtjkrt d��t��jt vrt |��S|S)a5 Recursively gather object in a nested list/tuple/dictionary of objects from all devices. Args: object (nested list/tuple/dictionary of picklable object): The data to gather. Returns: The same data structure as `object` with all the objects sent to every device. z&gather objects in TPU is not supported)rr�rr��NotImplementedErrorr r�)r�s rr�r��sN��~�~�&�/�*=�=�=�!�"J�K�K�K� �� (�,M� M� M�!�&�)�)�)�� rc�0�dd�}t||d|��S)Nrc�H�tj�||��|S)N��src)rr�� broadcast)rr�s r�_gpu_broadcast_onez*_gpu_broadcast.<locals>._gpu_broadcast_one�s#�� #�#�F��#�4�4�4�� rT)r<r��rr?)r2r�r�s r�_gpu_broadcastr��s1��/��4�UX�Y�Y�Y�Yr�broadcast tensorc�X��t|ttf��r)t|�fd�t |��D��St|t ��r6t |��fd�|��D��Stj �|�fd��S)Nc3�J�K�|]\}}t|��d|��V��dS)r��nameN��_tpu_broadcast)rAr�rWr�s �rrFz!_tpu_broadcast.<locals>.<genexpr>�s?��"g�"g�T�Q�PQ�>�!�T�-�-�A�-�-�#H�#H�#H�"g�"g�"g�"g�"g�"grc �D��i|]\}}|t|��d|��S)r�r�r�)rArHrIr�s �rrJz"_tpu_broadcast.<locals>.<dictcomp>�s6��a�a�a�$�!�Q�Q��q�$�}�}��}�}� E� E� E�a�a�arc��|�Srr�)r�r�s �r�<lambda>z _tpu_broadcast.<locals>.<lambda>�s��!�C�&�r) rr6r/r9r�rr5rKr��mesh_reduce)rr�r�s ``rr�r��s��&�4��-�(�(�c��&�"g�"g�"g�"g�U^�_e�Uf�Uf�"g�"g�"g�h�h�h� �F�G� $� $�c��t�F�|�|�a�a�a�a�RX�R^�R^�R`�R`�a�a�a�b�b�b� �>�$��(8�(8�(8�(8�9�9�9r�� c��i|]\}}||�� Sr�r�)rArHrIs rrJrJ�s��C�C�C��1�q�!�C�C�Crc��d}t��}tj|tj|j��}|�Z|j}t|j}tjt|��|gzt��|dt|��dz�<t|d��}||��}t|dd�d ��}|dd�}||fS) ze Grabs the shape of `tensor` only available on one process and returns a tensor of its shape ir�N�rar�sum�� reductionr�r) rrrkr[rXr`�TENSOR_TYPE_TO_INTrarr6rr�reduce�nonzero)r�max_tensor_dimensionr��base_tensorr`�tensor_dtyperas r�gather_tensor_shaper��s�� !��N�N�E��+�2�%�)�E�L�Y�Y�Y�K� ��)�&�,�7��(-��T�%�[�[�L�>�5Q�Y\�(]�(]�(]��$�c�%�j�j�1�n�$�%��6�6�6�K��k�1�1�3�3�4�K��B�C�C� ��#�$�$�E��c�r�c�"�K��r�returnc��t��}t|��\}}|�9tj|t|��|j��}t|d��S)a� Copys a tensor that only exists on a single device and broadcasts it to other devices. Differs from `broadcast` as each worker doesn't need to know its shape when used (and tensor can be `None`) Args: tensor (`torch.tensor`): The tensor that should be sent to all devices. Must only have it be defined on a single device, the rest should be `None`. Nr�r�r�)rr�r�zeros�TENSOR_INT_TO_DTYPErOrXr�)rr�r`ras r�copy_tensor_to_devicesrs`�� N�N�E�&�v�.�.�L�E�5� �~��U�*=�e�*D�E�E�E�H�H��V�V��&�E�*�*�*�*r�from_processc��t��jtjkrt ||d��St��jt vrt ||��S|S)a� Recursively broadcast tensor in a nested list/tuple/dictionary of tensors to all devices. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to gather. from_process (`int`, *optional*, defaults to 0): The process from which to send the data Returns: The same data structure as `tensor` with all tensors broadcasted to the proper device. zaccelerate.utils.broadcast)r�r�r�)rr�rr�r�r r�)rrs rr�r�sZ��~�~�&�/�*=�=�=��f�,�=Y�Z�Z�Z�Z� �� (�,M� M� M��f�,�7�7�7�7�� rc�$��t��jtjkr2t |��D]!\}}tjd|�fd��||<�"n;t��jtvr!tj � |��|S)a� Broadcast a list of picklable objects form one process to the others. Args: object_list (list of picklable objects): The list of objects to broadcast. This list will be modified inplace. from_process (`int`, *optional*, defaults to 0): The process from which to send the data. Returns: The same list containing the objects from process 0. z&accelerate.utils.broadcast_object_listc��|�Srr�)r�rs �rr�z'broadcast_object_list.<locals>.<lambda>:s��ef�gs�et�rr�)rr�rr�r�r�r�r rr��broadcast_object_list)�object_listrr�r7s ` rrr+s��~�~�&�/�*=�=�=��,�,� v� v�F�A�s��^�,T�VY�[t�[t�[t�[t�u�u�K��N�N� v� �� (�,M� M� M� ��/�/��/�N�N�N��rc�*�d�}t|||��S)aN Recursively takes a slice in a nested list/tuple/dictionary of tensors. Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data to slice. tensor_slice (`slice`): The slice to take. Returns: The same data structure as `data` with all the tensors slices. c��||Srr�)r�tensor_slices r� _slice_tensorz$slice_tensors.<locals>._slice_tensorNs ��l�#�#rr?)r2r � process_indexr�rs r� slice_tensorsr @s%��$�$�$��]�D�,�?�?�?rc �B��t�dttf��rCt�d��fd�t t�d��D��St�dt��rCt�d��fd��d��D��St�dtj ��s%tdt�d��tj��S)a� Recursively concatenate the tensors in a nested list/tuple/dictionary of lists of tensors with the same shape. Args: data (nested list/tuple/dictionary of lists of tensors `torch.Tensor`): The data to concatenate. dim (`int`, *optional*, defaults to 0): The dimension on which to concatenate. Returns: The same data structure as `data` with all the tensors concatenated. rc3�T��K�|]!�t�fd��D��V��"dS)c� ��g|] }|��Sr�r�)rA�dr�s �rr�z)concatenate.<locals>.<genexpr>.<listcomp>bs��0D�0D�0D�!��1��0D�0D�0Drr�N��concatenate)rAr�r2r�s @��rrFzconcatenate.<locals>.<genexpr>bsG��#l�#l�ST�K�0D�0D�0D�0D�t�0D�0D�0D�#�$N�$N�$N�#l�#l�#l�#l�#l�#lrc�N��i|] ��t�fd��D��!S)c� ��g|] }|��Sr�r�)rArrHs �rr�z*concatenate.<locals>.<dictcomp>.<listcomp>ds��-A�-A�-A�q�a��d�-A�-A�-Arr�r)rArHr2r�s @��rrJzconcatenate.<locals>.<dictcomp>ds?��d�d�d�PQ�a��-A�-A�-A�-A�D�-A�-A�-A�s�!K�!K�!K�d�d�drz%Can only concatenate tensors but got r�) rr/r6r9r�rrrr5rurrrLr�)r2r�s``rrrTs��$�q�'�E�4�=�)�)�Q��$�q�'�#l�#l�#l�#l�#l�X]�^a�bf�gh�bi�^j�^j�Xk�Xk�#l�#l�#l�m�m�m� �D��G�W� %� %�Q��t�D��G�}�}�d�d�d�d�d�UY�Z[�U\�Ua�Ua�Uc�Uc�d�d�d�e�e�e� ��Q�� .� .�Q��O��T�!�W� � �O�O�P�P�P��9�T�s�#�#�#�#rc��eZdZdS)�CannotPadNestedTensorWarningN)rMr�r�r�rrrrjs��Drrc�4�dd�}t||d|||��S)a3 Recursively pad the tensors in a nested list/tuple/dictionary of tensors from all devices to the same size so they can safely be gathered. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to gather. dim (`int`, *optional*, defaults to 0): The dimension on which to pad. pad_index (`int`, *optional*, defaults to 0): The value with which to pad. pad_first (`bool`, *optional*, defaults to `False`): Whether to pad at the beginning or the end. rFc�Z�� t|dd��rtjdt��|S�t |j��ks�t |j��kr|S�dkr�t |j��z �t j|j|j��d}t|�� }t�fd�|D�� |j�kr|S|j� t� ��}� |�<|� t|��|z}|r8t�� fd�tt |��D��}n6t�� fd�tt |��D��}|||<|S) N� is_nestedFzHCannot pad nested tensors without more information. Leaving unprocessed.r)rXc3�(�K�|]}|�V�� dSrr�)rA�sr�s �rrFzFpad_across_processes.<locals>._pad_across_processes.<locals>.<genexpr>�s'��-�-�!�q��v�-�-�-�-�-�-rc3�t�K�|]2}|�krt��z ��ntd��V��3dSr��slice)rAr�r��max_size�old_sizes ��rrFzFpad_across_processes.<locals>._pad_across_processes.<locals>.<genexpr>�s[��[\�Q�#�X�X��h��#��.��9�9�9�5�QU�;�;��rc3�n�K�|]/}|�krtd��ntd��V��0dS�rNr�rAr�r�r!s ��rrFzFpad_across_processes.<locals>._pad_across_processes.<locals>.<genexpr>�sE��o�o�UV�q�C�x�x�E�!�X�c�]�3�3�3�U�4�[�[�o�o�o�o�o�or)�getattr�warnings�warnrrrr`rrrXr�r{�maxr6� new_zerosr/r�)rr�� pad_index� pad_firstr��sizes�new_size� new_tensor�indicesr r!s ` @@r�_pad_across_processesz3pad_across_processes.<locals>._pad_across_processess��6�;��.�.� ��M�Z�,� � � ��M��#�f�l�#�#�#�#�s�c�&�,�.?�.?�-?�'?�'?��M��7�7��3�v�|�$�$�$�C��|�F�L��?�?�?��E��t�� "�"��-�-�-�-�u�-�-�-�-�-��v�|�C�(�(�(��M��<��>�>�� %�%�e�H�o�o�6�6��B� �� p��`e�fi�jr�fs�fs�`t�`t��G�G��o�o�o�o�o�Z_�`c�dl�`m�`m�Zn�Zn�o�o�o�o�o�G�$� �7��rT)r<r�r*r+�rrFr?)rr�r*r+r0s r�pad_across_processesr2ns?��" � � � �D��v�4�S�T]�ir��rc�4�dd�}t||d|||��S)z� Takes a `tensor` of arbitrary size and pads it so that it can work given `num_processes` needed dimensions. New tensors are just the last input repeated. E.g.: Tensor: ([3,4,4]) Num processes: 4 Expected result shape: ([4,4,4]) rc�x�� ||z}|||zz }||zdkr||z }n|||zz }|||zcxkrdkrnn||z }|j� t� ��}||z|d<|�t|��}t�� fd�t t|��D��} ||| <|S)Nrrc3�n�K�|]/}|�krtd��ntd��V��0dSr#rr$s ��rrFz@pad_input_tensors.<locals>._pad_input_tensors.<locals>.<genexpr>�sE��k�k�QR�1��8�8��a��#��/�/�/��t��k�k�k�k�k�kr)r`r6r)r/r�rr)r� batch_sizer�r�� remainder�last_inputs�to_padr-r.r/r!s ` @r�_pad_input_tensorsz-pad_input_tensors.<locals>._pad_input_tensors�s��-�/� � �I� �$=�>��&�!�+�+�"�Z�/�F�F�"�j�M�&A�B�F��&��,�,�,�,�1�,�,�,�,�,� �6�)�F��<��>�>�� 6�)��%�%�e�H�o�o�6�6� ��k�k�k�k�k�V[�\_�`h�\i�\i�Vj�Vj�k�k�k�k�k��$� �7��rT)r<r6r�r�r�r?)rr6r�r�r:s r�pad_input_tensorsr;�sA��&�� #�� r�mean��?c�2�dd�}t||d||��S)aX Recursively reduce the tensors in a nested list/tuple/dictionary of lists of tensors across all processes by the mean of a given operation. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to reduce. reduction (`str`, *optional*, defaults to `"mean"`): A reduction method. Can be of "mean", "sum", or "none" scale (`float`, *optional*): A default scaling value to be applied after the reduce, only valied on XLA. Returns: The same data structure as `data` with all the tensors reduced. r<r=c��t��}|��}|jtjkr|S|jtjkrHt j��t jtj |g|��t j��n=|jj tvr*tj �|tj��|dkr ||jz}|S)Nr<)rr�r�rr�r�r�r�� all_reduce� REDUCE_SUM�valuer rr�r�SUMr�)rr��scaler�� cloned_tensors r�_reduce_across_processesz(reduce.<locals>._reduce_across_processes�s�� !�_�%7�7�7� � ��!�_�%8�8�8� �L�N�N�N��M�"�-�-��%�@�@�@��L�N�N�N�N� � #� )�-N� N� N��(�(��E�E�E��U�0�0�M��rT)r<r�rD�r<r=r?)rr�rDrFs rr�r��s;��$��&� �&�d�i�_d��rc�2�d�}d�}t|||��S)av Recursively converts the elements nested list/tuple/dictionary of tensors in FP16/BF16 precision to FP32. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to convert from FP16/BF16 to FP32. Returns: The same data structure as `tensor` with all tensors that were in FP16/BF16 precision converted to FP32. c�*�|��Sr)�floatrs r�_convert_to_fp32z)convert_to_fp32.<locals>._convert_to_fp32s��|�|�~�~�rc�~�t|��st|d��o|jtjtjfvSrj)rr0rar�float16r|rs r�_is_fp16_bf16_tensorz-convert_to_fp32.<locals>._is_fp16_bf16_tensors@��'�'�C�7�6�7�+C�+C� ��M��N�Z �J � rrmr?)rrKrNs r�convert_to_fp32rO�s9�� -�v�AU�V�V�V�Vrc�$�eZdZdZd�Zd�Zd�ZdS)�ConvertOutputsToFp32ad Decorator to apply to a function outputing tensors (like a model forward pass) that ensures the outputs in FP16 precision will be convert back to FP32. Args: model_forward (`Callable`): The function which outputs we want to treat. Returns: The same function as `model_forward` but with converted outputs. c�4�||_t||��dSr)� model_forwardr)�selfrSs r�__init__zConvertOutputsToFp32.__init__s ��*��t�]�+�+�+�+�+rc�6�t|j|i|��Sr)rOrS)rTrCrEs r�__call__zConvertOutputsToFp32.__call__!s#��1�t�1�4�B�6�B�B�C�C�Crc�*�tjd��)Nz�Cannot pickle a prepared model with automatic mixed precision, please unwrap the model with `Accelerator.unwrap_model(model)` before pickling it.)�pickle� PicklingError)rTs r�__getstate__z!ConvertOutputsToFp32.__getstate__$s��"� `� � � rN)rMr�r�r�rUrWr[r�rrrQrQsN�� ,�,�,�D�D�D� � � � � rrQc�>��t��fd�}�|_|S)Nc��|i|��Srr�)rCrErSs �r�forwardz(convert_outputs_to_fp32.<locals>.forward-s��}�d�-�f�-�-�-r)rQ�__wrapped__)rSr^s` r�convert_outputs_to_fp32r`*s8��(��7�7�M�.�.�.�.�.�(�G��Nrc�>�t|t��r.|��D]}t|��}|�|cS�dSt|tt f��r|D]}t|��}|�|cS�dSt|tj��r|jSdS)z� Finds the device on which a nested dict/list/tuple of tensors lies (assuming they are all on the same device). Args: (nested list/tuple/dictionary of `torch.Tensor`): The data we want to know the device of. N) rr�valuesr�r/r6rrrX)r2r7rXs rr�r�6s��$�� ;�;�=�=� � �C� ��%�%�F��!�� "� � � �D�5�$�-� (� (�� C� ��%�%�F��!�� "� � � �D�%�,� '� '��{��rTc#�\K�t��jtjks8t��j�4t��j��st ��}n"ddl}|j� ||||��}|5dV�ddd��dS#1swxYwYdS)z� Wrapper around `deepspeed.runtime.zero.GatheredParameters`, but if Zero-3 is not enabled, will be a no-op context manager. Nr)� modifier_rank� fwd_module�enabled) r r�r� DEEPSPEED�deepspeed_plugin�is_zero3_init_enabledr� deepspeed�zero�GatheredParameters)�paramsrdrerf�gather_param_contextrjs rrlrlKs��*�o�.G�G�G��+�7� �"�"�3�I�I�K�K� 8� +�}�}��(�~�@�@��-�J�PW� A� � �� s�B!�!B%�(B%)FNr�)rr�r)NNr1rG)NNT)Wr�rYr&�collections.abcr� contextlibrr� functoolsrr�typingrrr�r r� constantsr �dataclassesrr�importsrrr�torch_xla.core.xla_model�core� xla_modelr��torch.distributedrrr'r+r3r9r@rVrcrgrortrwr�r�r�� Exceptionr�r�r�r�r�r�r�r�rJ�double�halfr|�uint8�int8�int16�int32�int64�boolr�rKrr�rrr[r�rr r�UserWarningrr2r;r�rOrQr`r�rlr�rr�<module>r�s4�� #�#�#�#�#�#�2�2�2�2�2�2�2�2�+�+�+�+�+�+�+�+��2�2�2�2�2�2�2�2�8�8�8�8�8�8�;�;�;�;�;�;�;�;��*�)�)�)�)�)�)�)�)�)�!�!�#�#�+�*�*�*�*�*�*�,�,�,� � � �6�6�6�]�]�]�$�$�$�4C�X]�0�0�0�0�0�f1�1�1�1�h8�8�8�$/�/�/�$b�b�b��.��"5�5�5�.�� P� P� P�F � � � � �I� � � � � � �F��&��&2�s�2�2�2�2��#��&Z�Z�Z�Z�:�:�:�:� �K�� L�!� �J�� N�A� �K�� J�� K�� K�� K�� J��D�C�(:�(@�(@�(B�(B�C�C�C��2+�+�5�<�+�+�+�+�"��C��*��S��*@�@�@�@�($�$�$�$�, � � � � �;� � � ��4�4�4��4�n%�%�%�%�P�&�&�&��&�RW�W�W�0 � � � � � � � �4 � � ��*��r