� %�gOy��h�dZddlZddlZddlmZddlmZmZddlm Z m Z ddl m Z ddl Z ddlmZmZd d lmZd d lmZmZd d lmZmZmZe��r ddlmcmZe��rdd lmZd�Z d�Z!d�Z"d�Z#d�Z$e dd�d�Z%dId�Z&d�Z'd�Z(d�Z)d�Z*d�Z+d�Z,d�Z-d�Z.Gd�d e/��Z0d!�Z1d"�Z2e1d#���Z3d$e fd%�Z4d$e fd&�Z5dJd'�Z6dKd)�Z7e j8d e j9de j:d*e j;d+e j<d,e j=d-e j>d.e j?d/e j@d0e jAd1i ZBd2�eB�C��D��ZDd3�ZEdLd4e jFfd5�ZGe1dJd6eHfd7���ZIdJd6eHfd8�ZJdMd9�ZKdJd:�ZLGd;�d<eM��ZNe2dNd=���ZOdJd>�ZPe1dOdA���ZQdB�ZRGdC�dD��ZSdE�ZTdF�ZUedPdH���ZVdS)QzB A set of basic tensor ops compatible with tpu, gpu, and multigpu �N)�Mapping)�contextmanager� nullcontext)�update_wrapper�wraps)�Any�)�AcceleratorState� PartialState�)�!TORCH_DISTRIBUTED_OPERATION_TYPES)�DistributedType�TensorInformation)�is_npu_available�is_torch_distributed_available�is_torch_xla_available)�ReduceOpc�6�t|tj��S�N)� isinstance�torch�Tensor��tensors �k/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/accelerate/utils/operations.py�is_torch_tensorr,s�� �f�e�l� +� +�+�c ��t|tjjtjjtjjtjjtjjtjjtjj ��Sr) rr�xpu� FloatTensor� ByteTensor� IntTensor� LongTensor� HalfTensor� DoubleTensor�BFloat16Tensorrs r�is_torch_xpu_tensorr'0sO�� �� � �� � �� � �� � �� � �� � �� � � � � � rc�,�t|t��Sr)rr�� tensor_infos r�is_tensor_informationr+=s�� �k�#4� 5� 5�5rc�l�t|t��ot|d��ot|d��S)z� Checks if `data` is a `namedtuple` or not. Can have false positives, but only if a user is trying to mimic a `namedtuple` perfectly. �_asdict�_fields)r�tuple�hasattr��datas r� is_namedtupler3As3�� �d�E� "� "� \�w�t�Y�'?�'?� \�G�D�R[�D\�D\�\rc��t|��rt|��t|���St|��|��S)zO Cast a generator to the same type as obj (list, tuple, or namedtuple) )r3�type�list)�obj� generators r� honor_typer9IsC�� �S���$��t�C�y�y�$�y�/�/�*�*��t�C�y�y��#�#�#rF�� test_type�error_on_other_typec �������t|ttf��r t|�����fd�|D����St|t��r:t |�������fd�|���D����S�|��r �|g��Ri���S�r0tdt |���d�j�d�j�d����|S)ad Recursively apply a function on a data structure that is a nested list/tuple/dictionary of a given base type. Args: func (`callable`): The function to recursively apply. data (nested list/tuple/dictionary of `main_type`): The data on which to apply `func` *args: Positional arguments that will be passed to `func` when applied on the unpacked data. main_type (`type`, *optional*, defaults to `torch.Tensor`): The base type of the objects to which apply `func`. error_on_other_type (`bool`, *optional*, defaults to `False`): Whether to return an error or not if after unpacking `data`, we get on an object that is not of type `main_type`. If `False`, the function will leave objects of types different than `main_type` unchanged. **kwargs (additional keyword arguments, *optional*): Keyword arguments that will be passed to `func` when applied on the unpacked data. Returns: The same data structure as `data` with `func` applied to every object of type `main_type`. c3�@�K�|]}t�|g��R��d����V��dS)r:N��recursively_apply)�.0�o�argsr<�func�kwargsr;s �����r� <genexpr>z$recursively_apply.<locals>.<genexpr>msm����� � ��"��!��"���.7�M`���dj��� � � � � � rc �@��i|]\}}|t�|g��R��d������S)r:r?)rA�k�vrCr<rDrEr;s �����r� <dictcomp>z%recursively_apply.<locals>.<dictcomp>vsf��� � � ��A�q��$��!��"���.7�M`���dj��� � � rzUnsupported types (z ) passed to `z?`. Only nested list/tuple/dicts of objects that are valid for `z` should be passed.) rr/r6r9rr5�items� TypeError�__name__)rDr2r;r<rCrEs` ````rr@r@Tsw�������,�$��� �&�&� �� � � � � � � � � ��  � � � � � � �D�'� "� "� ��t�D�z�z� � � � � � � � �!�J�J�L�L�  � � � � � � ��4��� ��t�D�*�4�*�*�*�6�*�*�*� � �� S�$�t�*�*� S� S�4�=� S� S�+4�+=� S� S� S� � � � �Krc�����t|��st|d��r��dkrd� |������S#t$r|����cYSt$r5}t ��rt �t��rd����n|�Yd}~nd}~wwxYw |������S#t$r|����cYSwxYwt |ttf��rt|���fd�|D����St |t��rUt �t��r�g�n��g�t|�����fd�|���D����S|S) a� Recursively sends the elements in a nested list/tuple/dictionary of tensors to a given device. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to send to a given device. device (`torch.device`): The device to send the data to. Returns: The same data structure as `tensor` with all tensors sent to the proper device. �to�npuznpu:0)� non_blockingznpu:Nc3�>�K�|]}t|������V��dS)�rQ� skip_keysN��send_to_device)rA�t�devicerQrTs ���rrFz!send_to_device.<locals>.<genexpr>�s6�����o�o�cd�^�A�v�L�T]�^�^�^�o�o�o�o�o�orc �J��i|]\}}||�vr|nt|�������� S)rSrU)rArHrWrXrQrTs ���rrJz"send_to_device.<locals>.<dictcomp>�sN��� � � ��A�q���Y���1�1�N�1�f�S_�kt�,u�,u�,u� � � r)rr0rOrL�AssertionErrorrr�intr/r6r9r�strr5rK)rrXrQrT�errors ``` rrVrV�s�������v���$�'�&�$�"7�"7�$� �U�?�?��F� ��9�9�V�,�9�?�?� ?��� %� %� %��9�9�V�$�$� $� $� $�� � � � �!�!� ��f�c�*�*�-�,�F�_�_�F��� ���������� ���� %��9�9�V�,�9�?�?� ?��� %� %� %��9�9�V�$�$� $� $� $� %���� �F�U�D�M� *� *��� �o�o�o�o�o�o�hn�o�o�o� � � � �F�G� $� $� � �i�� %� %� �"� �I�I� � ��I��t�F�|�|� � � � � � �"�L�L�N�N� � � � � � �� s/�A�B"�$ B"�-+B�B"�&B=�=C�Cc�(�d�}t||��S)aK Recursively gathers the information needed to rebuild a nested list/tuple/dictionary of tensors. Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data to send to analyze. Returns: The same data structure as `data` with [`~utils.TensorInformation`] instead of tensors. c�8�t|j|j���S)N)�shape�dtype)rr`rars r�_get_data_structurez/get_data_structure.<locals>._get_data_structure�s�� �v�|�6�<�H�H�H�Hrr?)r2rbs r�get_data_structurerc�s'��I�I�I� �0�$� 7� 7�7rc�(�d�}t||��S)a: Recursively gathers the shape of a nested list/tuple/dictionary of tensors as a list. Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data to send to analyze. Returns: The same data structure as `data` with lists of tensor shapes instead of tensors. c�*�t|j��Sr)r6r`rs r� _get_shapezget_shape.<locals>._get_shape�s���F�L�!�!�!rr?)r2rfs r� get_shaperg�s#��"�"�"� �Z�� .� .�.rc�6�d�}t||t���S)z� Recursively initializes tensors from a nested list/tuple/dictionary of [`~utils.TensorInformation`]. Returns: The same data structure as `data` with tensors instead of [`~utils.TensorInformation`]. c�8�tj|jd|ji�S�Nra)r�emptyr`rar)s r�_initialize_tensorz.initialize_tensors.<locals>._initialize_tensor�s���{�K�-�G�[�5F�G�G�Gr�r;)r@r+)�data_structurerls r�initialize_tensorsro�s-��H�H�H� �/��K`� a� a� a�arc�"�t|tttf��r3t |��dkr t dt |���d����t|ttf��rt|d��St|t��r.|���D]}t||��cSn:t|tj ��s tdt |���d����|j dS)a Recursively finds the batch size in a nested list/tuple/dictionary of lists of tensors. Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data from which to find the batch size. Returns: `int`: The batch size. rz&Cannot find the batch size from empty �.z0Can only find the batch size of tensors but got ) rr/r6r�len� ValueErrorr5�find_batch_size�keysrrrLr`)r2rHs rrtrt�s����$���g�.�/�/�Q�S��Y�Y�!�^�^��O�$�t�*�*�O�O�O�P�P�P��$��� �&�&�Z��t�A�w�'�'�'� �D�'� "� "�Z������ ,� ,�A�"�4��7�+�+� +� +� +� ,� ��e�l� +� +�Z��X�4�PT�:�:�X�X�X�Y�Y�Y� �:�a�=�rc�T� t|��S#ttf$rYnwxYwdS)a Same as [`utils.operations.find_batch_size`] except will ignore if `ValueError` and `TypeErrors` are raised Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data from which to find the batch size. Returns: `int`: The batch size. N)rtrsrLr1s r�ignorant_find_batch_sizerws?�� ��t�$�$�$�� � � "� � � � �� ���� �4s ��%�%c�(�d�}t||��S)aS Recursively finds tensors in a nested list/tuple/dictionary and converts them to a list of numbers. Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data from which to convert to regular numbers. Returns: The same data structure as `data` with lists of numbers instead of `torch.Tensor`. c���|������}|jtjkr|�tj��}|���Sr)�detach�cpurar�bfloat16rO�float32�tolistrs r�_convert_to_listz!listify.<locals>._convert_to_list sM�������$�$�&�&�� �<�5�>� )� )��Y�Y�u�}�-�-�F��}�}���rr?)r2rs r�listifyr�s$����� �-�t� 4� 4�4rc�V�d�}t||d���}tj��|S)Nc���|jdkr|���d}|���s|���}t j|��S)Nr)�ndim�clone� is_contiguous� contiguous�xm� all_gatherrs r�_tpu_gather_onez$_tpu_gather.<locals>._tpu_gather_one-sX�� �;�!� � ��\�\�^�^�D�)�F��#�#�%�%� )��&�&�(�(�F��}�V�$�$�$rT�r<)r@r�� mark_step)rr��ress r� _tpu_gatherr�,s7��%�%�%� �O�V�� N� N� N�C��L�N�N�N� �Jrc�t���t���tjj���fd�}t ||d���S)Nc�N����jdkr����d�����s������j�z�jdkrot j�j����z�j �j ���}�|���|j dg�� ��dd��R�S�fd�t�j��D��}t j�|���t j|d���S)Nr�gloo�rarX�����r c�8��g|]}tj�����S�)r� empty_like)rA�_rs �r� <listcomp>z8_gpu_gather.<locals>._gpu_gather_one.<locals>.<listcomp>Ws$���[�[�[�1�e�.�v�6�6�[�[�[r��dim)r�r�r�r��backendrrk� num_processes�numelrarX�view�size�range� distributedr��cat)r�output_tensors� gather_op�states` ��r�_gpu_gather_onez$_gpu_gather.<locals>._gpu_gather_one?s(���� �;�!� � ��\�\�^�^�D�)�F��#�#�%�%� )��&�&�(�(�F� �=� $���&�)@�)@� #�[��#�f�l�l�n�n�4��l��|����N� �I�n�f� -� -� -�&�>�&�r�>�F�K�K�M�M�!�"�"�,=�>�>�>� >� \�[�[�[��e�FY�@Z�@Z�[�[�[�N� � � (� (��� @� @� @��9�^��3�3�3� 3rTr�)r rr��all_gather_into_tensorr@)rr�r�r�s @@r� _gpu_gatherr�;sM���� �N�N�E��!�8�I�4�4�4�4�4�4�8 �_�f�$� O� O� O�Orc��eZdZdZdS)�DistributedOperationExceptionz� An exception class for distributed operations. Raised if the operation cannot be performed due to the shape of the tensors. N)rM� __module__� __qualname__�__doc__r�rrr�r�^s��������  �Drr�c�<��t����fd���}|S)zv Verifies that `tensor` is the same shape across all processes. Only ran if `PartialState().debug` is `True`. c � ��t��jtjkst��js�|i|��S�j�d�j��}d|vr |d}n|d}t��jjt|��jkrUtd|�d|jj�dt��jj�dt��jj�d|�d� ���t|��}t|g��}|d�o|� |d��t|��k}|sAd �d �t!|��D����}td |�d |������|i|��S) Nrqrrz%One or more of the tensors passed to z were not on the z+ while the `Accelerator` is configured for z. Please move it to the z before calling z - c�$�g|] \}}d|�d|����S)zProcess z: r�)rA�ir`s rr�z5verify_operation.<locals>.wrapper.<locals>.<listcomp>s.��2m�2m�2m�x�q�RW�3J�a�3J�3J�5�3J�3J�2m�2m�2mrznCannot apply desired operation due to shape mismatches. All shapes across devices must be valid. Operation: `z` Input shapes: - )r �distributed_typer�NO�debugr�rMrXr5� find_devicer�rg� gather_object�countrr�join� enumerate) rCrE� operationr�shapes�output�are_same�process_shape_str�functions �r�wrapperz!verify_operation.<locals>.wrapperls���� �>�>� *�o�.@� @� @� ���H\� @��8�T�,�V�,�,� ,��*�@�@�X�->�@�@� � �v� � ��H�%�F�F��!�W�F� �>�>� � %��V�)<�)<�)A� A� A�/�b� �b�b�TZ�Ta�Tf�b�b�T`�Tb�Tb�Ti�Tn�b�b�)5���)>�)C�b�b�U^�b�b�b��� ��6�"�"����x�(�(�� �!�9� ��|�|�F�1�I�.�.�#�f�+�+�=�H�� �$,�M�M�2m�2m�[d�ek�[l�[l�2m�2m�2m�$n�$n�!�3�\�'0�\�\�HY�\�\���� �x��(��(�(�(r�r�r�r�s` r�verify_operationr�gs5���  �8�_�_�)�)�)�)��_�)�4 �Nrc�<��t����fd���}|S)z� Checks that `verify_operation` failed and if so reports a more helpful error chaining the existing `DistributedOperationException`. c��� �|i|��S#t$r*}�j�d�j��}td|�d���|�d}~wwxYw)NrqzError found while calling `z1`. Please see the earlier error for more details.)r�r�rM)rCrE�er�r�s �rr�z"chained_operation.<locals>.wrapper�sx��� ��8�T�,�V�,�,� ,��,� � � �#�.�D�D��1B�D�D�I�/�j�i�j�j�j���� ����� ���s� � ?�%:�?r�r�s` r�chained_operationr��s5���  �8�_�_������_�� �Nrc��t��jtjkrt |��St��jt vrt |��S|S)a4 Recursively gather tensor in a nested list/tuple/dictionary of tensors from all devices. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to gather. Returns: The same data structure as `tensor` with all tensors sent to the proper device. )r r�r�XLAr�r r�rs r�gatherr��sM���~�~�&�/�*=�=�=��6�"�"�"� ��� (�,M� M� M��6�"�"�"�� r�objectc��d�tt��j��D��}tj�||��d�|D��S)Nc��g|]}d��Srr�)rAr�s rr�z&_gpu_gather_object.<locals>.<listcomp>�s��H�H�H�q�d�H�H�Hrc��g|] }|D]}|��� Sr�r�)rA�y�xs rr�z&_gpu_gather_object.<locals>.<listcomp>�s%�� 1� 1� 1�!�q� 1� 1�!�A� 1� 1� 1� 1r)r�r r�rr��all_gather_object)r��output_objectss r�_gpu_gather_objectr��sQ��H�H�E�,�.�.�*F�$G�$G�H�H�H�N� ��'�'���?�?�?� 1� 1�~� 1� 1� 1�1rc��t��jtjkrt d���t��jt vrt |��S|S)a5 Recursively gather object in a nested list/tuple/dictionary of objects from all devices. Args: object (nested list/tuple/dictionary of picklable object): The data to gather. Returns: The same data structure as `object` with all the objects sent to every device. z&gather objects in TPU is not supported)r r�rr��NotImplementedErrorr r�)r�s rr�r��sN���~�~�&�/�*=�=�=�!�"J�K�K�K� ��� (�,M� M� M�!�&�)�)�)�� rc�0�dd�}t||d|���S)Nrc�H�tj�||���|S)N��src)rr�� broadcast)rr�s r�_gpu_broadcast_onez*_gpu_broadcast.<locals>._gpu_broadcast_one�s#�� ��#�#�F��#�4�4�4�� rT)r<r��rr?)r2r�r�s r�_gpu_broadcastr��s1������ �/��4�UX� Y� Y� Y�Yr�broadcast tensorc�X���t|ttf��r)t|�fd�t |��D����St|t ��r6t |���fd�|���D����Stj �|�fd���S)Nc3�J�K�|]\}}t|��d|�����V��dS)r���nameN��_tpu_broadcast)rAr�rWr�s �rrFz!_tpu_broadcast.<locals>.<genexpr>�s?�����"g�"g�T�Q�PQ�>�!�T�-�-�A�-�-�#H�#H�#H�"g�"g�"g�"g�"g�"grc �D��i|]\}}|t|��d|�������S)r�r�r�)rArHrIr�s �rrJz"_tpu_broadcast.<locals>.<dictcomp>�s6���a�a�a�$�!�Q�Q��q�$�}�}��}�}� E� E� E�a�a�arc���|�Srr�)r�r�s �r�<lambda>z _tpu_broadcast.<locals>.<lambda>�s ���!�C�&�r) rr6r/r9r�rr5rKr�� mesh_reduce)rr�r�s ``rr�r��s������&�4��-�(�(�c��&�"g�"g�"g�"g�U^�_e�Uf�Uf�"g�"g�"g�h�h�h� �F�G� $� $�c��t�F�|�|�a�a�a�a�RX�R^�R^�R`�R`�a�a�a�b�b�b� �>�$��(8�(8�(8�(8� 9� 9�9r������� � c��i|]\}}||�� Sr�r�)rArHrIs rrJrJ�s��C�C�C���1�q�!�C�C�Crc���d}t��}tj|tj|j���}|�Z|j}t |j}tjt|��|gzt���|dt|��dz�<t|d���}||� ��}t|dd�d ��}|dd�}||fS) ze Grabs the shape of `tensor` only available on one process and returns a tensor of its shape ir�N�rar �sum�� reductionr�r) r rrkr[rXr`�TENSOR_TYPE_TO_INTrarr6rr�reduce�nonzero)r�max_tensor_dimensionr�� base_tensorr`� tensor_dtyperas r�gather_tensor_shaper��s��� !�� �N�N�E��+�2�%�)�E�L�Y�Y�Y�K� ��� ��)�&�,�7� �(-� �T�%�[�[�L�>�5Q�Y\�(]�(]�(]� �$�c�%�j�j�1�n�$�%����6�6�6�K��k�1�1�3�3�4�K� � �B�C�C� ��#� $� $�E��c�r�c�"�K� �� �r�returnc���t��}t|��\}}|�9tj|t|����|j��}t|d���S)a� Copys a tensor that only exists on a single device and broadcasts it to other devices. Differs from `broadcast` as each worker doesn't need to know its shape when used (and tensor can be `None`) Args: tensor (`torch.tensor`): The tensor that should be sent to all devices. Must only have it be defined on a single device, the rest should be `None`. Nr�r�r�)r r�r�zeros�TENSOR_INT_TO_DTYPErOrXr�)rr�r`ras r�copy_tensor_to_devicesrs`�� �N�N�E�&�v�.�.�L�E�5� �~���U�*=�e�*D�E�E�E�H�H���V�V�� �&�E� *� *� *�*r� from_processc���t��jtjkrt ||d���St��jt vrt ||���S|S)a� Recursively broadcast tensor in a nested list/tuple/dictionary of tensors to all devices. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to gather. from_process (`int`, *optional*, defaults to 0): The process from which to send the data Returns: The same data structure as `tensor` with all tensors broadcasted to the proper device. zaccelerate.utils.broadcast)r�r�r�)r r�rr�r�r r�)rrs rr�r�sZ���~�~�&�/�*=�=�=��f�,�=Y�Z�Z�Z�Z� ��� (�,M� M� M��f�,�7�7�7�7�� rc�$��t��jtjkr2t |��D]!\}}t jd|�fd���||<�"n;t��jtvr!tj � |����|S)a� Broadcast a list of picklable objects form one process to the others. Args: object_list (list of picklable objects): The list of objects to broadcast. This list will be modified inplace. from_process (`int`, *optional*, defaults to 0): The process from which to send the data. Returns: The same list containing the objects from process 0. z&accelerate.utils.broadcast_object_listc���|�Srr�)r�rs �rr�z'broadcast_object_list.<locals>.<lambda>:s���ef�gs�et�rr�) r r�rr�r�r�r�r rr��broadcast_object_list)� object_listrr�r7s ` rrr+s�����~�~�&�/�*=�=�=�� �,�,� v� v�F�A�s��^�,T�VY�[t�[t�[t�[t�u�u�K��N�N� v� ��� (�,M� M� M� ��/�/� ��/�N�N�N� �rc�*�d�}t|||��S)aN Recursively takes a slice in a nested list/tuple/dictionary of tensors. Args: data (nested list/tuple/dictionary of `torch.Tensor`): The data to slice. tensor_slice (`slice`): The slice to take. Returns: The same data structure as `data` with all the tensors slices. c��||Srr�)r� tensor_slices r� _slice_tensorz$slice_tensors.<locals>._slice_tensorNs ���l�#�#rr?)r2r � process_indexr�r s r� slice_tensorsr @s%��$�$�$� �]�D�,� ?� ?�?rc �B���t�dttf��rCt�d��fd�t t �d����D����St�dt ��rCt�d����fd��d���D����St�dtj ��s%tdt�d�������tj �����S)a� Recursively concatenate the tensors in a nested list/tuple/dictionary of lists of tensors with the same shape. Args: data (nested list/tuple/dictionary of lists of tensors `torch.Tensor`): The data to concatenate. dim (`int`, *optional*, defaults to 0): The dimension on which to concatenate. Returns: The same data structure as `data` with all the tensors concatenated. rc3�T��K�|]!�t�fd��D������V��"dS)c� ��g|] }|��� Sr�r�)rA�dr�s �rr�z)concatenate.<locals>.<genexpr>.<listcomp>bs���0D�0D�0D�!��1��0D�0D�0Drr�N�� concatenate)rAr�r2r�s @��rrFzconcatenate.<locals>.<genexpr>bsG������#l�#l�ST�K�0D�0D�0D�0D�t�0D�0D�0D�#�$N�$N�$N�#l�#l�#l�#l�#l�#lrc�N���i|] ��t�fd��D��������!S)c� ��g|] }|��� Sr�r�)rArrHs �rr�z*concatenate.<locals>.<dictcomp>.<listcomp>ds���-A�-A�-A�q�a��d�-A�-A�-Arr�r)rArHr2r�s @��rrJzconcatenate.<locals>.<dictcomp>ds?����d�d�d�PQ�a��-A�-A�-A�-A�D�-A�-A�-A�s�!K�!K�!K�d�d�drz%Can only concatenate tensors but got r�) rr/r6r9r�rrrr5rurrrLr�)r2r�s``rrrTs�����$�q�'�E�4�=�)�)�Q��$�q�'�#l�#l�#l�#l�#l�X]�^a�bf�gh�bi�^j�^j�Xk�Xk�#l�#l�#l�m�m�m� �D��G�W� %� %�Q��t�D��G�}�}�d�d�d�d�d�UY�Z[�U\�Ua�Ua�Uc�Uc�d�d�d�e�e�e� ��Q���� .� .�Q��O��T�!�W� � �O�O�P�P�P� �9�T�s� #� #� #�#rc��eZdZdS)�CannotPadNestedTensorWarningN)rMr�r�r�rrrrjs�������Drrc�4�dd�}t||d|||���S)a3 Recursively pad the tensors in a nested list/tuple/dictionary of tensors from all devices to the same size so they can safely be gathered. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to gather. dim (`int`, *optional*, defaults to 0): The dimension on which to pad. pad_index (`int`, *optional*, defaults to 0): The value with which to pad. pad_first (`bool`, *optional*, defaults to `False`): Whether to pad at the beginning or the end. rFc�Z�� � �t|dd��rtjdt��|S�t |j��ks�t |j�� kr|S�dkr�t |j��z �t j|j|j���d}t|��� ��}t�fd�|D����� � |j�kr|S|j� t� ��}� |�<|� t|����|z}|r8t�� � fd�tt |����D����}n6t�� fd�tt |����D����}|||<|S) N� is_nestedFzHCannot pad nested tensors without more information. Leaving unprocessed.r)rXc3�(�K�|] }|�V�� dSrr�)rA�sr�s �rrFzFpad_across_processes.<locals>._pad_across_processes.<locals>.<genexpr>�s'�����-�-�!�q��v�-�-�-�-�-�-rc3�t�K�|]2}|�krt���z ���ntd��V��3dSr��slice)rAr�r��max_size�old_sizes ���rrFzFpad_across_processes.<locals>._pad_across_processes.<locals>.<genexpr>�s[�������[\�Q�#�X�X��h��#��.��9�9�9�5�QU�;�;������rc3�n�K�|]/}|�krtd����ntd��V��0dS�rNr�rAr�r�r!s ��rrFzFpad_across_processes.<locals>._pad_across_processes.<locals>.<genexpr>�sE�����o�o�UV�q�C�x�x�E�!�X�c�]�3�3�3�U�4�[�[�o�o�o�o�o�or)�getattr�warnings�warnrrrr`rrrXr�r{�maxr6� new_zerosr/r�) rr�� pad_index� pad_firstr��sizes�new_size� new_tensor�indicesr r!s ` @@r�_pad_across_processesz3pad_across_processes.<locals>._pad_across_processess������ �6�;�� .� .� � �M�Z�,� � � ��M� �#�f�l�#�#� #� #�s�c�&�,�.?�.?�-?�'?�'?��M� ��7�7� �3�v�|�$�$� $�C��|�F�L���?�?�?��E���t� � � � �"�"���-�-�-�-�u�-�-�-�-�-�� �v�|�C�(� (� (��M��<����>�>�� ��� ��%�%�e�H�o�o�6�6��B� � � p��������`e�fi�jr�fs�fs�`t�`t������G�G��o�o�o�o�o�Z_�`c�dl�`m�`m�Zn�Zn�o�o�o�o�o�G�$� �7���rT)r<r�r*r+�rrFr?)rr�r*r+r0s r�pad_across_processesr2ns?��" � � � �D ��v�4�S�T]�ir� � � �rc�4�dd�}t||d|||���S)z� Takes a `tensor` of arbitrary size and pads it so that it can work given `num_processes` needed dimensions. New tensors are just the last input repeated. E.g.: Tensor: ([3,4,4]) Num processes: 4 Expected result shape: ([4,4,4]) rc�x�� �||z}|||zz }||zdkr||z }n|||zz }|||zcxkrdkrnn||z }|j� t� ��}||z|d<|�t|����}t�� fd�t t |����D����} ||| <|S)Nrr c3�n�K�|]/}|�krtd����ntd��V��0dSr#rr$s ��rrFz@pad_input_tensors.<locals>._pad_input_tensors.<locals>.<genexpr>�sE�����k�k�QR�1��8�8��a��#��/�/�/��t���k�k�k�k�k�kr)r`r6r)r/r�rr) r� batch_sizer�r�� remainder� last_inputs�to_padr-r.r/r!s ` @r�_pad_input_tensorsz-pad_input_tensors.<locals>._pad_input_tensors�s������-�/� � �I� �$=�>� � �� &�!� +� +�"�Z�/�F�F�"�j�M�&A�B�F� ��&�� ,� ,� ,� ,�1� ,� ,� ,� ,� ,� �6�)�F��<����>�>�� �6�)��� ��%�%�e�H�o�o�6�6� ��k�k�k�k�k�V[�\_�`h�\i�\i�Vj�Vj�k�k�k�k�k��$� �7���rT)r<r6r�r�r�r?)rr6r�r�r:s r�pad_input_tensorsr;�sA������& ��� ��#� �  � � �r�mean��?c�2�dd�}t||d||���S)aX Recursively reduce the tensors in a nested list/tuple/dictionary of lists of tensors across all processes by the mean of a given operation. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to reduce. reduction (`str`, *optional*, defaults to `"mean"`): A reduction method. Can be of "mean", "sum", or "none" scale (`float`, *optional*): A default scaling value to be applied after the reduce, only valied on XLA. Returns: The same data structure as `data` with all the tensors reduced. r<r=c���t��}|���}|jtjkr|S|jtjkrHt j��t jt j |g|��t j��n=|jj tvr*tj �|tj��|dkr ||jz}|S)Nr<)r r�r�rr�r�r�r�� all_reduce� REDUCE_SUM�valuer rr�r�SUMr�)rr��scaler�� cloned_tensors r�_reduce_across_processesz(reduce.<locals>._reduce_across_processes�s�������� � ��� � � !�_�%7� 7� 7� � � � !�_�%8� 8� 8� �L�N�N�N� �M�"�-�-��%� @� @� @� �L�N�N�N�N� � #� )�-N� N� N� � � (� (��� � E� E� E� �� � � �U�0� 0�M��rT)r<r�rD�r<r=r?)rr�rDrFs rr�r��s;��$����& � �&�d�i�_d� � � �rc�2�d�}d�}t|||���S)av Recursively converts the elements nested list/tuple/dictionary of tensors in FP16/BF16 precision to FP32. Args: tensor (nested list/tuple/dictionary of `torch.Tensor`): The data to convert from FP16/BF16 to FP32. Returns: The same data structure as `tensor` with all tensors that were in FP16/BF16 precision converted to FP32. c�*�|���Sr)�floatrs r�_convert_to_fp32z)convert_to_fp32.<locals>._convert_to_fp32s���|�|�~�~�rc�~�t|��st|d��o|jtjtjfvSrj)rr0rar�float16r|rs r�_is_fp16_bf16_tensorz-convert_to_fp32.<locals>._is_fp16_bf16_tensors@����'�'�C�7�6�7�+C�+C� ��� �M� �N�Z �J � rrmr?)rrKrNs r�convert_to_fp32rO�s9����� � � � �-�v�AU� V� V� V�Vrc�$�eZdZdZd�Zd�Zd�ZdS)�ConvertOutputsToFp32ad Decorator to apply to a function outputing tensors (like a model forward pass) that ensures the outputs in FP16 precision will be convert back to FP32. Args: model_forward (`Callable`): The function which outputs we want to treat. Returns: The same function as `model_forward` but with converted outputs. c�4�||_t||��dSr)� model_forwardr)�selfrSs r�__init__zConvertOutputsToFp32.__init__s ��*����t�]�+�+�+�+�+rc�6�t|j|i|����Sr)rOrS)rTrCrEs r�__call__zConvertOutputsToFp32.__call__!s#���1�t�1�4�B�6�B�B�C�C�Crc�*�tjd���)Nz�Cannot pickle a prepared model with automatic mixed precision, please unwrap the model with `Accelerator.unwrap_model(model)` before pickling it.)�pickle� PicklingError)rTs r� __getstate__z!ConvertOutputsToFp32.__getstate__$s���"� `� � � rN)rMr�r�r�rUrWr[r�rrrQrQsN������ � �,�,�,�D�D�D� � � � � rrQc�>��t�����fd�}�|_|S)Nc����|i|��Srr�)rCrErSs �r�forwardz(convert_outputs_to_fp32.<locals>.forward-s����}�d�-�f�-�-�-r)rQ� __wrapped__)rSr^s` r�convert_outputs_to_fp32r`*s8���(��7�7�M�.�.�.�.�.�(�G�� �Nrc�>�t|t��r.|���D]}t|��}|�|cS�dSt|tt f��r|D]}t|��}|�|cS�dSt|t j��r|jSdS)z� Finds the device on which a nested dict/list/tuple of tensors lies (assuming they are all on the same device). Args: (nested list/tuple/dictionary of `torch.Tensor`): The data we want to know the device of. N) rr�valuesr�r/r6rrrX)r2r7rXs rr�r�6s����$�� � � ��;�;�=�=� � �C� ��%�%�F��!�� � � �"� � � �D�5�$�-� (� (��� � �C� ��%�%�F��!�� � � �"� � � �D�%�,� '� '���{���rTc#�\K�t��jtjks8t��j�4t��j���st ��}n"ddl}|j� ||||���}|5dV�ddd��dS#1swxYwYdS)z� Wrapper around `deepspeed.runtime.zero.GatheredParameters`, but if Zero-3 is not enabled, will be a no-op context manager. Nr)� modifier_rank� fwd_module�enabled) r r�r� DEEPSPEED�deepspeed_plugin�is_zero3_init_enabledr� deepspeed�zero�GatheredParameters)�paramsrdrerf�gather_param_contextrjs rrlrlKs�������*�o�.G�G�G����+�7� �"�"�3�I�I�K�K� 8� +�}�}�������(�~�@�@� �-�J�PW� A� � �� ��� ������������������������s�B!�!B%�(B%)FNr�)rr�r)NNr1rG)NNT)Wr�rYr&�collections.abcr� contextlibrr� functoolsrr�typingrrr�r r � constantsr � dataclassesrr�importsrrr�torch_xla.core.xla_model�core� xla_modelr��torch.distributedrrr'r+r3r9r@rVrcrgrortrwr�r�r�� Exceptionr�r�r�r�r�r�r�r�rJ�double�halfr|�uint8�int8�int16�int32�int64�boolr�rKrr�rrr[r�rr r� UserWarningrr2r;r�rOrQr`r�rlr�rr�<module>r�s4����� � � �����#�#�#�#�#�#�2�2�2�2�2�2�2�2�+�+�+�+�+�+�+�+������� � � � �2�2�2�2�2�2�2�2�8�8�8�8�8�8�;�;�;�;�;�;�;�;���������������*�)�)�)�)�)�)�)�)�)�!�!�#�#�+�*�*�*�*�*�*�,�,�,� � � �6�6�6�]�]�]�$�$�$�4C�X]�0�0�0�0�0�f1�1�1�1�h8�8�8�$/�/�/�$ b� b� b����.���"5�5�5�. � � � P� P� P�F � � � � �I� � � � � � �F���&������&2�s�2�2�2�2��#�����&Z�Z�Z�Z�:�:�:�:� �K�� �L�!� �J�� �N�A� �K�� �J�� �K�� �K�� �K�� �J�� ��D�C�(:�(@�(@�(B�(B�C�C�C�����2+�+�5�<�+�+�+�+�"���C�������*��S�����*@�@�@�@�($�$�$�$�, � � � � �;� � � ��4�4�4���4�n%�%�%�%�P�&�&�&���&�RW�W�W�0 � � � � � � � �4 � � ����*��������r
Memory