� %�g����ddlZddlZddlmZmZddlmZmZmZm Z e ��r ddl m cm Z d�ZGd�dejj��Zdefd �ZdS) �N�)�AcceleratorState� GradientState)�DistributedType� honor_type�is_lomo_available�is_torch_xla_availablec�l��t|ttf��rt|�fd�|D����St|t��r6t |���fd�|���D����St|tj��r|� ���S|S)Nc3�8�K�|]}t|���V��dS�N��move_to_device)�.0�t�devices ��d/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/accelerate/optimizer.py� <genexpr>z!move_to_device.<locals>.<genexpr>s-�����!K�!K��.��F�";�";�!K�!K�!K�!K�!K�!K�c�8��i|]\}}|t|�����S�r )r�k�vrs �r� <dictcomp>z"move_to_device.<locals>.<dictcomp>s)���S�S�S�T�Q��A�~�a��8�8�S�S�Sr) � isinstance�list�tupler�dict�type�items�torch�Tensor�to)�staters `rrrs�����%�$���'�'� ��%�!K�!K�!K�!K�U�!K�!K�!K�L�L�L� �E�4� � � ��t�E�{�{�S�S�S�S�U�[�[�]�]�S�S�S�T�T�T� �E�5�<� (� (� ��x�x����� �Lrc��eZdZdZdd�Zed���Zejd���Zed���Zejd���Zed ���Z e jd ���Z d �Z d �Z d �Z dd�Z d�Zd�Zdd�Zd�Zed���Zd�Zd�ZdS)�AcceleratedOptimizera� Internal wrapper around a torch optimizer. Conditionally will perform `step` and `zero_grad` if gradients should be synchronized when performing gradient accumulation. Args: optimizer (`torch.optim.optimizer.Optimizer`): The optimizer to wrap. device_placement (`bool`, *optional*, defaults to `True`): Whether or not the optimizer should handle device placement. If so, it will place the state dictionary of `optimizer` on the right device. scaler (`torch.cuda.amp.grad_scaler.GradScaler`, *optional*): The scaler to use in the step function if training with mixed precision. TNc��||_||_t��|_t ��|_||_d|_|j�7d|_|jj |_ t||jj ��|_ |r�|j� ��}|jjtjkr t#j||jj��nt)||jj��}|j�|��dSdS�NF)� optimizer�scalerr�accelerator_stater�gradient_state�device_placement� _is_overflow�_accelerate_step_called�step�_optimizer_original_step_method�patch_optimizer_step�_optimizer_patched_step_method� state_dict�distributed_typer�XLA�xm�send_cpu_data_to_devicerr�load_state_dict)�selfr(r,r)r3s r�__init__zAcceleratedOptimizer.__init__6s���"����� �!1�!3�!3���+�o�o��� 0���!��� �;� "�+0�D� (�37�>�3F�D� 0�2F�t�T�^�M`�2a�2a�D� /� � 7���2�2�4�4�J��%�6�/�:M�M�M��*�:�t�7M�7T�U�U�U�U�+�J��8N�8U�V�V� � �N� *� *�:� 6� 6� 6� 6� 6�  7� 7rc��|jjSr �r(r#�r9s rr#zAcceleratedOptimizer.stateLs ���~�#�#rc��||j_dSr r<�r9r#s rr#zAcceleratedOptimizer.statePs��$�����rc��|jjSr �r(� param_groupsr=s rrBz!AcceleratedOptimizer.param_groupsTs ���~�*�*rc��||j_dSr rA)r9rBs rrBz!AcceleratedOptimizer.param_groupsXs��&2���#�#�#rc��|jjSr �r(�defaultsr=s rrFzAcceleratedOptimizer.defaults\s ���~�&�&rc��||j_dSr rE)r9rFs rrFzAcceleratedOptimizer.defaults`s��"*�����rc�:�|j�|��dSr )r(�add_param_group)r9� param_groups rrIz$AcceleratedOptimizer.add_param_groupds�� ��&�&�{�3�3�3�3�3rc��|jjtjkr&|jrt j||jj��|j� |��dSr ) r*r4rr5r,r6r7rr(r8)r9r3s rr8z$AcceleratedOptimizer.load_state_dictgsS�� � !� 2�o�6I� I� I�d�Nc� I� � &�z�4�3I�3P� Q� Q� Q� ��&�&�z�2�2�2�2�2rc�4�|j���Sr )r(r3r=s rr3zAcceleratedOptimizer.state_dictls���~�(�(�*�*�*rc��|jjrtdtj|jj��jv}|r!|�d}|j�|���dS|�td���|j���dSdS)N� set_to_noneT)rNzJ`set_to_none` for Optimizer.zero_grad` is not supported by this optimizer.)r+�sync_gradients�inspect� signaturer(� zero_grad� parameters� ValueError)r9rN� accept_args rrRzAcceleratedOptimizer.zero_grados��� � � -� +�&�'�*;�D�N�<T�*U�*U�*`�`�J�� +��&�"&�K���(�(�[�(�A�A�A�A�A��*�$�%q�r�r�r���(�(�*�*�*�*�*� +� +rc�z�t|jd��r4t|jj��r|j���dSt|jd��rXt|jjd��r@t|jjj��r$|jj���dSdSdSdS)z` Sets the optimizer to "train" mode. Useful for optimizers like `schedule_free` �trainr(N)�hasattrr(�callablerWr=s rrWzAcceleratedOptimizer.train{s��� �4�>�7� +� +� -����9M�0N�0N� -� �N� � � "� "� "� "� "� �D�N�K� 0� 0� -����0�'�:�:� -����1�7�8�8� -� �N� $� *� *� ,� ,� ,� ,� ,�  -� -� -� -� -� -rc��t|jd��r4t|jj��r|j���dSdSdS)z_ Sets the optimizer to "eval" mode. Useful for optimizers like `schedule_free` �evalN)rXr(rYr[r=s rr[zAcceleratedOptimizer.eval�sZ�� �4�>�6� *� *� "�x���8K�/L�/L� "� �N� � � !� !� !� !� !� "� "� "� "rc��t��rddlm}m}|jjsj|jjtj krPtj |j ��}tj d|dtj��z ���d|j_t��rt|j ||f��rdS|jjr�|j�y|j|j _|j�|j |��|j���|jsd|_nd|_|j|j _d|_n|j �|��|jjtj krd|j_dSdS)Nr)�AdaLomo�Lomo�sumg�?)�scaleTF)r� lomo_optimr]r^r+�is_xla_gradients_syncedr*r4rr5r6�_fetch_gradientsr(� all_reduce�xrt_world_sizerrOr)r2r/�updater.r-r0)r9�closurer]r^� gradientss rr/zAcceleratedOptimizer.step�s��� � � � 1� 0� 0� 0� 0� 0� 0� 0� 0��#�;� ?��&�7�?�;N�N�N��+�D�N�;�;�I� �M�%��#��8I�8K�8K�2K� L� L� L� L�:>�D� � 7� � � � ��$�.�4��/�:�:� ��� � � -� -��{�&�&*�&I���#�� � � ����9�9�9�� �"�"�$�$�$��3�.�(,�D�%�%�(-�D�%�&*�&J���#�/4��,�,���#�#�G�,�,�,� � !� 2�o�6I� I� I�:?�D� � 7� 7� 7� J� Irc�T��|jjD]}�fd�|dD��|d<�dS)Nc�<��g|]}��||����Sr)�get)r�p�parameters_maps �r� <listcomp>z;AcceleratedOptimizer._switch_parameters.<locals>.<listcomp>�s)���$]�$]�$]�!�^�%7�%7��1�%=�%=�$]�$]�$]r�paramsrA)r9rmrJs ` r�_switch_parametersz'AcceleratedOptimizer._switch_parameters�sL����>�6� ^� ^�K�$]�$]�$]�$]�{�S[�G\�$]�$]�$]�K�� !� !� ^� ^rc��|jS)z.Whether or not the optimizer step was skipped.)r-r=s r�step_was_skippedz%AcceleratedOptimizer.step_was_skipped�s ��� � rc�V��gd���fd�|j���D��S)N)r.r0r2c�$��i|] \}}|�v� ||�� Srr)rrr� _ignored_keyss �rrz5AcceleratedOptimizer.__getstate__.<locals>.<dictcomp>�s)���Q�Q�Q���A�!�=�:P�:P��1�:P�:P�:Pr)�__dict__r)r9rus @r� __getstate__z!AcceleratedOptimizer.__getstate__�s?��� � � � � R�Q�Q�Q���!4�!4�!6�!6�Q�Q�Q�Qrc��|j�|��|j�9d|_|jj|_t||jj��|_dSdSr') rvrfr)r.r(r/r0r1r2r?s r� __setstate__z!AcceleratedOptimizer.__setstate__�sZ�� � ���U�#�#�#� �;� "�+0�D� (�37�>�3F�D� 0�2F�t�T�^�M`�2a�2a�D� /� /� /� #� "r)TNr )�__name__� __module__� __qualname__�__doc__r:�propertyr#�setterrBrFrIr8r3rRrWr[r/rprrrwryrrrr%r%%s��������� 7�7�7�7�,�$�$��X�$� �\�%�%��\�%��+�+��X�+���3�3���3��'�'��X�'��_�+�+��_�+�4�4�4�3�3�3� +�+�+� +� +� +� +� -� -� -�"�"�"�$@�$@�$@�$@�L^�^�^��!�!��X�!�R�R�R�b�b�b�b�brr%�accelerated_optimizerc������fd�}|S)Nc�"��d�_�|i|��S)NT)r.)�args�kwargsr��methods ��r� patched_stepz*patch_optimizer_step.<locals>.patched_step�s!���8<��5��v�t�&�v�&�&�&rr)r�r�r�s`` rr1r1�s*����'�'�'�'�'�'� �r)rPr r#rr�utilsrrrr �torch_xla.core.xla_model�core� xla_modelr6r�optim� Optimizerr%r1rrr�<module>r�s������� � � � �2�2�2�2�2�2�2�2�Y�Y�Y�Y�Y�Y�Y�Y�Y�Y�Y�Y�����*�)�)�)�)�)�)�)�)�)����gb�gb�gb�gb�gb�5�;�0�gb�gb�gb�T�0D������r
Memory