� ���g ����ddlZddlmZddlZddlmZddlmZmZddlm Z e j e ��Z Gd�dej ��ZGd �d ej ��ZGd �d ej ��ZGd �dej ��ZGd�dej ��ZGd�dej ��ZGd�dej ��ZGd�dej ��ZGd�dej ��ZGd�dej ��ZGd�dej ��ZGd�de��Zide�d ed!d"d#�f�d$e�d%e�d&ed'd(if�d)e�d*e�d+e�d,ej�d-e�d.e�d/e�d0ej�d1e�d2ej�d3ej�d4ej�ejejej d5��Z!ee!��Z"d6�Z#e#d&��Z$e#d%��Z%e#d��Z&e#d$��Z'e#d/��Z(e#d4��Z)e#d.��Z*e#d-��Z+dS)7�N)� OrderedDict)�version)�Tensor�nn�)�loggingc�2��eZdZdZ�fd�Zdedefd�Z�xZS)�PytorchGELUTanha A fast C implementation of the tanh approximation of the GeLU activation function. See https://arxiv.org/abs/1606.08415. This implementation is equivalent to NewGELU and FastGELU but much faster. However, it is not an exact numerical match due to rounding errors. c����t�����tjtj��tjd��krt dtj�d����dS)Nz1.12.0zYou are using torch==zM, but torch>=1.12.0 is required to use PytorchGELUTanh. Please upgrade torch.)�super�__init__r�parse�torch� __version__� ImportError��self� __class__s ��h/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/transformers/activations.pyr zPytorchGELUTanh.__init__%so��� �������� �=��*� +� +�g�m�H�.E�.E� E� E��9��(9�9�9�9��� � F� E��input�returnc�D�tj�|d���S)N�tanh)� approximate)r� functional�gelu�rrs r�forwardzPytorchGELUTanh.forward-s���}�!�!�%�V�!�<�<�<r��__name__� __module__� __qualname__�__doc__r rr� __classcell__�rs@rr r sd��������������=�V�=��=�=�=�=�=�=�=�=rr c�"�eZdZdZdedefd�ZdS)�NewGELUActivationz� Implementation of the GELU activation function currently in Google BERT repo (identical to OpenAI GPT). Also see the Gaussian Error Linear Units paper: https://arxiv.org/abs/1606.08415 rrc ��d|zdtjtjdtjz ��|dtj|d��zzz��zzS)N��?��?�@��Hm��?g@)rr�math�sqrt�pi�powrs rrzNewGELUActivation.forward7sP���U�{�c�E�J�t�y��t�w��/G�/G�5�S[�^c�^g�hm�or�^s�^s�Ss�Ks�/t�$u�$u�u�v�vrN�r!r"r#r$rr�rrr(r(1sH�������� w�V�w��w�w�w�w�w�wrr(c�J��eZdZdZd def�fd� Zdedefd�Zdedefd�Z�xZ S) �GELUActivationa� Original Implementation of the GELU activation function in Google BERT repo when initially created. For information: OpenAI GPT's GELU is slightly different (and gives slightly different results): 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3)))) This is now written in C in nn.functional Also see the Gaussian Error Linear Units paper: https://arxiv.org/abs/1606.08415 F�use_gelu_pythonc���t�����|r|j|_dStjj|_dS�N)r r � _gelu_python�actrrr)rr6rs �rr zGELUActivation.__init__Cs?��� �������� � *��(�D�H�H�H��}�)�D�H�H�Hrrrc�f�|dzdtj|tjd��z ��zzS)Nr*r+r,)r�erfr.r/rs rr9zGELUActivation._gelu_pythonJs-���s�{�c�E�I�e�d�i��n�n�.D�$E�$E�E�F�Frc�,�|�|��Sr8�r:rs rrzGELUActivation.forwardM����x�x����r)F) r!r"r#r$�boolr rr9rr%r&s@rr5r5;s����������*�*��*�*�*�*�*�*�G�&�G�V�G�G�G�G��V����������rr5c�"�eZdZdZdedefd�ZdS)�FastGELUActivationz} Applies GELU approximation that is slower than QuickGELU but more accurate. See: https://github.com/hendrycks/GELUs rrc�Z�d|zdtj|dzdd|z|zzz��zzS)Nr*r+g���3E��?r-)rrrs rrzFastGELUActivation.forwardVs;���U�{�c�E�J�u�|�/C�s�X�X]�M]�`e�Me�Ge�/f�$g�$g�g�h�hrNr2r3rrrBrBQsH��������i�V�i��i�i�i�i�i�irrBc�"�eZdZdZdedefd�ZdS)�QuickGELUActivationzr Applies GELU approximation that is fast but somewhat inaccurate. See: https://github.com/hendrycks/GELUs rrc�6�|tjd|z��zS)Ng�Zd;�?)r�sigmoidrs rrzQuickGELUActivation.forward_s���u�}�U�U�]�3�3�3�3rNr2r3rrrErEZs@��������4�V�4��4�4�4�4�4�4rrEc�<��eZdZdZdedef�fd� Zdedefd�Z�xZS)�ClippedGELUActivationa� Clip the range of possible GeLU outputs between [min, max]. This is especially useful for quantization purpose, as it allows mapping negatives values in the GeLU spectrum. For more information on this trick, please refer to https://arxiv.org/abs/2004.09602. Gaussian Error Linear Unit. Original Implementation of the gelu activation function in Google Bert repo when initially created. For information: OpenAI GPT's gelu is slightly different (and gives slightly different results): 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3)))). See https://arxiv.org/abs/1606.08415 �min�maxc���||krtd|�d|�d����t�����||_||_dS)Nzmin should be < max (got min: z, max: �))� ValueErrorr r rJrK)rrJrKrs �rr zClippedGELUActivation.__init__psV��� ��9�9��P�c�P�P�#�P�P�P�Q�Q� Q� ���������������r�xrc�\�tjt|��|j|j��Sr8)r�cliprrJrK)rrOs rrzClippedGELUActivation.forwardxs ���z�$�q�'�'�4�8�T�X�6�6�6r) r!r"r#r$�floatr rrr%r&s@rrIrIcsw������� � ��E���������7��7�F�7�7�7�7�7�7�7�7rrIc�2��eZdZdZ�fd�Zdedefd�Z�xZS)�AccurateGELUActivationz� Applies GELU approximation that is faster than default and more accurate than QuickGELU. See: https://github.com/hendrycks/GELUs Implemented along with MEGA (Moving Average Equipped Gated Attention) c���t�����tjdtjz ��|_dS)N�)r r r.r/r0�precomputed_constantrs �rr zAccurateGELUActivation.__init__�s7��� ��������$(�I�a�$�'�k�$:�$:��!�!�!rrrc �~�d|zdtj|j|dtj|d��zzz��zzS)Nr*rr-�)rrrWr1rs rrzAccurateGELUActivation.forward�sC���U�{�a�%�*�T�-F�%�RZ�]b�]f�gl�no�]p�]p�Rp�Jp�-q�"r�"r�r�s�srr r&s@rrTrT|sn���������;�;�;�;�;�t�V�t��t�t�t�t�t�t�t�trrTc�B��eZdZdZ�fd�Zdedefd�Zdedefd�Z�xZS)�MishActivationz� See Mish: A Self-Regularized Non-Monotonic Activation Function (Misra., https://arxiv.org/abs/1908.08681). Also visit the official repository for the paper: https://github.com/digantamisra98/Mish c����t�����tjtj��tjd��kr|j|_dStj j |_dS)Nz1.9.0) r r rrrr� _mish_pythonr:rr�mishrs �rr zMishActivation.__init__�sY��� �������� �=��*� +� +�g�m�G�.D�.D� D� D��(�D�H�H�H��}�)�D�H�H�Hrrrc�j�|tjtj�|����zSr8)rrrr�softplusrs rr]zMishActivation._mish_python�s'���u�z�"�-�"8�"8��"?�"?�@�@�@�@rc�,�|�|��Sr8r>rs rrzMishActivation.forward�r?r) r!r"r#r$r rr]rr%r&s@rr[r[�s���������� *�*�*�*�*�A�&�A�V�A�A�A�A��V����������rr[c�"�eZdZdZdedefd�ZdS)�LinearActivationz[ Applies the linear activation function, i.e. forwarding input directly to output. rrc��|Sr8r3rs rrzLinearActivation.forward�s��� rNr2r3rrrcrc�s@���������V��������rrcc��eZdZdZdd�ZdS)�LaplaceActivationz� Applies elementwise activation based on Laplace function, introduced in MEGA as an attention activation. See https://arxiv.org/abs/2209.10655 Inspired by squared relu, but with bounded range and gradient for better stability 绹�۞��?� ^�/� �?c��||z �|tjd��z��}ddtj|��zzS)Nr,r*r+)�divr.r/rr<)rr�mu�sigmas rrzLaplaceActivation.forward�s@����� � ����3���!7�8�8���c�E�I�e�,�,�,�-�-rN)rgrh�r!r"r#r$rr3rrrfrf�s2��������.�.�.�.�.�.rrfc��eZdZdZd�ZdS)�ReLUSquaredActivationzX Applies the relu^2 activation introduced in https://arxiv.org/abs/2109.08668v2 c�l�tj�|��}tj|��}|Sr8)rr�relur�square)rr� relu_applied�squareds rrzReLUSquaredActivation.forward�s+���}�)�)�%�0�0� ��,�|�,�,���rNrmr3rrroro�s-������������rroc���eZdZ�fd�Z�xZS)�ClassInstantierc���t���|��}t|t��r|n|if\}}|di|��S)Nr3)r � __getitem__� isinstance�tuple)r�key�content�cls�kwargsrs �rrxzClassInstantier.__getitem__�sL����'�'�%�%�c�*�*��!+�G�U�!;�!;�N�g�g�'�2�� ��V��s�}�}�V�}�}�r)r!r"r#rxr%r&s@rrvrv�s8���������������rrvr�gelu_10i����� )rJrK� gelu_fast�gelu_new� gelu_pythonr6T�gelu_pytorch_tanh� gelu_accurate�laplace� leaky_relu�linearr^� quick_gelurq�relu2�relu6rG�silu)�swishr�preluc ��|tvr t|Std|�dtt����������)Nz function z not found in ACT2FN mapping )�ACT2FN�KeyError�list�keys)�activation_strings r�get_activationr��sO���F�"�"��'�(�(��h�#4�h�h�SW�X^�Xc�Xc�Xe�Xe�Sf�Sf�h�h�i�i�ir),r.� collectionsrr� packagingrrr�utilsr� get_loggerr!�logger�Moduler r(r5rBrErIrTr[rcrfrorv� LeakyReLU�ReLU�ReLU6�Sigmoid�SiLU�Tanh�PReLU�ACT2CLSr�r�r�r�rr�r�r�r^� linear_actr3rr�<module>r�s[�� � � � �#�#�#�#�#�#� � � � ��������������������� �� �H� %� %��=�=�=�=�=�b�i�=�=�=�*w�w�w�w�w�� �w�w�w������R�Y����,i�i�i�i�i���i�i�i�4�4�4�4�4�"�)�4�4�4�7�7�7�7�7�B�I�7�7�7�2 t� t� t� t� t�R�Y� t� t� t� �����R�Y����(�����r�y���� .� .� .� .� .�� � .� .� .������B�I���������k���� � �N� � �%�s�2�'>�'>�?� ��#� ��!�  � �N�%6��$=�>�  � ��  ��+� �� � ��"�,� � �� � �N� ��%� � �B�G� � � "� � �R�X� � �r�z�! �" �B�G�# �$�W� �G� �X�) � � ��, ��� !� !��j�j�j��n�]�+�+� � �>�*� %� %���~�f���� �N�;� '� '� � �^�L� )� )� ��~�f�����~�f���� �^�H� %� %� � � r
Memory