� ��g ��ddlZddlmZddlZddlmZddlmZmZddlm Z e j e��ZGd�dej ��ZGd �d ej ��ZGd�dej ��ZGd �dej ��ZGd�dej ��ZGd�dej ��ZGd�dej ��ZGd�dej ��ZGd�dej ��ZGd�dej ��ZGd�dej ��ZGd�de��Zide�d ed!d"d#�f�d$e�d%e�d&ed'd(if�d)e�d*e�d+e�d,ej�d-e�d.e�d/e�d0ej�d1e�d2ej�d3ej�d4ej�ejejej d5��Z!ee!��Z"d6�Z#e#d&��Z$e#d%��Z%e#d��Z&e#d$��Z'e#d/��Z(e#d4��Z)e#d.��Z*e#d-��Z+dS)7�N)�OrderedDict)�version)�Tensor�nn�)�loggingc�2��eZdZdZ�fd�Zdedefd�Z�xZS)�PytorchGELUTanha A fast C implementation of the tanh approximation of the GeLU activation function. See https://arxiv.org/abs/1606.08415. This implementation is equivalent to NewGELU and FastGELU but much faster. However, it is not an exact numerical match due to rounding errors. c��t��tjtj��tjd��krt dtj�d��dS)Nz1.12.0zYou are using torch==zM, but torch>=1.12.0 is required to use PytorchGELUTanh. Please upgrade torch.)�super�__init__r�parse�torch�__version__�ImportError��self� __class__s ��h/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/transformers/activations.pyr zPytorchGELUTanh.__init__%so�� =��*�+�+�g�m�H�.E�.E�E�E��9��(9�9�9�9�� F�E��input�returnc�D�tj�|d��S)N�tanh)�approximate)r� functional�gelu�rrs r�forwardzPytorchGELUTanh.forward-s��}�!�!�%�V�!�<�<�<r��__name__� __module__�__qualname__�__doc__r rr� __classcell__�rs@rr r sd��=�V�=��=�=�=�=�=�=�=�=rr c�"�eZdZdZdedefd�ZdS)�NewGELUActivationz� Implementation of the GELU activation function currently in Google BERT repo (identical to OpenAI GPT). Also see the Gaussian Error Linear Units paper: https://arxiv.org/abs/1606.08415 rrc��d|zdtjtjdtjz��|dtj|d��zzz��zzS)N��?��?�@��Hm��?g@)rr�math�sqrt�pi�powrs rrzNewGELUActivation.forward7sP��U�{�c�E�J�t�y��t�w��/G�/G�5�S[�^c�^g�hm�or�^s�^s�Ss�Ks�/t�$u�$u�u�v�vrN�r!r"r#r$rr�rrr(r(1sH�� w�V�w��w�w�w�w�w�wrr(c�J��eZdZdZd def�fd� Zdedefd�Zdedefd�Z�xZ S) �GELUActivationa� Original Implementation of the GELU activation function in Google BERT repo when initially created. For information: OpenAI GPT's GELU is slightly different (and gives slightly different results): 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3)))) This is now written in C in nn.functional Also see the Gaussian Error Linear Units paper: https://arxiv.org/abs/1606.08415 F�use_gelu_pythonc��t��|r|j|_dStjj|_dS�N)rr �_gelu_python�actrrr)rr6rs �rr zGELUActivation.__init__Cs?�� *��(�D�H�H�H��}�)�D�H�H�Hrrrc�f�|dzdtj|tjd��z��zzS)Nr*r+r,)r�erfr.r/rs rr9zGELUActivation._gelu_pythonJs-��s�{�c�E�I�e�d�i��n�n�.D�$E�$E�E�F�Frc�,�|�|��Sr8�r:rs rrzGELUActivation.forwardM��x�x��r)F) r!r"r#r$�boolr rr9rr%r&s@rr5r5;s��*�*��*�*�*�*�*�*�G�&�G�V�G�G�G�G��V��rr5c�"�eZdZdZdedefd�ZdS)�FastGELUActivationz} Applies GELU approximation that is slower than QuickGELU but more accurate. See: https://github.com/hendrycks/GELUs rrc�Z�d|zdtj|dzdd|z|zzz��zzS)Nr*r+g��3E��?r-)rrrs rrzFastGELUActivation.forwardVs;��U�{�c�E�J�u�|�/C�s�X�X]�M]�`e�Me�Ge�/f�$g�$g�g�h�hrNr2r3rrrBrBQsH��i�V�i��i�i�i�i�i�irrBc�"�eZdZdZdedefd�ZdS)�QuickGELUActivationzr Applies GELU approximation that is fast but somewhat inaccurate. See: https://github.com/hendrycks/GELUs rrc�6�|tjd|z��zS)Ng�Zd;�?)r�sigmoidrs rrzQuickGELUActivation.forward_s��u�}�U�U�]�3�3�3�3rNr2r3rrrErEZs@��4�V�4��4�4�4�4�4�4rrEc�<��eZdZdZdedef�fd�Zdedefd�Z�xZS)�ClippedGELUActivationa� Clip the range of possible GeLU outputs between [min, max]. This is especially useful for quantization purpose, as it allows mapping negatives values in the GeLU spectrum. For more information on this trick, please refer to https://arxiv.org/abs/2004.09602. Gaussian Error Linear Unit. Original Implementation of the gelu activation function in Google Bert repo when initially created. For information: OpenAI GPT's gelu is slightly different (and gives slightly different results): 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3)))). See https://arxiv.org/abs/1606.08415 �min�maxc��||krtd|�d|�d��t��||_||_dS)Nzmin should be < max (got min: z, max: �))� ValueErrorrr rJrK)rrJrKrs �rr zClippedGELUActivation.__init__psV��9�9��P�c�P�P�#�P�P�P�Q�Q�Q� ��r�xrc�\�tjt|��|j|j��Sr8)r�cliprrJrK)rrOs rrzClippedGELUActivation.forwardxs ��z�$�q�'�'�4�8�T�X�6�6�6r) r!r"r#r$�floatr rrr%r&s@rrIrIcsw�� E��7��7�F�7�7�7�7�7�7�7�7rrIc�2��eZdZdZ�fd�Zdedefd�Z�xZS)�AccurateGELUActivationz� Applies GELU approximation that is faster than default and more accurate than QuickGELU. See: https://github.com/hendrycks/GELUs Implemented along with MEGA (Moving Average Equipped Gated Attention) c��t��tjdtjz��|_dS)N�)rr r.r/r0�precomputed_constantrs �rr zAccurateGELUActivation.__init__�s7�� $(�I�a�$�'�k�$:�$:��!�!�!rrrc�~�d|zdtj|j|dtj|d��zzz��zzS)Nr*rr-�)rrrWr1rs rrzAccurateGELUActivation.forward�sC��U�{�a�%�*�T�-F�%�RZ�]b�]f�gl�no�]p�]p�Rp�Jp�-q�"r�"r�r�s�srr r&s@rrTrT|sn��;�;�;�;�;�t�V�t��t�t�t�t�t�t�t�trrTc�B��eZdZdZ�fd�Zdedefd�Zdedefd�Z�xZS)�MishActivationz� See Mish: A Self-Regularized Non-Monotonic Activation Function (Misra., https://arxiv.org/abs/1908.08681). Also visit the official repository for the paper: https://github.com/digantamisra98/Mish c��t��tjtj��tjd��kr|j|_dStj j |_dS)Nz1.9.0)rr rrrr�_mish_pythonr:rr�mishrs �rr zMishActivation.__init__�sY�� =��*�+�+�g�m�G�.D�.D�D�D��(�D�H�H�H��}�)�D�H�H�Hrrrc�j�|tjtj�|��zSr8)rrrr�softplusrs rr]zMishActivation._mish_python�s'��u�z�"�-�"8�"8��"?�"?�@�@�@�@rc�,�|�|��Sr8r>rs rrzMishActivation.forward�r?r) r!r"r#r$r rr]rr%r&s@rr[r[�s�� *�*�*�*�*�A�&�A�V�A�A�A�A��V��rr[c�"�eZdZdZdedefd�ZdS)�LinearActivationz[ Applies the linear activation function, i.e. forwarding input directly to output. rrc��|Sr8r3rs rrzLinearActivation.forward�s��rNr2r3rrrcrc�s@��V��rrcc��eZdZdZdd�ZdS)�LaplaceActivationz� Applies elementwise activation based on Laplace function, introduced in MEGA as an attention activation. See https://arxiv.org/abs/2209.10655 Inspired by squared relu, but with bounded range and gradient for better stability 绹�۞��?� ^�/� �?c��||z �|tjd��z��}ddtj|��zzS)Nr,r*r+)�divr.r/rr<)rr�mu�sigmas rrzLaplaceActivation.forward�s@�� 3��!7�8�8��c�E�I�e�,�,�,�-�-rN)rgrh�r!r"r#r$rr3rrrfrf�s2��.�.�.�.�.�.rrfc��eZdZdZd�ZdS)�ReLUSquaredActivationzX Applies the relu^2 activation introduced in https://arxiv.org/abs/2109.08668v2 c�l�tj�|��}tj|��}|Sr8)rr�relur�square)rr�relu_applied�squareds rrzReLUSquaredActivation.forward�s+��}�)�)�%�0�0��,�|�,�,��rNrmr3rrroro�s-��rroc��eZdZ�fd�Z�xZS)�ClassInstantierc��t��|��}t|t��r|n|if\}}|di|��S)Nr3)r�__getitem__� isinstance�tuple)r�key�content�cls�kwargsrs �rrxzClassInstantier.__getitem__�sL��'�'�%�%�c�*�*��!+�G�U�!;�!;�N�g�g�'�2��V��s�}�}�V�}�}�r)r!r"r#rxr%r&s@rrvrv�s8��rrvr�gelu_10i�� )rJrK� gelu_fast�gelu_new�gelu_pythonr6T�gelu_pytorch_tanh� gelu_accurate�laplace� leaky_relu�linearr^� quick_gelurq�relu2�relu6rG�silu)�swishr�preluc ��|tvr t|Std|�dtt��)Nz function z not found in ACT2FN mapping )�ACT2FN�KeyError�list�keys)�activation_strings r�get_activationr��sO��F�"�"��'�(�(��h�#4�h�h�SW�X^�Xc�Xc�Xe�Xe�Sf�Sf�h�h�i�i�ir),r.�collectionsrr� packagingrrr�utilsr� get_loggerr!�logger�Moduler r(r5rBrErIrTr[rcrfrorv� LeakyReLU�ReLU�ReLU6�Sigmoid�SiLU�Tanh�PReLU�ACT2CLSr�r�r�r�rr�r�r�r^� linear_actr3rr�<module>r�s[��#�#�#�#�#�#�� H� %� %��=�=�=�=�=�b�i�=�=�=�*w�w�w�w�w�� w�w�w��R�Y��,i�i�i�i�i��i�i�i�4�4�4�4�4�"�)�4�4�4�7�7�7�7�7�B�I�7�7�7�2 t� t� t� t� t�R�Y� t� t� t� ��R�Y��(��r�y�� .� .� .� .� .�� .� .� .��B�I��k�� N�� %�s�2�'>�'>�?��#��!� � �N�%6��$=�>�� +�� "�,�� N��%��B�G�� "��R�X�� r�z�!�"�B�G�#�$�W��G� �X�)��, �� !� !��j�j�j��n�]�+�+��>�*�%�%��~�f��N�;�'�'� � �^�L� )� )� ��~�f��~�f�� ^�H� %� %� � � r