� 3��g������ddlmZmZddlZddlZddlZddlZddlZddlZddl Z ddl m Z ddl m Z ddlmZmZmZmZmZmZmZmZmZmZddlmZddlmZeded �� �Zed ��Z Gd �d ej!��Z"dUd�Z#Gd�d��Z$d�Z%iZ&dVd�Z'Gd�dee ��Z(d�Z)d�Z*idd�dd�dd�dd �d!d�d"d#�d$d#�d%d�d&d'�d(d'�d)d*�d+d,�d-d.�d/d0�d1d2�d3d4�d5d6�d7d8d9d:d;d<��Z+e,e+�-����D]Z.e.e+e.<�Gd=�d>e(e ��Z/edWdA���Z0edddddddB�dXdL���Z0 dYdddddddB�dZdO�Z0GdP�dQ��Z1GdR�dS��Z2dT�Z3dS)[�)� annotations�divisionN)� defaultdict)�cached_property) �Callable�Generic�Iterable�Optional�TypeVar�Union�overload�Dict�Any�Tuple�)�driver)� ModuleTypez .runtime.jit�Tc�v��eZdZdZd�fd� Zed���Zd�Zd�Zd�Z d �Z d �Z d �Z d �Z d �Zd�Zd�Z�xZS)�DependenciesFindera� This AST visitor is used to find dependencies of a JITFunction. This can be used to invalidate a JITFunction's hash when its source code -- or that of its dependencies -- changes. This visitor also keeps track of the global variables touched by the JITFunction. When we launch the kernel, we check that these have the same values as they did when we ran this visitor. If not, we raise an error (or otherwise we could recompile). �return�Nonec����t�����||_tj|�d����|_||_hd�|_i|_ d|_ dS)N�utf-8> �int�len�max�min�list�float�print�range�getattr� isinstanceF) �super�__init__�name�hashlib�sha256�encode�hasher�globals�supported_python_builtins�used_global_vals�visiting_arg_default_value)�selfr'r,�src� __class__s ��b/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/triton/runtime/jit.pyr&zDependenciesFinder.__init__$st��� ���������� ��n�S�Z�Z��%8�%8�9�9�� ��� � * � * � * ��&�.TV���*/��'�'�'�c�4�|j���S�N)r+� hexdigest�r0s r3�retzDependenciesFinder.retHs���{�$�$�&�&�&r4c��t|j��tjkr|jS|j|jvrdS|j�|jd��}|�}|jsvt|��tkr^t|t��sIt|dd��s8|j|j vr*||jf|j|jt |j��f<|S)N�__triton_builtin__F)�type�ctx�ast�Store�id� local_namesr,�getr/rr$� JITFunctionr#r-r.)r0�node�vals r3� visit_NamezDependenciesFinder.visit_NameLs��� ���>�>�S�Y� &� &��7�N� �7�d�&� &� &��4��l���t�w��-�-�� �O��7� ���I�I��+�+�#�3� �4�4�,�>E�S�J^�`e�=f�=f�,��G�4�#A�A�A�BE�t�|�AT�D� !�4�7�B�t�|�,<�,<�"=� >�� r4c�*���fd�|jD��S)Nc�:��g|]}��|����S�)�visit)�.0�eltr0s �r3� <listcomp>z2DependenciesFinder.visit_Tuple.<locals>.<listcomp>ms#���5�5�5�C�� � �3���5�5�5r4)�elts�r0rDs` r3� visit_TuplezDependenciesFinder.visit_Tuplejs!���6�5�5�5�4�9�5�5�5�5r4c�8�|�|j��}t|tj��r4|�|j��}t|tj���4|�t |dd��t krdSt ||j��S)N�__name__�)rJ�valuer$r>� Attributer#� TRITON_MODULE�attr)r0rD�lhss r3�visit_Attributez"DependenciesFinder.visit_Attributeos����j�j���$�$����c�m�,�,� (��*�*�S�Y�'�'�C���c�m�,�,� (� �;�7�3� �B�7�7�=�H�H��4��s�D�I�&�&�&r4c�����fd�}���j��}|�3||��s(t|t��sJd|j�d����t j|ft�j�j���fd��j D����D�]%}t|t��s�||��r�%|j }�j � ��|j � ��zD]V}|\}}�j |\} }|j |\} }| | kr)td|�d| �d�j�d|j�d | �d � ����W�j �|j ��t!t#|d d ����} || z} �j�| �d ������'dS)Nc���tj�j��rdSt|dd��}|�t ��S)NT� __module__rS)�inspect� isbuiltin�funcr#� startswithrV)r_�modulerDs �r3�is_triton_builtinz8DependenciesFinder.visit_Call.<locals>.is_triton_builtinysB���� ���+�+� ��t��T�<��4�4�F��$�$�]�3�3� 3r4z Function "zv" is being called from a Triton function but is not a Triton function itself. Decorate it with @triton.jit to fix thisc3�L�K�|]}��|j��V��dSr6)rJrT)rK�kwr0s �r3� <genexpr>z0DependenciesFinder.visit_Call.<locals>.<genexpr>�s1����� :� :�b�T�Z�Z��� !� !� :� :� :� :� :� :r4�Global variable z has value z when compiling z, but inner kernel z has conflicting value z7 from when it was first compiled. This is not allowed.�noinlineFr)rJr_r$rCrR� itertools�chain�map�args�keywords� cache_keyr.�keys� RuntimeErrorr'�update�strr#r+r*) r0rDrbr_�obj�func_cache_key�k�var_name�_�v1�v2rg�keys `` r3� visit_CallzDependenciesFinder.visit_CallwsP���� 4� 4� 4� 4� 4� �z�z�$�)�$�$���|�0�0��6�6�|�*� �+�; �; �|�|� ^�� � ^� ^� ^��|� � �?� �H��D�J�� �*�*� :� :� :� :�D�M� :� :� :� � � 4� 4�C� �c�;�/�/� �� � ��%�%� �� �]�N��*�/�/�1�1�C�4H�4M�4M�4O�4O�O� � ��� ��!��-�a�0���A��,�Q�/���A���8�8�&�T�8�T�T��T�T�TX�T]�T�T�rv�r�T�T�Y[�T�T�T����� � !� (� (��)=� >� >� >��7�3� �E�:�:�;�;�H� �8�+�C� �K� � �s�z�z�'�2�2� 3� 3� 3� 3�9 4� 4r4c�f�d�|jjD��|_|�|��dS)Nc��h|] }|j�� SrI��arg)rKr~s r3� <setcomp>z7DependenciesFinder.visit_FunctionDef.<locals>.<setcomp>�s��>�>�>��C�G�>�>�>r4)rkrA� generic_visitrOs r3�visit_FunctionDefz$DependenciesFinder.visit_FunctionDef�s6��>�>�t�y�~�>�>�>��� ���4� � � � � r4c�.���fd�}tj|j|j|jr|jgng|j��D]}��|���||j��|j���|j��||j ��dS)Nc��� �jrJ�d�_|D]}|���|��� d�_dS#d�_wxYw)NTF)r/rJ)�defaults�exprr0s �r3�visit_defaultsz:DependenciesFinder.visit_arguments.<locals>.visit_defaults�sp��� 8��:�:�:�:�26��/�$�)�)�D��'�� � �4�(�(�(��)�38��/�/�/��%��/�7�7�7�7s �,9� A) rhri� posonlyargsrk�vararg� kwonlyargsrJ� kw_defaults�kwargr�)r0rDr�r~s` r3�visit_argumentsz"DependenciesFinder.visit_arguments�s���� 8� 8� 8� 8� 8��?�4�#3�T�Y�QU�Q\�@d��� � �bd�fj�fu�v�v� � �C� �J�J�s�O�O�O�O���t�'�(�(�(� �:� !� �J�J�t�z� "� "� "���t�}�%�%�%�%�%r4c���|�|��}t|t��r|xjt |��zc_dS|j�|��dSr6)rJr$rrA�set�add)r0rD�targets r3�visitAssnTargetz"DependenciesFinder.visitAssnTarget�sd�����D�!�!�� �f�d� #� #� )� � � ��F� � � +� � � � � � � � �� (� (� (� (� (r4c��t|j��dkrtd���|�|jd��|�|��dS)N�z2Simultaneous multiple assignment is not supported.r)r�targets� TypeErrorr�r�rOs r3� visit_AssignzDependenciesFinder.visit_Assign�s^�� �t�|� � �� !� !� �P�Q�Q� Q� ���T�\�!�_�-�-�-� ���4� � � � � r4c�d�|�|j��|�|��dSr6�r�r�r�rOs r3�visit_AnnAssignz"DependenciesFinder.visit_AnnAssign��4�� ���T�[�)�)�)� ���4� � � � � r4c�d�|�|j��|�|��dSr6r�rOs r3� visit_ForzDependenciesFinder.visit_For�r�r4)rr)rRr\� __qualname__�__doc__r&�propertyr9rFrPrYrzr�r�r�r�r�r�� __classcell__�r2s@r3rrs�������� � �"0�"0�"0�"0�"0�"0�H�'�'��X�'����<6�6�6� '�'�'�+4�+4�+4�Z!�!�!� &�&�&�@)�)�)� !� !� !�!�!�!� !�!�!�!�!�!�!r4rrrqc��t|t��r|jSt|t��r|St |��Sr6)r$r<rRrq�repr)�tys r3� _normalize_tyr��s>���"�d�����{�� �B�� � ��� � ��8�8�Or4c��eZdZdZdd�Zed ���Zed ���Zed ���Zed ���Z ed ���Z e d���Z e d���Z dS)� KernelParamzBRepresents a parameter (name plus metadata) to a @jit'ed function.�numr�param�inspect.Parameter�do_not_specialize�boolc�0�||_||_||_dSr6)r��_paramr�)r0r�r�r�s r3r&zKernelParam.__init__�s������� �!2����r4c��|jjSr6)r�r'r8s r3r'zKernelParam.names ���{��r4c��|jjr|jjtjjkrdSt |jj��S)NrS)r�� annotationr]� Parameter�emptyr�r8s r3r�zKernelParam.annotations<���{�%� ���)?�7�CT�CZ�)Z�)Z��2��T�[�3�4�4�4r4c��|j}dD]@\}}||�|��t|��zd�}|r ||vr|�|��cS�A|dkrdSdS)N))�uint�u)r�ir��u1rS)r��findr)r0r��ty1�ty2�widths r3�annotation_typezKernelParam.annotation_type s���_� �5� '� '�H�C���z���s�3�3�c�#�h�h�>�?�?�@�E�� '�� �*�*���u���&�&�&�� �� � ��4��rr4c��d|jvS)N� constexpr)r�r8s r3� is_constexprzKernelParam.is_constexprs���d�o�-�-r4c�$�d|jvo|j S)N�const)r�r�r8s r3�is_constzKernelParam.is_consts���$�/�)�C�$�2C�.C�Cr4c��|jjSr6)r��defaultr8s r3r�zKernelParam.defaults ���{�"�"r4c�@�|jjtjjkSr6)r�r�r]r�r�r8s r3� has_defaultzKernelParam.has_default"s���{�"�g�&7�&=�=�=r4N)r�rr�r�r�r�)rRr\r�r�r&rr'r�r�r�r�r�r�r�rIr4r3r�r��s�������L�L�3�3�3�3� � � ��_� ��5�5��_�5� ����_���.�.��_�.��D�D��_�D��#�#��X�#��>�>��X�>�>�>r4r�c��t|d��r|���dzdkrdSt|t��r|dzdkrdS|dkrdSdS)N�data_ptr�r�Dr��1�N)�hasattrr�r$r)�vs r3�compute_spec_keyr�'sg���q�*����1�:�:�<�<�"�#4��#9�#9��s� �A�s� � �� ��F�a�K�K��3� �!�V�V��3� �3r4Fc��|�dSt|t��rdSt|t��rd|kr|dkrdSd|kr|dkrdSd St|t��rd S|j|f}t �|d��}|�P|d rd nd tt|d��� d��dz}|t |<|S)N�none�i1�������i32�������u64�i64�fp32r��*k�*r�.�����) r$r�rr �dtype� dtype2strrB�type_canonicalisation_dictrq�split)r~r��dsk�ress r3� mangle_typer�7s��� �{��v� �C�� � ���t� �C�� � �� �s�?�?�s�i�/�/��5� �c�\�\�c�Y�.�.��5��5� �C�� � � ��v��y�(�#���m�m�C��&�&�� �;��q�6�*�4�4�s�.H��S�QR�V���IZ�IZ�[^�I_�I_�`b�Ic�.d�d�C� �I�c�N�� r4c�"�eZdZUded<dd�ZdS)�KernelInterfacer�runrc������fd�S)z� A JIT function is launched with: fn[grid](*args, **kwargs). Hence JITFunction.__getitem__ returns a callable proxy that memorizes the grid. c�$���j|�dd�|��S)NF��grid�warmup)r�)rk�kwargsr�r0s ��r3�<lambda>z-KernelInterface.__getitem__.<locals>.<lambda>Ys ���x�t�x�$�T�%�'Y�'Y�RX�'Y�'Y�r4rI)r0r�s``r3� __getitem__zKernelInterface.__getitem__Ss���� Z�Y�Y�Y�Y�Yr4N)rr)rRr\r��__annotations__r�rIr4r3r�r�Ps9������� �F�F�F�Z�Z�Z�Z�Z�Zr4r�c��d�|���D��}ddl}||||���|j|d�}|�|��}|S)Nc�X�i|]'\}}||jjdkrt|��n|��(S�r�)r2rRrq�rKryrTs r3� <dictcomp>z1serialize_specialization_data.<locals>.<dictcomp>^s:��w�w�w�Wa�WZ�\a��E�O�$<��$G�$G�c�%�j�j�j�U�w�w�wr4r)r'� signature� constants�attrs�optionsry)�items�json�to_dict�__dict__�dumps) r'r�r�r�r�ryr�rr�serialized_objs r3�serialize_specialization_datar]sg��w�w�en�et�et�ev�ev�w�w�w�I��K�K�K��9�9�u�}�}������ � �C��Z�Z��_�_�N� �r4c ���t|j��t|��ksJ�g}g}g}g}g}g}t|j���|��D�]"\\}} } | jt jjur1|�|��|�d|�d|����n5|�|�d|����|�d|�d|����| j r|�|����|�|��| j s|�d|z��| j r|�d| j z����|�d|�d| j rdnd �d �����$d � d �||zD����} d � d �|D����}d � d�|D����}|�d��d� |��} d� |��} d| �d| �d| �d|�d|�d� }d�|j���D��}t|d<t|d<t!||��|dS)a2 Equivalent to sig.bind followed by apply_defaults. This generates a native Python function (using exec) which can be memoized on a per-kernel basis to avoid having to run these expensive functions -- which constitute much of the kernel launch overhead -- every time we run the kernel. �'z': z =default_zcompute_spec_key(%s)z"%s"z mangle_type(�, �True�False�)rSc��g|]}|dz��S�rrI�rK�xs r3rMz2create_function_from_signature.<locals>.<listcomp>�s��M�M�M�a��T��M�M�Mr4c��g|]}|dz��Sr rIr s r3rMz2create_function_from_signature.<locals>.<listcomp>�s��?�?�?�1�a�$�h�?�?�?r4c��g|]}|dz��Sr rIr s r3rMz2create_function_from_signature.<locals>.<listcomp>�s��!G�!G�!G�q�!�d�(�!G�!G�!Gr4z**excess_kwargszdef dynamic_func(z): return {z}, (z), (z), excess_kwargsc�Z�i|](\}}|jtjju�d|��|j��)S)�default_)r�r]r�r�)rKr'r�s r3r�z2create_function_from_signature.<locals>.<dictcomp>�sE����� �D�%� �=�� 1� 7� 7� 7� �4���5�=� 7� 7� 7r4r�r�� dynamic_func)r� parameters�zipr�r�r]r�r��appendr�r�r�r��joinr�r��exec)�sig�kparams� func_args� dict_entries�constexpr_vals�non_constexpr_vals�signature_types�specialisationsr'�sp�kprm�args_str�dict_str� func_body�func_namespaces r3�create_function_from_signaturer%hs��� �s�~� � �#�g�,�,� .� .� .� .��I��L��N����O��O���� 4� 4� 6� 6��@�@�k�k���$��R� �:��*�0� 0� 0� � � �T� "� "� "� � � � 3�D� 3� 3�T� 3� 3� 4� 4� 4� 4� � � ��5�5�t�5�5� 6� 6� 6� � � � 3�D� 3� 3�T� 3� 3� 4� 4� 4� �?� k� � !� !�$� '� '� '� '� � %� %�d� +� +� +��'� F��&�&�'=��'D�E�E�E��!� k��&�&�v��0B�'B�C�C�C�C��&�&�&����PR�P[�Fh�f�f�ah�Fh�Fh�'i�j�j�j�j����M�M�?�_�+L�M�M�M�N�N�I��W�W�?�?��?�?�?�@�@�N����!G�!G�4F�!G�!G�!G�H�H�� ���&�'�'�'��y�y��#�#�H��y�y��&�&�H�H����(�(�(�I�I�I�~�~�~�7I�7I�7I�K�I����>�/�/�1�1����N� %0�N�=�!�)9�N�%�&� ��N�#�#�#� �.� )�)r4r�r�� float8e4nv�fp8e4nv�float8e5�fp8e5� float8e4b15�fp8e4b15� float8_e4m3fn� float8e4b8�fp8e4b8�float8_e4m3fnuz� float8_e5m2� float8e5b16�fp8e5b16�float8_e5m2fnuz�float16�fp16�bfloat16�bf16�float32r��float64�fp64�int8�i8�int16�i16�int32r�r��u8�u16�u32r�)�int64�uint8�uint16�uint32�uint64c����eZdZdZdZed���Zed���Zd�Zedd���Z d�Z d �Z d �Z d �Z d �Z dd �Zed���Zd�Zd�Zd�Zd�Z�fd�Zd�Z�xZS)rCNr�c�>�t|d��r|jSt|t��rdSt|t��rd|kr|dkrdSd|kr|dkrdSd St|t ��rd S|�dSt d t|���d |�����) Nr�r�r�r�r�r�r�r�r�r�zUnsupported type z for )r�r�r$r�rr r�r<r}s r3�_key_ofzJITFunction._key_of�s��� �3�� � � G��9� � ��T� "� "� G��4� ��S� !� !� G��3���3�)�#3�#3��u��#���#��"2�"2��u��u� ��U� #� #� G��6� �[��4��E��S� � �E�E��E�E�F�F� Fr4c��t|d��r%|���tjzdkSt |t ��r|dzdk|dkfS|dufS)Nr�rr�r��r�r�rC� divisibilityr$rr}s r3�_spec_ofzJITFunction._spec_of�sd�� �3� � #� #� -��<�<�>�>�K�$<�<��A� A� ��S� !� !� -��"�H��M�3�!�8�,� ,��t� ��r4c����ddlm}d���fd�t|j|��D��}d�t|j|��D��}|t |��t |����S)Nr)�AttrsDescriptorc���t|d��r%|���tjzdkSt |t ��r|tjzdkS|�dSdS)Nr�rTFrL)r s r3�is_divisible_by_16z3JITFunction._get_config.<locals>.is_divisible_by_16�sc���q�*�%�%� 9��z�z�|�|�k�&>�>�!�C�C��A�s�#�#� 9��;�3�3�q�8�8��y��t��5r4c�H��h|]\}}�|���|j�|j��SrI)r�r�)rKr�r~rRs �r3rz*JITFunction._get_config.<locals>.<setcomp>�sM��� � � ���s�!�!�#�&�&� �05�/F� � �I� � � r4c��h|]C\}}t|t��r)t|t��s|dk�5|j�<|j��DS)r�)r$rr�r�r�)rKr�r~s r3rz*JITFunction._get_config.<locals>.<setcomp>�sa�� � � ���s��#�s�#�#� �-7�s�D�,A�,A� �GJ�Q�h�h�W\�Wn�h� �I�FN�h�hr4)�compilerrPr�params�tuple)r0rkrP�divisible_by_16� equal_to_1rRs @r3� _get_configzJITFunction._get_config�s����.�.�.�.�.�.� � � � � � � �!�$�+�t�4�4� � � ��  � �!�$�+�t�4�4� � � � ���u�_�5�5�u�Z�7H�7H�I�I�Ir4Fc��|�dSt|t��r|St|���d��d}t|}|rdnd}||zS)N�*i8r�r�r�r�)r$rqr�r�)ryr�� dtype_str� const_strs r3�_type_ofzJITFunction._type_ofsd�� �;��5� ��S� !� !� ��J���H�H�N�N�3�'�'��+� �.�y�9� �$�-�D�D�#� ��9�$�$r4c�J�tt|j|����}|Sr6)�dictr� constexprs)r0� constexpr_keyr�s r3�_make_constantszJITFunction._make_constantss!����T�_�m�<�<�=�=� ��r4c ��tj�dS|jj}|jj}d�d�t |j|d��D����} |�d|j�d|j �d|j �d|j �d | �d � } Gd �d ��} t||||d ||��} ||||j|j |j |j |j || d� } t�|| | |||��d|i| �dd���S)NFrc�,�g|]\}}|j�d|����S)z: �r')rKr�r�s r3rMz*JITFunction._call_hook.<locals>.<listcomp>"s,��_�_�_����%�*�4�4��4�4�_�_�_r4r�z [num_warps=z , num_ctas=z , num_stages=z, enable_fp_fusion=�](rc��eZdZd�ZdS)�/JITFunction._call_hook.<locals>.JitFunctionInfoc�0�||_||_||_dSr6)rar'� jit_function)r0rar'rls r3r&z8JITFunction._call_hook.<locals>.JitFunctionInfo.__init__'s��$�� � �� �$0��!��r4N)rRr\r�r&rIr4r3�JitFunctionInforj%s#������ � � � � r4rmr) r��devicer�� num_warps�num_ctas� num_stages�enable_fp_fusion� extern_libs�configs�specialization_datary)ryr��fn�compile�is_manual_warmup�already_compiled)rC� cache_hookrvrRr\rrrVrorprqrrrrs)r0ryr�rnr�r�rtr'ra� arg_reprsr�rmrur�s r3� _call_hookzJITFunction._call_hooks��� � !� )��5��w�����#���I�I�_�_�c�$�+�WZ�[\�W]�F^�F^�_�_�_�`�`� ��p�p�7�#4�p�p��AQ�p�p�`g�`r�p�p�HO�H`�p�p�dm�p�p�p�� � � � � � � � �<�D�)�Y�X_�`a�Xb�dk�mp�q�q��#��"� �*��(�!�,� '� 8�"�.��#6�  �  ���%�%�����v�t�T�2�2��C�*�6�*�"�"� &� � � r4c�\�t|��sJ�|j�|��dS)z� Add a hook that will be executed prior to the execution of run function with args and kwargs passed into the kernel N)�callable� pre_run_hooksr)r0�hooks r3�add_pre_run_hookzJITFunction.add_pre_run_hookEs2�� ��~�~���~� ��!�!�$�'�'�'�'�'r4c�f�ddlm}m}m}m}||_||_||_||_t |j|j��|_d�t|j��D��|_ d�t|j��D��|_ d�t|j��D��|_ dS)z1 Precompute as much as possible. r)�CompiledKernelrw� ASTSource� make_backendc�&�g|]\}}|j� |��SrI�r��rKr��ps r3rMz-JITFunction.create_binder.<locals>.<listcomp>Ws#��![�![�![���A�A�N�![�!�![�![�![r4c�&�g|]\}}|j� |��SrIr�r�s r3rMz-JITFunction.create_binder.<locals>.<listcomp>Xs%��%c�%c�%c�F�Q��TU�Tb�%c�a�%c�%c�%cr4c�4�g|]\}}|j� |j�|��SrI)r�r�r�s r3rMz-JITFunction.create_binder.<locals>.<listcomp>Ys?��$ �$ �$ ��1�a�1�;N�$ �YZ�Yg�$ � �$ �$ �$ r4N) rUr�rwr�r�r%r�rV�binder� enumerate�constexpr_indices�non_constexpr_indices�specialised_indices)r0r�rwr�r�s r3� create_binderzJITFunction.create_binderMs��� P�O�O�O�O�O�O�O�O�O�O�O�,����� �"���(���4�T�^�T�[�Q�Q�� �![�![�)�D�K�2H�2H�![�![�![���%c�%c�i�� �6L�6L�%c�%c�%c��"�$ �$ �%�d�k�2�2�$ �$ �$ �� � � r4c �<��&�tj���}tj�|��}�j|d<�jD] }||i|��� �j������j|i|��\}} } } } d�| ��t| | f��z} �j |� | d��}|���tj� ��}�� |��}|�|��}d|vs Jd���d|vs Jd���d|vs Jd���| D]}||jvrt!d |z����t#|�����}�fd ��jD��}| dt)|���}d �t+||��D��}�j|�f�&�&fd �t+|�j��D��}|���D]'\}}t3|��rt5d |�d�����(��| ||||�&��rdS���||�&d��}��|||j���}|�j || <t=��}�j���D]?\\}}\}}|� ||��x} |krtAd|�d|�d| ������@|s�|�J�t3|��r ||��}t)|��}!|d}"|!dkr|dnd}#|!dkr|dnd}$|j!||g| �R�}%|j"|"|#|$||j#|j$|%�j%j&�j%j'g | �R�|S)N�debugrS� device_typez=device_type option is deprecated; current target will be usedrnz8device option is deprecated; current device will be used�streamz8stream option is deprecated; current stream will be usedz2Keyword argument %s was specified but unrecognisedc�4��g|]}�j|j��SrI)rVr')rKr�r0s �r3rMz#JITFunction.run.<locals>.<listcomp>�s"���O�O�O�q�t�{�1�~�*�O�O�Or4c�*�i|]\}}||dkrdn|��S)r�r\rI)rKrtr�s r3r�z#JITFunction.run.<locals>.<dictcomp>�s*��`�`�`���A��q�F�{�{�U�U��`�`�`r4c�`��i|]*\}}|js|j�djvs|�"|j|��+S)r)r�r�rYr')rKr�r�rts �r3r�z#JITFunction.run.<locals>.<dictcomp>�sM�������Q���>��&'�U�g�a�j�.C�%C�%C�q�y����GP�y�yr4zCallable constexpr at index z is not supportedr)r�r�rfz1 has changed since we compiled this kernel, from z to r�r)(r�active�get_current_device�get_current_streamr�rr�r�rrq�cacherB�get_current_targetr�� parse_optionsr��KeyErrorrW�valuesr�rrrZrVr�r~r�r|r�rw�objectr.ro�launch_metadatar��function�packed_metadatar��launch_enter_hook�launch_exit_hook)'r0r�r�rkr�rnr�r�� bound_args� sig_and_specrr� excess_kwargsry�kernelr��backendr�rt� bound_vals�sigkeys�sigvalsr�r�r�r~r1� not_presentr'�globals_dict_idrE� globals_dict�newVal� grid_size�grid_0�grid_1�grid_2r�rts'` @r3r�zJITFunction.run]s�������1�1�3�3����1�1�&�9�9���*��w���&� "� "�D� �D�$� !�&� !� !� !� !� �;� � � � � � � �Va�VZ�Va�cg�Vr�kq�Vr�Vr�S� �L�.�2D�m��g�g�l�#�#�c�>�=�*I�&J�&J�J����F�#�'�'��T�2�2�� �>��]�5�5�7�7�F��'�'��/�/�G��+�+�F�3�3�G�!��.�.�.�0o�.�.�.��6�)�)�)�+e�)�)�)��6�)�)�)�+e�)�)�)�"� ]� ]���G�,�,�,�"�#W�Z[�#[�\�\�\�-��z�0�0�2�2�3�3�J� P�O�O�O�D�4N�O�O�O�G�"�=�C��L�L�=�1�G�`�`�#�g�W^�J_�J_�`�`�`�I�'�t�'��4�7�G�����!�*�d�k�:�:����I� $�/�/�+�+� Y� Y���3��C�=�=�Y�#�$W�1�$W�$W�$W�X�X�X�Y����s�I�v�y�'�7�S�S� ��t��.�.��y�)�W�Q�Z�H�H�C��\�\����(�"���F� '-�D�J�v� �s� #��h�h� �<@�<Q�<W�<W�<Y�<Y� q� q� 8� #�T�?�%8�c�<�&�*�*�4��=�=�=��#�E�E�"�o�t�o�o�^a�o�o�gm�o�o�q�q�q�F�� y��#�#�#���~�~� (��t�J�'�'���D� � �I��!�W�F� )�A� � �T�!�W�W�1�F� )�A� � �T�!�W�W�1�F�5�f�4�T�6�W�DV�W�W�W�O� �F�J�v�v�v�v�v���H^�`o��*�<�d�>Q�>b� y�ew� y� y� y� y�� r4c�L���|r|ng}�|_�j|_||_t j���|_||_t j���d|_��fd�|_ ||_ d|_ g|_ t|jj�����D]=\}} |o ||vp| j|v} |j �t%|| | �����>t'jt j�����|_|jt/jd|jt.j�����d�|_t7t8��|_d|_i|_d|_ tBj"�#dd��dkrdn||_$||_%d�|j D��|_&d �|j D��|_'g|_(�j)|_)�j*|_*�j+|_+�j|_dS) Nr�c�,�����jn �|��Sr6)rR)rvrvr�s ��r3r�z&JITFunction.__init__.<locals>.<lambda>�s���T�\�b�k�k�t�t�A�w�w�r4z^def\s+\w+\s*\(� TRITON_DEBUG�0r�Tc��g|] }|j�� SrIrg�rKr�s r3rMz(JITFunction.__init__.<locals>.<listcomp>�s��6�6�6�Q�!�&�6�6�6r4c�*�g|]}|j� |j��SrI)r�r�r�s r3rMz(JITFunction.__init__.<locals>.<listcomp>�s!��H�H�H�Q���H�1�5�H�H�Hr4),rvr\ra�versionr]r�r��getsourcelines�starting_line_numberr�r�r�rVr�rr�r'rr��textwrap�dedent� getsourcer1�re�search� MULTILINE�startrrar��hashr.r��os�environrBr�rg� arg_namesrbrr�rR� __globals__) r0rvr�r�r�rgr�r�r�r��dnss ` ` r3r&zJITFunction.__init__�s����1B�J�-�-�������m�� ��� � �*�2�.�.���!2���$+�$:�2�$>�$>�q�$A��!�F�F�F�F�F�� �.����� ��� �!�$�.�";�"B�"B�"D�"D�E�E� ;� ;�H�A�u�#�c��.?�)?�)b�5�:�Qb�Cb�C� �K� � �{�1�e�S�9�9� :� :� :� :��?�7�#4�R�#8�#8�9�9����8�B�I�&8�$�(�B�L�Q�Q�W�W�Y�Y�Z�Z�[��� ��&�&�� ��� �TV����� ��Z�^�^�N�C�@�@�C�G�G�T�T�U�� � �� �7�6�$�+�6�6�6���H�H�$�+�H�H�H��� ����z�� �� �� ��>����-����r4c�`�|j��t|j|j|j���}|�|�����|jt|j ��z|_tt|j � ������|_ |jS)N)r'r,r1)r�rrRr�r1rJ�parser9rqr�ra�sortedr.r�)r0�dependencies_finders r3rmzJITFunction.cache_key�s��� �9� �"4�$�-�QU�Qa�gk�go�"p�"p�"p� � � %� %�d�j�j�l�l� 3� 3� 3�+�/�#�d�6O�2P�2P�P�D�I�$(��0C�0T�0Z�0Z�0\�0\�)]�)]�$^�$^�D� !��y�r4c�R�|jttj|��|dd�|��S)NTr�)r�rj� MockTensor� wrap_dtype)r0r�rkr�s r3r�zJITFunction.warmup�s.���t�x��Z�5J�D�1Q�1Q�T�$�\�\�U[�\�\�\r4c ���ddlm}m}m}ddl}ddlm�tj� ��}|� |��}|d|j j kr%td|d�d|j j ������fd�|d���D��}t|d �����} ||| ||�|d ����} d �|d ���D��} |d } || d| ��} | |j|| <| S)Nr)rPrwr�rr'zSpecialization data is for z but trying to preload for c�z��i|]7\}}|�j�|��r��|��n|��8SrI)r��is_dtype)rKryrT�tls �r3r�z'JITFunction.preload.<locals>.<dictcomp>sR��� � � ���U� �B�H�$5�$5�e�$<�$<�G����%����%� � � r4r�r�r�c�b�i|],\}}|t|t��rt|��n|��-SrI)r$rrWr�s r3r�z'JITFunction.preload.<locals>.<dictcomp> sG�� � � ���U� ��E�4�!8�!8�C��u����e� � � r4r�ry)rUrPrwr�r��triton.language�languagerr�r��loadsrvrRror�ra� from_dictr�)r0rurPrwr�r�rn�deserialized_objr�r�r1r�ryr�r�s @r3�preloadzJITFunction.preload�s����B�B�B�B�B�B�B�B�B�B�� � � �$�$�$�$�$�$���1�1�3�3���:�:�&9�:�:�� �F� #�t�w�'7� 7� 7��u�.>�v�.F�u�u�cg�cj�cs�u�u�w�w� w� � � � �.�{�;�A�A�C�C� � � � ��)�+�6�<�<�>�>�?�?� ��i��i��O�4M�4M�N^�_f�Ng�4h�4h�i�i�� � �.�y�9�?�?�A�A� � � ���u�%�����d�G�,�,��"(�� �6��3��� r4c���tj|j��}t|tj��sJ�t |j��dksJ�t|jdtj��sJ�|S)Nr�r)r>r�r1r$�Moduler�body� FunctionDef)r0�trees r3r�zJITFunction.parsesg���y���"�"���$�� �+�+�+�+�+��4�9�~�~��"�"�"�"��$�)�A�,���8�8�8�8�8�� r4c� �td���)Nz:Cannot call @triton.jit'd outside of the scope of a kernel)ro)r0rkr�s r3�__call__zJITFunction.__call__ s���W�X�X�Xr4c�x��tt|���||��|dkr d|_dSdS)Nr1)r%rC� __setattr__r�)r0r'rTr2s �r3r�zJITFunction.__setattr__#s@��� �k�4� � �,�,�T�5�9�9�9� �5�=�=��D�I�I�I� �=r4c�2�d|j�d|jj�d�S)Nz JITFunction(�:r)rarvrRr8s r3�__repr__zJITFunction.__repr__*s"��?�d�k�?�?�D�G�,<�?�?�?�?r4�F)NNNNNN)rRr\r�rzrM� staticmethodrJrNrZr_rdr|r�r�r�r&r�rmr�r�r�r�r�r�r�r�s@r3rCrC�s���������J��L��G�G��\�G�&����\��J�J�J�8� %� %� %��\� %����/ �/ �/ �b(�(�(� � � � X�X�X�tbf�!%�8(�8(�8(�8(�t����X��]�]�]����6���Y�Y�Y������@�@�@�@�@�@�@r4rCrv�JITFunction[T]c��dSr6rI)rvs r3�jitr�3s���Cr4�r�r�r�r�r�rgr��Optional[Callable]r�r��Optional[Iterable[int]]r��Optional[bool]rg�Callable[[T], JITFunction[T]]c��dSr6rIr�s r3r�r�8s ���Cr4� Optional[T]�4Union[JITFunction[T], Callable[[T], JITFunction[T]]]c�B�������d������fd� }|� ||��S|S)a< Decorator for JIT-compiling a function using the Triton compiler. :note: When a jit'd function is called, arguments are implicitly converted to pointers if they have a :code:`.data_ptr()` method and a `.dtype` attribute. :note: This function will be compiled and run on the GPU. It will only have access to: * python primitives, * builtins within the triton package, * arguments to this function, * other jit'd functions :param fn: the function to be jit-compiled :type fn: Callable rvrrr�c ���t|��sJ�tjdd��dkrddlm}||��St |���������S)N�TRITON_INTERPRETr�r�r�)�InterpretedFunction)r�r�r�rgr�r�)r~r��getenv� interpreterr�rC)rvr�r�r�r�rgr�r�s ������r3� decoratorzjit.<locals>.decoratoras{�����|�|���|� �9�'�� -� -�� 4� 4� 8� 8� 8� 8� 8� 8�&�&�r�*�*� *����"3��!�� /���� r4N�rvrrr�rI)rvr�r�r�r�r�rgr�s `````` r3r�r�Es\��������8�����������  �~��y��}�}���r4c�D�eZdZdZed���Zd�Zed���ZdS)r�zr Can be used in place of real tensors when calling: kernel.warmup(MockTensor(torch.float32), ...) c�Z�|jjdkr|jdkrt|��S|S)Nr��torch)r2rRr\r�r}s r3r�zMockTensor.wrap_dtype�s/�� �=� !�W� ,� ,���7�1J�1J��c�?�?� "�� r4c��||_dSr6r�)r0r�s r3r&zMockTensor.__init__�s ���� � � r4c��dS)NrrIrIr4r3r�zMockTensor.data_ptr�s���qr4N)rRr\r�r�r�r�r&r�rIr4r3r�r�}sc�������� ����\�� �������\���r4r�c�@�eZdZd�Zd�Zd�Zd d�Zd�Zd�Zd �Z d �Z d S) � TensorWrapperc�t�||_||_|j|_|j|_|jj|_dSr6)r��base�datarn�shape)r0r r�s r3r&zTensorWrapper.__init__�s1���� ��� ��I�� ��k�� ��Y�_�� � � r4c�4�|j���Sr6)r r�r8s r3r�zTensorWrapper.data_ptr�s���y�!�!�#�#�#r4c�6�|j�|��Sr6)r �stride)r0r�s r3rzTensorWrapper.stride�s���y����"�"�"r4rrqc�(�d|j�d|j�d�S)NzTensorWrapper[rhr)r�r r8s r3�__str__zTensorWrapper.__str__�s��:�� �:�:�d�i�:�:�:�:r4c�4�|j���Sr6)r � element_sizer8s r3rzTensorWrapper.element_size�s���y�%�%�'�'�'r4c�Z�t|j���|j��Sr6)rr �cpur�r8s r3rzTensorWrapper.cpu�s���T�Y�]�]�_�_�d�j�9�9�9r4c�D�|j�|j��dSr6)r �copy_)r0�others r3rzTensorWrapper.copy_�s�� � ���� �#�#�#�#�#r4c�\�t|j�|��|j��Sr6)rr �tor�)r0rns r3rzTensorWrapper.to�s"���T�Y�\�\�&�1�1�4�:�>�>�>r4N�rrq) rRr\r�r&r�rrrrrrrIr4r3rr�s�������%�%�%�$�$�$�#�#�#�;�;�;�;�(�(�(�:�:�:�$�$�$�?�?�?�?�?r4rc��t|t��r,||jjkr|jSt|j|��St |d��rt||��St dt |���d����)Nr�zCannot reinterpret a r�)r$rr r�r�r�r<)�tensorr�s r3� reinterpretr�s����&�-�(�(� A� �F�K�%� %� %��;� �!���e�4�4� 4� ��� $� $�A��V�U�+�+�+��?��V� � �?�?�?�@�@�@r4rr�r) r�r�r�r�r�r�r�r�rgr�rr�r6)rvr�r�r�r�r�r�r�r�r�rgr�rr�)4� __future__rrr>r(r]rhr�r�r�� collectionsr� functoolsr�typingrrr r r r r rrr�runtime.driverr�typesrrRrrVr� NodeVisitorrr�r�r�r�r�r�rr%r�rr�r�rCr�r�rrrIr4r3�<module>r%sy��,�,�,�,�,�,�,�,� � � � ������������� � � � � � � � �����#�#�#�#�#�#�%�%�%�%�%�%�d�d�d�d�d�d�d�d�d�d�d�d�d�d�d�d�d�d�d�d�d�d�d�d�#�#�#�#�#�#��������.�3�3�~�.�.�.�.�/� � �G�C�L�L��Q!�Q!�Q!�Q!�Q!���Q!�Q!�Q!�r����+>�+>�+>�+>�+>�+>�+>�+>�\ � � � � �����2 Z� Z� Z� Z� Z�g�a�j� Z� Z� Z����>*�>*�>*�B� �D���)������:� � �Y� � �)� ��y���7���:���z���v������v���v�� �D��  �U�!�" �U�#�$� ����-����2 ��(�/�/�1�1� 2� 2�&�&�A�$%��q�!�!�e@�e@�e@�e@�e@�/�!�$�e@�e@�e@�Z  ���� ��� � �#�*.�15� �#� � � � � � �� ��0� �#�*.�15� �#�0�0�0�0�0�0�p��������(?�?�?�?�?�?�?�?�> A� A� A� A� Ar4
Memory