� 2��g���L�ddlmZddlmZmZmZmZmZddlm Z ddl m Z ddl m Z ed��ZGd �d e��Zd�d�Zd�d�Zd�d�Zd�d�Zd�d�Z d�d�d$�Zd�d'�Zd�d(�Zd�d)�Zd�d*�Zd�d+�Zd�d-�Zd�d.�Zd�d3�Zd�d4�Zd�d7�Z d�d8�Z!d�d9�Z"d�d:�Z#d�d;�Z$d�d<�Z%d�d=�Z&d�d>�Z'd�d?�Z(d�d@�Z)d�dA�Z*d�dB�Z+d�dC�Z,d�dD�Z-d�dG�Z.d�dH�Z/d�dI�Z0d�dJ�Z1d�dK�Z2d�dL�Z3d�dM�Z4d�dP�Z5d�dT�Z6d�dV�Z7d�dY�Z8d�dZ�Z9d�d[�Z:d�d^�Z;d�d_�Z<d�db�Z=d�dc�Z>d�dd�Z?d�dg�Z@d�di�ZA d�d�dl�ZBdm�ZCdn�ZDdo�ZEdp�ZFdq�ZGdr�ZHds�ZIdt�ZJdu�ZKd�d��ZLd�d��ZMd�d��ZNd��ZOd��ZPd�d��ZQd�d��ZRd�d��ZSd�d��ZTd�d��ZUd�d��ZVd�d��ZWd�d��ZXd�d��ZYd�d��ZZd��Z[d�d��Z\d�d��Z]d��Z^d�d��Z_d�d��Z`d�d��Zad�d��Zbd�d��Zcd�d��Zdd�d��Zed�d��Zfd�d��Zgd��Zhd�d��Zid�d��Zjd�d��ZkdjS)��)� annotations)�List�Optional�Sequence�Tuple�TypeVar�)�ir�)�core)�math�Tc���eZdZ�fd�Z�xZS)�IncompatibleTypeErrorImplc����||_||_d|j���zdz|j���z|_t t |���|j��dS)Nzinvalid operands of type � and )�type_a�type_b�__repr__�message�superr�__init__)�selfrr� __class__s ��h/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/triton/language/semantic.pyrz"IncompatibleTypeErrorImpl.__init__sl����� ��� �2�T�[�5I�5I�5K�5K�K�g�U�X\�Xc�Xl�Xl�Xn�Xn�n�� � �'��.�.�7�7�� �E�E�E�E�E�)�__name__� __module__� __qualname__r� __classcell__)rs@rrr sA�������F�F�F�F�F�F�F�F�Frr�axis�int�builder� ir.builder�return� tl.tensorc��|dvrtd|�����tj|�|��tj��S)N�rr r z+program_id axis must be 0, 1, or 2 but got )� ValueError�tl�tensor�create_get_program_id�int32�r!r#s r� program_idr/sF�� �9����M�t�M�M�N�N�N� �9�W�2�2�4�8�8�"�(� C� C�Crc��|dvrtd|�����tj|�|��tj��S)Nr(z-num_programs axis must be 0, 1, or 2 but got )r)r*r+�create_get_num_programsr-r.s r� num_programsr2 sF�� �9����O��O�O�P�P�P� �9�W�4�4�T�:�:�B�H� E� E�Er�a_ty�tl.dtype�b_tyc��|j}|j}|j}|j}||kr ||kr|n|S|tjjjkr ||kr|n|S|tjjjkr ||kr|n|St d|�d|�����)Nzunexpected signedness r)� int_bitwidth�int_signednessr*�dtype� SIGNEDNESS�UNSIGNED� TypeError)r3r5�a_rank�b_rank�a_sn�b_sns r�integer_promote_implrA+s��� � �F� � �F� � �D� � �D� �t�|�|�����t�t�D�0� ���$�-� -� -���'�'�t�t�T�1� ���$�-� -� -���'�'�t�t�T�1� �>�T�>�>��>�>� ?� ?�?r� div_or_mod�boolc�~�|���s|���r tjS|���s|���r tjS|���s|���r|r tjStjS|���s|���rN|r tjS|���r |���r tjStjS|� ��r|� ��std|�d|�����|rO|j |j kr?td|� ��zdz|� ��zdz���t||��S)N�unexpected type rzCannot use /, #, or % with �x because they have different signedness;this is unlikely to result in a useful answer. Cast them to the same signedness.)�is_fp64r*�float64�is_fp32�float32�is_fp16�float16�is_bf16�bfloat16�is_intr<r8rrA)r3r5rBs r�computation_type_implrP;s��� �|�|�~�~���������z�� �|�|�~�~���������z�� �|�|�~�~�������� � ��:� ��:� � �|�|�~�~�������� � ��:� � �<�<�>�>� �d�l�l�n�n� ��;� ��z�� �;�;�=�=�>�� � � � �>��<�4�<�<�d�<�<�=�=�=��l�d�)�T�-@�@�@��5�� � ���G�'�Q�TX�Ta�Ta�Tc�Tc�c�k�k�l�l� l� ��d� +� +�+rrr� allow_ptr_a�Nonec���|���r`|st||���|���r||krt||���|���rt||���dSdS�N)�is_ptrr� is_floating)rrrQs r�check_ptr_type_implrWcs��� �}�}���<�� <�+�F�F�;�;� ;� �=�=�?�?� <��&� 0� 0�+�F�F�;�;� ;� � � � � � <�+�F�F�;�;� ;�<�<� <� <rFT�lhs�rhs�Tuple[tl.tensor, tl.tensor]c�`�t|||��\}}|jj}|jj}t|||��t|||��|r[|���sG|���s3t |||��} t || |��}t || |��}||fSrT)�broadcast_impl_value�type�scalarrWrUrP�cast) rXrYr#� allow_lhs_ptr� allow_rhs_ptr�arithmetic_checkrB� lhs_sca_ty� rhs_sca_ty� ret_sca_tys r�binary_op_type_checking_implrfos���$�C��g�6�6�H�C�����J����J�� �J� �>�>�>�� �J� �>�>�>��-� � 1� 1� 3� 3�-�J�<M�<M�<O�<O�-�*�:�z�:�N�N� ��3� �G�,�,���3� �G�,�,�� ��8�Or�input�otherc�@�t|||dd��\}}|jj}|jj}|���r#|���rt d���|���r0|���s||}}|jj}|jj}|���r8t j|�|j|j��|j��S|� ��r8t j|� |j|j��|j��S|� ��r8t j|� |j|j��|j��St d|�����)NTzcannot add pointers togetherrE) rfr]r^rUr<r*r+� create_addptr�handlerV� create_faddrO� create_add�rgrhr#�input_scalar_ty�other_scalar_tys r�addrq�sy��/��u�g�t�T�R�R�L�E�5��j�'�O��j�'�O������8�O�$:�$:�$<�$<�8��6�7�7�7������,��(>�(>�(@�(@�,��e�u���*�+���*�+�������U��y��.�.�u�|�U�\�J�J�E�J�W�W�W� � $� $� &� &�U��y��,�,�U�\�5�<�H�H�%�*�U�U�U� � � � !� !�U��y��+�+�E�L�%�,�G�G���T�T�T� �8��8�8� 9� 9�9rc �N�t|||dd��\}}|jj}|���rFt j|�|jt||��j��|j��S|� ��r8t j|� |j|j��|j��S|� ��r8t j|� |j|j��|j��Std|�����)NTFrE)rfr]r^rUr*r+rjrk�minusrV� create_fsubrO� create_subr<�rgrhr#� scalar_tys r�subrx�s��/��u�g�t�U�S�S�L�E�5�� �!�I������h��y��.�.�u�|�U�5�'�=R�=R�=Y�Z�Z�\a�\f�g�g�g������U��y��,�,�U�\�5�<�H�H�%�*�U�U�U� � � � � �U��y��+�+�E�L�%�,�G�G���T�T�T� �2�y�2�2� 3� 3�3rc��t|||��\}}|jj}|���r8t j|�|j|j��|j��S|���r8t j|� |j|j��|j��Std|������NrE) rfr]r^rVr*r+� create_fmulrkrO� create_mulr<rvs r�mulr}�s���/��u�g�F�F�L�E�5�� �!�I������U��y��,�,�U�\�5�<�H�H�%�*�U�U�U� � � � � �U��y��+�+�E�L�%�,�G�G���T�T�T� �2�y�2�2� 3� 3�3rc �V�t|||dddd��\}}|jj}|jj}|���r'|���rt |||��}�n|���r&|���rt |||��}n�|���rK|���r7t |t j|��}t |t j|��}nn|���rH|���r4|j|jkrt |||��}n$t |||��}ntd|�����t j |� |j |j ��|j��S�NFTrE) rfr]r^rVrOr_r*rJ�fp_mantissa_widthr<r+� create_fdivrkrns r�truedivr��s���/��u�g�u�e�UY�[_�`�`�L�E�5��j�'�O��j�'�O��"�"�$�$�>��)?�)?�)A�)A�>��U�O�W�5�5��� � � � !� !�>�o�&A�&A�&C�&C�>��U�O�W�5�5��� � � � !� !� >�o�&<�&<�&>�&>� >��U�B�J��0�0���U�B�J��0�0��� � $� $� &� &�>�?�+F�+F�+H�+H�>� � ,��/P� P� P�����9�9�E�E�����9�9�E�E��<�?�<�<�=�=�=� �9�W�(�(���u�|�D�D�e�j� Q� Q�Qrc �B�t|||dddd��\}}|jj}|jj}|���r�|���r�t ||��}t |||��}t |||��}|���r8tj|� |j |j ��|j��Stj|� |j |j ��|j��Std|�����r) rfr]r^rOrAr_� is_int_signedr*r+� create_sdivrk� create_udivr<)rgrhr#rorp�ret_tys r�floordivr��s ��/��u�g�u�e�UY�[_�`�`�L�E�5��j�'�O��j�'�O������Z�O�$:�$:�$<�$<�Z�%�o��G�G���U�F�G�,�,���U�F�G�,�,�� � � � !� !� Z��9�W�0�0���u�|�L�L�e�j�Y�Y� Y��9�W�0�0���u�|�L�L�e�j�Y�Y� Y� �8��8�8� 9� 9�9r� ieee_roundingc �D�|jj}|jj}|���r|���std���t |||dddd��\}}|�|j|j��}tj||j��S)Nz4both operands of fdiv must have floating scalar typeFT) r]r^rVr<rfr�rkr*r+)rgrhr�r#rorp�rets r�fdivr��s����j�'�O��j�'�O� � &� &� (� (�P��0K�0K�0M�0M�P��N�O�O�O�/��u�g�u�e�UZ�\`�a�a�L�E�5� � � �e�l�E�L� 9� 9�C� �9�S�%�*� %� %�%rc ��t|||dddd��\}}|jj}|jj}|���rFt |t t jt||d|��|���||��|��}|S|� ��r�|j |j kr?td|� ��zdz|� ��zdz���|� ��r8tj|�|j|j��|j��Stj|�|j|j��|j��Std|�����)NFT��_builderz Cannot mod z by rFrE)rfr]r^rVrxr}r �floorr�rOr8r<rr�r*r+� create_sremrk� create_urem)rgrhr#rwrpr�s r�modr��s|��/��u�g�u�e�UY�[_�`�`�L�E�5�� �!�I��j�'�O������ Z��%��T�Z��U�E�5�'�(J�(J�U\�]�]�]�_d�fm�n�n�pw�x�x��� � � � � � �Z� � #��'E� E� E��M�I�,>�,>�,@�,@�@�6�I�O�Ld�Ld�Lf�Lf�f�jo�o�p�p� p� � "� "� $� $� Z��9�W�0�0���u�|�L�L�e�j�Y�Y� Y��9�W�0�0���u�|�L�L�e�j�Y�Y� Y� �2�y�2�2� 3� 3�3r�x�y� propagate_nan�tl.PropagateNanc� �t|||��\}}|j}|���r�|tjjkr8tj|�|j|j��|j ��S|tjj kr8tj|� |j|j��|j ��Std|�����|� ��r8tj|�|j|j��|j ��S|���r8tj|�|j|j��|j ��St#d|������NzUnexpected propagate_nan �Unexpected dtype )rfr9rVr*� PropagateNan�ALLr+�create_minimumfrkr]�NONE�create_minnumfr)r�� create_minsi�is_int_unsigned� create_minuir<�r�r�r�r#r9s r�minimumr��H�� '��1�g� 6� 6�D�A�q� �G�E� ����� 5� �B�O�/� /� /��9�W�4�4�Q�X�q�x�H�H�!�&�Q�Q� Q� �b�o�2� 2� 2��9�W�3�3�A�H�a�h�G�G���P�P� P��H��H�H�I�I� I� � � � � �5��y��-�-�a�h���A�A�1�6�J�J�J� � � � � �5��y��-�-�a�h���A�A�1�6�J�J�J��3�E�3�3�4�4�4rc� �t|||��\}}|j}|���r�|tjjkr8tj|�|j|j��|j ��S|tjj kr8tj|� |j|j��|j ��Std|�����|� ��r8tj|�|j|j��|j ��S|���r8tj|�|j|j��|j ��St#d|�����r�)rfr9rVr*r�r�r+�create_maximumfrkr]r��create_maxnumfr)r�� create_maxsir�� create_maxuir<r�s r�maximumr�r�r�min�maxc�T�t|||��\}}t|||��\}}t|||��\}}|j}|���r?tj|�|j|j|j|��|j��Std|�d����)Nr�z(. Only floating point clamp is supported) rfr9rVr*r+� create_clampfrkr]r<)r�r�r�r�r#r9s r�clampr�%s���+�C��g�>�>�H�C�� )�!�S�'� :� :�F�A�s� )�!�S�'� :� :�F�A�s� �G�E� �����]��y��.�.�q�x���S�Z�Q^�_�_�ab�ag�h�h�h��[�E�[�[�[�\�\�\rc�T�t|||ddd��\}}|jj}|jj}|���r|���st ||���t ||��}||krt |||��}||krt |||��}||fS)NF)rfr]r^rOrrAr_)rgrhr#� input_sca_ty� other_sca_tyres r�bitwise_op_type_checking_implr�6s���/��u�g�u�e�UZ�[�[�L�E�5��:�$�L��:�$�L� � � � � �D� �(;�(;�(=�(=�D�'� �l�C�C�C�%�l�L�A�A�J��\�!�!��U�J��0�0���\�!�!��U�J��0�0�� �%�<�rc��t|||��\}}tj|�|j|j��|j��SrT)r�r*r+� create_andrkr]�rgrhr#s r�and_r�E�?��0���w�G�G�L�E�5� �9�W�'�'�� �e�l�C�C�U�Z� P� P�Prc��t|||��\}}tj|�|j|j��|j��SrT)r�r*r+� create_orrkr]r�s r�or_r�Js?��0���w�G�G�L�E�5� �9�W�&�&�u�|�U�\�B�B�E�J� O� O�Orc��t|||��\}}tj|�|j|j��|j��SrT)r�r*r+� create_xorrkr]r�s r�xor_r�Or�rc��|j���s#t|tjd��|��}|j���s#t|tjd��|��}t |||��S�N�int1)r]�is_int1�bitcastr*r9r�r�s r� logical_andr�Tsv�� �:� � � � �:���r�x��/�/��9�9�� �:� � � � �:���r�x��/�/��9�9�� ��u�g� &� &�&rc��|j���s#t|tjd��|��}|j���s#t|tjd��|��}t |||��Sr�)r]r�r�r*r9r�r�s r� logical_orr�\sv�� �:� � � � �:���r�x��/�/��9�9�� �:� � � � �:���r�x��/�/��9�9�� �u�e�W� %� %�%rc��|j���s#t|tjd��|��}t ||��Sr�)r]r�r�r*r9�invert)rgr#s r�not_r�dsC�� �:� � � � �:���r�x��/�/��9�9�� �%�� !� !�!rc��t|||��\}}tj|�|j|j��|j��SrT)r�r*r+� create_lshrrkr]r�s r�lshrr�j�?��0���w�G�G�L�E�5� �9�W�(�(���u�|�D�D�e�j� Q� Q�Qrc��t|||��\}}tj|�|j|j��|j��SrT)r�r*r+� create_ashrrkr]r�s r�ashrr�or�rc��t|||��\}}tj|�|j|j��|j��SrT)r�r*r+� create_shlrkr]r�s r�shlr�tr�rc��|SrT�)rgs r�plusr�~s�� �Lrc�(�|jj}|���r'td|���zdz���t j|�|�|����|��}t|||��S)Nz$wrong type argument to unary minus (�)) r]r^rUr)rr*r+�get_null_value�to_irrx)rgr#r��_0s rrsrs�s����:�$�L������a��?�,�BW�BW�BY�BY�Y�\_�_�`�`�`� ��7�)�)�,�*<�*<�W�*E�*E�F�F� � U� U�B� �r�5�'� "� "�"rc�P�|jj}|���s|���r't d|���zdz���t j|�|� |����|��}t|||��S)Nz%wrong type argument to unary invert (r�) r]r^rUrVr)rr*r+�get_all_ones_valuer�r�)rgr#r��_1s rr�r��s����:�$�L������b� � 8� 8� :� :�b��@�<�CX�CX�CZ�CZ�Z�]`�`�a�a�a� ��7�-�-�l�.@�.@��.I�.I�J�J�L� Y� Y�B� ��r�7� #� #�#r�v� tl.block_typec��|j���s tjS|jj}tjtj|��SrT)r]�is_blockr*r��shape� block_type)r�r�s r� _bool_liker��s:�� �6�?�?� � ���w�� �F�L�E� �=���%� (� (�(rc�^�t|||��\}}|jj}|���r@t j|�|j|j��t|����S|� ��r�|� ��r@t j|� |j|j��t|����St j|� |j|j��t|����Std|�����rz)rfr]r^rVr*r+�create_fcmpOGTrkr�rOr��create_icmpSGT�create_icmpUGTr<rvs r� greater_thanr�����/��u�g�F�F�L�E�5�� �!�I������d��y��/�/�� �e�l�K�K�Z�X]�M^�M^�_�_�_� � � � � �d� � "� "� $� $� d��9�W�3�3�E�L�%�,�O�O�Q[�\a�Qb�Qb�c�c� c��9�W�3�3�E�L�%�,�O�O�Q[�\a�Qb�Qb�c�c� c� �2�y�2�2� 3� 3�3rc�^�t|||��\}}|jj}|���r@t j|�|j|j��t|����S|� ��r�|� ��r@t j|� |j|j��t|����St j|� |j|j��t|����Std|�����rz)rfr]r^rVr*r+�create_fcmpOGErkr�rOr��create_icmpSGE�create_icmpUGEr<rvs r� greater_equalr��r�rc�^�t|||��\}}|jj}|���r@t j|�|j|j��t|����S|� ��r�|� ��r@t j|� |j|j��t|����St j|� |j|j��t|����Std|�����rz)rfr]r^rVr*r+�create_fcmpOLTrkr�rOr��create_icmpSLT�create_icmpULTr<rvs r� less_thanr��r�rc�^�t|||��\}}|jj}|���r@t j|�|j|j��t|����S|� ��r�|� ��r@t j|� |j|j��t|����St j|� |j|j��t|����Std|�����rz)rfr]r^rVr*r+�create_fcmpOLErkr�rOr��create_icmpSLE�create_icmpULEr<rvs r� less_equalr��r�rc��t|||��\}}|jj}|���r@t j|�|j|j��t|����S|� ��r@t j|� |j|j��t|����Std|�����rz) rfr]r^rVr*r+�create_fcmpOEQrkr�rO� create_icmpEQr<rvs r�equalr������/��u�g�F�F�L�E�5�� �!�I������_��y��/�/�� �e�l�K�K�Z�X]�M^�M^�_�_�_� � � � � �_��y��.�.�u�|�U�\�J�J�J�W\�L]�L]�^�^�^� �2�y�2�2� 3� 3�3rc��t|||��\}}|jj}|���r@t j|�|j|j��t|����S|� ��r@t j|� |j|j��t|����Std|�����rz) rfr]r^rVr*r+�create_fcmpUNErkr�rO� create_icmpNEr<rvs r� not_equalr�rr�start�endc���t|t��rt|t��std���t|dz ��}t|dz ��}|s|rtd���||krtd���||z }||dz zdkrtd���|g}t jtj|��}t j|�||��|��S)Nz/arange's arguments must be of type tl.constexpr� zarange must fit in int32z=arange's end argument must be greater than the start argumentr rz#arange's range must be a power of 2) � isinstancer"r)rCr*r�r-r+�create_make_range)rrr#�is_start_int64� is_end_int64�ranger�r�s r�aranger�s��� �e�S� !� !�L��C��)=�)=�L��J�K�K�K��%�2�+�&�&�N���r� �?�?�L��5��5��3�4�4�4� �e�|�|��X�Y�Y�Y� �%�K�E� ������!�!��>�?�?�?� �G�E� �]�2�8�U� +� +�F� �9�W�.�.�u�c�:�:�F� C� C�Crr�� List[int]r9c��t|tj��r,|jjdks Jd���t |||��}nx|�t d���|dkr)|�|�|����}n#t|d|j ����}||��}tj||��}t|||��S)Nr zonly accepts size-1 tensorz2dtype must be specified when value is not a tensorr�get_) r r*r+�numel�valuer_r)r�r��getattr�name�splat)r�rr9r#� get_value_fns r�fullrs����%���#�#� (��{� �A�%�%�%�'C�%�%�%��U�E�7�+�+��� �=��Q�R�R� R� �A�:�:��*�*�5�;�;�w�+?�+?�@�@�E�E�"�7�,?�5�:�,?�,?�@�@�L� �L��'�'�E�� �%��'�'�� ���w� '� '�'rrc��|j���r Jd���t|��dkr|Stj|j|��}tj|�|j|��|��S)NzCannot splat a block tensorr) r]r��lenr*r�r9r+� create_splatrk)rr�r#r�s rrrsp���z�"�"�$�$�C�C�&C�C�C� $� �5�z�z�Q���� � �]�5�;�� .� .�F� �9�W�)�)�%�,��>�>�� G� G�Gr� dst_shape� can_reorderc���d}|D]}||z}�|jj|krtd���tj|jj|��}tj|�|j||��|��S)Nr z:reshape() cannot change total number of elements in tensor) r]rr)r*r�r^r+�create_reshaperk)rgrrr#r�sr�s r�reshaper!%s}�� �E� ���� �� ��� �z��5� � ��U�V�V�V� �]�5�:�,�i� 8� 8�F� �9�W�+�+�E�L�)�[�Q�Q�SY� Z� Z�Zrc�@�d�|jD��}|�|d��|j���st |||���St j|jj|��}t j|� |j |��|��S)Nc�6�g|]}tj|����Sr��r*�_constexpr_to_value)�.0r�s r� <listcomp>zexpand_dims.<locals>.<listcomp>0s#��@�@�@�q��'��*�*�@�@�@rr )r�r#) r��insertr]r�rr*r�r^r+�create_expand_dimsrk)rgr!r#rr�s r� expand_dimsr*/s���@�@�E�K�@�@�@�I� ���T�1���� �:� � � � �>��U�)�W�=�=�=�=� �]�5�:�,�i� 8� 8�F� �9�W�/�/�� �d�C�C�V� L� L�Lrc�&�|s Jd���t|j��dksJ�tj|jj|jd|jdzg��}tj|�|j|j��|��S)Nz;current implementation of `cat` always may reorder elementsr r) rr�r*r�r]r^r+� create_catrk)rXrYrr#�ret_types r�catr.:s~�� �U�U�U�U�U�;� �s�y�>�>�Q� � � � ��}�S�X�_�s�y��|�c�i��l�/J�.K�L�L�H� �9�W�'�'�� �C�J�?�?�� J� J�Jr�a�bc���t|||��\}}|jgk}|r"t|d|��}t|d|��}t|jdtj��rt jd��}nd}|j|gz}t j|jj|��}t j |� |j |j ��|��}|rt|dgd|���}|S)Nr�����r F�rr#) r\r�r*r r*� constexprr�r]r^r+� create_joinrkr!)r/r0r#� was_rank_1�two� new_shaper-r�s r�joinr9As��� ��1�g� .� .�D�A�q���B��J��'� ��1�g� &� &�� ��1�g� &� &���!�'�"�+�r�|�,�,���l�1�o�o�������3�%��I��}�Q�V�]�I�6�6�H� �)�G�'�'���!�(�;�;�X� F� F�C��D��c�A�3�E�7�C�C�C�� �Jrc�j�t|j��dksJ�tj|jd��dksJ�|jdd�}tj|jj|��}|�|j��\}}tj ||��tj ||��fS)Nrr2r ) rr�r*r%r�r]r^� create_splitrkr+)r/r#r8r-�outLHS�outRHSs r�splitr>Zs��� ���L�L�1� � � � � � "�1�7�2�;� /� /�1� 4� 4� 4� 4������ �I��}�Q�V�]�I�6�6�H��)�)�!�(�3�3�N�F�F� � �&�(�#�#� � �&�(�#�#� �r�dims� Tuple[int]c����t�j��t|��krtd���td�|D����t t t|������krtd|�����t j�jj �fd�|D����}t j |� �j |��|��S)Nz5permute dims must have the same length as input shapec3�>K�|]}tj|��V��dSrTr$)r&�ds r� <genexpr>zpermute.<locals>.<genexpr>js-���� 6� 6�A�b�$�Q�'�'� 6� 6� 6� 6� 6� 6rz?permute dims must be a permutation of 0, 1, ..., n-1, but were c�*��g|]}�j|��Sr�)r�)r&rCrgs �rr'zpermute.<locals>.<listcomp>ms���0N�0N�0N�A���Q��0N�0N�0Nr) rr�r)�sorted�listr r*r�r]r^r+� create_transrk)rgr?r#r-s` r�permuterIgs���� �5�;���3�t�9�9�$�$��P�Q�Q�Q� � 6� 6�� 6� 6� 6�6�6�$�u�S��Y�Y�?O�?O�:P�:P�P�P��a�[_�a�a�b�b�b��}�U�Z�.�0N�0N�0N�0N��0N�0N�0N�O�O�H� �9�W�)�)�%�,��=�=�x� H� H�Hrc ��|j���sHtj|j|��}tj|�|j|��|��S|j���}t|��t|��krtd|�d|�����||kr|St|��D];\}}|||kr*|dkr$td||�d|�d|�d|�d|�� ����<tj|jj |��}tj|� |j|��|��S)Nz!Cannot broadcast, rank mismatch: �, r z3Cannot broadcast, the expanded size of the tensor (z ) must match the existing size (z) at non-singleton dimension �: ) r]r�r*r�r+rrk�get_block_shapesrr)� enumerater^�create_broadcast)rgr�r#r�� src_shape�i�items r�broadcast_impl_shaperSqsv�� �:� � � � �L���u�z�5�1�1���y��-�-�e�l�E�B�B�F�K�K�K�� �+�+�-�-�I� �9�~�~��U���#�#��Q�Y�Q�Q�%�Q�Q�R�R�R� � ���� ��Y�'�'�<�<���4� ��8�t� � ��� � ��;�SX�YZ�S[�;�;�?C�;�;�!"�;�;�&/�;�;�38�;�;�<�<� <���]�5�:�,�e� 4� 4�F� �9�W�-�-�e�l�E�B�B�F� K� K�Krc �v�|j}|j}|���ru|���satj|j|j��}tj|�|j|� ����|��}�n�|���su|���ratj|j|j��}tj|�|j|� ����|��}�n|����r|����r�|� ��}|� ��}t|��t|��kr�tt|��t|����D]g}tj|� |jd��tj|jdg|z����}|j}|� ��}�hn�t|��t|��kr�tt|��t|����D]g}tj|� |jd��tj|jdg|z����}|j}|� ��}�ht|��t|��ksJ�g}t|��D]�\} } || } | dkr|�| ���)| dks| | kr|�| ���Ktdt!| ��zdzt!| ��zdzt!| ��z���||krHtj|j|��} tj|�|j|��| ��}||krHtj|j|��} tj|�|j|��| ��}||fS)Nrr z?Cannot make_shape_compatible: incompatible dimensions at index rLr)r]r�r*r�r^r�r+rrkrMrr r)rN�appendr)�strrO) rXrYr#�lhs_ty�rhs_ty� lhs_shape� rhs_shape�_� ret_shaperQ�left�rightr�s rr\r\�s��� �X�F� �X�F������+U����!2�!2�+U���v�}�f�l�;�;���i��,�,�S�Z��9P�9P�9R�9R�S�S�U[�\�\��� �_�_� � �'U�6�?�?�#4�#4�'U���v�}�f�l�;�;���i��,�,�S�Z��9P�9P�9R�9R�S�S�U[�\�\��� ��� � �#U�v���0�0�#U��+�+�-�-� ��+�+�-�-� � �y�>�>�C� �N�N� *� *��3�y�>�>�3�y�>�>�:�:� 6� 6���i�� :� :�3�:�q� I� I� "� �f�m�a�S�9�_� M� M�O�O�����"�3�3�5�5� � �  6� ��^�^�c�)�n�n� ,� ,��3�y�>�>�3�y�>�>�:�:� 6� 6���i�� :� :�3�:�q� I� I� "� �f�m�a�S�9�_� M� M�O�O�����"�3�3�5�5� � ��9�~�~��Y���/�/�/�/�� � ��+�+� a� a�G�A�t��a�L�E��q�y�y�� � ��'�'�'�'��1�*�*�%�4�-�-�� � ��&�&�&�&� �"-�/2�1�v�v�"6�8<�"=�?B�4�y�y�"I�KR�"S�UX�Y^�U_�U_�"`�a�a�a� � � !� !��]�6�=�)�<�<�F��)�G�4�4�S�Z��K�K�V�T�T�C� � � !� !��]�6�=�)�<�<�F��)�G�4�4�S�Z��K�K�V�T�T�C� ��8�Or� rounding_mode� Optional[str]c��|�dS|dkrtjjS|dkrtjjSt d|�d����)N�rtne�rtzzInvalid rounding mode: z0. Supported rounding modes are 'rtne' and 'rtz'.)r � ROUNDING_MODE�RTNE�RTZr))r_s r�_str_to_rounding_moderg�sU�����t�������$�$�������#�#� �n�}�n�n�n� o� o�or�dst_tyc�F�|j}|���r1tj|j|j�����}||kr|S|j}|j}|���s|���rt|||��S|j}|j}||kr2tdt|��zdzt|��z���tj |� |j |�|����|��S)Nz!Cannot bitcast data-type of size z to data-type of size )r]r�r*r�r^rMrUr_�primitive_bitwidthr)rVr+�create_bitcastrkr�)rgrhr#�src_ty� src_sca_ty� dst_sca_ty�src_bits�dst_bitss rr�r��s�� �Z�F� �����M���v�}�e�j�.I�.I�.K�.K�L�L�� ����� ���J���J������,�j�/�/�1�1�,��E�6�7�+�+�+��,�H��,�H��8����<�s�8�}�}�L�P.�.�03�H� � �>�?�?� ?� �9�W�+�+�E�L�&�,�,�w�:O�:O�P�P�RX� Y� Y�YrN�fp_downcast_roundingc��|j}t|tj��r|j}t|tj��r|j}|���r1tj|j|j�����}||kr|S|j}|j}t|��}d}|� ��rP|� ��r<|j |j kr,|�tj j}nL|tj jkrd}n4|�2tdt!|��zdzt!|��z���|���s|���r|jjs Jd���|���s|���r?|j�d�� � Jd���|jd||||���S|���r|� ��s*|� ��r|���s|rBtj|�|j|�|��|��|��S|���r|���r(|���r>|���s*t?t?|tj |��||��S|� ��o#|� ��o|j |j k}|rAtj|�!|j|�|����|��S|� ��o#|� ��o|j |j k} | rAtj|�"|j|�|����|��S|�#���r |�#��r�|j$|j$ks|j%|j%kr�|�&��o|�'�� } |�'��rX|j(�|��} tj|�)| ��|j(��} tU|| |��Stj|�+|j|�|��| ��|��S|�,���r|�#���r|�'��rX|j(�|��} tj|�)| ��|j(��} tU|| |��S|�&��rAtj|�-|j|�|����|��Stj|�.|j|�|����|��S|�#��r�|�,��r�|�'��s|�&��sAtj|�/|j|�|����|��Stj|�0|j|�|����|��S|�1��r�|�#��r�|j$} | d krAtj|�2|j|�|����|��S| d krZtUt?|tj3|��tj|�4d ��tj3��|��S|�#��rU|�1��rAtj|�5|j|�|����|��S|�1��rU|�1��rAtj|�6|j|�|����|��SJd |�d |�����)NFTz]fp_downcast_rounding should be set only for truncating fp conversions. Source scalar type is z and destination type is z4fp8e4nv data type is not supported on CUDA arch < 89�convert_custom_typesz0target doesn't provide conversion for this type.r��@r rz cannot cast z to )7r]r r*r4rr�r�r^rMrgrVrjr rdrer)rV� is_fp8e4nv�options� allow_fp8e4nv� is_fp8e4b15� codegen_fns�get�is_fp8r+�create_fp_to_fprkr�rKrIrMr_rJ�create_fp_trunc� create_fp_extrOr7r8r��is_boolr9r�r�create_int_cast�is_standard_floating�create_fp_to_si�create_fp_to_ui�create_ui_to_fp�create_si_to_fprU�create_ptr_to_int�int64� get_int64�create_int_to_ptrrk)rgrhr#rqrlrmrn�use_custom_rounding� truncate_fp�ext_fp� sign_extend�tyr��bitwidths rr_r_�sr�� �Z�F��&�"�,�'�'������&�� �5�5�:�3�9�� �����M���v�}�e�j�.I�.I�.K�.K�L�L�� ����� ���J���J�1�1E�F�F���������y�J�$:�$:�%�%�y� � '�*�*G� G� G� � '��@P�@U�)=�)=� !�R�%5�%:� :� :�RV�<O�� � +��6�8;�J���H�Je�f�hk�lv�hw�hw�x�y�y� y� �����e�:�#8�#8�#:�#:�e���,�d�d�.d�d�d�,���� � �r�J�$:�$:�$<�$<�r��"�&�&� "�$�$�+/�0�0�1c�0�0�0�:�w�"�#9�:�5�&�J^�ip�q�q�q�q� �����u� � 6� 6� 8� 8�u���� � �u�%/�%6�%6�%8�%8�u��u��y��0�0���v�|�|�G�?T�?T�Vj�k�k�ms�t�t�t� �����K�Z�%7�%7�%9�%9�K������K�%/�%7�%7�%9�%9�K��D��� �G�4�4�j�'�J�J�J� �(�(�*�*�F���� � �F��%� �(E�E���_��y��0�0���v�|�|�G�?T�?T�U�U�W]�^�^�^� � #� #� %� %�F���� � �F��%� �(E�E� ��]��y��.�.�u�|�V�\�\�'�=R�=R�S�S�U[�\�\�\������p�z�0�0�2�2�p���:�#:�:�:�j�>W�[e�[t�>t�>t� �.�.�0�0�M��9K�9K�9M�9M�5M� � � � � � � p���"�"�7�+�+�B���7�1�1�"�5�5�u�{�C�C�B��U�B��0�0� 0��9�W�4�4�U�\�6�<�<�PW�CX�CX�Ze�f�f�hn�o�o� o��&�&�(�(�c�Z�->�->�-@�-@�c� � � � � � c���"�"�7�+�+�B���7�1�1�"�5�5�u�{�C�C�B��U�B��0�0� 0� � %� %� '� '� c��9�W�4�4�U�\�6�<�<�PW�CX�CX�Y�Y�[a�b�b� b��9�W�4�4�U�\�6�<�<�PW�CX�CX�Y�Y�[a�b�b� b������c�z�>�>�@�@�c� � � � � � c�z�'?�'?�'A�'A� c��9�W�4�4�U�\�6�<�<�PW�CX�CX�Y�Y�[a�b�b� b��9�W�4�4�U�\�6�<�<�PW�CX�CX�Y�Y�[a�b�b� b������q�z�0�0�2�2�q��*�� �r�>�>��9�W�6�6�u�|�V�\�\�RY�EZ�EZ�[�[�]c�d�d� d� �q�=�=��T�%���7�;�;�R�Y�w�GX�GX�YZ�G[�G[�]_�]e�=f�=f�ho�p�p� p������a�z�0�0�2�2�a��y��2�2�5�<����g�AV�AV�W�W�Y_�`�`�`������^�z�0�0�2�2�^��y��/�/�� �f�l�l�7�>S�>S�T�T�V\�]�]�]�4�4��4�4�F�4�4�4�4�5rc��tjj}|rC|dkrtjj}n+|dkrtjj}nt d|�d����|S)Nz.ca�.cg�Cache modifier � not supported)r �CACHE_MODIFIERr��CA�CGr)��cache_modifier�caches r�_str_to_load_cache_modifierr�Qse�� � � "�E��O� �U� "� "��%�(�E�E� �u� $� $��%�(�E�E��M�~�M�M�M�N�N� N� �Lrc��tjj}|rs|dkrtjj}n[|dkrtjj}nC|dkrtjj}n+|dkrtjj}ntd|�d����|S)Nz.wbr�z.csz.wtr�r�)r r�r��WBr��CS�WTr)r�s r�_str_to_store_cache_modifierr�]s��� � � "�E�� O� �U� "� "��%�(�E�E� �u� $� $��%�(�E�E� �u� $� $��%�(�E�E� �u� $� $��%�(�E�E��M�~�M�M�M�N�N� N� �Lrc��tjj}|rC|dkrtjj}n+|dkrtjj}nt d|�d����|S)N� evict_last� evict_firstzEviction policy r�)r �EVICTION_POLICY�NORMAL� EVICT_LAST� EVICT_FIRSTr))�eviction_policy�evictions r�_str_to_eviction_policyr�mse���!�(�H��Q� �l� *� *��)�4�H�H� � � -� -��)�5�H�H��O��O�O�O�P�P� P� �Orc��d}|rC|dkrtjj}n+|dkrtjj}nt d|�d����|S)N�zero�nanzPadding option r�)r �PADDING_OPTION�PAD_ZERO�PAD_NANr))�padding_option�paddings r�_str_to_padding_optionr�ys_���G��O� �V� #� #��'�0�G�G� �u� $� $��'�/�G�G��M�~�M�M�M�N�N� N� �Nrc��tjj}|rs|dkrtjj}n[|dkrtjj}nC|dkrtjj}n+|dkrtjj}nt d|�d����|S)N�acquire�release�acq_rel�relaxed�Memory semantic r�)r � MEM_SEMANTIC�ACQUIRE_RELEASE�ACQUIRE�RELEASE�RELAXEDr))� sem_option�sems r� _str_to_semr��s��� �/� )�C�� L� �� "� "��/�)�C�C� �9� $� $��/�)�C�C� �9� $� $��/�1�C�C� �9� $� $��/�)�C�C��J� �J�J�J�K�K� K� �Jrc���tjj}|r[|dkrtjj}nC|dkrtjj}n+|dkrtjj}nt d|�d����|S)N�gpu�cta�sysr�r�)r �MEM_SYNC_SCOPE�GPU�CTA�SYSTEMr))� scope_option�scopes r� _str_to_scoper��s}�� � � !�E��N� �5� � ��%�)�E�E� �U� "� "��%�)�E�E� �U� "� "��%�,�E�E��L� �L�L�L�M�M� M� �Lrc�n�|r�t|d��s|g}d�|D��}|D]5}t|t��rd|cxkrt|��ksnJ��6t|��dksJ�t|��tt |����ks Jd���t |��SdS)N�__iter__c�T�g|]%}t|tj��r|jn|��&Sr��r r*r4r�r&�elems rr'z0_canonicalize_boundary_check.<locals>.<listcomp>�s0��l�l�l�UY� �4���(F�(F�P�$�*�*�D�l�l�lrrz'Duplicate dimension in `boundary_check`r�)�hasattrr r"r�setrF)�boundary_check� block_shape�dims r�_canonicalize_boundary_checkr��s����&��~�z�2�2� .�,�-�N�l�l�]k�l�l�l��!� H� H�C��c�3�'�'� G�A��,G�,G�,G�,G�s�;�7G�7G�,G�,G�,G�,G�,G�G�,G��>�"�"�Q�&�&�&�&��>�"�"�c�#�n�*=�*=�&>�&>�>�>�>�@i�>�>�>��n�%�%�%� �2rc ��|�|�td���|jjj} | tjks Jd���| ���r$|t jjkrtd���|jj} t|| � ����}tj |� |j |||||��| ��S)N�K`mask` and `other` arguments cannot be specified for loading block pointers�3`tl.int1` should be rewrited in `tl.make_block_ptr`z@Padding option `nan` is not supported for integer block pointers)r)r]� element_tyr*r�rOr r�r�r�rMr+�create_tensor_pointer_loadrk) �ptr�maskrhr�r�r�r�� is_volatiler#�elt_tyrhs r�_load_block_pointerr��s��� ��5�,��f�g�g�g� �X� � +�F� �R�W� � � �S� � � � �}�}���]�7�b�&7�&?�?�?��[�\�\�\��X� �F�2�.�&�BY�BY�B[�B[�\�\�N� �9��*�*�3�:�~�w�PU�W_�al�m�m�ou� w� w�wrc ���|jj���s*td|j����d����|�|�td���|s|rtd���|j���sT|r(|j���rtd���|r(|j���rtd���|j���rT|�(t ||j���|��}|�(t ||j���|��}|jj} | j} | tj kr7tj } tj | | j ��} t|| |��}|�t|| |��}|j���r/|j���} tj| | ��} n| } |�0tj|�|j|||��| ��Stj|�|j|j|r|jnd|||��| ��S)N�Unsupported ptr type z in `tl.load`z)`other` cannot be provided without `mask`z�`padding_option` or `boundary_check` argument is not supported for loading a tensor ofpointers or loading a scalar. Because the compiler does not know the boundary; please use block pointers (defined by `make_block_ptr`) instead�EMask argument cannot be block type if pointer argument is not a blockzFOther argument cannot be block type if pointer argument is not a block)r]r^rUr)rr�rSrMr�r*r��int8� pointer_type� address_spacer_r�r+� create_loadrk�create_masked_load) r�r�rhr�r�r�r�r�r#�ptr_tyr�r�rhs r� _load_legacyr��s��� �8�?� !� !� #� #�U��S���1B�1B�1D�1D�S�S�S�T�T�T� �|��)��D�E�E�E��U�.�U��T�U�U� U� �8� � � � �g� � f�D�I�&�&�(�(� f��d�e�e� e� � g�U�Z�(�(�*�*� g��e�f�f� f� �x�����V� � �'��c�h�.G�.G�.I�.I�7�S�S�D� � �(����0I�0I�0K�0K�W�U�U�E��X�_�F� � �F�������������)=�>�>���3���(�(�� ���U�F�G�,�,�� �x��������)�)�+�+����v�u�-�-����� �|��y��,�,�S�Z���+�V�V�X^�_�_�_��y� � &� &�s�z�4�;�PU�@_�� � �[_�af�hp�'2� 4� 4�5;�=�=� =rr�r��Optional[tl.tensor]r�rr�rVr�r�r�c �&�t|��} t|��} t|��} |j���r5|jj���rt||||| | | ||� � St||||| | | ||� � SrT) r�r�r�r]rUr�r�r�r�) r�r�rhr�r�r�r�r�r#r�r�r�s r�loadr�s��� (�� 7� 7�E�&��7�7�H�$�^�4�4�G� �x�����n�S�X�0�9�9�;�;�n�"�3��e�^�W�e�U]�_j�ls�t�t�t��C��u�n�g�u�h�Xc�el�m�m�mr�desc_ptrc ���t||d���}|�|j||�|��t |��t |����}t j||��S�NF�� require_i64)�_convert_to_ir_values�create_descriptor_loadrkr�r�r�r*r+)r��offsetsr�r�r]r#r�s r�descriptor_loadr�sk��#�G�W�%�H�H�H�G��&�&�x������G�AT�AT�'B�>�'R�'R�'>��'O�'O� Q� Q�A� �9�Q�� � �rc��t||d���}tj|�|j|j|��tj��Sr�)r�r*r+�create_descriptor_storerk�void)r�rr�r#s r�descriptor_storer�sC��#�G�W�%�H�H�H�G� �9�W�4�4�X�_�e�l�T[�\�\�^`�^e� f� f�frc �.�|�td���|jj���}|j���st |||��}|j���s Jd���||j���ks(Jd|�d|j����d����|jjj|jjks*Jd|jjj�d|jj�d����|jjj}|t jks Jd���t||��}t|||��}t j |� |j |j |||��t j ��S) Nr�z-Value argument must be block type or a scalarz Block shape(z) and value shape(z ) mismatchzBlock element type(z) and value element type(r�)r)r]r�rMr�rSr*r�r�r_r+�create_tensor_pointer_storerkr�) r��valr�r�r�r�r#r�r�s r�_store_block_pointerr�s��� ���f�g�g�g��(�%�6�6�8�8�K� �8� � � � �>�"�3� �W�=�=�� �8� � � � �O�O� O�O�O� � �#�(�3�3��� � � �\�k�\�\�S�X�5N�5N�5P�5P�\�\�\� � � � �8� � )�S�X�-@� @� @� @�Cq�X[�X`�Xk�Xv�Cq�Cq�RU�RZ�Re�Cq�Cq�Cq� @� @� @� �X� � +�F� �R�W� � � �S� � � �2�.�+�N�N�N� �s�F�G� $� $�C� �9�W�8�8���S�Z�Q_�af�hp�q�q��W� � �rc ��|jj���s*td|j����d����|rtd���|j���sR|j���rtd���|r(|j���rtd���|j���rRt ||j���|��}|�(t ||j���|��}|jj}|j}|tj kr7tj }tj ||j ��}t|||��}t|||��}|s?tj|�|j|j||��tj��S|jj���std���tj|�|j|j|j||��tj��S)Nr�z in `tl.store`z�`boundary_check` argument is not supported for storing a tensor of pointers or storing a scalar. Because the compiler does not know the boundary; please use block pointers (defined by `make_block_ptr`) insteadzFValue argument cannot be block type if pointer argument is not a blockr�z"Mask must have boolean scalar type)r]r^rUr)rr�rSrMr�r*r�r�r�r�r_r+� create_storerkr�r�create_masked_store) r�r�r�r�r�r�r#r�r�s r� _store_legacyr;s'�� �8�?� !� !� #� #�V��T���1B�1B�1D�1D�T�T�T�U�U�U��B��A�B�B� B� �8� � � � �f� �8� � � � � g��e�f�f� f� � f�D�I�&�&�(�(� f��d�e�e� e� �x�����T�"�3���(A�(A�(C�(C�W�M�M�� � �'��c�h�.G�.G�.I�.I�7�S�S�D� �X�_�F� � �F�������������)=�>�>���3���(�(�� �s�F�G� $� $�C� �a��y��-�-�c�j�#�*�e�X�V�V�XZ�X_�`�`�`� �9� � #� #� %� %�?��=�>�>�>� �9�W�0�0���S�Z���V[�]e�f�f�hj�ho� p� p�prr�c ��t|��}t|��}|j���s|jj���rt d���|j���r3|jj���rt|||||||��St|||||||��S)N�"Cannot store to a constant pointer) r�r�r]�is_constr^r)rUr�r�r�r) r�r�r�r�r�r�r#r�r�s r�storergs��� )�� 8� 8�E�&��7�7�H� �x�����?�c�h�o�6�6�8�8�?��=�>�>�>� �x�����W�S�X�0�9�9�;�;�W�#�C��d�N�E�8�U\�]�]�]��S�#�t�^�U�H�g�V�V�Vr�cmpr�r�c ��t|��}t|��}|jjj}|jdvrt d���tj|� |j |j |j ||��|j��S)N)�rrtz9atomic_cas only supports elements with width {16, 32, 64}) r�r�r]r^r�rjr)r*r+�create_atomic_casrk)r�rr�r�r�r#r�s r� atomic_casr }sy�� �c� � �C� �%� � �E����+�J��$�L�8�8��T�U�U�U� �9�W�.�.�s�z�3�:�s�z�SV�X]�^�^�`c�`h� i� i�ir�op�&Tuple[tl.tensor, tl.tensor, tl.tensor]c���|jj���s)td|j���z���|j���s|jj���rtd���|jjj}|tjur|dkrtd|zdz���|tj tj tj tj fvr%td|zdzt|��z���|j���rT|�(t||j���|��}|�(t||j���|��}t#||jjj|��}|s�|�d��}tj }|j���rc|�||j�����}tjtj |j�����}tj||��}|||fS)Nz)Pointer argument of store instruction is rrq�atomic_z does not support fp16z does not support T)r]r^rUr)rrr�r*rLr�r��int16rNrVr�rSrMr_�get_int1rr�r+)r�r�r�r r#r��mask_ir�mask_tys r�atom_red_typechecking_implr�s�� �8�?� !� !� #� #�\��D�s�x�GX�GX�GZ�GZ�Z�[�[�[� �x�����?�c�h�1�:�:�<�<�?��=�>�>�>����+�J��R�Z���B�%�K�K���R��*B�B�C�C�C��b�g�r�w���"�+�>�>�>���R��*>�>��Z���P�Q�Q�Q� �x�����R� � �'��c�h�.G�.G�.I�.I�7�S�S�D� �?�&�s�C�H�,E�,E�,G�,G��Q�Q�C� �s�C�H�O�.�� 8� 8�C� �+��"�"�4�(�(���'�� �8� � � � � J��*�*�7�C�H�4M�4M�4O�4O�P�P�G��m�B�G�S�X�-F�-F�-H�-H�I�I�G��y��'�*�*�� ��T�>�rc � �t|||d|��\}}}t|��}t|��}|jj}|���r�|���rPtj|� tj j |j |j |j ||��|j��Stj|� tj j|j |j |j ||��|j��S|tjtjhvrt#d|�����t%gd||��}|tjkr tjn tj}t+|||��} t+|tj|d��|��} |tjkr tjn tj} t+|| |��} t+|tj| d��|��} t3|||��}t5|||��}tj|� tj j | j | j t7|||��j ||��| j��}tj|� tj j| j | j t7|||��j ||��| j��}t;||||��}t+|||��S)Nr�z#atomic_max not supported for dtype �r )rr�r�r]r^rOr�r*r+�create_atomic_rmwr � ATOMIC_OP�MAXrk�UMAXrJrHr<rr-r�r�r��uint32�uint64r�r�r��UMIN�where�r�r�r�r�r�r#�sca_tyr��i_type�i_val�i_ptr�ui_type�ui_val�ui_ptr�pos�neg�pos_ret�neg_retr�s r� atomic_maxr,����/��S�$��w�O�O�N�C��d� �c� � �C� �%� � �E� �X�_�F� �}�}���y� � � � !� !� y��9��)�)�"�,�*:�C�J�� �TX�T_�ad�fk�l�l�nq�nv�x�x� x��9��)�)�"�,�*;�S�Z���UY�U`�be�gl�m�m�or�ow�y�y� y� �b�j�"�*�-�-�-��F�f�F�F�G�G�G� ��C��� )� )�D��2�:�-�-�R�X�X�2�8�F� �C��� )� )�E� �C�����3�3�W� =� =�E�!�R�Z�/�/�b�i�i�R�Y�G� �S�'�7� +� +�F� �S�"�/�'�1�5�5�w� ?� ?�F� ��T�7� +� +�C� �C��w� '� '�C��i��!�!�"�,�"2�E�L�%�,�"&�t�S�'�":�":�"A�3�� O� O�PU�PZ�\�\�G��i��!�!�"�,�"3�V�]�F�M�"&�t�S�'�":�":�"A�3�� O� O�PV�P[�]�]�G� ��W�g�w� /� /�C� �3��� (� (�(rc � �t|||d|��\}}}t|��}t|��}|jj}|���r�|���rPtj|� tj j |j |j |j ||��|j��Stj|� tj j|j |j |j ||��|j��S|tjtjhvrt#d|�����t%gd||��}|tjkr tjn tj}t+|||��} t+|tj|d��|��} |tjkr tjn tj} t+|| |��} t+|tj| d��|��} t3|||��}t5|||��}tj|� tj j | j | j t7|||��j ||��| j��}tj|� tj j| j | j t7|||��j ||��| j��}t;||||��}t+|||��S)Nr�z#atomic_min not supported for dtype rr )rr�r�r]r^rOr�r*r+rr r�MINrkrrJrHr<rr-r�r�r�rrr�r�r�rrr s r� atomic_minr0�r-rc �r�t|||d|��\}}}t|��}t|��}|jj}|���rt jjnt jj }tj |� ||j |j |j ||��|j��S)Nrq)rr�r�r]r^rVr r�FADD�ADDr*r+rrk)r�r�r�r�r�r#r!r s r� atomic_addr4�s���/��S�$��w�O�O�N�C��d� �c� � �C� �%� � �E� �X�_�F�$�0�0�2�2� H��� � �� �8H�B� �9�W�.�.�r�3�:�s�z�4�;�X[�]b�c�c�eh�em� n� n�nrc � �t|||d|��\}}}t|��}t|��}tj|�t jj|j |j |j ||��|j ��S)N�and) rr�r�r*r+rr r�ANDrkr]�r�r�r�r�r�r#s r� atomic_andr9��y��/��S�$��w�O�O�N�C��d� �c� � �C� �%� � �E� �9�W�.�.�r�|�/?���S�Z�Y]�Yd�fi�kp�q�q��X� � �rc � �t|||d|��\}}}t|��}t|��}tj|�t jj|j |j |j ||��|j ��S)N�or) rr�r�r*r+rr r�ORrkr]r8s r� atomic_orr>sx��/��S�$��g�N�N�N�C��d� �c� � �C� �%� � �E� �9�W�.�.�r�|��� �C�J�X\�Xc�eh�jo�p�p��X� � �rc � �t|||d|��\}}}t|��}t|��}tj|�t jj|j |j |j ||��|j ��S)N�xor) rr�r�r*r+rr r�XORrkr]r8s r� atomic_xorrBr:rc � �t|||d|��\}}}t|��}t|��}tj|�t jj|j |j |j ||��|j ��S)N�xchg) rr�r�r*r+rr r�XCHGrkr]r8s r� atomic_xchgrFsy��/��S�$���P�P�N�C��d� �c� � �C� �%� � �E� �9�W�.�.�r�|�/@�#�*�c�j�Z^�Ze�gj�lq�r�r��X� � �rc���|���|jjvsJd|jj�d|�����|���}|dkrd}t t j|��S)Nzinput_precision must be one of z. Got �TF32X3�TF32x3)�lowerrv�allowed_dot_input_precisions�upperrr �INPUT_PRECISION)�input_precisionr#s r�_str_to_dot_input_precisionrOsz�� � � � "� "�g�o�&R� R� R� R�o�'�/�*V�o�o�^m�o�o� S� R� R�%�+�+�-�-�O��(�"�"�"�� �2�%�� 7� 7�7r�accrN�max_num_imprecise_acc� out_dtypec ��d�}|j���r|j���sJ�||j|j|j��|j���s|j���r6t |t j|��}t |t j|��}|� |jj}t||��}t|j ��}t|j ��} || cxkrdks,n|| cxkrdksnJd|j �d|j �d����|j dj |j dj ksAJd |j �d |j �d |j dj �d |j dj �d� ���|j dj d kr,|j dj d kr|j dj d ksJd|j �d|j �d����|jj ���rf|jj t jks Jd���|j dj dks Jd���|�d��} t j} n�|���rt'd���|jj ���s|jj ���r"|�d��} t j} n@|���r|�d��n|�d��} |} |jj d} |jj d} |dkr|jj dnd}t j| |r|| | gn| | g��}|� |�| |r|| | gn| | g��}n|j}|j|ksJ�|�A|j���r&|j���r |jj}nd}t j|�|j|j|||��|��S)Nc�@�|jsu|���s|���r Jd���|���r|���rdS||ksJd|�d|�d����dS|���s|���rQ||ksJd|�d|�d����|���s"|���sJd|�d����dSdS|���s|���r/|jrgd �}nd d g}d �}|||d ��|||d��dS|���sI|���s5|� ��s!|� ��s Jd|�����|���sI|���s5|� ��s!|� ��s Jd|�����||ksJd|�d|�d����dS)Nz1Dot op does not support fp8e4nv on CUDA arch < 90z First input (z) and second input (z) must have the same dtype!z0Both operands must be same type. First operand (z) and second operand (r�z:Both operands must be either int8 or uint8. Operand type ()�fp8e4nv�fp8e5�fp8e4b15rUrVc ���t�fd�|D����s.d�|��}td|�d|�d��d����dS)Nc3�N�K�|]}t�d|������V�� dS)�is_N)r)r&� dtype_namer9s �rrDzLdot.<locals>.assert_dtypes_valid.<locals>._validate_dtype.<locals>.<genexpr>=s?�����d�d� �A�w�u�.@�J�.@�.@�A�A�C�C�d�d�d�d�d�drrKzOnly supports z. z (r�)�anyr9�AssertionError)r9� allowed_types� operand_name�supported_typess` r�_validate_dtypez9dot.<locals>.assert_dtypes_valid.<locals>._validate_dtype<ss����d�d�d�d�Vc�d�d�d�d�d�k�*.�)�)�M�*B�*B��,�-i�o�-i�-i�Q]�-i�-i�af�-i�-i�-i�j�j�j�k�krz First operandzSecond operandzUnsupported dtype ) rwrur{rO�is_int8�is_uint8�allow_fp8e4b15rKrMrIr�)� lhs_dtype� rhs_dtypervr^ras r�assert_dtypes_validz dot.<locals>.assert_dtypes_valid*sZ���$� E� �+�+�-�-� C�i�6J�6J�7�7� C� C�B� C� C�����!�!� �i�&6�&6�&8�&8� ���� �)�)�)�,A�9�,A�,A�Zc�,A�,A�,A�)�)�)�)�)����!�!� E�Y�%5�%5�%7�%7� E� �I�-�-�-�0P�bk�0P�0P�DM�0P�0P�0P�-�-�-� �(�(�*�*�]�i�.@�.@�/�/�]�]�\�PY�\�\�\�]�]��]�]�]�]��!�!�#�#� E�y�'7�'7�'9�'9� E��)�9�$D�$D�$D�M�M�%.��$8�M�k�k�k�  �� �=�/�J�J�J��� �=�:J�K�K�K�K�K� �(�(�*�*�4�i�.?�.?�.A�.A�4�Y�EV�EV�EX�EX�4�\e�\m�\m�]�]�4�4�3� �3�3�4�4�� �(�(�*�*�4�i�.?�.?�.A�.A�4�Y�EV�EV�EX�EX�4�\e�\m�\m�]�]�4�4�3� �3�3�4�4�� �I�-�-�-�0E�y�0E�0E�^g�0E�0E�0E�-�-�-�-�-rr �z+Both inputs must be either 2D or 3D; (lhs: z vs rhs: r�r2�����zFirst input shape (z) and second input shape z= are not compatible for matmul (second index of first shape (z0) must be equal to first index of second shape (r z0All non-batch values in both first input shape (z) and second input shape (z) must be >= 16!zonly int8 supported!r rzsmall blocks not supported!rzhout_dtype=bfloat16 is unsupported. Please use out_dtype=float32/float16 and cast with `.to(tl.bfloat16)`) r]r�r9rvrxr_r*rL�default_dot_input_precisionrOrr�rr^rOr�� get_int32r-rMr)rI�get_fp32rJrK�get_fp16r�rrkr{�max_num_imprecise_acc_defaultr+� create_dot)rXrYrPrNrQrRr#rg�lhs_rank�rhs_rankr�� ret_scalar_ty�M�N�Br�� acc_handles r�dotrw's���E�E�E�@ �8� � � � �6�3�8�#4�#4�#6�#6�6�6� 6���� �3�9�g�o�>�>�>� �y�����-�#�)�"7�"7�"9�"9�-��3�� �G�,�,���3�� �G�,�,����!�/�E��1�/�7�K�K�O��3�9�~�~�H��3�9�~�~�H� �x� $� $� $� $�1� $� $� $� $��H�(A�(A�(A�(A��(A�(A�(A�(A�(A�DR�qt�qz�DR�DR�FI�FO�DR�DR�DR�(A�(A� A� �9�R�=� �#�)� �# � � � � �q���q�q�S�Y�q�q�VY�V_�`b�Vc�Vi�q�q�[^�[d�eg�[h�[n�q�q�q� � � � �9�R�=� �"� $� $���2��)<��)B�)B� �I�b�M� �2� %� %� %�{�3�9�{�{�`c�`i�{�{�{� &� %� &� �x������"��x��"�'�)�)�)�+A�)�)�)��y��|�!�R�'�'�'�)F�'�'�'� � � �q� !� !���� � � � � � � �"�� v�x�x� x� ��� � � "� "�"�c�h�o�&=�&=�&?�&?�"� � � �a� � ��� � � �$-�$5�$5�$7�$7� P�W� � �a� � � �W�=M�=M�a�=P�=P��!� � ���r��A� ���r��A�%��]�]����q����A� �]�=�q�*D�1�a��)�)�q�!�f� E� E�F� �{��)�)�"�1�.H�q�!�Q�i�i�1�a�&�I�I� � ��Z� ��x�6�!�!�!�!��$� �9� � � � � &�#�)�"2�"2�"4�"4� &�$+�O�$Q� !� !�$%� !� �9�W�'�'�� �C�J� �O�]r�s�s�� � �r� conditionc���t|tj|��}|j���r<t |||��\}}t |||��\}}t |||��\}}t |||dd��\}}|j���st |||��\}}|j}tj|�|j |j |j ��|��S)NT) r_r*r�r]r�r\rfr+� create_selectrk)rxr�r�r#r[r�s rrr�s����Y����1�1�I��~��� � �C�+�I�q�'�B�B� � �1�#�A�q�'�2�2���1�+�I�q�'�B�B� � �1� '��1�g�t�T� B� B�D�A�q� �>� "� "� $� $�C�+�I�q�'�B�B� � �1� �V�F� �9�W�*�*�9�+;�Q�X�q�x�P�P�RX� Y� Y�Yrc�`�|rtj||��}n|}tj||��SrT)r*r�r+)r�rwr\�res_tys r� wrap_tensorr}�s6������y�)�4�4����� �9�Q�� � �r�inputs�Sequence[tl.tensor]�Tuple[tl.tensor, ...]c�"���������t�fd��D�����d��djj�t���}�|ksJd|�d�����fd�t ���D���t �fd��D����s Jd�����d��D������|�������t���fd �tt�����D����S) Nc3�T�K�|]"}t||jjgd����V��#dS)Tr3N)r!rr)r&�tr#s �rrDzreduction.<locals>.<genexpr>�s;�����f�f�Z[�w�q�1�7�=�/�t�W�U�U�U�f�f�f�f�f�frrz&reduction axis must be < inputs rank (r�c�&��g|] \}}|�k� |��Sr�r�)r&rQr r!s �rr'zreduction.<locals>.<listcomp>�s"���=�=�=�t�q�!�1��9�9��9�9�9rc3�8�K�|]}|jj�kV��dSrT)r]r�)r&r�r�s �rrDzreduction.<locals>.<genexpr>�s,�����5�5��q�v�|�u�$�5�5�5�5�5�5rz-all reduction inputs must have the same shapec��g|] }|j�� Sr��rk�r&r�s rr'zreduction.<locals>.<listcomp>�s��&@�&@�&@�A�q�x�&@�&@�&@rc3��K�|]8}t��|���|jj���V��9dSrT�r}� get_resultr]r^)r&rQr~� reduce_opr\s ���rrDzreduction.<locals>.<genexpr>�sG�����t�t�\]��Y�1�1�!�4�4�f�Q�i�n�6K�Y�W�W�t�t�t�t�t�tr) �tupler]r�rrN�all� create_reduce�verifyr )r~r!�region_builder_fnr#�rankr�r\r�s`` ` @@@r� reductionr��sC�������� �|��f�f�f�f�_e�f�f�f�f�f���� �1�I�N� �E� �u�:�:�D� �$�;�;�;�H��H�H�H�;�;�;�=�=�=�=�y��/�/�=�=�=�I� �5�5�5�5�f�5�5�5� 5� 5�f�f�7f�f�f� 5��%�%�&@�&@��&@�&@�&@�$�G�G�I���i� � � � ������ �t�t�t�t�t�t�af�gj�kq�gr�gr�as�as�t�t�t� t� t�tr�reversec�������djj�t���}| |cxkr|ksnJd|�d|�d����|dkr||z }�D]}|jj�ks Jd����|�d��D��||���|�������t ���fd�t t�����D����S)Nrz scan axis z must be < inputs rank (r�z(all scan inputs must have the same shapec��g|] }|j�� Sr�r�r�s rr'z$associative_scan.<locals>.<listcomp>�s��"<�"<�"<��1�8�"<�"<�"<rc3��K�|]8}t��|���|jj���V��9dSrTr�)r&rQr~�scan_opr�s ���rrDz#associative_scan.<locals>.<genexpr>�sG�����n�n�VW��W�/�/��2�2�F�1�I�N�4I�5�Q�Q�n�n�n�n�n�nr)r]r�r� create_scanr�r�r ) r~r!r�r�r#r�r�r�r�s ` @@r�associative_scanr��s(����� �1�I�N� �E� �u�:�:�D� �5�D� � � � �4� � � � � �!S�d�!S�!S�D�!S�!S�!S� � � � �a�x�x� �� �� �Q�Q���v�|�u�$�$�$�&P�$�$�$�$��!�!�"<�"<�V�"<�"<�"<�d�G�L�L�G���g���� �N�N���� �n�n�n�n�n�n�[`�ad�ek�al�al�[m�[m�n�n�n� n� n�nr�num_binsc�$�t|j��dks Jd���|j���s Jd���t j|�|j|��t jtj |f����S)Nr z histogram only supports 1D inputz%histogram only supports integer input) rr�r9rOr*r+�create_histogramrkr�r-)rgr�r#s r� histogramr��s��� �u�{� � �q� � � �"D� � � � �;� � � � �H�H�!H�H�H� � �9�W�-�-�e�l�H�E�E�r�}�UW�U]�`h�_k�Gl�Gl� m� m�mr�valuesc��tdt|j����t|��krtd���|j�dt j||j�������|S)Nr zAShape of input to multiple_of does not match the length of valuesztt.divisibility) r�rr�r)rk�set_attrr � make_attr� get_context�r�r�s r� multiple_ofr��sl�� �1�c�!�'�l�l���s�6�{�{�*�*��\�]�]�]��H���'���f�a�h�>R�>R�>T�>T�)U�)U�V�V�V� �Hrc���t|j��t|��krtd���|j�dt j||j�������|S)NzDShape of input to max_contiguous does not match the length of valuesz tt.contiguity�rr�r)rkr�r r�r�r�s r�max_contiguousr��sa�� �1�7�|�|�s�6�{�{�"�"��_�`�`�`��H���o�r�|�F�A�H�<P�<P�<R�<R�'S�'S�T�T�T� �Hrc���t|j��t|��krtd���|j�dt j||j�������|S)NzCShape of input to max_constancy does not match the length of valuesz tt.constancyr�r�s r� max_constancyr��sa�� �1�7�|�|�s�6�{�{�"�"��^�_�_�_��H���n�b�l�6�1�8�;O�;O�;Q�;Q�&R�&R�S�S�S� �Hrc�d�tj|���tj��SrT)r*r+�create_barrierr�)r#s r� debug_barrierr��s"�� �9�W�+�+�-�-�r�w� 7� 7�7r�prefix�args�List[tl.tensor]�hexc�\�|�d��s|r|dz }|�d��s|r |dd�dz}t|��dkr|�d��sd|z}d�|D��}tj|�|||��tj��S)N� rLr2r c��g|] }|j�� Sr�r�)r&�args rr'z device_print.<locals>.<listcomp>�s��+�+�+�s�� �+�+�+r)�endswithr� startswithr*r+� create_printr�)r�r�r�r#�new_argss r� device_printr��s��� �?�?�3� � ��D���#� �� �?�?�4� � �$�T�$������t�#�� �6�{�{�Q���v�0�0��5�5���v���+�+�d�+�+�+�H� �9�W�)�)�&�#�x�@�@�"�'� J� J�Jr�cond�msg� file_name�linenoc �>�|j}|���sHtj|jd��}tj|�|jd��|��}tj|�|j||||��tj ��S)N)r ) r]r�r*r�r^r+rrk� create_assertr�)r�r�r�� func_namer�r#�cond_tys r� device_assertr�s����i�G� � � � � �L��-����6�6���y��-�-�d�k�5�A�A�7�K�K�� �9�W�*�*�4�;��Y� �SY�Z�Z�\^�\c� d� d�drc�z�t|t��rtj|��}t|tj��r�|rAd|jcxkrdksnJd|j�d����|�|j��Sd|jcxkrdksnJd|j�d����|�|j��St|tj��r�|jjdks Jd ���|j � ��s Jd ���|j tj krG|rE|� |j |���|j �����S|j tjkr |s Jd ���|j SJd t#|�������)Nl����lz@Block pointers only support 64 bit `shape/strides`, got a value z which is out of the range���zFBlock pointers only support 32 bit `offsets/block_shape`, got a value r z*Expected a scalar in shape/strides/offsetsz8Expected an integer scalar type in shape/strides/offsetsFzzBlock pointers only support 32 bit `offsets/block_shape`, add a `.to(tl.int32)` or use regular indexing for 64 bit supportz3Unsupported element type in shape/strides/offsets: )r r"r*r4rr�rkr+rr9rOr�r�rk� get_int64_tyr�r-r])r#r�r�s r�_convert_elem_to_ir_valuer� s����$����"��|�D�!�!���$�� �%�%�� � 1��T�Z�/�/�/�/�%�/�/�/�/�/�2F�#�z�2F�2F�2F�/�/�/��$�$�T�Z�0�0� 0��T�Z�/�/�/�/�%�/�/�/�/�/�2F�#�z�2F�2F�2F�/�/�/��$�$�T�Z�0�0� 0� �D�"�)� $� $���z��1�$�$�$�&R�$�$�$��z� � �"�"�^�^�$^�^�^�"� �:��� !� !�k� !��*�*�4�;��8L�8L�8N�8N�PT�PZ�Ph�Ph�Pj�Pj�k�k� k� �Z�2�8� #� #�K� #� S�S� S� S�5��{��T�T��T� � �T�T�T�T�5rc�h���t|d��r��fd�|D��St�|���gS)Nr�c�2��g|]}t�|�����Sr�)r�)r&r�r#r�s ��rr'z)_convert_to_ir_values.<locals>.<listcomp>%s&���\�\�\�$�)�'�4��E�E�\�\�\r)r�r�)r#� list_liker�s` `rr�r�#sJ�����y�*�%�%�]�\�\�\�\�\�R[�\�\�\�\� %�g�y�+� F� F� G�Gr�basec ����t||��}t||��}t||d���}|j���r|jj���rt d���|jjt jkr8t|t j t j |jj ��|��}t�d��s�g�d��D���td��D����s Jd���t|d��s|g}d�|D��}t|��tt!t#|������ks Jd ���t�fd �||||fD����s Jd ���|�|j|||�|��}t j|t j t j|jj�������S) NFr�zMExpected `base` to be a pointer type (but not a block pointer type or others)r�c�T�g|]%}t|tj��r|jn|��&Sr�r�r�s rr'z"make_block_ptr.<locals>.<listcomp>;s/��b�b�b�d��D�"�,�!?�!?�I�4�:�:�T�b�b�brc3�`K�|])}t|t��od|cxkodkncV��*dS)r�r�N)r r"r�s rrDz!make_block_ptr.<locals>.<genexpr><sM����X�X�D�z�$��$�$�?��4�)?�)?�)?�)?�%�)?�)?�)?�)?�X�X�X�X�X�XrzGExpected a list of constant integers (`int32_t` range) in `block_shape`c�T�g|]%}t|tj��r|jn|��&Sr�r�r�s rr'z"make_block_ptr.<locals>.<listcomp>Bs/�� V� V� V��:�d�B�L�9�9� C�T�Z�Z�t� V� V� Vrz<Expected a permutation of (0, 1, ..., len(order)-1) in orderc3�X�K�|]$}t���t|��kV��%dSrT)r)r&r�r�s �rrDz!make_block_ptr.<locals>.<genexpr>Fs6�����d�d�i�s�;���3�y�>�>�1�d�d�d�d�d�drzBExpected shape/strides/offsets/block_shape to have the same length)r�r]rUr�r�r)r*r�r_r�r�r�r�r�rFrGr r�create_make_block_ptrrkr+r�)r�r��stridesr�r��orderr#rks ` r�make_block_ptrr�)s��� "�'�5� 1� 1�E�#�G�W�5�5�G�#�G�W�%�H�H�H�G� �9� � � � �j���!5�!>�!>�!@�!@�j��h�i�i�i� �y��r�w�&�&��D�"�/�"�'�4�9�3J�K�K�W�U�U�� �;� � +� +�$�"�m� �b�b�Va�b�b�b�K� �X�X�K�X�X�X� X� X�R�R�Q�R�R� X� �5�*� %� %����� V� V�PU� V� V� V�E� �%�=�=�D��s�5�z�z�!2�!2�3�3� 3� 3� 3�5s� 3� 3� 3� �d�d�d�d�E�7�T[�]b�Cc�d�d�d� d� d�M�M�L�M�M� d� � *� *�4�;��w��Q\�^c� d� d�F� �9�V�R�_�R�]�4�9�;O�Q\�-]�-]�^�^� _� _�_rc��t||d���}tj|�|j|��|j��Sr�)r�r*r+�create_advancerkr])r�r�r#s r�advancer�Ps>��#�G�W�%�H�H�H�G� �9�W�+�+�D�K��A�A�4�9� M� M�Mr)r!r"r#r$r%r&)r3r4r5r4r%r4)r3r4r5r4rBrCr%r4)rr4rr4rQrCr%rR)FFTF)rXr&rYr&r#r$r%rZ)rgr&rhr&r#r$r%r&) rgr&rhr&r�rCr#r$r%r&)r�r&r�r&r�r�r#r$) r�r&r�r&r�r&r�r�r#r$)rgr&rhr&r#r$r%rZ)rgr&r#r$)rgr&r%r&)rgr&r#r$r%r&)rgr&r#r&r%r&)r�r&r%r�)rr"rr"r#r$r%r&)r�rr9r4r#r$r%r&)rr&r�rr#r$r%r&) rgr&rrrrCr#r$r%r&)rgr&r!r"r#r$r%r&) rXr&rYr&rrCr#r$r%r&)r/r&r0r&r#r$r%r&)r/r&r#r$r%rZ)rgr&r?r@r#r$r%r&)rgr&r�rr#r$r%r&)rXr&rYr&r#r$r%r&)r_r`)rgr&rhr4r#r$r%r&rT) rgr&rhr4r#r$rqr`r%r&)r�r&r�r�rhr�r�rr�rVr�rVr�rVr�rCr#r$r%r&) r�r&r�rVr�rVr#r$r%r&)r�r&rr&r#r$r%r&)r�r&r�r&r�r�r�rVr�rVr#r$r%r&)r�r&rr&r�r&r�rVr�rVr#r$r%r&) r�r&r�r&r�r&r rVr#r$r%r)r�r&r�r&r�r&r�rVr�rVr#r$r%r&)rXr&rYr&rPr&rNr`rQr"rRr4r#r$r%r&) rxr&r�r&r�r&r#r$r%r&)r~rr!r"r#r$r%r�) r~rr!r"r�rCr#r$r%r�)rgr&r�r"r#r$r%r&)r�r&r�rr%r&)r#r$r%r&) r�rVr�r�r�rCr#r$r%r&) r�r&r�rVr�rVr�r"r#r$r%r&)T)r�r&r#r$r%r&)l� __future__r�typingrrrrr� _C.libtritonr �r r*r r� Exceptionrr/r2rArPrWrfrqrxr}r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�rsr�r�r�r�r�r�r�rrrrr!r*r.r9r>rIrSr\rgr�r_r�r�r�r�r�r�r�r�r�r�r�r�r�rrr rr,r0r4r9r>rBrFrOrwrr}r�r�r�r�r�r�r�r�r�r�r�r�r�r�rr�<module>r�sW��"�"�"�"�"�"�;�;�;�;�;�;�;�;�;�;�;�;�;�;������������������� �G�C�L�L��F�F�F�F�F� �F�F�F�D�D�D�D� F�F�F�F� @� @� @� @� ,� ,� ,� ,�P <� <� <� <�ej�GK�,1������":�:�:�:�0 4� 4� 4� 4� 4� 4� 4� 4�R�R�R�R�4 :� :� :� :�&�&�&�&�4�4�4�4�65�5�5�5�$5�5�5�5�$ ]� ]� ]� ]�" � � � �Q�Q�Q�Q� P�P�P�P� Q�Q�Q�Q� '�'�'�'�&�&�&�&�"�"�"�"� R�R�R�R� R�R�R�R� Q�Q�Q�Q�����#�#�#�#�$�$�$�$�)�)�)�)� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4� 4�"D�D�D�D�"(�(�(�(�.H�H�H�H�[�[�[�[�M�M�M�M�K�K�K�K�����2 � � � �I�I�I�I�L�L�L�L�$2�2�2�2�tp�p�p�p�Z�Z�Z�Z�(04�o5�o5�o5�o5�o5�n � � � � � � � � � � � � � � � � � � � � �w�w�w�,7=�7=�7=�t n� n� n� n� ����g�g�g�g� ���:)q�)q�)q�XW�W�W�W�,j�j�j�j�����6$)�$)�$)�$)�N$)�$)�$)�$)�No�o�o�o�����������������8�8�8�X�X�X�X�@ Z� Z� Z� Z�& � � �u�u�u�u�.o�o�o�o�2n�n�n�n� � � � � � � � � � � � �8�8�8�8� K� K� K� K�e�e�e�e�U�U�U�0H�H�H�H� $`�$`�$`�$`�NN�N�N�N�N�Nr
Memory