� ���gw���ddlZdedefd�Zdededeefd�Zdededeefd�Zd eedefd �Z d ej j dedefd �Z dS) �N� gen_kwargs�returnc��d�|���D��}tt|�������dkrGt dd�d�|���D����zdzdz���t |���d� ��}t d|��S) zFReturn the number of possible shards according to the input gen_kwargsc�^�i|]*\}}t|t���|t|����+S��� isinstance�list�len)�.0�key�values �g/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/datasets/utils/sharding.py� <dictcomp>z3_number_of_shards_in_gen_kwargs.<locals>.<dictcomp>s5��e�e�e���e�Z�X]�_c�Md�Md�e�S�#�e�*�*�e�e�e��z�Sharding is ambiguous for this dataset: we found several data sources lists of different lengths, and we don't know over which list we should parallelize: � c3�,K�|]\}}d|�d|��V��dS)z - key z has length Nr)r r �lengths r� <genexpr>z2_number_of_shards_in_gen_kwargs.<locals>.<genexpr> s7����f�f���f�<�3�<�<�F�<�<�f�f�f�f�f�frzW To fix this, check the 'gen_kwargs' and make sure to use lists only for data sources, zqand use tuples otherwise. In the end there should only be one single list, or several lists with the same length.r)�default)�itemsr �set�values� RuntimeError�join�max)r� lists_lengths� max_lengths r�_number_of_shards_in_gen_kwargsr s���f�e�z�7G�7G�7I�7I�e�e�e�M� �3�}�#�#�%�%� &� &�'�'�!�+�+�� E��i�i�f�f�P]�Pc�Pc�Pe�Pe�f�f�f�f�f� g�i� i�B�  B� � � ��]�)�)�+�+�Q�7�7�7�J� �q�*� � �r� num_shards� max_num_jobsc���g}t|��D]R}||z|||zkz}|dkrn:|r |djnd}t|||z��}|�|���S|S)a� Get the range of shard indices per job. If num_shards<max_num_jobs, then num_shards jobs are given a range of one shard. The shards indices order is preserved: e.g. all the first shards are given the first job. Moreover all the jobs are given approximately the same number of shards. Example: ```python >>> _distribute_shards(2, max_num_jobs=4) [range(0, 1), range(1, 2)] >>> _distribute_shards(10, max_num_jobs=3) [range(0, 4), range(4, 7), range(7, 10)] ``` r�����)�range�stop�append)r!r"�shards_indices_per_group� group_idx�num_shards_to_add�start� shard_indicess r�_distribute_shardsr-s��� "���<�(�(�7�7� �&�,�6�)�z�T`�G`�:a�b�� �� !� !� �E�5M�T�(��,�1�1�ST���e�U�->�%>�?�?� � �'�'� �6�6�6�6� #�#rc�����t���}|dkrt���gSt||������fd�tt �����D��S)z2Split the gen_kwargs into `max_num_job` gen_kwargsr)r!r"c�T���g|]#���fd�����D����$S)c�p���i|]1\}�|t�t��r�fd���D��n���2S)c� ��g|] }�|�� Srr)r � shard_idxrs �r� <listcomp>z;_split_gen_kwargs.<locals>.<listcomp>.<dictcomp>.<listcomp>:s���[�[�[�9�e�I�&�[�[�[r�r r )r r rr)�shard_indices_per_groups @��rrz0_split_gen_kwargs.<locals>.<listcomp>.<dictcomp>9sf���� � � ��C����e�T�*�*��[�[�[�[�8O�PY�8Z�[�[�[�[�� � � r)r)r r)rr5s @��rr3z%_split_gen_kwargs.<locals>.<listcomp>8s`���� � � ��  � � � � �#-�"2�"2�"4�"4�  � � � � � r)r �dictr-r%r )rr"r!r5s` @r�_split_gen_kwargsr70s�����1��<�<�J��Q����Z� � �!�!�"4� �Ye�"f�"f�"f�� � � � � �#�3�'>�#?�#?�@�@� � � � r�gen_kwargs_listc�,���fd��dD��S)Nc����i|]@��t�d�t��r�fd��D��n �d���AS)rc�*��g|]}|�D]}|���Srr)r rrr s �rr3z0_merge_gen_kwargs.<locals>.<dictcomp>.<listcomp>Es*��� S� S� S� �:�c�?� S� S�%�e� S� S� S� Srr4)r r r8s @�rrz%_merge_gen_kwargs.<locals>.<dictcomp>Dso���� � � � � � �o�a�(��-�t� 4� 4�%� S� S� S� S�o� S� S� S� S� �Q� �� $� � � rrr)r8s`r�_merge_gen_kwargsr<Cs5��� � � � �#�1�%�  � � �r�rngc���d�|���D��}i}|D]<}tt|����||<|�||���=t |��}|���D]>\}�t �t��r$�fd�|t���D��||<�?|S)z.Return a shuffled copy of the input gen_kwargsc�V�h|]&}t|t���t|����'Srr)r rs r� <setcomp>z&_shuffle_gen_kwargs.<locals>.<setcomp>Rs.��Y�Y�Y���E�SW�AX�AX�Y�#�e�*�*�Y�Y�Yrc� ��g|] }�|�� Srr)r �irs �rr3z'_shuffle_gen_kwargs.<locals>.<listcomp>[s���#S�#S�#S��E�!�H�#S�#S�#Sr)rr r%�shuffler6rr r )r=r� list_sizes�indices_per_size�size�shuffled_kwargsr rs @r�_shuffle_gen_kwargsrHLs���� Z�Y�*�*;�*;�*=�*=�Y�Y�Y�J����,�,��!%�e�D�k�k�!2�!2���� � � �$�T�*�+�+�+�+��:�&�&�O�%�+�+�-�-�T�T� ��U� �e�T� "� "� T�#S�#S�#S�#S�6F�s�5�z�z�6R�#S�#S�#S�O�C� �� �r) �numpy�npr6�intr r r%r-r7r<�random� GeneratorrHrrr�<module>rNs���������������"$�3�$�c�$�d�5�k�$�$�$�$�6 �$� �c� �d�4�j� � � � �&�t�D�z��d������R�Y�0��d��t������r
Memory