�
���gw � � � d dl Zdedefd�Zdededee fd�Zdededee fd�Zd ee defd
�Z dej
j dedefd�ZdS )
� N�
gen_kwargs�returnc � � d� | � � � D � � }t t |� � � � � � � dk rGt dd� d� |� � � D � � � � z dz dz � � �t
|� � � d� � � }t
d|� � S )
zFReturn the number of possible shards according to the input gen_kwargsc �^ � i | ]*\ }}t |t � � �|t |� � ��+S � ��
isinstance�list�len)�.0�key�values �g/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/datasets/utils/sharding.py�
<dictcomp>z3_number_of_shards_in_gen_kwargs.<locals>.<dictcomp> s5 � �e�e�e���e�Z�X]�_c�Md�Md�e�S�#�e�*�*�e�e�e� � z�Sharding is ambiguous for this dataset: we found several data sources lists of different lengths, and we don't know over which list we should parallelize:
�
c 3 �, K � | ]\ }}d |� d|� �V � �dS )z - key z has length Nr )r r
�lengths r � <genexpr>z2_number_of_shards_in_gen_kwargs.<locals>.<genexpr>
s7 � � � �f�f���f�<�3�<�<�F�<�<�f�f�f�f�f�fr zW
To fix this, check the 'gen_kwargs' and make sure to use lists only for data sources, zqand use tuples otherwise. In the end there should only be one single list, or several lists with the same length.r )�default)�itemsr �set�values�RuntimeError�join�max)r �
lists_lengths�
max_lengths r �_number_of_shards_in_gen_kwargsr s� � � f�e�z�7G�7G�7I�7I�e�e�e�M�
�3�}�#�#�%�%�&�&�'�'�!�+�+��
E��i�i�f�f�P]�Pc�Pc�Pe�Pe�f�f�f�f�f�
g� i�
i� B�
B�
�
�
� �]�)�)�+�+�Q�7�7�7�J��q�*���r �
num_shards�max_num_jobsc �� � g }t |� � D ]R}| |z || |z k z }|dk r n:|r
|d j nd}t |||z � � }|� |� � �S|S )a�
Get the range of shard indices per job.
If num_shards<max_num_jobs, then num_shards jobs are given a range of one shard.
The shards indices order is preserved: e.g. all the first shards are given the first job.
Moreover all the jobs are given approximately the same number of shards.
Example:
```python
>>> _distribute_shards(2, max_num_jobs=4)
[range(0, 1), range(1, 2)]
>>> _distribute_shards(10, max_num_jobs=3)
[range(0, 4), range(4, 7), range(7, 10)]
```
r �����)�range�stop�append)r! r"