� ���g����ddlZd�Zdd�ZdS)�Nc�:�||z|���z }tj|��}t||���z ��}|dkr�||z }tjtj|����ddd�}|D]n}tj||k��\} tt| ��|��} |� | | d���} || xxdz cc<|| z}|dkrn�o|� tj ��S)a�Computes approximate mode of multivariate hypergeometric. This is an approximation to the mode of the multivariate hypergeometric given by class_counts and n_draws. It shouldn't be off by more than one. It is the mostly likely outcome of drawing n_draws many samples from the population given by class_counts. Args ---------- class_counts : ndarray of int Population per class. n_draws : int Number of draws (samples to draw) from the overall population. rng : random state Used to break ties. Returns ------- sampled_classes : ndarray of int Number of samples drawn from each class. np.sum(sampled_classes) == n_draws rN�����F)�size�replace�) �sum�np�floor�int�sort�unique�where�min�len�choice�astype�int64) � class_counts�n_draws�rng� continuous�floored� need_to_add� remainder�values�value�inds�add_nows �g/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/datasets/utils/stratify.py�approximate_moder s��0�<�'�,�*:�*:�*<�*<�<�J��h�z�"�"�G��g�� � � � �-�.�.�K��Q�����(� �����9�-�-�.�.�t�t��t�4��� � �E��h�y�E�1�2�2�G�T� �#�d�)�)�[�1�1�G��:�:�d��%�:�@�@�D� �D�M�M�M�Q� �M�M�M� �7� "�K��a����� � �>�>�"�(� #� #�#�� c #��K�tj|d���\}}|jd}tj|��}tj|��dkrt d���||krt d||fz���||krt d||fz���tjtj|d� ��tj|��d d ���} t|��D]�} t|||��} || z } t| ||��} g}g}t|��D]�}|� ||��}| |� |d � ��}|� |d | |���|� || || || |z�����|� |��}|� |��}||fV���d S)a� Provides train/test indices to split data in train/test sets. It's reference is taken from StratifiedShuffleSplit implementation of scikit-learn library. Args ---------- n_train : int, represents the absolute number of train samples. n_test : int, represents the absolute number of test samples. random_state : int or RandomState instance, default=None Controls the randomness of the training and testing indices produced. Pass an int for reproducible output across multiple function calls. n_splits : int, default=10 Number of re-shuffling & splitting iterations. T)�return_inverser�zMinimum class count errorzLThe train_size = %d should be greater or equal to the number of classes = %dzKThe test_size = %d should be greater or equal to the number of classes = %d� mergesort)�kindNr�clip)�mode)r r �shape�bincountr� ValueError�split�argsort�cumsum�ranger � permutation�take�extend)�y�n_train�n_testr�n_splits�classes� y_indices� n_classesr� class_indices�_�n_i�class_counts_remaining�t_i�train�test�ir1�perm_indices_class_is r�)stratified_shuffle_split_generate_indicesrD6s����.��1�T�:�:�:��G�Y�� �a� �I��;�y�)�)�L� �v�l���a����4�5�5�5������ Z�^e�gp�]q� q� � � �� ���� Y�]c�en�\o� o� � � ��H�R�Z� � �D�D�D�b�i�P\�F]�F]�^a�_a�^a�Fb�c�c�M� �8�_�_�����|�W�c�:�:��!-��!3���5�v�s�C�C�������y�!�!� H� H�A��/�/�,�q�/�:�:�K�#0��#3�#8�#8��6�#8�#R�#R� � �L�L�-�h��A��h�7� 8� 8� 8� �K�K�,�S��V�c�!�f�s�1�v�o�-E�F� G� G� G� G�����&�&�����t�$�$���T�k�����!�r!)r")�numpyr r rD�r!r�<module>rGs?������/$�/$�/$�d5�5�5�5�5�5r!
Memory