�
%�g� � �@ � d dl Z d dlmZ d dlmZmZmZ ddlmZ ddl m
Z
mZmZm
Z
mZmZmZmZ ddedefd �Zd
� Zd� Zd� Z ddeeeee f deee deee deeeef dee dee fd�ZdS )� N)�
MethodType)�Any�Optional�Union� )�PartialState)�calculate_maximum_sizes�
convert_bytes�copy_tensor_to_devices�ignorant_find_batch_size�infer_auto_device_map�is_pippy_available�pad_input_tensors�send_to_device�
num_processes�
max_memoryc �| � � |dk rt | |d�� � S |��t | � � \ }}||d z |z � t � � � � � � d� � \ }}t j t
|� � � � dz � � � d|� �� � fd�t |� � D � � }t | ||d� � � }|S )
zH
Calculates the device map for `model` with an offset for PiPPy
r F)�no_split_module_classes�clean_resultNr � g�������?c � �� i | ]}|���S � r )�.0�i�memorys ��d/home/asafur/pinokio/api/open-webui.git/app/env/lib/python3.11/site-packages/accelerate/inference.py�
<dictcomp>z'generate_device_map.<locals>.<dictcomp>0 s �� �>�>�>�A�a��>�>�>� )r r r )r
r r
�split�math�ceil�float�range)
�modelr r r �
model_size�shared�value�ending�
device_mapr s
@r �generate_device_mapr* s� �� � ����$�U�D[�jo�p�p�p�p���4�U�;�;��
�F� �v�a�y�(�M�9���v�&�&�����S�)�)�
��v� ��5��<�<�(�(�3�.���%�%�V�%�%��>�>�>�>��}�)=�)=�>�>�>�
�&�
�� 7�� � � �J� �r c � � d }| �| D ]}t |� � }|� n�|�,|�*|� � � D ]}t |� � }|� n�|S �N)r �values)�args�kwargs�found_batch_size�arg�kwargs r �find_pippy_batch_sizer3 : s� � ������ � �C�7��<�<���+��� ,�
��.�6��]�]�_�_� � �E�7��>�>���+��� ,��r c �� �� ddl m}m�m} t � � }�fd�|D � � } || |||�� � } | � |j |j �� � }
||
|� � }|S )aB
Attaches the split points to the model based on `self.device_map` and generates a `PipelineStage`. Requires passing
in needed `args` and `kwargs` as the model needs on the CPU.
Users can pass in custom `num_chunks` as an optional hyper-parameter. By default will use
`AcceleratorState.num_processes`
r )�
ScheduleGPipe�
SplitPoint�pipelinec � �� i | ]
}|�j ��S r )� BEGINNING)r �split_pointr6 s �r r z"build_pipeline.<locals>.<dictcomp>V s �� �T�T�T��+�z�3�T�T�Tr )�mb_args� mb_kwargs�
split_spec)�device)�torch.distributed.pipeliningr5 r6 r7 r �build_stage�local_process_indexr>