Metadata-Version: 2.1 Name: nvidia-cusparselt-cu12 Version: 0.6.2 Summary: NVIDIA cuSPARSELt Home-page: https://developer.nvidia.com/cusparselt Author: NVIDIA Corporation Author-email: cuda_installer@nvidia.com License: NVIDIA Proprietary Software Keywords: cuda,nvidia,machine learning,high-performance computing Classifier: Topic :: Scientific/Engineering Classifier: Environment :: GPU :: NVIDIA CUDA Classifier: Environment :: GPU :: NVIDIA CUDA :: 12 Description-Content-Type: text/x-rst ################################################################################### cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication ################################################################################### **NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix: .. math:: D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale where :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose, and :math:`alpha, beta, scale` are scalars. The *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types. **Download:** `developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>`_ **Provide Feedback:** `Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>`_ **Examples**: `cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul>`_, `cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul_advanced>`_ **Blog post**: - `Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>`_ - `Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines <https://developer.nvidia.com/blog/structured-sparsity-in-the-nvidia-ampere-architecture-and-applications-in-search-engines/>`__ - `Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture <https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31552/>`__ ================================================================================ Key Features ================================================================================ * *NVIDIA Sparse MMA tensor core* support * Mixed-precision computation support: +--------------+----------------+-----------------+-------------+ | Input A/B | Input C | Output D | Compute | +==============+================+=================+=============+ | `FP32` | `FP32` | `FP32` | `FP32` | +--------------+----------------+-----------------+-------------+ | `FP16` | `FP16` | `FP16` | `FP32` | + + + +-------------+ | | | | `FP16` | +--------------+----------------+-----------------+-------------+ | `BF16` | `BF16` | `BF16` | `FP32` | +--------------+----------------+-----------------+-------------+ | `INT8` | `INT8` | `INT8` | `INT32` | + +----------------+-----------------+ + | | `INT32` | `INT32` | | + +----------------+-----------------+ + | | `FP16` | `FP16` | | + +----------------+-----------------+ + | | `BF16` | `BF16` | | +--------------+----------------+-----------------+-------------+ | `E4M3` | `FP16` | `E4M3` | `FP32` | + +----------------+-----------------+ + | | `BF16` | `E4M3` | | + +----------------+-----------------+ + | | `FP16` | `FP16` | | + +----------------+-----------------+ + | | `BF16` | `BF16` | | + +----------------+-----------------+ + | | `FP32` | `FP32` | | +--------------+----------------+-----------------+-------------+ | `E5M2` | `FP16` | `E5M2` | `FP32` | + +----------------+-----------------+ + | | `BF16` | `E5M2` | | + +----------------+-----------------+ + | | `FP16` | `FP16` | | + +----------------+-----------------+ + | | `BF16` | `BF16` | | + +----------------+-----------------+ + | | `FP32` | `FP32` | | +--------------+----------------+-----------------+-------------+ * Matrix pruning and compression functionalities * Activation functions, bias vector, and output scaling * Batched computation (multiple matrices in a single run) * GEMM Split-K mode * Auto-tuning functionality (see `cusparseLtMatmulSearch()`) * NVTX ranging and Logging functionalities ================================================================================ Support ================================================================================ * *Supported SM Architectures*: `SM 8.0`, `SM 8.6`, `SM 8.9`, `SM 9.0` * *Supported CPU architectures and operating systems*: +------------+--------------------+ | OS | CPU archs | +============+====================+ | `Windows` | `x86_64` | +------------+--------------------+ | `Linux` | `x86_64`, `Arm64` | +------------+--------------------+ ================================================================================ Documentation ================================================================================ Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation. ================================================================================ Installation ================================================================================ The cuSPARSELt wheel can be installed as follows: .. code-block:: bash pip install cusparselt-cuXX where XX is the CUDA major version (currently CUDA 12 only is supported).