scikit-cuda

scikit-cuda provides Python interfaces to many of the functions in the CUDA device/runtime, CUBLAS, CUFFT, and CUSOLVER libraries distributed as part of NVIDIA’s CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided.

Python wrappers for cuDNN by Hannes Bretschneider are available here.

Contents

Installation

Quick Installation

If you have pip installed, you should be able to install the latest stable release of scikit-cuda by running the following:

pip install scikit-cuda

All dependencies should be automatically downloaded and installed if they are not already on your system.

Obtaining the Latest Software

The latest stable and development versions of scikit-cuda can be downloaded from GitHub

Online documentation is available at https://scikit-cuda.readthedocs.org

Installation Dependencies

scikit-cuda requires that the following software packages be installed:

Note that both Python and the CUDA Toolkit must be built for the same architecture, i.e., Python compiled for a 32-bit architecture will not find the libraries provided by a 64-bit CUDA installation. CUDA versions from 7.0 onwards are 64-bit.

To run the unit tests, the following packages are also required:

Some of the linear algebra functionality relies on the CULA toolkit; as of 2017, it is available to premium tier users of E.M. Photonics’ HPC site Celerity Tools:

  • CULA R16a or later.

To build the documentation, the following packages are also required:

Platform Support

The software has been developed and tested on Linux; it should also work on other Unix-like platforms supported by the above packages. Parts of the package may work on Windows as well, but remain untested.

Building and Installation

scikit-cuda searches for CUDA libraries in the system library search path when imported. You may have to modify this path (e.g., by adding the path to the CUDA libraries to /etc/ld.so.conf and running ldconfig as root or to the LD_LIBRARY_PATH environmental variable on Linux, or by adding the CUDA library path to the DYLD_LIBRARY_PATH on MacOSX) if the libraries are not being found.

To build and install the toolbox, download and unpack the source release and run:

python setup.py install

from within the main directory in the release. To rebuild the documentation, run:

python setup.py build_sphinx

Running the Unit Tests

To run all of the package unit tests, download and unpack the package source tarball and run:

python setup.py test

from within the main directory in the archive. Tests for individual modules (found in the tests/ subdirectory) can also be run directly.

Getting Started

The functions provided by scikit-cuda are grouped into several submodules in the skcuda namespace package. Sample code demonstrating how to use different parts of the toolbox is located in the demos/ subdirectory of the source release. Many of the high-level functions also contain doctests that describe their usage.

Reference

Library Wrapper Routines

CUBLAS Routines
Helper Routines
cublasCheckStatus Raise CUBLAS exception
cublasCreate Initialize CUBLAS.
cublasDestroy Release CUBLAS resources.
cublasGetCurrentCtx Get current CUBLAS context.
cublasGetStream Set current CUBLAS library stream.
cublasGetVersion Get CUBLAS version.
cublasSetStream Set current CUBLAS library stream.
Wrapper Routines
Single Precision BLAS1 Routines
cublasIsamax Index of maximum magnitude element.
cublasIsamin Index of minimum magnitude element (single precision real).
cublasSasum Sum of absolute values of single precision real vector.
cublasSaxpy Vector addition (single precision real).
cublasScopy Vector copy (single precision real)
cublasSdot Vector dot product (single precision real)
cublasSnrm2 Euclidean norm (2-norm) of real vector.
cublasSrot Apply a real rotation to real vectors (single precision)
cublasSrotg Construct a single precision real Givens rotation matrix.
cublasSrotm Apply a single precision real modified Givens rotation.
cublasSrotmg Construct a single precision real modified Givens rotation matrix.
cublasSscal Scale a single precision real vector by a single precision real scalar.
cublasSswap Swap single precision real vectors.
cublasCaxpy Vector addition (single precision complex).
cublasCcopy Vector copy (single precision complex)
cublasCdotc Vector dot product (single precision complex)
cublasCdotu Vector dot product (single precision complex)
cublasCrot Apply a complex rotation to complex vectors (single precision)
cublasCrotg Construct a single precision complex Givens rotation matrix.
cublasCscal Scale a single precision complex vector by a single precision complex scalar.
cublasCsrot Apply a complex rotation to complex vectors (single precision)
cublasCsscal Scale a single precision complex vector by a single precision real scalar.
cublasCswap Swap single precision complex vectors.
cublasIcamax Index of maximum magnitude element.
cublasIcamin Index of minimum magnitude element (single precision complex).
cublasScasum Sum of absolute values of single precision complex vector.
cublasScnrm2 Euclidean norm (2-norm) of real vector.
Double Precision BLAS1 Routines
cublasIdamax Index of maximum magnitude element.
cublasIdamin Index of minimum magnitude element (double precision real).
cublasDasum Sum of absolute values of double precision real vector.
cublasDaxpy Vector addition (double precision real).
cublasDcopy Vector copy (double precision real)
cublasDdot Vector dot product (double precision real)
cublasDnrm2 Euclidean norm (2-norm) of real vector.
cublasDrot Apply a real rotation to real vectors (double precision)
cublasDrotg Construct a double precision real Givens rotation matrix.
cublasDrotm Apply a double precision real modified Givens rotation.
cublasDrotmg Construct a double precision real modified Givens rotation matrix.
cublasDscal Scale a double precision real vector by a double precision real scalar.
cublasDswap Swap double precision real vectors.
cublasDzasum Sum of absolute values of double precision complex vector.
cublasDznrm2 Euclidean norm (2-norm) of real vector.
cublasIzamax Index of maximum magnitude element.
cublasIzamin Index of minimum magnitude element (double precision complex).
cublasZaxpy Vector addition (double precision complex).
cublasZcopy Vector copy (double precision complex)
cublasZdotc Vector dot product (double precision complex)
cublasZdotu Vector dot product (double precision complex)
cublasZdrot Apply a complex rotation to complex vectors (double precision)
cublasZdscal Scale a double precision complex vector by a double precision real scalar.
cublasZrot Apply a complex rotation to complex vectors (double precision)
cublasZrotg Construct a double precision complex Givens rotation matrix.
cublasZscal Scale a double precision complex vector by a double precision complex scalar.
cublasZswap Swap double precision complex vectors.
Single Precision BLAS2 Routines
cublasSgbmv Matrix-vector product for real single precision general banded matrix.
cublasSgemv Matrix-vector product for real single precision general matrix.
cublasSger Rank-1 operation on real single precision general matrix.
cublasSsbmv Matrix-vector product for real single precision symmetric-banded matrix.
cublasSspmv Matrix-vector product for real single precision symmetric packed matrix.
cublasSspr Rank-1 operation on real single precision symmetric packed matrix.
cublasSspr2 Rank-2 operation on real single precision symmetric packed matrix.
cublasSsymv Matrix-vector product for real symmetric matrix.
cublasSsyr Rank-1 operation on real single precision symmetric matrix.
cublasSsyr2 Rank-2 operation on real single precision symmetric matrix.
cublasStbmv Matrix-vector product for real single precision triangular banded matrix.
cublasStbsv Solve real single precision triangular banded system with one right-hand side.
cublasStpmv Matrix-vector product for real single precision triangular packed matrix.
cublasStpsv Solve real triangular packed system with one right-hand side.
cublasStrmv Matrix-vector product for real single precision triangular matrix.
cublasStrsv Solve real triangular system with one right-hand side.
cublasCgbmv Matrix-vector product for complex single precision general banded matrix.
cublasCgemv Matrix-vector product for complex single precision general matrix.
cublasCgerc Rank-1 operation on complex single precision general matrix.
cublasCgeru Rank-1 operation on complex single precision general matrix.
cublasChbmv Matrix-vector product for single precision Hermitian banded matrix.
cublasChemv Matrix vector product for single precision Hermitian matrix.
cublasCher Rank-1 operation on single precision Hermitian matrix.
cublasCher2 Rank-2 operation on single precision Hermitian matrix.
cublasChpmv Matrix-vector product for single precision Hermitian packed matrix.
cublasChpr Rank-1 operation on single precision Hermitian packed matrix.
cublasChpr2 Rank-2 operation on single precision Hermitian packed matrix.
cublasCtbmv Matrix-vector product for complex single precision triangular banded matrix.
cublasCtbsv Solve complex single precision triangular banded system with one right-hand side.
cublasCtpmv Matrix-vector product for complex single precision triangular packed matrix.
cublasCtpsv Solve complex single precision triangular packed system with one right-hand side.
cublasCtrmv Matrix-vector product for complex single precision triangular matrix.
cublasCtrsv Solve complex single precision triangular system with one right-hand side.
Double Precision BLAS2 Routines
cublasDgbmv Matrix-vector product for real double precision general banded matrix.
cublasDgemv Matrix-vector product for real double precision general matrix.
cublasDger Rank-1 operation on real double precision general matrix.
cublasDsbmv Matrix-vector product for real double precision symmetric-banded matrix.
cublasDspmv Matrix-vector product for real double precision symmetric packed matrix.
cublasDspr Rank-1 operation on real double precision symmetric packed matrix.
cublasDspr2 Rank-2 operation on real double precision symmetric packed matrix.
cublasDsymv Matrix-vector product for real double precision symmetric matrix.
cublasDsyr Rank-1 operation on real double precision symmetric matrix.
cublasDsyr2 Rank-2 operation on real double precision symmetric matrix.
cublasDtbmv Matrix-vector product for real double precision triangular banded matrix.
cublasDtbsv Solve real double precision triangular banded system with one right-hand side.
cublasDtpmv Matrix-vector product for real double precision triangular packed matrix.
cublasDtpsv Solve real double precision triangular packed system with one right-hand side.
cublasDtrmv Matrix-vector product for real double precision triangular matrix.
cublasDtrsv Solve real double precision triangular system with one right-hand side.
cublasZgbmv Matrix-vector product for complex double precision general banded matrix.
cublasZgemv Matrix-vector product for complex double precision general matrix.
cublasZgerc Rank-1 operation on complex double precision general matrix.
cublasZgeru Rank-1 operation on complex double precision general matrix.
cublasZhbmv Matrix-vector product for double precision Hermitian banded matrix.
cublasZhemv Matrix-vector product for double precision Hermitian matrix.
cublasZher Rank-1 operation on double precision Hermitian matrix.
cublasZher2 Rank-2 operation on double precision Hermitian matrix.
cublasZhpmv Matrix-vector product for double precision Hermitian packed matrix.
cublasZhpr Rank-1 operation on double precision Hermitian packed matrix.
cublasZhpr2 Rank-2 operation on double precision Hermitian packed matrix.
cublasZtbmv Matrix-vector product for complex double triangular banded matrix.
cublasZtbsv Solve complex double precision triangular banded system with one right-hand side.
cublasZtpmv Matrix-vector product for complex double precision triangular packed matrix.
cublasZtpsv Solve complex double precision triangular packed system with one right-hand size.
cublasZtrmv Matrix-vector product for complex double precision triangular matrix.
cublasZtrsv Solve complex double precision triangular system with one right-hand side.
Single Precision BLAS3 Routines
cublasSgemm Matrix-matrix product for real single precision general matrix.
cublasSsymm Matrix-matrix product for real single precision symmetric matrix.
cublasSsyrk Rank-k operation on real single precision symmetric matrix.
cublasSsyr2k Rank-2k operation on real single precision symmetric matrix.
cublasStrmm Matrix-matrix product for real single precision triangular matrix.
cublasStrsm Solve a real single precision triangular system with multiple right-hand sides.
cublasCgemm Matrix-matrix product for complex single precision general matrix.
cublasChemm Matrix-matrix product for single precision Hermitian matrix.
cublasCherk Rank-k operation on single precision Hermitian matrix.
cublasCher2k Rank-2k operation on single precision Hermitian matrix.
cublasCsymm Matrix-matrix product for complex single precision symmetric matrix.
cublasCsyrk Rank-k operation on complex single precision symmetric matrix.
cublasCsyr2k Rank-2k operation on complex single precision symmetric matrix.
cublasCtrmm Matrix-matrix product for complex single precision triangular matrix.
cublasCtrsm Solve a complex single precision triangular system with multiple right-hand sides.
Double Precision BLAS3 Routines
cublasDgemm Matrix-matrix product for real double precision general matrix.
cublasDsymm Matrix-matrix product for real double precision symmetric matrix.
cublasDsyrk Rank-k operation on real double precision symmetric matrix.
cublasDsyr2k Rank-2k operation on real double precision symmetric matrix.
cublasDtrmm Matrix-matrix product for real double precision triangular matrix.
cublasDtrsm Solve a real double precision triangular system with multiple right-hand sides.
cublasZgemm Matrix-matrix product for complex double precision general matrix.
cublasZhemm Matrix-matrix product for double precision Hermitian matrix.
cublasZherk Rank-k operation on double precision Hermitian matrix.
cublasZher2k Rank-2k operation on double precision Hermitian matrix.
cublasZsymm Matrix-matrix product for complex double precision symmetric matrix.
cublasZsyrk Rank-k operation on complex double precision symmetric matrix.
cublasZsyr2k Rank-2k operation on complex double precision symmetric matrix.
cublasZtrmm Matrix-matrix product for complex double precision triangular matrix.
cublasZtrsm Solve complex double precision triangular system with multiple right-hand sides.
Single-Precision BLAS-like Extension Routines
cublasSdgmm Multiplies a matrix with a diagonal matrix.
cublasSgeam Matrix-matrix addition/transposition (single precision real).
cublasSgemmBatched Matrix-matrix product for arrays of real single precision general matrices.
cublasCgemmBatched Matrix-matrix product for arrays of complex single precision general matrices.
cublasStrsmBatched This function solves an array of triangular linear systems with multiple right-hand-sides.
cublasSgetrfBatched This function performs the LU factorization of an array of n x n matrices.
cublasCdgmm Multiplies a matrix with a diagonal matrix.
cublasCgeam Matrix-matrix addition/transposition (single precision complex).
Double-Precision BLAS-like Extension Routines
cublasDdgmm Multiplies a matrix with a diagonal matrix.
cublasDgeam Matrix-matrix addition/transposition (double precision real).
cublasDgemmBatched Matrix-matrix product for arrays of real double precision general matrices.
cublasZgemmBatched Matrix-matrix product for arrays of complex double precision general matrices.
cublasDtrsmBatched This function solves an array of triangular linear systems with multiple right-hand-sides.
cublasDgetrfBatched This function performs the LU factorization of an array of n x n matrices.
cublasZdgmm Multiplies a matrix with a diagonal matrix.
cublasZgeam Matrix-matrix addition/transposition (double precision complex).
CUFFT Routines
Helper Routines
cufftCheckStatus
cufftCreate
cufftDestroy
cufftSetAutoAllocation
cufftSetCompatibilityMode
cufftSetStream
cufftSetWorkArea
Wrapper Routines
cufftPlan1d
cufftPlan2d
cufftPlan3d
cufftPlanMany
cufftDestroy
cufftExecC2C
cufftExecR2C
cufftExecC2R
cufftExecZ2Z
cufftExecD2Z
cufftExecZ2D
cufftEstimate1d
cufftEstimate2d
cufftEstimate3d
cufftEstimateMany
cufftGetSize1d
cufftGetSize2d
cufftGetSize3d
cufftGetSizeMany
cufftGetSize
cufftMakePlan1d
cufftMakePlan2d
cufftMakePlan3d
cufftMakePlanMany
CUSOLVER Routines

These routines are only available in CUDA 7.0 and later.

Wrapper Routines
Single Precision Routines
cusolverDnSgeqrf_bufferSize Calculate size of work buffer used by cusolverDnSgeqrf.
cusolverDnSgeqrf Compute QR factorization of a real single precision m x n matrix.
cusolverDnSgesvd_bufferSize Calculate size of work buffer used by cusolverDnSgesvd.
cusolverDnSgesvd Compute real single precision singular value decomposition.
cusolverDnSgetrf_bufferSize Calculate size of work buffer used by cusolverDnSgetrf.
cusolverDnSgetrf Compute LU factorization of a real single precision m x n matrix.
cusolverDnSgetrs Solve real single precision linear system.
cusolverDnSorgqr_bufferSize Calculate size of work buffer used by cusolverDnSorgqr.
cusolverDnSorgqr Create unitary m x n matrix from single precision real reflection vectors.
cusolverDnSpotrf_bufferSize Calculate size of work buffer used by cusolverDnSpotrf.
cusolverDnSpotrf Compute Cholesky factorization of a real single precision Hermitian positive-definite matrix.
cusolverDnSsyevd_bufferSize Calculate size of work buffer used by culsolverDnSsyevd.
cusolverDnSsyevd
cusolverDnSsyevj_bufferSize
cusolverDnSsyevj
cusolverDnSsyevjBatched_bufferSize
cusolverDnSsyevjBatched
cusolverDnCgeqrf_bufferSize Calculate size of work buffer used by cusolverDnCgeqrf.
cusolverDnCgeqrf Compute QR factorization of a complex single precision m x n matrix.
cusolverDnCgesvd_bufferSize Calculate size of work buffer used by cusolverDnCgesvd.
cusolverDnCgesvd Compute complex single precision singular value decomposition.
cusolverDnCgetrf_bufferSize Calculate size of work buffer used by cusolverDnCgetrf.
cusolverDnCgetrf Compute LU factorization of a complex single precision m x n matrix.
cusolverDnCgetrs Solve complex single precision linear system.
cusolverDnCheevd_bufferSize Calculate size of work buffer used by culsolverDnCheevd.
cusolverDnCheevd
cusolverDnCheevj_bufferSize
cusolverDnCheevj
cusolverDnCheevjBatched_bufferSize
cusolverDnCheevjBatched
cusolverDnCpotrf_bufferSize Calculate size of work buffer used by cusolverDnCpotrf.
cusolverDnCpotrf Compute Cholesky factorization of a complex single precision Hermitian positive-definite matrix.
cusolverDnCungqr_bufferSize Calculate size of work buffer used by cusolverDnCungqr.
cusolverDnCungqr Create unitary m x n matrix from single precision complex reflection vectors.
Double Precision Routines
cusolverDnDgeqrf_bufferSize Calculate size of work buffer used by cusolverDnDgeqrf.
cusolverDnDgeqrf Compute QR factorization of a real double precision m x n matrix.
cusolverDnDgesvd_bufferSize Calculate size of work buffer used by cusolverDnDgesvd.
cusolverDnDgesvd Compute real double precision singular value decomposition.
cusolverDnDgetrf_bufferSize Calculate size of work buffer used by cusolverDnDgetrf.
cusolverDnDgetrf Compute LU factorization of a real double precision m x n matrix.
cusolverDnDgetrs Solve real double precision linear system.
cusolverDnDorgqr_bufferSize Calculate size of work buffer used by cusolverDnDorgqr.
cusolverDnDorgqr Create unitary m x n matrix from double precision real reflection vectors.
cusolverDnDpotrf_bufferSize Calculate size of work buffer used by cusolverDnDpotrf.
cusolverDnDpotrf Compute Cholesky factorization of a real double precision Hermitian positive-definite matrix.
cusolverDnDsyevd_bufferSize Calculate size of work buffer used by culsolverDnDsyevd.
cusolverDnDsyevd
cusolverDnDsyevj_bufferSize
cusolverDnDsyevj
cusolverDnDsyevjBatched_bufferSize
cusolverDnDsyevjBatched
cusolverDnZgeqrf_bufferSize Calculate size of work buffer used by cusolverDnZgeqrf.
cusolverDnZgeqrf Compute QR factorization of a complex double precision m x n matrix.
cusolverDnZgesvd_bufferSize Calculate size of work buffer used by cusolverDnZgesvd.
cusolverDnZgesvd Compute complex double precision singular value decomposition.
cusolverDnZgetrf_bufferSize Calculate size of work buffer used by cusolverDnZgetrf.
cusolverDnZgetrf Compute LU factorization of a complex double precision m x n matrix.
cusolverDnZgetrs Solve complex double precision linear system.
cusolverDnZheevd_bufferSize Calculate size of work buffer used by culsolverDnZheevd.
cusolverDnZheevd
cusolverDnZheevj_bufferSize
cusolverDnZheevj
cusolverDnZheevjBatched_bufferSize
cusolverDnZheevjBatched
cusolverDnZpotrf_bufferSize Calculate size of work buffer used by cusolverDnZpotrf.
cusolverDnZpotrf Compute Cholesky factorization of a complex double precision Hermitian positive-definite matrix.
cusolverDnZungqr_bufferSize Calculate size of work buffer used by cusolverDnZungqr.
cusolverDnZungqr Create unitary m x n matrix from double precision complex reflection vectors.
CULA Routines
Framework Routines
culaCheckStatus Raise an exception corresponding to the specified CULA status code.
culaFreeBuffers Releases any memory buffers stored internally by CULA.
culaGetCublasMinimumVersion Report the version of CUBLAS required by CULA.
culaGetCublasRuntimeVersion Report the version of CUBLAS linked to by CULA.
culaGetCudaDriverVersion Report the version of the CUDA driver installed on the system.
culaGetCudaMinimumVersion Report the minimum version of CUDA required by CULA.
culaGetCudaRuntimeVersion Report the version of the CUDA runtime linked to by the CULA library.
culaGetDeviceCount Report the number of available GPU devices.
culaGetErrorInfo Returns extended information code for the last CULA error.
culaGetErrorInfoString Returns a readable CULA error string.
culaGetExecutingDevice Reports the id of the GPU device used by CULA.
culaGetLastStatus Returns the last status code returned from a CULA function.
culaGetStatusString Get string associated with the specified CULA status code.
culaGetVersion Report the version number of CULA.
culaInitialize Initialize CULA.
culaSelectDevice Selects a device with which CULA will operate.
culaShutdown Shuts down CULA.
Auxiliary Routines
Single Precision Real
culaDeviceSgeNancheck Check a real general matrix for invalid entries
culaDeviceSgeTranspose Transpose of real general matrix.
culaDeviceSgeTransposeInplace Inplace transpose of real square matrix.
Single Precision Complex
culaDeviceCgeConjugate Conjugate of complex general matrix.
culaDeviceCgeNancheck Check a complex general matrix for invalid entries
culaDeviceCgeTranspose Transpose of complex general matrix.
culaDeviceCgeTransposeConjugate Conjugate transpose of complex general matrix.
culaDeviceCgeTransposeInplace Inplace transpose of complex square matrix.
culaDeviceCgeTransposeConjugateInplace Inplace conjugate transpose of complex square matrix.
Double Precision Real
culaDeviceDgeNancheck Check a real general matrix for invalid entries
culaDeviceDgeTranspose Transpose of real general matrix.
culaDeviceDgeTransposeInplace Inplace transpose of real square matrix.
Double Precision Complex
culaDeviceZgeConjugate Conjugate of complex general matrix.
culaDeviceZgeNancheck Check a complex general matrix for invalid entries
culaDeviceZgeTranspose Transpose of complex general matrix.
culaDeviceZgeTransposeConjugate Conjugate transpose of complex general matrix.
culaDeviceZgeTransposeInplace Inplace transpose of complex square matrix.
culaDeviceZgeTransposeConjugateInplace Inplace conjugate transpose of complex square matrix.
BLAS Routines
Single Precision Real
culaDeviceSgemm Matrix-matrix product for general matrix.
culaDeviceSgemv Matrix-vector product for real general matrix.
Single Precision Complex
culaDeviceCgemm Matrix-matrix product for complex general matrix.
culaDeviceCgemv Matrix-vector product for complex general matrix.
Double Precision Real
culaDeviceDgemm Matrix-matrix product for general matrix.
culaDeviceDgemv Matrix-vector product for real general matrix.
Double Precision Complex
culaDeviceZgemm Matrix-matrix product for complex general matrix.
culaDeviceZgemv Matrix-vector product for complex general matrix.
LAPACK Routines
Single Precision Real
culaDeviceSgels Solve linear system with QR or LQ factorization.
culaDeviceSgeqrf QR factorization.
culaDeviceSgesv Solve linear system with LU factorization.
culaDeviceSgesvd SVD decomposition.
culaDeviceSgetrf LU factorization.
culaDeviceSgglse Solve linear equality-constrained least squares problem.
culaDeviceSposv Solve positive definite linear system with Cholesky factorization.
culaDeviceSpotrf Cholesky factorization.
Single Precision Complex
culaDeviceCgels Solve linear system with QR or LQ factorization.
culaDeviceCgeqrf QR factorization.
culaDeviceCgesv Solve linear system with LU factorization.
culaDeviceCgesvd SVD decomposition.
culaDeviceCgetrf LU factorization.
culaDeviceCgglse Solve linear equality-constrained least squares problem.
culaDeviceCposv Solve positive definite linear system with Cholesky factorization.
culaDeviceCpotrf Cholesky factorization.
Double Precision Real
culaDeviceDgels Solve linear system with QR or LQ factorization.
culaDeviceDgeqrf QR factorization.
culaDeviceDgesv Solve linear system with LU factorization.
culaDeviceDgesvd SVD decomposition.
culaDeviceDgetrf LU factorization.
culaDeviceDgglse Solve linear equality-constrained least squares problem.
culaDeviceDposv Solve positive definite linear system with Cholesky factorization.
culaDeviceDpotrf Cholesky factorization.
Double Precision Complex
culaDeviceZgels Solve linear system with QR or LQ factorization.
culaDeviceZgeqrf QR factorization.
culaDeviceZgesv Solve linear system with LU factorization.
culaDeviceZgesvd SVD decomposition.
culaDeviceZgetrf LU factorization.
culaDeviceZgglse Solve linear equality-constrained least squares problem.
culaDeviceZposv Solve positive definite linear system with Cholesky factorization.
culaDeviceZpotrf Cholesky factorization.
Multi-GPU CULA Routines
Framework Routines
pculaConfigInit Initialize pCULA configuration structure to sensible defaults.
BLAS Routines
Single Precision Real
pculaSgemm Matrix-matrix product for general matrix.
pculaStrsm Triangular system solve.
Single Precision Complex
pculaCgemm Matrix-matrix product for general matrix.
pculaCtrsm Triangular system solve.
Double Precision Real
pculaDgemm Matrix-matrix product for general matrix.
pculaDtrsm Triangular system solve.
Double Precision Complex
pculaZgemm Matrix-matrix product for general matrix.
pculaZtrsm Triangular system solve.
LAPACK Routines
Single Precision Real
pculaSgesv General system solve using LU decomposition.
pculaSgetrf LU decomposition.
pculaSgetrs LU solve.
pculaSposv QR factorization.
pculaSpotrf Cholesky decomposition.
pculaSpotrs Cholesky solve.
Single Precision Complex
pculaCgesv General system solve using LU decomposition.
pculaCgetrf LU decomposition.
pculaCgetrs LU solve.
pculaCposv QR factorization.
pculaCpotrf Cholesky decomposition.
pculaCpotrs Cholesky solve.
Double Precision Real
pculaDgesv General system solve using LU decomposition.
pculaDgetrf LU decomposition.
pculaDgetrs LU solve.
pculaDposv QR factorization.
pculaDpotrf Cholesky decomposition.
pculaDpotrs Cholesky solve.
Double Precision Complex
pculaZgesv General system solve using LU decomposition.
pculaZgetrf LU decomposition.
pculaZgetrs LU solve.
pculaZposv QR factorization.
pculaZpotrf Cholesky decomposition.
pculaZpotrs Cholesky solve.

High-Level Routines

Fast Fourier Transform
fft
ifft
Plan
Integration Routines
simps Implementation of composite Simpson’s rule similar to scipy.integrate.simps.
trapz 1D trapezoidal integration.
trapz2d 2D trapezoidal integration.
Linear Algebra Routines and Classes
add_diag Adds a vector to the diagonal of an array.
add_dot Calculates the dot product of two arrays and adds it to a third matrix.
cho_factor Cholesky factorization.
cho_solve Cholesky solver.
cholesky Cholesky factorization.
conj Complex conjugate.
det Compute the determinant of a square matrix.
diag Construct a diagonal matrix if input array is one-dimensional, or extracts diagonal entries of a two-dimensional array.
dmd Dynamic Mode Decomposition.
dot_diag Dot product of diagonal and non-diagonal arrays.
dot Dot product of two arrays.
eig Eigendecomposition of a matrix.
eye Construct a 2D matrix with ones on the diagonal and zeros elsewhere.
hermitian Hermitian (conjugate) matrix transpose.
inv Compute the inverse of a matrix.
mdot Product of several matrices.
multiply Element-wise array multiplication (Hadamard product).
norm Euclidean norm (2-norm) of real vector.
pinv Moore-Penrose pseudoinverse.
qr QR Decomposition.
scale Scale a vector by a factor alpha.
svd Singular Value Decomposition.
PCA Principal Component Analysis with similar API to sklearn.decomposition.PCA
trace Return the sum along the main diagonal of the array.
transpose Matrix transpose.
tril Lower triangle of a matrix.
triu Upper triangle of a matrix.
vander Generate a Vandermonde matrix.
Randomized Linear Algebra Routines
rdmd Randomized Dynamic Mode Decomposition.
cdmd Compressed Dynamic Mode Decomposition.
rsvd Randomized Singular Value Decomposition.
Special Math Functions
exp1 Exponential integral with n = 1 of complex arguments.
expi Exponential integral of complex arguments.
sici Sine/Cosine integral.

Other Routines

Miscellaneous Routines
add Adds two scalars, vectors, or matrices.
add_matvec Adds a vector to each column/row of the matrix.
argmax Indices of the maximum values along an axis.
argmin Indices of the minimum values along an axis.
cumsum Cumulative sum.
diff Calculate the discrete difference.
div_matvec Divides each column/row of a matrix by a vector.
divide Divides two scalars, vectors, or matrices with broadcasting.
done_context Detach from a context cleanly.
get_by_index Get values in a GPUArray by index.
get_compute_capability Get the compute capability of the specified device.
get_current_device Get the device in use by the current context.
get_dev_attrs
inf Return an array of the given shape and dtype filled with infs.
init Initialize libraries used by scikit-cuda.
init_context Create a context that will be cleaned up properly.
init_device Initialize a GPU device.
iscomplextype Check whether a type is complex.
isdoubletype Check whether a type has double precision.
max Return the maximum of an array or maximum along an axis.
maxabs Get maximum absolute value.
mean Compute the arithmetic means along the specified axis.
min Return the minimum of an array or minimum along an axis.
mult_matvec Multiplies a vector elementwise with each column/row of the matrix.
multiply Multiplies two scalars, vectors, or matrices with broadcasting.
ones Return an array of the given shape and dtype filled with ones.
ones_like Return an array of ones with the same shape and type as a given array.
select_block_grid_sizes
set_by_index Set values in a GPUArray by index.
set_realloc Transfer data into a GPUArray instance.
shutdown Shutdown libraries used by scikit-cuda.
std Compute the standard deviation along the specified axis.
subtract Subtracts two scalars, vectors, or matrices with broadcasting.
sum Compute the sum along the specified axis.
var Compute the variance along the specified axis.
zeros Return an array of the given shape and dtype filled with zeros.
zeros_like Return an array of zeros with the same shape and type as a given array.

Authors & Acknowledgments

This software was written and packaged by Lev Givon. Although it depends upon the excellent PyCUDA package by Andreas Klöckner, scikit-cuda is developed independently of PyCUDA.

Special thanks are due to the following parties for their contributions:

License

Copyright (c) 2009-2018, Lev E. Givon. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of Lev E. Givon nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Change Log

Release 0.5.3 (Under Development)

  • Add support for CUDA 10.

Release 0.5.2 (November 6, 2018)

  • Prevent exceptions when CULA Dense free is present (#146).
  • Fix Python 3 issues with CUSOLVER wrapper functions (#145)
  • Add support for using either CUSOLVER or CULA for computing SVD.
  • Add support for using either CUSOLVER or CULA for computing determinant.
  • Compressed Dynamic Mode Decomposition (enh. by N. Benjamin Erichson).
  • Support for CUFFT extensible plan API (enh. by Bruce Merry).
  • Wrappers for CUFFT size estimation (enh. by Luke Pfister).
  • Wrappers for CUBLAS-XT functions.
  • More wrappers for MAGMA functions (enh. by Nikul H. Ukani).
  • Python 3 compatibility improvements (enh. by Joseph Martinot-Lagarde).
  • Allow specification of order in misc.zeros and misc.ones.
  • Preserve strides in misc.zeros_like and misc.ones_like.
  • Add support for Cholesky factorization/solving using CUSOLVER (#198).
  • Add cholesky() function that zeros out non-factor entries in result (#199).
  • Add support for CUDA 8.0 libraries (#171).
  • Workaround for libgomp + CUDA 8.0 weirdness (fix by Kevin Flansburg).
  • Fix broken matrix-vector dot product (#156).
  • Initialize MAGMA before CUSOLVER to prevent internal errors in certain CUSOLVER functions.
  • Skip CULA-dependent unit tests when CULA isn’t present.
  • CUSOLVER support for symmetric eigenvalue decomposition (enh. by Bryant Menn).
  • CUSOLVER support for matrix inversion, QR decomposition (#198).
  • Prevent objdump output from changing due to environment language (fix by Arnaud Bergeron).
  • Fix diag() support for column-major 2D array inputs (#219).
  • Use absolute path for skcuda header includes (enh. by S. Clarkson).
  • Fix QR issues by reverting fix for #131 and raising PyCUDA version requirement (fix by S. Clarkson).
  • More batch CUBLAS wrappers (enh. by Li Yong Liu)
  • Numerical integration with Simpson’s Rule (enh. by Alexander Weyman)
  • Make CUSOLVER default backend for functions that can use either CULA or CUSOLVER.
  • Fix CUDA errors that only occur when unit tests are run en masse with nose or setuptools (#257).
  • Fix MAGMA eigenvalue decomposition wrappers (#265, fix by Wing-Kit Lee).

Release 0.5.1 - (October 30, 2015)

  • More CUSOLVER wrappers.
  • Eigenvalue/eigenvector computation (eng. by N. Benjamin Erichson).
  • QR decomposition (enh. by N. Benjamin Erichson).
  • Improved Windows 10 compatibility (enh. by N. Benjamin Erichson).
  • Function for constructing Vandermonde matrix in GPU memory (enh. by N. Benjamin Erichson).
  • Standard and randomized Dynamic Mode Decomposition (enh. by N. Benjamin Erichson).
  • Randomized linear algebra routines (enh. by N. Benjamin Erichson).
  • Add triu function (enh. by N. Benjamin Erichson).
  • Support Bessel correction in computation of variance and standard deviation (#143).
  • Fix pip installation issues.

Release 0.5.0 - (July 14, 2015)

  • Rename package to scikit-cuda.
  • Reductions sum, mean, var, std, max, min, argmax, argmin accept keepdims option.
  • The same reductions now return a GPUArray instead of ndarray if axis=None.
  • Switch to PEP 440 version numbering.
  • Replace distribute_setup.py with ez_setup.py.
  • Improve support for latest NVIDIA GPUs.
  • Direct links to online NVIDIA documentation in CUBLAS, CUFFT wrapper docstrings.
  • Add wrappers for CUSOLVER in CUDA 7.0.
  • Add skcuda namespace package that contains all modules in scikits.cuda namespace.
  • Add more wrappers for CUBLAS 5 functions (enh. by Teodor Moldovan, Sander Dieleman).
  • Add support for CULA Dense Free R17 (enh. by Alex Rubinsteyn).
  • Memoize elementwise kernel used by ifft scaling (#37).
  • Speed up misc.maxabs using reduction and kernel memoization.
  • Speed up misc.cumsum using scan and kernel memoization.
  • Speed up linalg.conj and misc.diff using elementwise kernel and memoization.
  • Speed up special.{sici,exp1,expi} using elementwise kernel and memoization.
  • Add wrappers for experimental multi-GPU CULA routines in CULA Dense R14+.
  • Use ldconfig to find library paths rather than libdl (#39).
  • Fix win32 platform detection.
  • Add Cholesky factorization/solve routines (enh. by Steve Taylor).
  • Fix Cholesky factorization/solve routines (fix by Thomas Unterthiner).
  • Enable dot() function to operate inplace (enh. by Thomas Unterthiner).
  • Python 3 compatibility improvements (enh. by Thomas Unterthiner).
  • Support for Fortran-order arrays in dot() and cho_solve() (enh. by Thomas Unterthiner)
  • CULA-based matrix inversion (enh. by Thomas Unterthiner).
  • Add add_diag() function (enh. by Thomas Unterthiner).
  • Use cublas*copy in diag() function (enh. by Thomas Unterthiner).
  • Improved MacOSX compatibility (enh. by Michael M. Forbes).
  • Find CUBLAS version even when it is only accessible via LD_LIBRARY_PATH (enh. by Frédéric Bastien).
  • Get both major and minor version numbers from CUBLAS library when determining version.
  • Handle unset LD_LIBRARY_PATH variable (fix by Jan Schlüter).
  • Fix library search on MacOS X (fix by capdevc).
  • Fix library search on Windows.
  • Add Windows support to CULA wrappers.
  • Enable specification of memory pool allocator to linalg functions (enh. by Thomas Unterthiner).
  • Improve misc.select_block_grid_sizes() logic to handle different GPU hardware.
  • Compute transpose using CUDA 5.0 CUBLAS functions rather than with inefficient naive kernel.
  • Use ReadTheDocs theme when building HTML docs locally.
  • Support additional cufftPlanMany() parameters when creating FFT plans (enh. by Gregory R. Lee).
  • Improved Python 3.4 compatibility (enh. by Eric Larson).
  • Avoid unnecessary import of cublas when importing fft module (enh. by Eric Larson).
  • Matrix trace function (enh. by Thomas Unterthiner).
  • Functions for computing simple axis-wise stats over matrices (enh. by Thomas Unterthiner).
  • Matrix add_dot, add_matvec, div_matvec, mult_matvec functions (enh. by Thomas Unterthiner).
  • Faster dot_diag implementation using CUBLAS matrix-matrix multiplication (enh. by Thomas Unterthiner).
  • Memoize SourceModule calls to speed up various high-level functions (enh. by Thomas Unterthiner).
  • Function for computing matrix determinant (enh. by Thomas Unterthiner).
  • Function for computing min/max and argmin/argmax along a matrix axis (enh. by Thomas Unterthiner).
  • Set default value of the parameter ‘overwrite’ to False in all linalg functions.
  • Elementwise arithmetic operations with broadcasting up to 2 dimensions (enh. David Wei Chiang)

Release 0.042 - (March 10, 2013)

  • Add complex exponential integral.
  • Fix typo in cublasCgbmv.
  • Use CUBLAS v2 API, add preliminary support for CUBLAS 5 functions.
  • Detect CUBLAS version without initializing the GPU.
  • Work around numpy bug #1898.
  • Fix issues with pycuda installations done via easy_install/pip.
  • Add support for specifying streams when creating FFT plans.
  • Successfully find CULA R13a libraries.
  • Raise exceptions when functions in the full release of CULA Dense are invoked without the library installed.
  • Perform post-fft scaling in-place.
  • Fix broken Python 2.6 compatibility (#19).
  • Download distribute for package installation if it isn’t available.
  • Prevent absence of CULA from causing import errors (enh. by Jacob Frelinger)
  • FFT batch tests and FFTW mode configuration (enh. by Lars Pastewka)

Release 0.041 - (May 22, 2011)

  • Fix bug preventing installation with pip.

Release 0.04 - (May 11, 2011)

  • Fix bug in cutoff_invert kernel.
  • Add get_compute_capability function and other goodies to misc module.
  • Use pycuda-complex.hpp to improve kernel readability.
  • Add integrate module.
  • Add unit tests for high-level functions.
  • Automatically determine device used by current context.
  • Support batched and multidimensional FFT operations.
  • Extended dot() function to support implicit transpose/Hermitian.
  • Support for in-place computation of singular vectors in svd() function.
  • Simplify kernel launch setup.
  • More CULA routine wrappers.
  • Wrappers for CULA R11 auxiliary routines.

Release 0.03 - (November 22, 2010)

  • Add support for some functions in the premium version of CULA toolkit.
  • Add wrappers for all lapack functions in basic CULA toolkit.
  • Fix pinv() to properly invert complex matrices.
  • Add Hermitian transpose.
  • Add tril function.
  • Fix missing library detection.
  • Include missing CUDA headers in package.

Release 0.02 - (September 21, 2010)

  • Add documentation.
  • Update copyright information.

Release 0.01 - (September 17, 2010)

  • First public release.

Index