scikit-cuda¶
scikit-cuda provides Python interfaces to many of the functions in the CUDA device/runtime, CUBLAS, CUFFT, and CUSOLVER libraries distributed as part of NVIDIA’s CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided.
Python wrappers for cuDNN by Hannes Bretschneider are available here.
Contents¶
Installation¶
Quick Installation¶
If you have pip installed, you should be
able to install the latest stable release of scikit-cuda
by running the
following:
pip install scikit-cuda
All dependencies should be automatically downloaded and installed if they are not already on your system.
Obtaining the Latest Software¶
The latest stable and development versions of scikit-cuda
can be downloaded
from GitHub
Online documentation is available at https://scikit-cuda.readthedocs.org
Installation Dependencies¶
scikit-cuda
requires that the following software packages be
installed:
- Python 2.7 or 3.4.
- Setuptools 0.6c10 or later.
- Mako 1.0.1 or later.
- NumPy 1.2.0 or later.
- PyCUDA 2016.1 or later (some
parts of
scikit-cuda
might not work properly with earlier versions). - NVIDIA CUDA Toolkit 5.0 or later.
Note that both Python and the CUDA Toolkit must be built for the same architecture, i.e., Python compiled for a 32-bit architecture will not find the libraries provided by a 64-bit CUDA installation. CUDA versions from 7.0 onwards are 64-bit.
To run the unit tests, the following packages are also required:
Some of the linear algebra functionality relies on the CULA toolkit; as of 2017, it is available to premium tier users of E.M. Photonics’ HPC site Celerity Tools:
- CULA R16a or later.
To build the documentation, the following packages are also required:
- Docutils 0.5 or later.
- Jinja2 2.2 or later.
- Pygments 0.8 or later.
- Sphinx 1.0.1 or later.
- Sphinx ReadTheDocs Theme 0.1.6 or later.
Platform Support¶
The software has been developed and tested on Linux; it should also work on other Unix-like platforms supported by the above packages. Parts of the package may work on Windows as well, but remain untested.
Building and Installation¶
scikit-cuda
searches for CUDA libraries in the system library
search path when imported. You may have to modify this path (e.g., by adding the
path to the CUDA libraries to /etc/ld.so.conf
and running ldconfig
as
root or to the
LD_LIBRARY_PATH
environmental variable on Linux, or by adding the CUDA
library path to the DYLD_LIBRARY_PATH
on MacOSX) if the libraries are
not being found.
To build and install the toolbox, download and unpack the source release and run:
python setup.py install
from within the main directory in the release. To rebuild the documentation, run:
python setup.py build_sphinx
Running the Unit Tests¶
To run all of the package unit tests, download and unpack the package source tarball and run:
python setup.py test
from within the main directory in the archive. Tests for individual
modules (found in the tests/
subdirectory) can also be run
directly.
Getting Started¶
The functions provided by scikit-cuda
are grouped into several submodules in
the skcuda
namespace package. Sample code demonstrating how to use
different parts of the toolbox is located in the demos/
subdirectory of the
source release. Many of the high-level functions also contain doctests that
describe their usage.
Reference¶
Library Wrapper Routines¶
CUBLAS Routines¶
Helper Routines¶
cublasCheckStatus |
Raise CUBLAS exception |
cublasCreate |
Initialize CUBLAS. |
cublasDestroy |
Release CUBLAS resources. |
cublasGetCurrentCtx |
Get current CUBLAS context. |
cublasGetStream |
Set current CUBLAS library stream. |
cublasGetVersion |
Get CUBLAS version. |
cublasSetStream |
Set current CUBLAS library stream. |
Wrapper Routines¶
cublasIsamax |
Index of maximum magnitude element. |
cublasIsamin |
Index of minimum magnitude element (single precision real). |
cublasSasum |
Sum of absolute values of single precision real vector. |
cublasSaxpy |
Vector addition (single precision real). |
cublasScopy |
Vector copy (single precision real) |
cublasSdot |
Vector dot product (single precision real) |
cublasSnrm2 |
Euclidean norm (2-norm) of real vector. |
cublasSrot |
Apply a real rotation to real vectors (single precision) |
cublasSrotg |
Construct a single precision real Givens rotation matrix. |
cublasSrotm |
Apply a single precision real modified Givens rotation. |
cublasSrotmg |
Construct a single precision real modified Givens rotation matrix. |
cublasSscal |
Scale a single precision real vector by a single precision real scalar. |
cublasSswap |
Swap single precision real vectors. |
cublasCaxpy |
Vector addition (single precision complex). |
cublasCcopy |
Vector copy (single precision complex) |
cublasCdotc |
Vector dot product (single precision complex) |
cublasCdotu |
Vector dot product (single precision complex) |
cublasCrot |
Apply a complex rotation to complex vectors (single precision) |
cublasCrotg |
Construct a single precision complex Givens rotation matrix. |
cublasCscal |
Scale a single precision complex vector by a single precision complex scalar. |
cublasCsrot |
Apply a complex rotation to complex vectors (single precision) |
cublasCsscal |
Scale a single precision complex vector by a single precision real scalar. |
cublasCswap |
Swap single precision complex vectors. |
cublasIcamax |
Index of maximum magnitude element. |
cublasIcamin |
Index of minimum magnitude element (single precision complex). |
cublasScasum |
Sum of absolute values of single precision complex vector. |
cublasScnrm2 |
Euclidean norm (2-norm) of real vector. |
cublasIdamax |
Index of maximum magnitude element. |
cublasIdamin |
Index of minimum magnitude element (double precision real). |
cublasDasum |
Sum of absolute values of double precision real vector. |
cublasDaxpy |
Vector addition (double precision real). |
cublasDcopy |
Vector copy (double precision real) |
cublasDdot |
Vector dot product (double precision real) |
cublasDnrm2 |
Euclidean norm (2-norm) of real vector. |
cublasDrot |
Apply a real rotation to real vectors (double precision) |
cublasDrotg |
Construct a double precision real Givens rotation matrix. |
cublasDrotm |
Apply a double precision real modified Givens rotation. |
cublasDrotmg |
Construct a double precision real modified Givens rotation matrix. |
cublasDscal |
Scale a double precision real vector by a double precision real scalar. |
cublasDswap |
Swap double precision real vectors. |
cublasDzasum |
Sum of absolute values of double precision complex vector. |
cublasDznrm2 |
Euclidean norm (2-norm) of real vector. |
cublasIzamax |
Index of maximum magnitude element. |
cublasIzamin |
Index of minimum magnitude element (double precision complex). |
cublasZaxpy |
Vector addition (double precision complex). |
cublasZcopy |
Vector copy (double precision complex) |
cublasZdotc |
Vector dot product (double precision complex) |
cublasZdotu |
Vector dot product (double precision complex) |
cublasZdrot |
Apply a complex rotation to complex vectors (double precision) |
cublasZdscal |
Scale a double precision complex vector by a double precision real scalar. |
cublasZrot |
Apply a complex rotation to complex vectors (double precision) |
cublasZrotg |
Construct a double precision complex Givens rotation matrix. |
cublasZscal |
Scale a double precision complex vector by a double precision complex scalar. |
cublasZswap |
Swap double precision complex vectors. |
cublasSgbmv |
Matrix-vector product for real single precision general banded matrix. |
cublasSgemv |
Matrix-vector product for real single precision general matrix. |
cublasSger |
Rank-1 operation on real single precision general matrix. |
cublasSsbmv |
Matrix-vector product for real single precision symmetric-banded matrix. |
cublasSspmv |
Matrix-vector product for real single precision symmetric packed matrix. |
cublasSspr |
Rank-1 operation on real single precision symmetric packed matrix. |
cublasSspr2 |
Rank-2 operation on real single precision symmetric packed matrix. |
cublasSsymv |
Matrix-vector product for real symmetric matrix. |
cublasSsyr |
Rank-1 operation on real single precision symmetric matrix. |
cublasSsyr2 |
Rank-2 operation on real single precision symmetric matrix. |
cublasStbmv |
Matrix-vector product for real single precision triangular banded matrix. |
cublasStbsv |
Solve real single precision triangular banded system with one right-hand side. |
cublasStpmv |
Matrix-vector product for real single precision triangular packed matrix. |
cublasStpsv |
Solve real triangular packed system with one right-hand side. |
cublasStrmv |
Matrix-vector product for real single precision triangular matrix. |
cublasStrsv |
Solve real triangular system with one right-hand side. |
cublasCgbmv |
Matrix-vector product for complex single precision general banded matrix. |
cublasCgemv |
Matrix-vector product for complex single precision general matrix. |
cublasCgerc |
Rank-1 operation on complex single precision general matrix. |
cublasCgeru |
Rank-1 operation on complex single precision general matrix. |
cublasChbmv |
Matrix-vector product for single precision Hermitian banded matrix. |
cublasChemv |
Matrix vector product for single precision Hermitian matrix. |
cublasCher |
Rank-1 operation on single precision Hermitian matrix. |
cublasCher2 |
Rank-2 operation on single precision Hermitian matrix. |
cublasChpmv |
Matrix-vector product for single precision Hermitian packed matrix. |
cublasChpr |
Rank-1 operation on single precision Hermitian packed matrix. |
cublasChpr2 |
Rank-2 operation on single precision Hermitian packed matrix. |
cublasCtbmv |
Matrix-vector product for complex single precision triangular banded matrix. |
cublasCtbsv |
Solve complex single precision triangular banded system with one right-hand side. |
cublasCtpmv |
Matrix-vector product for complex single precision triangular packed matrix. |
cublasCtpsv |
Solve complex single precision triangular packed system with one right-hand side. |
cublasCtrmv |
Matrix-vector product for complex single precision triangular matrix. |
cublasCtrsv |
Solve complex single precision triangular system with one right-hand side. |
cublasDgbmv |
Matrix-vector product for real double precision general banded matrix. |
cublasDgemv |
Matrix-vector product for real double precision general matrix. |
cublasDger |
Rank-1 operation on real double precision general matrix. |
cublasDsbmv |
Matrix-vector product for real double precision symmetric-banded matrix. |
cublasDspmv |
Matrix-vector product for real double precision symmetric packed matrix. |
cublasDspr |
Rank-1 operation on real double precision symmetric packed matrix. |
cublasDspr2 |
Rank-2 operation on real double precision symmetric packed matrix. |
cublasDsymv |
Matrix-vector product for real double precision symmetric matrix. |
cublasDsyr |
Rank-1 operation on real double precision symmetric matrix. |
cublasDsyr2 |
Rank-2 operation on real double precision symmetric matrix. |
cublasDtbmv |
Matrix-vector product for real double precision triangular banded matrix. |
cublasDtbsv |
Solve real double precision triangular banded system with one right-hand side. |
cublasDtpmv |
Matrix-vector product for real double precision triangular packed matrix. |
cublasDtpsv |
Solve real double precision triangular packed system with one right-hand side. |
cublasDtrmv |
Matrix-vector product for real double precision triangular matrix. |
cublasDtrsv |
Solve real double precision triangular system with one right-hand side. |
cublasZgbmv |
Matrix-vector product for complex double precision general banded matrix. |
cublasZgemv |
Matrix-vector product for complex double precision general matrix. |
cublasZgerc |
Rank-1 operation on complex double precision general matrix. |
cublasZgeru |
Rank-1 operation on complex double precision general matrix. |
cublasZhbmv |
Matrix-vector product for double precision Hermitian banded matrix. |
cublasZhemv |
Matrix-vector product for double precision Hermitian matrix. |
cublasZher |
Rank-1 operation on double precision Hermitian matrix. |
cublasZher2 |
Rank-2 operation on double precision Hermitian matrix. |
cublasZhpmv |
Matrix-vector product for double precision Hermitian packed matrix. |
cublasZhpr |
Rank-1 operation on double precision Hermitian packed matrix. |
cublasZhpr2 |
Rank-2 operation on double precision Hermitian packed matrix. |
cublasZtbmv |
Matrix-vector product for complex double triangular banded matrix. |
cublasZtbsv |
Solve complex double precision triangular banded system with one right-hand side. |
cublasZtpmv |
Matrix-vector product for complex double precision triangular packed matrix. |
cublasZtpsv |
Solve complex double precision triangular packed system with one right-hand size. |
cublasZtrmv |
Matrix-vector product for complex double precision triangular matrix. |
cublasZtrsv |
Solve complex double precision triangular system with one right-hand side. |
cublasSgemm |
Matrix-matrix product for real single precision general matrix. |
cublasSsymm |
Matrix-matrix product for real single precision symmetric matrix. |
cublasSsyrk |
Rank-k operation on real single precision symmetric matrix. |
cublasSsyr2k |
Rank-2k operation on real single precision symmetric matrix. |
cublasStrmm |
Matrix-matrix product for real single precision triangular matrix. |
cublasStrsm |
Solve a real single precision triangular system with multiple right-hand sides. |
cublasCgemm |
Matrix-matrix product for complex single precision general matrix. |
cublasChemm |
Matrix-matrix product for single precision Hermitian matrix. |
cublasCherk |
Rank-k operation on single precision Hermitian matrix. |
cublasCher2k |
Rank-2k operation on single precision Hermitian matrix. |
cublasCsymm |
Matrix-matrix product for complex single precision symmetric matrix. |
cublasCsyrk |
Rank-k operation on complex single precision symmetric matrix. |
cublasCsyr2k |
Rank-2k operation on complex single precision symmetric matrix. |
cublasCtrmm |
Matrix-matrix product for complex single precision triangular matrix. |
cublasCtrsm |
Solve a complex single precision triangular system with multiple right-hand sides. |
cublasDgemm |
Matrix-matrix product for real double precision general matrix. |
cublasDsymm |
Matrix-matrix product for real double precision symmetric matrix. |
cublasDsyrk |
Rank-k operation on real double precision symmetric matrix. |
cublasDsyr2k |
Rank-2k operation on real double precision symmetric matrix. |
cublasDtrmm |
Matrix-matrix product for real double precision triangular matrix. |
cublasDtrsm |
Solve a real double precision triangular system with multiple right-hand sides. |
cublasZgemm |
Matrix-matrix product for complex double precision general matrix. |
cublasZhemm |
Matrix-matrix product for double precision Hermitian matrix. |
cublasZherk |
Rank-k operation on double precision Hermitian matrix. |
cublasZher2k |
Rank-2k operation on double precision Hermitian matrix. |
cublasZsymm |
Matrix-matrix product for complex double precision symmetric matrix. |
cublasZsyrk |
Rank-k operation on complex double precision symmetric matrix. |
cublasZsyr2k |
Rank-2k operation on complex double precision symmetric matrix. |
cublasZtrmm |
Matrix-matrix product for complex double precision triangular matrix. |
cublasZtrsm |
Solve complex double precision triangular system with multiple right-hand sides. |
cublasSdgmm |
Multiplies a matrix with a diagonal matrix. |
cublasSgeam |
Matrix-matrix addition/transposition (single precision real). |
cublasSgemmBatched |
Matrix-matrix product for arrays of real single precision general matrices. |
cublasCgemmBatched |
Matrix-matrix product for arrays of complex single precision general matrices. |
cublasStrsmBatched |
This function solves an array of triangular linear systems with multiple right-hand-sides. |
cublasSgetrfBatched |
This function performs the LU factorization of an array of n x n matrices. |
cublasCdgmm |
Multiplies a matrix with a diagonal matrix. |
cublasCgeam |
Matrix-matrix addition/transposition (single precision complex). |
cublasDdgmm |
Multiplies a matrix with a diagonal matrix. |
cublasDgeam |
Matrix-matrix addition/transposition (double precision real). |
cublasDgemmBatched |
Matrix-matrix product for arrays of real double precision general matrices. |
cublasZgemmBatched |
Matrix-matrix product for arrays of complex double precision general matrices. |
cublasDtrsmBatched |
This function solves an array of triangular linear systems with multiple right-hand-sides. |
cublasDgetrfBatched |
This function performs the LU factorization of an array of n x n matrices. |
cublasZdgmm |
Multiplies a matrix with a diagonal matrix. |
cublasZgeam |
Matrix-matrix addition/transposition (double precision complex). |
CUFFT Routines¶
Helper Routines¶
cufftCheckStatus |
|
cufftCreate |
|
cufftDestroy |
|
cufftSetAutoAllocation |
|
cufftSetCompatibilityMode |
|
cufftSetStream |
|
cufftSetWorkArea |
Wrapper Routines¶
cufftPlan1d |
|
cufftPlan2d |
|
cufftPlan3d |
|
cufftPlanMany |
|
cufftDestroy |
|
cufftExecC2C |
|
cufftExecR2C |
|
cufftExecC2R |
|
cufftExecZ2Z |
|
cufftExecD2Z |
|
cufftExecZ2D |
|
cufftEstimate1d |
|
cufftEstimate2d |
|
cufftEstimate3d |
|
cufftEstimateMany |
|
cufftGetSize1d |
|
cufftGetSize2d |
|
cufftGetSize3d |
|
cufftGetSizeMany |
|
cufftGetSize |
|
cufftMakePlan1d |
|
cufftMakePlan2d |
|
cufftMakePlan3d |
|
cufftMakePlanMany |
CUSOLVER Routines¶
These routines are only available in CUDA 7.0 and later.
Helper Routines¶
cusolverDnCreate |
Create cuSolverDn context. |
cusolverDnCreateSyevjInfo |
|
cusolverDnGetStream |
Get stream used by cuSolverDN library. |
cusolverDnDestroy |
Destroy cuSolverDn context. |
cusolverDnDestroySyevjInfo |
|
cusolverDnSetStream |
Set stream used by cuSolverDN library. |
cusolverDnXsyevjGetResidual |
|
cusolverDnXsyevjGetSweeps |
|
cusolverDnXsyevjSetMaxSweeps |
|
cusolverDnXsyevjSetSortEig |
|
cusolverDnXsyevjSetTolerance |
Wrapper Routines¶
cusolverDnSgeqrf_bufferSize |
Calculate size of work buffer used by cusolverDnSgeqrf. |
cusolverDnSgeqrf |
Compute QR factorization of a real single precision m x n matrix. |
cusolverDnSgesvd_bufferSize |
Calculate size of work buffer used by cusolverDnSgesvd. |
cusolverDnSgesvd |
Compute real single precision singular value decomposition. |
cusolverDnSgetrf_bufferSize |
Calculate size of work buffer used by cusolverDnSgetrf. |
cusolverDnSgetrf |
Compute LU factorization of a real single precision m x n matrix. |
cusolverDnSgetrs |
Solve real single precision linear system. |
cusolverDnSorgqr_bufferSize |
Calculate size of work buffer used by cusolverDnSorgqr. |
cusolverDnSorgqr |
Create unitary m x n matrix from single precision real reflection vectors. |
cusolverDnSpotrf_bufferSize |
Calculate size of work buffer used by cusolverDnSpotrf. |
cusolverDnSpotrf |
Compute Cholesky factorization of a real single precision Hermitian positive-definite matrix. |
cusolverDnSsyevd_bufferSize |
Calculate size of work buffer used by culsolverDnSsyevd. |
cusolverDnSsyevd |
|
cusolverDnSsyevj_bufferSize |
|
cusolverDnSsyevj |
|
cusolverDnSsyevjBatched_bufferSize |
|
cusolverDnSsyevjBatched |
|
cusolverDnCgeqrf_bufferSize |
Calculate size of work buffer used by cusolverDnCgeqrf. |
cusolverDnCgeqrf |
Compute QR factorization of a complex single precision m x n matrix. |
cusolverDnCgesvd_bufferSize |
Calculate size of work buffer used by cusolverDnCgesvd. |
cusolverDnCgesvd |
Compute complex single precision singular value decomposition. |
cusolverDnCgetrf_bufferSize |
Calculate size of work buffer used by cusolverDnCgetrf. |
cusolverDnCgetrf |
Compute LU factorization of a complex single precision m x n matrix. |
cusolverDnCgetrs |
Solve complex single precision linear system. |
cusolverDnCheevd_bufferSize |
Calculate size of work buffer used by culsolverDnCheevd. |
cusolverDnCheevd |
|
cusolverDnCheevj_bufferSize |
|
cusolverDnCheevj |
|
cusolverDnCheevjBatched_bufferSize |
|
cusolverDnCheevjBatched |
|
cusolverDnCpotrf_bufferSize |
Calculate size of work buffer used by cusolverDnCpotrf. |
cusolverDnCpotrf |
Compute Cholesky factorization of a complex single precision Hermitian positive-definite matrix. |
cusolverDnCungqr_bufferSize |
Calculate size of work buffer used by cusolverDnCungqr. |
cusolverDnCungqr |
Create unitary m x n matrix from single precision complex reflection vectors. |
cusolverDnDgeqrf_bufferSize |
Calculate size of work buffer used by cusolverDnDgeqrf. |
cusolverDnDgeqrf |
Compute QR factorization of a real double precision m x n matrix. |
cusolverDnDgesvd_bufferSize |
Calculate size of work buffer used by cusolverDnDgesvd. |
cusolverDnDgesvd |
Compute real double precision singular value decomposition. |
cusolverDnDgetrf_bufferSize |
Calculate size of work buffer used by cusolverDnDgetrf. |
cusolverDnDgetrf |
Compute LU factorization of a real double precision m x n matrix. |
cusolverDnDgetrs |
Solve real double precision linear system. |
cusolverDnDorgqr_bufferSize |
Calculate size of work buffer used by cusolverDnDorgqr. |
cusolverDnDorgqr |
Create unitary m x n matrix from double precision real reflection vectors. |
cusolverDnDpotrf_bufferSize |
Calculate size of work buffer used by cusolverDnDpotrf. |
cusolverDnDpotrf |
Compute Cholesky factorization of a real double precision Hermitian positive-definite matrix. |
cusolverDnDsyevd_bufferSize |
Calculate size of work buffer used by culsolverDnDsyevd. |
cusolverDnDsyevd |
|
cusolverDnDsyevj_bufferSize |
|
cusolverDnDsyevj |
|
cusolverDnDsyevjBatched_bufferSize |
|
cusolverDnDsyevjBatched |
|
cusolverDnZgeqrf_bufferSize |
Calculate size of work buffer used by cusolverDnZgeqrf. |
cusolverDnZgeqrf |
Compute QR factorization of a complex double precision m x n matrix. |
cusolverDnZgesvd_bufferSize |
Calculate size of work buffer used by cusolverDnZgesvd. |
cusolverDnZgesvd |
Compute complex double precision singular value decomposition. |
cusolverDnZgetrf_bufferSize |
Calculate size of work buffer used by cusolverDnZgetrf. |
cusolverDnZgetrf |
Compute LU factorization of a complex double precision m x n matrix. |
cusolverDnZgetrs |
Solve complex double precision linear system. |
cusolverDnZheevd_bufferSize |
Calculate size of work buffer used by culsolverDnZheevd. |
cusolverDnZheevd |
|
cusolverDnZheevj_bufferSize |
|
cusolverDnZheevj |
|
cusolverDnZheevjBatched_bufferSize |
|
cusolverDnZheevjBatched |
|
cusolverDnZpotrf_bufferSize |
Calculate size of work buffer used by cusolverDnZpotrf. |
cusolverDnZpotrf |
Compute Cholesky factorization of a complex double precision Hermitian positive-definite matrix. |
cusolverDnZungqr_bufferSize |
Calculate size of work buffer used by cusolverDnZungqr. |
cusolverDnZungqr |
Create unitary m x n matrix from double precision complex reflection vectors. |
CULA Routines¶
Framework Routines¶
culaCheckStatus |
Raise an exception corresponding to the specified CULA status code. |
culaFreeBuffers |
Releases any memory buffers stored internally by CULA. |
culaGetCublasMinimumVersion |
Report the version of CUBLAS required by CULA. |
culaGetCublasRuntimeVersion |
Report the version of CUBLAS linked to by CULA. |
culaGetCudaDriverVersion |
Report the version of the CUDA driver installed on the system. |
culaGetCudaMinimumVersion |
Report the minimum version of CUDA required by CULA. |
culaGetCudaRuntimeVersion |
Report the version of the CUDA runtime linked to by the CULA library. |
culaGetDeviceCount |
Report the number of available GPU devices. |
culaGetErrorInfo |
Returns extended information code for the last CULA error. |
culaGetErrorInfoString |
Returns a readable CULA error string. |
culaGetExecutingDevice |
Reports the id of the GPU device used by CULA. |
culaGetLastStatus |
Returns the last status code returned from a CULA function. |
culaGetStatusString |
Get string associated with the specified CULA status code. |
culaGetVersion |
Report the version number of CULA. |
culaInitialize |
Initialize CULA. |
culaSelectDevice |
Selects a device with which CULA will operate. |
culaShutdown |
Shuts down CULA. |
Auxiliary Routines¶
culaDeviceSgeNancheck |
Check a real general matrix for invalid entries |
culaDeviceSgeTranspose |
Transpose of real general matrix. |
culaDeviceSgeTransposeInplace |
Inplace transpose of real square matrix. |
culaDeviceCgeConjugate |
Conjugate of complex general matrix. |
culaDeviceCgeNancheck |
Check a complex general matrix for invalid entries |
culaDeviceCgeTranspose |
Transpose of complex general matrix. |
culaDeviceCgeTransposeConjugate |
Conjugate transpose of complex general matrix. |
culaDeviceCgeTransposeInplace |
Inplace transpose of complex square matrix. |
culaDeviceCgeTransposeConjugateInplace |
Inplace conjugate transpose of complex square matrix. |
culaDeviceDgeNancheck |
Check a real general matrix for invalid entries |
culaDeviceDgeTranspose |
Transpose of real general matrix. |
culaDeviceDgeTransposeInplace |
Inplace transpose of real square matrix. |
culaDeviceZgeConjugate |
Conjugate of complex general matrix. |
culaDeviceZgeNancheck |
Check a complex general matrix for invalid entries |
culaDeviceZgeTranspose |
Transpose of complex general matrix. |
culaDeviceZgeTransposeConjugate |
Conjugate transpose of complex general matrix. |
culaDeviceZgeTransposeInplace |
Inplace transpose of complex square matrix. |
culaDeviceZgeTransposeConjugateInplace |
Inplace conjugate transpose of complex square matrix. |
BLAS Routines¶
culaDeviceSgemm |
Matrix-matrix product for general matrix. |
culaDeviceSgemv |
Matrix-vector product for real general matrix. |
culaDeviceCgemm |
Matrix-matrix product for complex general matrix. |
culaDeviceCgemv |
Matrix-vector product for complex general matrix. |
culaDeviceDgemm |
Matrix-matrix product for general matrix. |
culaDeviceDgemv |
Matrix-vector product for real general matrix. |
culaDeviceZgemm |
Matrix-matrix product for complex general matrix. |
culaDeviceZgemv |
Matrix-vector product for complex general matrix. |
LAPACK Routines¶
culaDeviceSgels |
Solve linear system with QR or LQ factorization. |
culaDeviceSgeqrf |
QR factorization. |
culaDeviceSgesv |
Solve linear system with LU factorization. |
culaDeviceSgesvd |
SVD decomposition. |
culaDeviceSgetrf |
LU factorization. |
culaDeviceSgglse |
Solve linear equality-constrained least squares problem. |
culaDeviceSposv |
Solve positive definite linear system with Cholesky factorization. |
culaDeviceSpotrf |
Cholesky factorization. |
culaDeviceCgels |
Solve linear system with QR or LQ factorization. |
culaDeviceCgeqrf |
QR factorization. |
culaDeviceCgesv |
Solve linear system with LU factorization. |
culaDeviceCgesvd |
SVD decomposition. |
culaDeviceCgetrf |
LU factorization. |
culaDeviceCgglse |
Solve linear equality-constrained least squares problem. |
culaDeviceCposv |
Solve positive definite linear system with Cholesky factorization. |
culaDeviceCpotrf |
Cholesky factorization. |
culaDeviceDgels |
Solve linear system with QR or LQ factorization. |
culaDeviceDgeqrf |
QR factorization. |
culaDeviceDgesv |
Solve linear system with LU factorization. |
culaDeviceDgesvd |
SVD decomposition. |
culaDeviceDgetrf |
LU factorization. |
culaDeviceDgglse |
Solve linear equality-constrained least squares problem. |
culaDeviceDposv |
Solve positive definite linear system with Cholesky factorization. |
culaDeviceDpotrf |
Cholesky factorization. |
culaDeviceZgels |
Solve linear system with QR or LQ factorization. |
culaDeviceZgeqrf |
QR factorization. |
culaDeviceZgesv |
Solve linear system with LU factorization. |
culaDeviceZgesvd |
SVD decomposition. |
culaDeviceZgetrf |
LU factorization. |
culaDeviceZgglse |
Solve linear equality-constrained least squares problem. |
culaDeviceZposv |
Solve positive definite linear system with Cholesky factorization. |
culaDeviceZpotrf |
Cholesky factorization. |
Multi-GPU CULA Routines¶
Framework Routines¶
pculaConfigInit |
Initialize pCULA configuration structure to sensible defaults. |
BLAS Routines¶
pculaSgemm |
Matrix-matrix product for general matrix. |
pculaStrsm |
Triangular system solve. |
pculaCgemm |
Matrix-matrix product for general matrix. |
pculaCtrsm |
Triangular system solve. |
pculaDgemm |
Matrix-matrix product for general matrix. |
pculaDtrsm |
Triangular system solve. |
pculaZgemm |
Matrix-matrix product for general matrix. |
pculaZtrsm |
Triangular system solve. |
LAPACK Routines¶
pculaSgesv |
General system solve using LU decomposition. |
pculaSgetrf |
LU decomposition. |
pculaSgetrs |
LU solve. |
pculaSposv |
QR factorization. |
pculaSpotrf |
Cholesky decomposition. |
pculaSpotrs |
Cholesky solve. |
pculaCgesv |
General system solve using LU decomposition. |
pculaCgetrf |
LU decomposition. |
pculaCgetrs |
LU solve. |
pculaCposv |
QR factorization. |
pculaCpotrf |
Cholesky decomposition. |
pculaCpotrs |
Cholesky solve. |
pculaDgesv |
General system solve using LU decomposition. |
pculaDgetrf |
LU decomposition. |
pculaDgetrs |
LU solve. |
pculaDposv |
QR factorization. |
pculaDpotrf |
Cholesky decomposition. |
pculaDpotrs |
Cholesky solve. |
pculaZgesv |
General system solve using LU decomposition. |
pculaZgetrf |
LU decomposition. |
pculaZgetrs |
LU solve. |
pculaZposv |
QR factorization. |
pculaZpotrf |
Cholesky decomposition. |
pculaZpotrs |
Cholesky solve. |
High-Level Routines¶
Fast Fourier Transform¶
fft |
|
ifft |
|
Plan |
Integration Routines¶
simps |
Implementation of composite Simpson’s rule similar to scipy.integrate.simps. |
trapz |
1D trapezoidal integration. |
trapz2d |
2D trapezoidal integration. |
Linear Algebra Routines and Classes¶
add_diag |
Adds a vector to the diagonal of an array. |
add_dot |
Calculates the dot product of two arrays and adds it to a third matrix. |
cho_factor |
Cholesky factorization. |
cho_solve |
Cholesky solver. |
cholesky |
Cholesky factorization. |
conj |
Complex conjugate. |
det |
Compute the determinant of a square matrix. |
diag |
Construct a diagonal matrix if input array is one-dimensional, or extracts diagonal entries of a two-dimensional array. |
dmd |
Dynamic Mode Decomposition. |
dot_diag |
Dot product of diagonal and non-diagonal arrays. |
dot |
Dot product of two arrays. |
eig |
Eigendecomposition of a matrix. |
eye |
Construct a 2D matrix with ones on the diagonal and zeros elsewhere. |
hermitian |
Hermitian (conjugate) matrix transpose. |
inv |
Compute the inverse of a matrix. |
mdot |
Product of several matrices. |
multiply |
Element-wise array multiplication (Hadamard product). |
norm |
Euclidean norm (2-norm) of real vector. |
pinv |
Moore-Penrose pseudoinverse. |
qr |
QR Decomposition. |
scale |
Scale a vector by a factor alpha. |
svd |
Singular Value Decomposition. |
PCA |
Principal Component Analysis with similar API to sklearn.decomposition.PCA |
trace |
Return the sum along the main diagonal of the array. |
transpose |
Matrix transpose. |
tril |
Lower triangle of a matrix. |
triu |
Upper triangle of a matrix. |
vander |
Generate a Vandermonde matrix. |
Other Routines¶
Miscellaneous Routines¶
add |
Adds two scalars, vectors, or matrices. |
add_matvec |
Adds a vector to each column/row of the matrix. |
argmax |
Indices of the maximum values along an axis. |
argmin |
Indices of the minimum values along an axis. |
cumsum |
Cumulative sum. |
diff |
Calculate the discrete difference. |
div_matvec |
Divides each column/row of a matrix by a vector. |
divide |
Divides two scalars, vectors, or matrices with broadcasting. |
done_context |
Detach from a context cleanly. |
get_by_index |
Get values in a GPUArray by index. |
get_compute_capability |
Get the compute capability of the specified device. |
get_current_device |
Get the device in use by the current context. |
get_dev_attrs |
|
inf |
Return an array of the given shape and dtype filled with infs. |
init |
Initialize libraries used by scikit-cuda. |
init_context |
Create a context that will be cleaned up properly. |
init_device |
Initialize a GPU device. |
iscomplextype |
Check whether a type is complex. |
isdoubletype |
Check whether a type has double precision. |
max |
Return the maximum of an array or maximum along an axis. |
maxabs |
Get maximum absolute value. |
mean |
Compute the arithmetic means along the specified axis. |
min |
Return the minimum of an array or minimum along an axis. |
mult_matvec |
Multiplies a vector elementwise with each column/row of the matrix. |
multiply |
Multiplies two scalars, vectors, or matrices with broadcasting. |
ones |
Return an array of the given shape and dtype filled with ones. |
ones_like |
Return an array of ones with the same shape and type as a given array. |
select_block_grid_sizes |
|
set_by_index |
Set values in a GPUArray by index. |
set_realloc |
Transfer data into a GPUArray instance. |
shutdown |
Shutdown libraries used by scikit-cuda. |
std |
Compute the standard deviation along the specified axis. |
subtract |
Subtracts two scalars, vectors, or matrices with broadcasting. |
sum |
Compute the sum along the specified axis. |
var |
Compute the variance along the specified axis. |
zeros |
Return an array of the given shape and dtype filled with zeros. |
zeros_like |
Return an array of zeros with the same shape and type as a given array. |
Authors & Acknowledgments¶
This software was written and packaged by Lev Givon. Although it depends upon the excellent PyCUDA package by Andreas Klöckner, scikit-cuda is developed independently of PyCUDA.
Special thanks are due to the following parties for their contributions:
- Frédéric Bastien - CUBLAS version detection enhancements.
- Arnaud Bergeron - Fix to prevent LANG from affecting objdump output.
- David Wei Chiang - Improvements to vectorized functions, bug fixes.
- Sander Dieleman - CUBLAS 5 bindings.
- Chris Capdevila - MacOS X library search fix.
- Ben Erichson - QR decomposition, eigenvalue/eigenvector computation, Dynamic Mode Decomposition, randomized linear algebra routines.
- Ying Wei (Daniel) Fan - Kindly permitted reuse of CUBLAS wrapper code in his PARRET Python package.
- Michael M. Forbes - Improved MacOSX compatibility, bug fixes.
- Jacob Frelinger - Various enhancements.
- Tim Klein - Additional MAGMA wrappers.
- Joseph Martinot-Lagarde - Python 3 compatibility improvements.
- Eric Larson - Various enhancements.
- Gregory R. Lee - Enhanced FFT plan creation.
- Bryant Menn - CUSOLVER support for symmetric eigenvalue decomposition.
- Bruce Merry - Support for CUFFT extensible plan API.
- Teodor Mihai Moldovan - CUBLAS 5 bindings.
- Lars Pastewka - FFT tests and FFTW compatibility mode configuration.
- Li Yong Liu - CUBLAS batch wrappers.
- Luke Pfister - Bug fixes.
- Michael Rader - Bug fixes.
- Nate Merrill - PCA module.
- Alex Rubinsteyn - Support for CULA Dense Free R17.
- Xing Shi - Bug fixes.
- Steve Taylor - Cholesky factorization/solve functions.
- Rob Turetsky - Useful feedback.
- Thomas Unterthiner - Additional high-level and wrapper functions.
- Nikul H. Ukani - Additional MAGMA wrappers.
- S. Clarkson - Bug fixes.
- Stefan van der Walt - Bug fixes.
- Feng Wang - Bug reports.
- Alexander Weyman - Simpson’s Rule.
- Evgeniy Zheltonozhskiy - Complex Hermitian support eigenvalue decomposition.
- Wing-Kit Lee - Fixes for MAGMA eigenvalue decomp wrappers.
- Yiyin Zhou - Patches, bug reports, and function wrappers
License¶
Copyright (c) 2009-2018, Lev E. Givon. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of Lev E. Givon nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Change Log¶
Release 0.5.3 (Under Development)¶
- Add support for CUDA 10.
Release 0.5.2 (November 6, 2018)¶
- Prevent exceptions when CULA Dense free is present (#146).
- Fix Python 3 issues with CUSOLVER wrapper functions (#145)
- Add support for using either CUSOLVER or CULA for computing SVD.
- Add support for using either CUSOLVER or CULA for computing determinant.
- Compressed Dynamic Mode Decomposition (enh. by N. Benjamin Erichson).
- Support for CUFFT extensible plan API (enh. by Bruce Merry).
- Wrappers for CUFFT size estimation (enh. by Luke Pfister).
- Wrappers for CUBLAS-XT functions.
- More wrappers for MAGMA functions (enh. by Nikul H. Ukani).
- Python 3 compatibility improvements (enh. by Joseph Martinot-Lagarde).
- Allow specification of order in misc.zeros and misc.ones.
- Preserve strides in misc.zeros_like and misc.ones_like.
- Add support for Cholesky factorization/solving using CUSOLVER (#198).
- Add cholesky() function that zeros out non-factor entries in result (#199).
- Add support for CUDA 8.0 libraries (#171).
- Workaround for libgomp + CUDA 8.0 weirdness (fix by Kevin Flansburg).
- Fix broken matrix-vector dot product (#156).
- Initialize MAGMA before CUSOLVER to prevent internal errors in certain CUSOLVER functions.
- Skip CULA-dependent unit tests when CULA isn’t present.
- CUSOLVER support for symmetric eigenvalue decomposition (enh. by Bryant Menn).
- CUSOLVER support for matrix inversion, QR decomposition (#198).
- Prevent objdump output from changing due to environment language (fix by Arnaud Bergeron).
- Fix diag() support for column-major 2D array inputs (#219).
- Use absolute path for skcuda header includes (enh. by S. Clarkson).
- Fix QR issues by reverting fix for #131 and raising PyCUDA version requirement (fix by S. Clarkson).
- More batch CUBLAS wrappers (enh. by Li Yong Liu)
- Numerical integration with Simpson’s Rule (enh. by Alexander Weyman)
- Make CUSOLVER default backend for functions that can use either CULA or CUSOLVER.
- Fix CUDA errors that only occur when unit tests are run en masse with nose or setuptools (#257).
- Fix MAGMA eigenvalue decomposition wrappers (#265, fix by Wing-Kit Lee).
Release 0.5.1 - (October 30, 2015)¶
- More CUSOLVER wrappers.
- Eigenvalue/eigenvector computation (eng. by N. Benjamin Erichson).
- QR decomposition (enh. by N. Benjamin Erichson).
- Improved Windows 10 compatibility (enh. by N. Benjamin Erichson).
- Function for constructing Vandermonde matrix in GPU memory (enh. by N. Benjamin Erichson).
- Standard and randomized Dynamic Mode Decomposition (enh. by N. Benjamin Erichson).
- Randomized linear algebra routines (enh. by N. Benjamin Erichson).
- Add triu function (enh. by N. Benjamin Erichson).
- Support Bessel correction in computation of variance and standard deviation (#143).
- Fix pip installation issues.
Release 0.5.0 - (July 14, 2015)¶
- Rename package to scikit-cuda.
- Reductions sum, mean, var, std, max, min, argmax, argmin accept keepdims option.
- The same reductions now return a GPUArray instead of ndarray if axis=None.
- Switch to PEP 440 version numbering.
- Replace distribute_setup.py with ez_setup.py.
- Improve support for latest NVIDIA GPUs.
- Direct links to online NVIDIA documentation in CUBLAS, CUFFT wrapper docstrings.
- Add wrappers for CUSOLVER in CUDA 7.0.
- Add skcuda namespace package that contains all modules in scikits.cuda namespace.
- Add more wrappers for CUBLAS 5 functions (enh. by Teodor Moldovan, Sander Dieleman).
- Add support for CULA Dense Free R17 (enh. by Alex Rubinsteyn).
- Memoize elementwise kernel used by ifft scaling (#37).
- Speed up misc.maxabs using reduction and kernel memoization.
- Speed up misc.cumsum using scan and kernel memoization.
- Speed up linalg.conj and misc.diff using elementwise kernel and memoization.
- Speed up special.{sici,exp1,expi} using elementwise kernel and memoization.
- Add wrappers for experimental multi-GPU CULA routines in CULA Dense R14+.
- Use ldconfig to find library paths rather than libdl (#39).
- Fix win32 platform detection.
- Add Cholesky factorization/solve routines (enh. by Steve Taylor).
- Fix Cholesky factorization/solve routines (fix by Thomas Unterthiner).
- Enable dot() function to operate inplace (enh. by Thomas Unterthiner).
- Python 3 compatibility improvements (enh. by Thomas Unterthiner).
- Support for Fortran-order arrays in dot() and cho_solve() (enh. by Thomas Unterthiner)
- CULA-based matrix inversion (enh. by Thomas Unterthiner).
- Add add_diag() function (enh. by Thomas Unterthiner).
- Use cublas*copy in diag() function (enh. by Thomas Unterthiner).
- Improved MacOSX compatibility (enh. by Michael M. Forbes).
- Find CUBLAS version even when it is only accessible via LD_LIBRARY_PATH (enh. by Frédéric Bastien).
- Get both major and minor version numbers from CUBLAS library when determining version.
- Handle unset LD_LIBRARY_PATH variable (fix by Jan Schlüter).
- Fix library search on MacOS X (fix by capdevc).
- Fix library search on Windows.
- Add Windows support to CULA wrappers.
- Enable specification of memory pool allocator to linalg functions (enh. by Thomas Unterthiner).
- Improve misc.select_block_grid_sizes() logic to handle different GPU hardware.
- Compute transpose using CUDA 5.0 CUBLAS functions rather than with inefficient naive kernel.
- Use ReadTheDocs theme when building HTML docs locally.
- Support additional cufftPlanMany() parameters when creating FFT plans (enh. by Gregory R. Lee).
- Improved Python 3.4 compatibility (enh. by Eric Larson).
- Avoid unnecessary import of cublas when importing fft module (enh. by Eric Larson).
- Matrix trace function (enh. by Thomas Unterthiner).
- Functions for computing simple axis-wise stats over matrices (enh. by Thomas Unterthiner).
- Matrix add_dot, add_matvec, div_matvec, mult_matvec functions (enh. by Thomas Unterthiner).
- Faster dot_diag implementation using CUBLAS matrix-matrix multiplication (enh. by Thomas Unterthiner).
- Memoize SourceModule calls to speed up various high-level functions (enh. by Thomas Unterthiner).
- Function for computing matrix determinant (enh. by Thomas Unterthiner).
- Function for computing min/max and argmin/argmax along a matrix axis (enh. by Thomas Unterthiner).
- Set default value of the parameter ‘overwrite’ to False in all linalg functions.
- Elementwise arithmetic operations with broadcasting up to 2 dimensions (enh. David Wei Chiang)
Release 0.042 - (March 10, 2013)¶
- Add complex exponential integral.
- Fix typo in cublasCgbmv.
- Use CUBLAS v2 API, add preliminary support for CUBLAS 5 functions.
- Detect CUBLAS version without initializing the GPU.
- Work around numpy bug #1898.
- Fix issues with pycuda installations done via easy_install/pip.
- Add support for specifying streams when creating FFT plans.
- Successfully find CULA R13a libraries.
- Raise exceptions when functions in the full release of CULA Dense are invoked without the library installed.
- Perform post-fft scaling in-place.
- Fix broken Python 2.6 compatibility (#19).
- Download distribute for package installation if it isn’t available.
- Prevent absence of CULA from causing import errors (enh. by Jacob Frelinger)
- FFT batch tests and FFTW mode configuration (enh. by Lars Pastewka)
Release 0.041 - (May 22, 2011)¶
- Fix bug preventing installation with pip.
Release 0.04 - (May 11, 2011)¶
- Fix bug in cutoff_invert kernel.
- Add get_compute_capability function and other goodies to misc module.
- Use pycuda-complex.hpp to improve kernel readability.
- Add integrate module.
- Add unit tests for high-level functions.
- Automatically determine device used by current context.
- Support batched and multidimensional FFT operations.
- Extended dot() function to support implicit transpose/Hermitian.
- Support for in-place computation of singular vectors in svd() function.
- Simplify kernel launch setup.
- More CULA routine wrappers.
- Wrappers for CULA R11 auxiliary routines.
Release 0.03 - (November 22, 2010)¶
- Add support for some functions in the premium version of CULA toolkit.
- Add wrappers for all lapack functions in basic CULA toolkit.
- Fix pinv() to properly invert complex matrices.
- Add Hermitian transpose.
- Add tril function.
- Fix missing library detection.
- Include missing CUDA headers in package.
Release 0.02 - (September 21, 2010)¶
- Add documentation.
- Update copyright information.
Release 0.01 - (September 17, 2010)¶
- First public release.