CuPy – NumPy-like API accelerated with CUDA¶
This is the CuPy documentation.
Overview¶
CuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA.
CuPy consists of cupy.ndarray
, the core multi-dimensional array class,
and many functions on it. It supports a subset of numpy.ndarray
interface.
The following is a brief overview of supported subset of NumPy interface:
- Basic indexing (indexing by ints, slices, newaxes, and Ellipsis)
- Most of Advanced indexing (except for some indexing patterns with boolean masks)
- Data types (dtypes):
bool_
,int8
,int16
,int32
,int64
,uint8
,uint16
,uint32
,uint64
,float16
,float32
,float64
,complex64
,complex128
- Most of the array creation routines (
empty
,ones_like
,diag
, etc.) - Most of the array manipulation routines (
reshape
,rollaxis
,concatenate
, etc.) - All operators with broadcasting
- All universal functions for elementwise operations (except those for complex numbers).
- Linear algebra functions, including product (
dot
,matmul
, etc.) and decomposition (cholesky
,svd
, etc.), accelerated by cuBLAS. - Reduction along axes (
sum
,max
,argmax
, etc.)
CuPy also includes the following features for performance:
- User-defined elementwise CUDA kernels
- User-defined reduction CUDA kernels
- Fusing CUDA kernels to optimize user-defined calculation
- Customizable memory allocator and memory pool
- cuDNN utilities
CuPy uses on-the-fly kernel synthesis: when a kernel call is required, it
compiles a kernel code optimized for the shapes and dtypes of given arguments,
sends it to the GPU device, and executes the kernel. The compiled code is
cached to $(HOME)/.cupy/kernel_cache
directory (this cache path can be
overwritten by setting the CUPY_CACHE_DIR
environment variable). It may
make things slower at the first kernel call, though this slow down will be
resolved at the second execution. CuPy also caches the kernel code sent to GPU
device within the process, which reduces the kernel transfer time on further
calls.
Tutorial¶
Basics of CuPy¶
In this section, you will learn about the following things:
- Basics of
cupy.ndarray
- The concept of current device
- host-device and device-device array transfer
Basics of cupy.ndarray¶
CuPy is a GPU array backend that implements a subset of NumPy interface. In the following code, cp is an abbreviation of cupy, as np is numpy as is customarily done:
>>> import numpy as np
>>> import cupy as cp
The cupy.ndarray
class is in its core, which is a compatible GPU alternative of numpy.ndarray
.
>>> x_gpu = cp.array([1, 2, 3])
x_gpu
in the above example is an instance of cupy.ndarray
.
You can see its creation of identical to NumPy
’s one, except that numpy
is replaced with cupy
.
The main difference of cupy.ndarray
from numpy.ndarray
is that the content is allocated on the device memory.
Its data is allocated on the current device, which will be explained later.
Most of the array manipulations are also done in the way similar to NumPy.
Take the Euclidean norm (a.k.a L2 norm) for example.
NumPy has numpy.linalg.norm()
to calculate it on CPU.
>>> x_cpu = np.array([1, 2, 3])
>>> l2_cpu = np.linalg.norm(x_cpu)
We can calculate it on GPU with CuPy in a similar way:
>>> x_gpu = cp.array([1, 2, 3])
>>> l2_gpu = cp.linalg.norm(x_gpu)
CuPy implements many functions on cupy.ndarray
objects.
See the reference for the supported subset of NumPy API.
Understanding NumPy might help utilizing most features of CuPy.
So, we recommend you to read the NumPy documentation.
Current Device¶
CuPy has a concept of the current device, which is the default device on which the allocation, manipulation, calculation etc. of arrays are taken place. Suppose the ID of current device is 0. The following code allocates array contents on GPU 0.
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
The current device can be changed by cupy.cuda.Device.use()
as follows:
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
>>> cp.cuda.Device(1).use()
>>> x_on_gpu1 = cp.array([1, 2, 3, 4, 5])
If you switch the current GPU temporarily, with statement comes in handy.
>>> with cp.cuda.Device(1):
... x_on_gpu1 = cp.array([1, 2, 3, 4, 5])
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
Most operations of CuPy is done on the current device. Be careful that if processing of an array on a non-current device will cause an error:
>>> with cp.cuda.Device(0):
... x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
>>> with cp.cuda.Device(1):
... x_on_gpu0 * 2 # raises error
Traceback (most recent call last):
...
ValueError: Array device must be same as the current device: array device = 0 while current = 1
cupy.ndarray.device
attribute indicates the device on which the array is allocated.
>>> with cp.cuda.Device(1):
... x = cp.array([1, 2, 3, 4, 5])
>>> x.device
<CUDA Device 1>
Note
If the environment has only one device, such explicit device switching is not needed.
Data Transfer¶
Move arrays to a device¶
cupy.asarray()
can be used to move a numpy.ndarray
, a list, or any object
that can be passed to numpy.array()
to the current device:
>>> x_cpu = np.array([1, 2, 3])
>>> x_gpu = cp.asarray(x_cpu) # move the data to the current device.
cupy.asarray()
can accept cupy.ndarray
, which means we can
transfer the array between devices with this function.
>>> with cp.cuda.Device(0):
... x_gpu_0 = cp.ndarray([1, 2, 3]) # create an array in GPU 0
>>> with cp.cuda.Device(1):
... x_gpu_1 = cp.asarray(x_gpu_0) # move the array to GPU 1
Note
cupy.asarray()
does not copy the input array if possible.
So, if you put an array of the current device, it returns the input object itself.
If we do copy the array in this situation, you can use cupy.array()
with copy=True.
Actually cupy.asarray()
is equivalent to cupy.array(arr, dtype, copy=False).
Move array from a device to the host¶
Moving a device array to the host can be done by cupy.asnumpy()
as follows:
>>> x_gpu = cp.array([1, 2, 3]) # create an array in the current device
>>> x_cpu = cp.asnumpy(x_gpu) # move the array to the host.
We can also use cupy.ndarray.get()
:
>>> x_cpu = x_gpu.get()
How to write CPU/GPU agnostic code¶
The compatibility of CuPy with NumPy enables us to write CPU/GPU generic code.
It can be made easy by the cupy.get_array_module()
function.
This function returns the numpy
or cupy
module based on arguments.
A CPU/GPU generic function is defined using it like follows:
>>> # Stable implementation of log(1 + exp(x))
>>> def softplus(x):
... xp = cp.get_array_module(x)
... return xp.maximum(0, x) + xp.log1p(xp.exp(-abs(x)))
User-Defined Kernels¶
CuPy provides easy ways to define two types of CUDA kernels: elementwise kernels and reduction kernels. We first describe how to define and call elementwise kernels, and then describe how to define and call reduction kernels.
Basics of elementwise kernels¶
An elementwise kernel can be defined by the ElementwiseKernel
class.
The instance of this class defines a CUDA kernel which can be invoked by the __call__
method of this instance.
A definition of an elementwise kernel consists of four parts: an input argument list, an output argument list, a loop body code, and the kernel name. For example, a kernel that computes a squared difference \(f(x, y) = (x - y)^2\) is defined as follows:
>>> squared_diff = cp.ElementwiseKernel(
... 'float32 x, float32 y',
... 'float32 z',
... 'z = (x - y) * (x - y)',
... 'squared_diff')
The argument lists consist of comma-separated argument definitions. Each argument definition consists of a type specifier and an argument name. Names of NumPy data types can be used as type specifiers.
Note
n
, i
, and names starting with an underscore _
are reserved for the internal use.
The above kernel can be called on either scalars or arrays with broadcasting:
>>> x = cp.arange(10, dtype=np.float32).reshape(2, 5)
>>> y = cp.arange(5, dtype=np.float32)
>>> squared_diff(x, y)
array([[ 0., 0., 0., 0., 0.],
[25., 25., 25., 25., 25.]], dtype=float32)
>>> squared_diff(x, 5)
array([[25., 16., 9., 4., 1.],
[ 0., 1., 4., 9., 16.]], dtype=float32)
Output arguments can be explicitly specified (next to the input arguments):
>>> z = cp.empty((2, 5), dtype=np.float32)
>>> squared_diff(x, y, z)
array([[ 0., 0., 0., 0., 0.],
[25., 25., 25., 25., 25.]], dtype=float32)
Type-generic kernels¶
If a type specifier is one character, then it is treated as a type placeholder.
It can be used to define a type-generic kernels.
For example, the above squared_diff
kernel can be made type-generic as follows:
>>> squared_diff_generic = cp.ElementwiseKernel(
... 'T x, T y',
... 'T z',
... 'z = (x - y) * (x - y)',
... 'squared_diff_generic')
Type placeholders of a same character in the kernel definition indicate the same type. The actual type of these placeholders is determined by the actual argument type. The ElementwiseKernel class first checks the output arguments and then the input arguments to determine the actual type. If no output arguments are given on the kernel invocation, then only the input arguments are used to determine the type.
The type placeholder can be used in the loop body code:
>>> squared_diff_generic = cp.ElementwiseKernel(
... 'T x, T y',
... 'T z',
... '''
... T diff = x - y;
... z = diff * diff;
... ''',
... 'squared_diff_generic')
More than one type placeholder can be used in a kernel definition. For example, the above kernel can be further made generic over multiple arguments:
>>> squared_diff_super_generic = cp.ElementwiseKernel(
... 'X x, Y y',
... 'Z z',
... 'z = (x - y) * (x - y)',
... 'squared_diff_super_generic')
Note that this kernel requires the output argument explicitly specified, because the type Z
cannot be automatically determined from the input arguments.
Raw argument specifiers¶
The ElementwiseKernel class does the indexing with broadcasting automatically, which is useful to define most elementwise computations.
On the other hand, we sometimes want to write a kernel with manual indexing for some arguments.
We can tell the ElementwiseKernel class to use manual indexing by adding the raw
keyword preceding the type specifier.
We can use the special variable i
and method _ind.size()
for the manual indexing.
i
indicates the index within the loop.
_ind.size()
indicates total number of elements to apply the elementwise operation.
Note that it represents the size after broadcast operation.
For example, a kernel that adds two vectors with reversing one of them can be written as follows:
>>> add_reverse = cp.ElementwiseKernel(
... 'T x, raw T y', 'T z',
... 'z = x + y[_ind.size() - i - 1]',
... 'add_reverse')
(Note that this is an artificial example and you can write such operation just by z = x + y[::-1]
without defining a new kernel).
A raw argument can be used like an array.
The indexing operator y[_ind.size() - i - 1]
involves an indexing computation on y
, so y
can be arbitrarily shaped and strode.
Note that raw arguments are not involved in the broadcasting.
If you want to mark all arguments as raw
, you must specify the size
argument on invocation, which defines the value of _ind.size()
.
Reduction kernels¶
Reduction kernels can be defined by the ReductionKernel
class.
We can use it by defining four parts of the kernel code:
- Identity value: This value is used for the initial value of reduction.
- Mapping expression: It is used for the pre-processing of each element to be reduced.
- Reduction expression: It is an operator to reduce the multiple mapped values.
The special variables
a
andb
are used for its operands. - Post mapping expression: It is used to transform the resulting reduced values.
The special variable
a
is used as its input. Output should be written to the output parameter.
ReductionKernel class automatically inserts other code fragments that are required for an efficient and flexible reduction implementation.
For example, L2 norm along specified axes can be written as follows:
>>> l2norm_kernel = cp.ReductionKernel(
... 'T x', # input params
... 'T y', # output params
... 'x * x', # map
... 'a + b', # reduce
... 'y = sqrt(a)', # post-reduction map
... '0', # identity value
... 'l2norm' # kernel name
... )
>>> x = cp.arange(10, dtype=np.float32).reshape(2, 5)
>>> l2norm_kernel(x, axis=1)
array([ 5.477226 , 15.9687195], dtype=float32)
Note
raw
specifier is restricted for usages that the axes to be reduced are put at the head of the shape.
It means, if you want to use raw
specifier for at least one argument, the axis
argument must be 0
or a contiguous increasing sequence of integers starting from 0
, like (0, 1)
, (0, 1, 2)
, etc.
Reference Manual¶
This is the official reference of CuPy, a multi-dimensional array on CUDA with a subset of NumPy interface.
Multi-Dimensional Array (ndarray)¶
cupy.ndarray
is the CuPy counterpart of NumPy numpy.ndarray
.
It provides an intuitive interface for a fixed-size multidimensional array which resides
in a CUDA device.
For the basic concept of ndarray
s, please refer to the NumPy documentation.
cupy.ndarray |
Multi-dimensional array on a CUDA device. |
Code compatibility features¶
cupy.ndarray
is designed to be interchangeable with numpy.ndarray
in terms of code compatibility as much as possible.
But occasionally, you will need to know whether the arrays you’re handling are cupy.ndarray
or numpy.ndarray
.
One example is when invoking module-level functions such as cupy.sum()
or numpy.sum()
.
In such situations, cupy.get_array_module()
can be used.
cupy.get_array_module |
Returns the array module for arguments. |
Conversion to/from NumPy arrays¶
cupy.ndarray
and numpy.ndarray
are not implicitly convertible to each other.
That means, NumPy functions cannot take cupy.ndarray
s as inputs, and vice versa.
- To convert
numpy.ndarray
tocupy.ndarray
, usecupy.array()
orcupy.asarray()
. - To convert
cupy.ndarray
tonumpy.ndarray
, usecupy.asnumpy()
orcupy.ndarray.get()
.
Note that converting between cupy.ndarray
and numpy.ndarray
incurs data transfer between
the host (CPU) device and the GPU device, which is costly in terms of performance.
cupy.array |
Creates an array on the current device. |
cupy.asarray |
Converts an object to array. |
cupy.asnumpy |
Returns an array on the host memory from an arbitrary source array. |
Universal Functions (ufunc)¶
CuPy provides universal functions (a.k.a. ufuncs) to support various elementwise operations. CuPy’s ufunc supports following features of NumPy’s one:
- Broadcasting
- Output type determination
- Casting rules
CuPy’s ufunc currently does not provide methods such as reduce
, accumulate
, reduceat
, outer
, and at
.
Ufunc class¶
cupy.ufunc |
Universal function. |
Available ufuncs¶
Math operations¶
cupy.add |
Adds two arrays elementwise. |
cupy.subtract |
Subtracts arguments elementwise. |
cupy.multiply |
Multiplies two arrays elementwise. |
cupy.divide |
Elementwise true division (i.e. |
cupy.logaddexp |
Computes log(exp(x1) + exp(x2)) elementwise. |
cupy.logaddexp2 |
Computes log2(exp2(x1) + exp2(x2)) elementwise. |
cupy.true_divide |
Elementwise true division (i.e. |
cupy.floor_divide |
Elementwise floor division (i.e. |
cupy.negative |
Takes numerical negative elementwise. |
cupy.power |
Computes x1 ** x2 elementwise. |
cupy.remainder |
Computes the remainder of Python division elementwise. |
cupy.mod |
Computes the remainder of Python division elementwise. |
cupy.fmod |
Computes the remainder of C division elementwise. |
cupy.absolute |
Elementwise absolute value function. |
cupy.rint |
Rounds each element of an array to the nearest integer. |
cupy.sign |
Elementwise sign function. |
cupy.exp |
Elementwise exponential function. |
cupy.exp2 |
Elementwise exponentiation with base 2. |
cupy.log |
Elementwise natural logarithm function. |
cupy.log2 |
Elementwise binary logarithm function. |
cupy.log10 |
Elementwise common logarithm function. |
cupy.expm1 |
Computes exp(x) - 1 elementwise. |
cupy.log1p |
Computes log(1 + x) elementwise. |
cupy.sqrt |
|
cupy.square |
Elementwise square function. |
cupy.reciprocal |
Computes 1 / x elementwise. |
Trigonometric functions¶
cupy.sin |
Elementwise sine function. |
cupy.cos |
Elementwise cosine function. |
cupy.tan |
Elementwise tangent function. |
cupy.arcsin |
Elementwise inverse-sine function (a.k.a. |
cupy.arccos |
Elementwise inverse-cosine function (a.k.a. |
cupy.arctan |
Elementwise inverse-tangent function (a.k.a. |
cupy.arctan2 |
Elementwise inverse-tangent of the ratio of two arrays. |
cupy.hypot |
Computes the hypoteneous of orthogonal vectors of given length. |
cupy.sinh |
Elementwise hyperbolic sine function. |
cupy.cosh |
Elementwise hyperbolic cosine function. |
cupy.tanh |
Elementwise hyperbolic tangent function. |
cupy.arcsinh |
Elementwise inverse of hyperbolic sine function. |
cupy.arccosh |
Elementwise inverse of hyperbolic cosine function. |
cupy.arctanh |
Elementwise inverse of hyperbolic tangent function. |
cupy.deg2rad |
Converts angles from degrees to radians elementwise. |
cupy.rad2deg |
Converts angles from radians to degrees elementwise. |
Bit-twiddling functions¶
cupy.bitwise_and |
Computes the bitwise AND of two arrays elementwise. |
cupy.bitwise_or |
Computes the bitwise OR of two arrays elementwise. |
cupy.bitwise_xor |
Computes the bitwise XOR of two arrays elementwise. |
cupy.invert |
Computes the bitwise NOT of an array elementwise. |
cupy.left_shift |
Shifts the bits of each integer element to the left. |
cupy.right_shift |
Shifts the bits of each integer element to the right. |
Comparison functions¶
cupy.greater |
Tests elementwise if x1 > x2 . |
cupy.greater_equal |
Tests elementwise if x1 >= x2 . |
cupy.less |
Tests elementwise if x1 < x2 . |
cupy.less_equal |
Tests elementwise if x1 <= x2 . |
cupy.not_equal |
Tests elementwise if x1 != x2 . |
cupy.equal |
Tests elementwise if x1 == x2 . |
cupy.logical_and |
Computes the logical AND of two arrays. |
cupy.logical_or |
Computes the logical OR of two arrays. |
cupy.logical_xor |
Computes the logical XOR of two arrays. |
cupy.logical_not |
Computes the logical NOT of an array. |
cupy.maximum |
Takes the maximum of two arrays elementwise. |
cupy.minimum |
Takes the minimum of two arrays elementwise. |
cupy.fmax |
Takes the maximum of two arrays elementwise. |
cupy.fmin |
Takes the minimum of two arrays elementwise. |
Floating point values¶
cupy.isfinite |
Tests finiteness elementwise. |
cupy.isinf |
Tests if each element is the positive or negative infinity. |
cupy.isnan |
Tests if each element is a NaN. |
cupy.signbit |
Tests elementwise if the sign bit is set (i.e. |
cupy.copysign |
Returns the first argument with the sign bit of the second elementwise. |
cupy.nextafter |
Computes the nearest neighbor float values towards the second argument. |
cupy.modf |
Extracts the fractional and integral parts of an array elementwise. |
cupy.ldexp |
Computes x1 * 2 ** x2 elementwise. |
cupy.frexp |
Decomposes each element to mantissa and two’s exponent. |
cupy.fmod |
Computes the remainder of C division elementwise. |
cupy.floor |
Rounds each element of an array to its floor integer. |
cupy.ceil |
Rounds each element of an array to its ceiling integer. |
cupy.trunc |
Rounds each element of an array towards zero. |
ufunc.at¶
Currently, CuPy does not support at
for ufuncs in general.
However, cupyx.scatter_add()
can substitute add.at
as both behave identically.
Routines¶
The following pages describe NumPy-compatible routines. These functions cover a subset of NumPy routines.
Array Creation Routines¶
Basic creation routines¶
cupy.empty |
Returns an array without initializing the elements. |
cupy.empty_like |
Returns a new array with same shape and dtype of a given array. |
cupy.eye |
Returns a 2-D array with ones on the diagonals and zeros elsewhere. |
cupy.identity |
Returns a 2-D identity array. |
cupy.ones |
Returns a new array of given shape and dtype, filled with ones. |
cupy.ones_like |
Returns an array of ones with same shape and dtype as a given array. |
cupy.zeros |
Returns a new array of given shape and dtype, filled with zeros. |
cupy.zeros_like |
Returns an array of zeros with same shape and dtype as a given array. |
cupy.full |
Returns a new array of given shape and dtype, filled with a given value. |
cupy.full_like |
Returns a full array with same shape and dtype as a given array. |
Creation from other data¶
cupy.array |
Creates an array on the current device. |
cupy.asarray |
Converts an object to array. |
cupy.asanyarray |
Converts an object to array. |
cupy.ascontiguousarray |
Returns a C-contiguous array. |
cupy.copy |
Creates a copy of a given array on the current device. |
Numerical ranges¶
cupy.arange |
Returns an array with evenly spaced values within a given interval. |
cupy.linspace |
Returns an array with evenly-spaced values within a given interval. |
cupy.logspace |
Returns an array with evenly-spaced values on a log-scale. |
cupy.meshgrid |
Return coordinate matrices from coordinate vectors. |
Matrix creation¶
cupy.diag |
Returns a diagonal or a diagonal array. |
cupy.diagflat |
Creates a diagonal array from the flattened input. |
Array Manipulation Routines¶
Basic manipulations¶
cupy.copyto |
Copies values from one array to another with broadcasting. |
Shape manipulation¶
cupy.reshape |
Returns an array with new shape and same elements. |
cupy.ravel |
Returns a flattened array. |
Transposition¶
cupy.moveaxis |
Moves axes of an array to new positions. |
cupy.rollaxis |
Moves the specified axis backwards to the given place. |
cupy.swapaxes |
Swaps the two axes. |
cupy.transpose |
Permutes the dimensions of an array. |
Edit dimensionalities¶
cupy.atleast_1d |
Converts arrays to arrays with dimensions >= 1. |
cupy.atleast_2d |
Converts arrays to arrays with dimensions >= 2. |
cupy.atleast_3d |
Converts arrays to arrays with dimensions >= 3. |
cupy.broadcast |
Object that performs broadcasting. |
cupy.broadcast_arrays |
Broadcasts given arrays. |
cupy.broadcast_to |
Broadcast an array to a given shape. |
cupy.expand_dims |
Expands given arrays. |
cupy.squeeze |
Removes size-one axes from the shape of an array. |
Changing kind of array¶
cupy.asarray |
Converts an object to array. |
cupy.asanyarray |
Converts an object to array. |
cupy.asfortranarray |
Return an array laid out in Fortran order in memory. |
cupy.ascontiguousarray |
Returns a C-contiguous array. |
Joining arrays along axis¶
cupy.concatenate |
Joins arrays along an axis. |
cupy.stack |
Stacks arrays along a new axis. |
cupy.column_stack |
Stacks 1-D and 2-D arrays as columns into a 2-D array. |
cupy.dstack |
Stacks arrays along the third axis. |
cupy.hstack |
Stacks arrays horizontally. |
cupy.vstack |
Stacks arrays vertically. |
Splitting arrays along axis¶
cupy.split |
Splits an array into multiple sub arrays along a given axis. |
cupy.array_split |
Splits an array into multiple sub arrays along a given axis. |
cupy.dsplit |
Splits an array into multiple sub arrays along the third axis. |
cupy.hsplit |
Splits an array into multiple sub arrays horizontally. |
cupy.vsplit |
Splits an array into multiple sub arrays along the first axis. |
Repeating part of arrays along axis¶
cupy.tile |
Construct an array by repeating A the number of times given by reps. |
cupy.repeat |
Repeat arrays along an axis. |
Rearranging elements¶
cupy.flip |
Reverse the order of elements in an array along the given axis. |
cupy.fliplr |
Flip array in the left/right direction. |
cupy.flipud |
Flip array in the up/down direction. |
cupy.reshape |
Returns an array with new shape and same elements. |
cupy.roll |
Roll array elements along a given axis. |
cupy.rot90 |
Rotate an array by 90 degrees in the plane specified by axes. |
Binary Operations¶
Elementwise bit operations¶
cupy.bitwise_and |
Computes the bitwise AND of two arrays elementwise. |
cupy.bitwise_or |
Computes the bitwise OR of two arrays elementwise. |
cupy.bitwise_xor |
Computes the bitwise XOR of two arrays elementwise. |
cupy.invert |
Computes the bitwise NOT of an array elementwise. |
cupy.left_shift |
Shifts the bits of each integer element to the left. |
cupy.right_shift |
Shifts the bits of each integer element to the right. |
Bit packing¶
cupy.packbits |
Packs the elements of a binary-valued array into bits in a uint8 array. |
cupy.unpackbits |
Unpacks elements of a uint8 array into a binary-valued output array. |
Output formatting¶
cupy.binary_repr |
Return the binary representation of the input number as a string. |
FFT Functions¶
Standard FFTs¶
cupy.fft.fft |
Compute the one-dimensional FFT. |
cupy.fft.ifft |
Compute the one-dimensional inverse FFT. |
cupy.fft.fft2 |
Compute the two-dimensional FFT. |
cupy.fft.ifft2 |
Compute the two-dimensional inverse FFT. |
cupy.fft.fftn |
Compute the N-dimensional FFT. |
cupy.fft.ifftn |
Compute the N-dimensional inverse FFT. |
Real FFTs¶
cupy.fft.rfft |
Compute the one-dimensional FFT for real input. |
cupy.fft.irfft |
Compute the one-dimensional inverse FFT for real input. |
cupy.fft.rfft2 |
Compute the two-dimensional FFT for real input. |
cupy.fft.irfft2 |
Compute the two-dimensional inverse FFT for real input. |
cupy.fft.rfftn |
Compute the N-dimensional FFT for real input. |
cupy.fft.irfftn |
Compute the N-dimensional inverse FFT for real input. |
Hermitian FFTs¶
cupy.fft.hfft |
Compute the FFT of a signal that has Hermitian symmetry. |
cupy.fft.ihfft |
Compute the FFT of a signal that has Hermitian symmetry. |
Helper routines¶
cupy.fft.fftfreq |
Return the FFT sample frequencies. |
cupy.fft.rfftfreq |
Return the FFT sample frequencies for real input. |
cupy.fft.fftshift |
Shift the zero-frequency component to the center of the spectrum. |
cupy.fft.ifftshift |
The inverse of fftshift() . |
Normalization¶
The default normalization has the direct transforms unscaled and the inverse transforms are scaled by \(1/n\).
If the ketyword argument norm
is "ortho"
, both transforms will be scaled by \(1/\sqrt{n}\).
Code compatibility features¶
FFT functions of NumPy alway return numpy.ndarray which type is numpy.complex128
or numpy.float64
.
CuPy functions do not follow the behavior, they will return numpy.complex64
or numpy.float32
if the type of the input is numpy.float16
, numpy.float32
, or numpy.complex64
.
Indexing Routines¶
cupy.c_ |
Translates slice objects to concatenation along the second axis. |
cupy.r_ |
Translates slice objects to concatenation along the first axis. |
cupy.nonzero |
Return the indices of the elements that are non-zero. |
cupy.where |
Return elements, either from x or y, depending on condition. |
cupy.ix_ |
Construct an open mesh from multiple sequences. |
cupy.take |
Takes elements of an array at specified indices along an axis. |
cupy.choose |
|
cupy.diag |
Returns a diagonal or a diagonal array. |
cupy.diagonal |
Returns specified diagonals. |
cupy.fill_diagonal |
Fills the main diagonal of the given array of any dimensionality. |
Input and Output¶
NPZ files¶
cupy.load |
Loads arrays or pickled objects from .npy , .npz or pickled file. |
cupy.save |
Saves an array to a binary file in .npy format. |
cupy.savez |
Saves one or more arrays into a file in uncompressed .npz format. |
cupy.savez_compressed |
Saves one or more arrays into a file in compressed .npz format. |
String formatting¶
cupy.array_repr |
Returns the string representation of an array. |
cupy.array_str |
Returns the string representation of the content of an array. |
Base-n representations¶
cupy.binary_repr |
Return the binary representation of the input number as a string. |
cupy.base_repr |
Return a string representation of a number in the given base system. |
Linear Algebra¶
Matrix and vector products¶
cupy.dot |
Returns a dot product of two arrays. |
cupy.vdot |
Returns the dot product of two vectors. |
cupy.inner |
Returns the inner product of two arrays. |
cupy.outer |
Returns the outer product of two vectors. |
cupy.matmul |
Returns the matrix product of two arrays and is the implementation of the @ operator introduced in Python 3.5 following PEP465. |
cupy.tensordot |
Returns the tensor dot product of two arrays along specified axes. |
cupy.einsum |
Evaluates the Einstein summation convention on the operands. |
cupy.kron |
Returns the kronecker product of two arrays. |
Decompositions¶
cupy.linalg.cholesky |
Cholesky decomposition. |
cupy.linalg.qr |
QR decomposition. |
cupy.linalg.svd |
Singular Value Decomposition. |
Matrix eigenvalues¶
cupy.linalg.eigh |
Eigenvalues and eigenvectors of a symmetric matrix. |
cupy.linalg.eigvalsh |
Calculates eigenvalues of a symmetric matrix. |
Norms etc.¶
cupy.linalg.det |
Retruns the deteminant of an array. |
cupy.linalg.norm |
Returns one of matrix norms specified by ord parameter. |
cupy.linalg.matrix_rank |
Return matrix rank of array using SVD method |
cupy.linalg.slogdet |
Returns sign and logarithm of the determinat of an array. |
cupy.trace |
Returns the sum along the diagonals of an array. |
Solving linear equations¶
cupy.linalg.solve |
Solves a linear matrix equation. |
cupy.linalg.tensorsolve |
Solves tensor equations denoted by ax = b . |
cupy.linalg.inv |
Computes the inverse of a matrix. |
cupy.linalg.pinv |
Compute the Moore-Penrose pseudoinverse of a matrix. |
cupy.linalg.tensorinv |
Computes the inverse of a tensor. |
Logic Functions¶
Truth value testing¶
cupy.all |
Tests whether all array elements along a given axis evaluate to True. |
cupy.any |
Tests whether any array elements along a given axis evaluate to True. |
Infinities and NaNs¶
cupy.isfinite |
Tests finiteness elementwise. |
cupy.isinf |
Tests if each element is the positive or negative infinity. |
cupy.isnan |
Tests if each element is a NaN. |
Array type testing¶
cupy.isscalar |
Returns True if the type of num is a scalar type. |
cupy.iscomplex |
Returns a bool array, where True if input element is complex. |
cupy.iscomplexobj |
Check for a complex type or an array of complex numbers. |
cupy.isfortran |
Returns True if the array is Fortran contiguous but not C contiguous. |
cupy.isreal |
Returns a bool array, where True if input element is real. |
cupy.isrealobj |
Return True if x is a not complex type or an array of complex numbers. |
Logic operations¶
cupy.logical_and |
Computes the logical AND of two arrays. |
cupy.logical_or |
Computes the logical OR of two arrays. |
cupy.logical_not |
Computes the logical NOT of an array. |
cupy.logical_xor |
Computes the logical XOR of two arrays. |
Comparison operations¶
cupy.greater |
Tests elementwise if x1 > x2 . |
cupy.greater_equal |
Tests elementwise if x1 >= x2 . |
cupy.less |
Tests elementwise if x1 < x2 . |
cupy.less_equal |
Tests elementwise if x1 <= x2 . |
cupy.equal |
Tests elementwise if x1 == x2 . |
cupy.not_equal |
Tests elementwise if x1 != x2 . |
Mathematical Functions¶
Trigonometric functions¶
cupy.sin |
Elementwise sine function. |
cupy.cos |
Elementwise cosine function. |
cupy.tan |
Elementwise tangent function. |
cupy.arcsin |
Elementwise inverse-sine function (a.k.a. |
cupy.arccos |
Elementwise inverse-cosine function (a.k.a. |
cupy.arctan |
Elementwise inverse-tangent function (a.k.a. |
cupy.hypot |
Computes the hypoteneous of orthogonal vectors of given length. |
cupy.arctan2 |
Elementwise inverse-tangent of the ratio of two arrays. |
cupy.deg2rad |
Converts angles from degrees to radians elementwise. |
cupy.rad2deg |
Converts angles from radians to degrees elementwise. |
cupy.degrees |
Converts angles from radians to degrees elementwise. |
cupy.radians |
Converts angles from degrees to radians elementwise. |
Hyperbolic functions¶
cupy.sinh |
Elementwise hyperbolic sine function. |
cupy.cosh |
Elementwise hyperbolic cosine function. |
cupy.tanh |
Elementwise hyperbolic tangent function. |
cupy.arcsinh |
Elementwise inverse of hyperbolic sine function. |
cupy.arccosh |
Elementwise inverse of hyperbolic cosine function. |
cupy.arctanh |
Elementwise inverse of hyperbolic tangent function. |
Rounding¶
cupy.rint |
Rounds each element of an array to the nearest integer. |
cupy.floor |
Rounds each element of an array to its floor integer. |
cupy.ceil |
Rounds each element of an array to its ceiling integer. |
cupy.trunc |
Rounds each element of an array towards zero. |
cupy.fix |
If given value x is positive, it return floor(x). |
Sums and products¶
cupy.sum |
Returns the sum of an array along given axes. |
cupy.prod |
Returns the product of an array along given axes. |
cupy.cumsum |
Returns the cumulative sum of an array along a given axis. |
cupy.cumprod |
Returns the cumulative product of an array along a given axis. |
Exponential and logarithm functions¶
cupy.exp |
Elementwise exponential function. |
cupy.expm1 |
Computes exp(x) - 1 elementwise. |
cupy.exp2 |
Elementwise exponentiation with base 2. |
cupy.log |
Elementwise natural logarithm function. |
cupy.log10 |
Elementwise common logarithm function. |
cupy.log2 |
Elementwise binary logarithm function. |
cupy.log1p |
Computes log(1 + x) elementwise. |
cupy.logaddexp |
Computes log(exp(x1) + exp(x2)) elementwise. |
cupy.logaddexp2 |
Computes log2(exp2(x1) + exp2(x2)) elementwise. |
Floating point manipulations¶
cupy.signbit |
Tests elementwise if the sign bit is set (i.e. |
cupy.copysign |
Returns the first argument with the sign bit of the second elementwise. |
cupy.ldexp |
Computes x1 * 2 ** x2 elementwise. |
cupy.frexp |
Decomposes each element to mantissa and two’s exponent. |
cupy.nextafter |
Computes the nearest neighbor float values towards the second argument. |
Arithmetic operations¶
cupy.negative |
Takes numerical negative elementwise. |
cupy.add |
Adds two arrays elementwise. |
cupy.subtract |
Subtracts arguments elementwise. |
cupy.multiply |
Multiplies two arrays elementwise. |
cupy.divide |
Elementwise true division (i.e. |
cupy.true_divide |
Elementwise true division (i.e. |
cupy.floor_divide |
Elementwise floor division (i.e. |
cupy.power |
Computes x1 ** x2 elementwise. |
cupy.fmod |
Computes the remainder of C division elementwise. |
cupy.mod |
Computes the remainder of Python division elementwise. |
cupy.remainder |
Computes the remainder of Python division elementwise. |
cupy.modf |
Extracts the fractional and integral parts of an array elementwise. |
cupy.reciprocal |
Computes 1 / x elementwise. |
Miscellaneous¶
cupy.clip |
Clips the values of an array to a given interval. |
cupy.sqrt |
|
cupy.square |
Elementwise square function. |
cupy.absolute |
Elementwise absolute value function. |
cupy.sign |
Elementwise sign function. |
cupy.maximum |
Takes the maximum of two arrays elementwise. |
cupy.minimum |
Takes the minimum of two arrays elementwise. |
cupy.fmax |
Takes the maximum of two arrays elementwise. |
cupy.fmin |
Takes the minimum of two arrays elementwise. |
cupy.blackman |
Returns the Blackman window. |
cupy.hamming |
Returns the Hamming window. |
cupy.hanning |
Returns the Hanning window. |
Random Sampling (cupy.random
)¶
CuPy’s random number generation routines are based on cuRAND.
They cover a small fraction of numpy.random
.
The big difference of cupy.random
from numpy.random
is that cupy.random
supports dtype
option for most functions.
This option enables us to generate float32 values directly without any space overhead.
Sample random data¶
cupy.random.choice |
Returns an array of random values from a given 1-D array. |
cupy.random.rand |
Returns an array of uniform random values over the interval [0, 1) . |
cupy.random.randn |
Returns an array of standard normal random values. |
cupy.random.randint |
Returns a scalar or an array of integer values over [low, high) . |
cupy.random.random_integers |
Return a scalar or an array of integer values over [low, high] |
cupy.random.random_sample |
Returns an array of random values over the interval [0, 1) . |
cupy.random.random |
Returns an array of random values over the interval [0, 1) . |
cupy.random.ranf |
Returns an array of random values over the interval [0, 1) . |
cupy.random.sample |
Returns an array of random values over the interval [0, 1) . |
cupy.random.bytes |
Returns random bytes. |
Distributions¶
cupy.random.gumbel |
Returns an array of samples drawn from a Gumbel distribution. |
cupy.random.lognormal |
Returns an array of samples drawn from a log normal distribution. |
cupy.random.normal |
Returns an array of normally distributed samples. |
cupy.random.standard_normal |
Returns an array of samples drawn from the standard normal distribution. |
cupy.random.uniform |
Returns an array of uniformly-distributed samples over an interval. |
Random number generator¶
cupy.random.seed |
Resets the state of the random number generator with a seed. |
cupy.random.get_random_state |
Gets the state of the random number generator for the current device. |
cupy.random.set_random_state |
Sets the state of the random number generator for the current device. |
cupy.random.RandomState |
Portable container of a pseudo-random number generator. |
Permutations¶
cupy.random.shuffle |
Shuffles an array. |
Sorting, Searching, and Counting¶
cupy.sort |
Returns a sorted copy of an array with a stable sorting algorithm. |
cupy.lexsort |
Perform an indirect sort using an array of keys. |
cupy.argsort |
Returns the indices that would sort an array with a stable sorting. |
cupy.msort |
Returns a copy of an array sorted along the first axis. |
cupy.argmax |
Returns the indices of the maximum along an axis. |
cupy.argmin |
Returns the indices of the minimum along an axis. |
cupy.partition |
Returns a partitioned copy of an array. |
cupy.argpartition |
Returns the indices that would partially sort an array. |
cupy.count_nonzero |
Counts the number of non-zero values in the array. |
cupy.nonzero |
Return the indices of the elements that are non-zero. |
cupy.flatnonzero |
Return indices that are non-zero in the flattened version of a. |
cupy.where |
Return elements, either from x or y, depending on condition. |
Statistics¶
Order statistics¶
cupy.amin |
Returns the minimum of an array or the minimum along an axis. |
cupy.amax |
Returns the maximum of an array or the maximum along an axis. |
cupy.nanmin |
Returns the minimum of an array along an axis ignoring NaN. |
cupy.nanmax |
Returns the maximum of an array along an axis ignoring NaN. |
Means and variances¶
cupy.mean |
Returns the arithmetic mean along an axis. |
cupy.var |
Returns the variance along an axis. |
cupy.std |
Returns the standard deviation along an axis. |
Histograms¶
cupy.bincount |
Count number of occurrences of each value in array of non-negative ints. |
CuPy-specific Functions¶
CuPy-specific functions are placed under cupyx
namespace.
cupyx.rsqrt |
Returns the reciprocal square root. |
cupyx.scatter_add |
Adds given values to specified elements of an array. |
Sparse matrix¶
CuPy supports sparse matrices using cuSPARSE. These matrices have the same interfaces of SciPy’s sparse matrices.
Sparse matrix classes¶
cupy.sparse.coo_matrix |
COOrdinate format sparse matrix. |
cupy.sparse.csr_matrix |
Compressed Sparse Row matrix. |
cupy.sparse.csc_matrix |
Compressed Sparse Column matrix. |
cupy.sparse.dia_matrix |
Sparse matrix with DIAgonal storage. |
cupy.sparse.spmatrix |
Base class of all sparse matrixes. |
Functions¶
Building sparse matrices¶
cupy.sparse.eye |
Creates a sparse matrix with ones on diagonal. |
cupy.sparse.identity |
Creates an identity matrix in sparse format. |
Identifying sparse matrices¶
cupy.sparse.issparse |
Checks if a given matrix is a sparse matrix. |
cupy.sparse.isspmatrix |
Checks if a given matrix is a sparse matrix. |
cupy.sparse.isspmatrix_csc |
Checks if a given matrix is of CSC format. |
cupy.sparse.isspmatrix_csr |
Checks if a given matrix is of CSR format. |
cupy.sparse.isspmatrix_coo |
Checks if a given matrix is of COO format. |
cupy.sparse.isspmatrix_dia |
Checks if a given matrix is of DIA format. |
Linear Algebra¶
cupy.sparse.linalg.lsqr |
Solves linear system with QR decomposition. |
NumPy-CuPy Generic Code Support¶
cupy.get_array_module |
Returns the array module for arguments. |
Low-Level CUDA Support¶
Device management¶
cupy.cuda.Device |
Object that represents a CUDA device. |
Memory management¶
cupy.get_default_memory_pool |
Returns CuPy default memory pool for GPU memory. |
cupy.get_default_pinned_memory_pool |
Returns CuPy default memory pool for pinned memory. |
cupy.cuda.Memory |
Memory allocation on a CUDA device. |
cupy.cuda.PinnedMemory |
Pinned memory allocation on host. |
cupy.cuda.MemoryPointer |
Pointer to a point on a device memory. |
cupy.cuda.PinnedMemoryPointer |
Pointer of a pinned memory. |
cupy.cuda.alloc |
Calls the current allocator. |
cupy.cuda.alloc_pinned_memory |
Calls the current allocator. |
cupy.cuda.set_allocator |
Sets the current allocator for GPU memory. |
cupy.cuda.set_pinned_memory_allocator |
Sets the current allocator for the pinned memory. |
cupy.cuda.MemoryPool |
Memory pool for all GPU devices on the host. |
cupy.cuda.PinnedMemoryPool |
Memory pool for pinned memory on the host. |
Memory hook¶
cupy.cuda.MemoryHook |
Base class of hooks for Memory allocations. |
cupy.cuda.memory_hooks.DebugPrintHook |
Memory hook that prints debug information. |
cupy.cuda.memory_hooks.LineProfileHook |
Code line CuPy memory profiler. |
Streams and events¶
cupy.cuda.Stream |
CUDA stream. |
cupy.cuda.get_current_stream |
Gets current CUDA stream. |
cupy.cuda.Event |
CUDA event, a synchronization point of CUDA streams. |
cupy.cuda.get_elapsed_time |
Gets the elapsed time between two events. |
Profiler¶
cupy.cuda.profile |
Enable CUDA profiling during with statement. |
cupy.cuda.profiler.initialize |
Initialize the CUDA profiler. |
cupy.cuda.profiler.start |
Enable profiling. |
cupy.cuda.profiler.stop |
Disable profiling. |
cupy.cuda.nvtx.Mark |
Marks an instantaneous event (marker) in the application. |
cupy.cuda.nvtx.MarkC |
Marks an instantaneous event (marker) in the application. |
cupy.cuda.nvtx.RangePush |
Starts a nested range. |
cupy.cuda.nvtx.RangePushC |
Starts a nested range. |
cupy.cuda.nvtx.RangePop |
Ends a nested range. |
Kernel binary memoization¶
cupy.memoize |
Makes a function memoizing the result for each argument and device. |
cupy.clear_memo |
Clears the memoized results for all functions decorated by memoize. |
Custom kernels¶
cupy.ElementwiseKernel |
User-defined elementwise kernel. |
cupy.ReductionKernel |
User-defined reduction kernel. |
Testing Modules¶
CuPy offers testing utilities to support unit testing.
They are under namespace cupy.testing
.
Standard Assertions¶
The assertions have same names as NumPy’s ones.
The difference from NumPy is that they can accept both numpy.ndarray
and cupy.ndarray
.
cupy.testing.assert_allclose |
Raises an AssertionError if objects are not equal up to desired tolerance. |
cupy.testing.assert_array_almost_equal |
Raises an AssertionError if objects are not equal up to desired precision. |
cupy.testing.assert_array_almost_equal_nulp |
Compare two arrays relatively to their spacing. |
cupy.testing.assert_array_max_ulp |
Check that all items of arrays differ in at most N Units in the Last Place. |
cupy.testing.assert_array_equal |
Raises an AssertionError if two array_like objects are not equal. |
cupy.testing.assert_array_list_equal |
Compares lists of arrays pairwise with assert_array_equal . |
cupy.testing.assert_array_less |
Raises an AssertionError if array_like objects are not ordered by less than. |
NumPy-CuPy Consistency Check¶
The following decorators are for testing consistency between CuPy’s functions and corresponding NumPy’s ones.
cupy.testing.numpy_cupy_allclose |
Decorator that checks NumPy results and CuPy ones are close. |
cupy.testing.numpy_cupy_array_almost_equal |
Decorator that checks NumPy results and CuPy ones are almost equal. |
cupy.testing.numpy_cupy_array_almost_equal_nulp |
Decorator that checks results of NumPy and CuPy are equal w.r.t. |
cupy.testing.numpy_cupy_array_max_ulp |
Decorator that checks results of NumPy and CuPy ones are equal w.r.t. |
cupy.testing.numpy_cupy_array_equal |
Decorator that checks NumPy results and CuPy ones are equal. |
cupy.testing.numpy_cupy_array_list_equal |
Decorator that checks the resulting lists of NumPy and CuPy’s one are equal. |
cupy.testing.numpy_cupy_array_less |
Decorator that checks the CuPy result is less than NumPy result. |
cupy.testing.numpy_cupy_raises |
Decorator that checks the NumPy and CuPy throw same errors. |
Parameterized dtype Test¶
The following decorators offer the standard way for parameterized test with respect to single or the combination of dtype(s).
cupy.testing.for_dtypes |
Decorator for parameterized dtype test. |
cupy.testing.for_all_dtypes |
Decorator that checks the fixture with all dtypes. |
cupy.testing.for_float_dtypes |
Decorator that checks the fixture with float dtypes. |
cupy.testing.for_signed_dtypes |
Decorator that checks the fixture with signed dtypes. |
cupy.testing.for_unsigned_dtypes |
Decorator that checks the fixture with unsinged dtypes. |
cupy.testing.for_int_dtypes |
Decorator that checks the fixture with integer and optionally bool dtypes. |
cupy.testing.for_complex_dtypes |
Decorator that checks the fixture with complex dtypes. |
cupy.testing.for_dtypes_combination |
Decorator that checks the fixture with a product set of dtypes. |
cupy.testing.for_all_dtypes_combination |
Decorator that checks the fixture with a product set of all dtypes. |
cupy.testing.for_signed_dtypes_combination |
Decorator for parameterized test w.r.t. |
cupy.testing.for_unsigned_dtypes_combination |
Decorator for parameterized test w.r.t. |
cupy.testing.for_int_dtypes_combination |
Decorator for parameterized test w.r.t. |
Parameterized order Test¶
The following decorators offer the standard way to parameterize tests with orders.
cupy.testing.for_orders |
Decorator to parameterize tests with order. |
cupy.testing.for_CF_orders |
Decorator that checks the fixture with orders ‘C’ and ‘F’. |
Profiling¶
time range¶
cupy.prof.TimeRangeDecorator |
Decorator to mark function calls with range in NVIDIA profiler |
cupy.prof.time_range |
A context manager to describe the enclosed block as a nested range |
Environment variables¶
Here are the environment variables CuPy uses.
CUPY_CACHE_DIR |
Path to the directory to store kernel cache.
${HOME}/.cupy/kernel_cache is used by default.
See Overview for details. |
CUPY_CACHE_SAVE_CUDA_SOURCE |
If set to 1, CUDA source file will be saved along with compiled binary in the cache directory for debug purpose. It is disabled by default. Note: source file will not be saved if the compiled binary is already stored in the cache. |
CUPY_DUMP_CUDA_SOURCE_ON_ERROR |
If set to 1, when CUDA kernel compilation fails, CuPy dumps CUDA kernel code to standard error. It is disabled by default. |
For install¶
These environment variables are only used during installation.
CUDA_PATH |
Path to the directory containing CUDA.
The parent of the directory containing nvcc is used as default.
When nvcc is not found, /usr/local/cuda is used.
See Working with Custom CUDA Installation for details. |
NVCC |
Define the compiler to use when compiling CUDA files. |
Difference between CuPy and NumPy¶
The interface of CuPy is designed to obey that of NumPy. However, there are some differeneces.
Cast behavior from float to integer¶
Some casting behaviors from float to integer are not defined in C++ specification. The casting from a negative float to unsigned integer and infinity to integer is one of such examples. The behavior of NumPy depends on your CPU architecture. This is Intel CPU result.
>>> np.array([-1], dtype=np.float32).astype(np.uint32)
array([4294967295], dtype=uint32)
>>> cupy.array([-1], dtype=np.float32).astype(np.uint32)
array([0], dtype=uint32)
>>> np.array([float('inf')], dtype=np.float32).astype(np.int32)
array([-2147483648], dtype=int32)
>>> cupy.array([float('inf')], dtype=np.float32).astype(np.int32)
array([2147483647], dtype=int32)
Random methods support dtype argument¶
NumPy’s random value generator does not support dtype option and it always returns a float32
value.
We support the option in CuPy because cuRAND, which is used in CuPy, supports any types of float values.
>>> np.random.randn(dtype=np.float32)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: randn() got an unexpected keyword argument 'dtype'
>>> cupy.random.randn(dtype=np.float32)
array(0.10689262300729752, dtype=float32)
Out-of-bounds indices¶
CuPy handles out-of-bounds indices differently by default from NumPy when using integer array indexing. NumPy handles them by raising an error, but CuPy wraps around them.
>>> x = np.array([0, 1, 2])
>>> x[[1, 3]] = 10
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 3 is out of bounds for axis 1 with size 3
>>> x = cupy.array([0, 1, 2])
>>> x[[1, 3]] = 10
>>> x
array([10, 10, 2])
Duplicate values in indices¶
CuPy’s __setitem__
behaves differently from NumPy when integer arrays
reference the same location multiple times.
In that case, the value that is actually stored is undefined.
Here is an example of CuPy.
>>> a = cupy.zeros((2,))
>>> i = cupy.arange(10000) % 2
>>> v = cupy.arange(10000).astype(np.float32)
>>> a[i] = v
>>> a
array([ 9150., 9151.])
NumPy stores the value corresponding to the last element among elements referencing duplicate locations.
>>> a_cpu = np.zeros((2,))
>>> i_cpu = np.arange(10000) % 2
>>> v_cpu = np.arange(10000).astype(np.float32)
>>> a_cpu[i_cpu] = v_cpu
>>> a_cpu
array([9998., 9999.])
Zero-dimensional array¶
Reduction methods¶
NumPy’s reduction functions (e.g. numpy.sum()
) return scalar values (e.g. numpy.float32
).
However CuPy counterparts return zero-dimensional cupy.ndarray
s.
That is because CuPy scalar values (e.g. cupy.float32
) are aliases of NumPy scalar values and are allocated in CPU memory.
If these types were returned, it would be required to synchronize between GPU and CPU.
If you want to use scalar values, cast the returned arrays explicitly.
>>> type(np.sum(np.arange(3))) == np.int64
True
>>> type(cupy.sum(cupy.arange(3))) == cupy.core.core.ndarray
True
Type promotion¶
CuPy automatically promotes dtypes of cupy.ndarray
s in a function with two or more operands, the result dtype is determined by the dtypes of the inputs.
This is different from NumPy’s rule on type promotion, when operands contain zero-dimensional arrays.
Zero-dimensional numpy.ndarray
s are treated as if they were scalar values if they appear in operands of NumPy’s function,
This may affect the dtype of its output, depending on the values of the “scalar” inputs.
>>> (np.array(3, dtype=np.int32) * np.array([1., 2.], dtype=np.float32)).dtype
dtype('float32')
>>> (np.array(300000, dtype=np.int32) * np.array([1., 2.], dtype=np.float32)).dtype
dtype('float64')
>>> (cupy.array(3, dtype=np.int32) * cupy.array([1., 2.], dtype=np.float32)).dtype
dtype('float64')
Data types¶
Data type of CuPy arrays cannot be non-numeric like strings and objects. See Overview for details.
Array creation from Python objects¶
Currently, cupy.array()
or cupy.asarray()
cannot create an array from Python object containing CuPy array (e.g., a list of CuPy arrays).
Use cupy.stack()
instead.
>>> data_cpu = [np.arange(10), np.arange(10)]
>>> np.asarray(data_cpu)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> data_gpu = [cupy.arange(10), cupy.arange(10)]
>>> cupy.asarray(data_gpu)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unsupported dtype object
>>> cupy.stack(data_gpu)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
Universal Functions only work with CuPy array or scalar¶
Unlike NumPy, Universal Functions in CuPy only work with CuPy array or scalar.
They do not accept other objects (e.g., lists or numpy.ndarray
).
>>> np.power([np.arange(5)], 2)
array([[ 0, 1, 4, 9, 16]])
>>> cupy.power([cupy.arange(5)], 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Unsupported type <class 'list'>
API Compatibility Policy¶
This document expresses the design policy on compatibilities of CuPy APIs. Development team should obey this policy on deciding to add, extend, and change APIs and their behaviors.
This document is written for both users and developers. Users can decide the level of dependencies on CuPy’s implementations in their codes based on this document. Developers should read through this document before creating pull requests that contain changes on the interface. Note that this document may contain ambiguities on the level of supported compatibilities.
Versioning and Backward Compatibilities¶
The updates of CuPy are classified into three levels: major, minor, and revision. These types have distinct levels of backward compatibilities.
- Major update contains disruptive changes that break the backward compatibility.
- Minor update contains addition and extension to the APIs keeping the supported backward compatibility.
- Revision update contains improvements on the API implementations without changing any API specifications.
Note that we do not support full backward compatibility, which is almost infeasible for Python-based APIs, since there is no way to completely hide the implementation details.
Processes to Break Backward Compatibilities¶
Deprecation, Dropping, and Its Preparation¶
Any APIs may be deprecated at some minor updates. In such a case, the deprecation note is added to the API documentation, and the API implementation is changed to fire deprecation warning (if possible). There should be another way to reimplement the same things previously written with the deprecated APIs.
Any APIs may be marked as to be dropped in the future. In such a case, the dropping is stated in the documentation with the major version number on which the API is planned to be dropped, and the API implementation is changed to fire the future warning (if possible).
The actual dropping should be done through the following steps:
- Make the API deprecated. At this point, users should not need the deprecated API in their new application codes.
- After that, mark the API as to be dropped in the future. It must be done in the minor update different from that of the deprecation.
- At the major version announced in the above update, drop the API.
Consequently, it takes at least two minor versions to drop any APIs after the first deprecation.
API Changes and Its Preparation¶
Any APIs may be marked as to be changed in the future for changes without backward compatibility. In such a case, the change is stated in the documentation with the version number on which the API is planned to be changed, and the API implementation is changed to fire the future warning on the certain usages.
The actual change should be done in the following steps:
- Announce that the API will be changed in the future. At this point, the actual version of change need not be accurate.
- After the announcement, mark the API as to be changed in the future with version number of planned changes. At this point, users should not use the marked API in their new application codes.
- At the major update announced in the above update, change the API.
Supported Backward Compatibility¶
This section defines backward compatibilities that minor updates must maintain.
Documented Interface¶
CuPy has the official API documentation. Many applications can be written based on the documented features. We support backward compatibilities of documented features. In other words, codes only based on the documented features run correctly with minor/revision-updated versions.
Developers are encouraged to use apparent names for objects of implementation details. For example, attributes outside of the documented APIs should have one or more underscores at the prefix of their names.
Undocumented behaviors¶
Behaviors of CuPy implementation not stated in the documentation are undefined. Undocumented behaviors are not guaranteed to be stable between different minor/revision versions.
Minor update may contain changes to undocumented behaviors. For example, suppose an API X is added at the minor update. In the previous version, attempts to use X cause AttributeError. This behavior is not stated in the documentation, so this is undefined. Thus, adding the API X in minor version is permissible.
Revision update may also contain changes to undefined behaviors. Typical example is a bug fix. Another example is an improvement on implementation, which may change the internal object structures not shown in the documentation. As a consequence, even revision updates do not support compatibility of pickling, unless the full layout of pickled objects is clearly documented.
Documentation Error¶
Compatibility is basically determined based on the documentation, though it sometimes contains errors. It may make the APIs confusing to assume the documentation always stronger than the implementations. We therefore may fix the documentation errors in any updates that may break the compatibility in regard to the documentation.
Note
Developers MUST NOT fix the documentation and implementation of the same functionality at the same time in revision updates as “bug fix”. Such a change completely breaks the backward compatibility. If you want to fix the bugs in both sides, first fix the documentation to fit it into the implementation, and start the API changing procedure described above.
Object Attributes and Properties¶
Object attributes and properties are sometimes replaced by each other at minor updates. It does not break the user codes, except the codes depend on how the attributes and properties are implemented.
Functions and Methods¶
Methods may be replaced by callable attributes keeping the compatibility of parameters and return values in minor updates. It does not break the user codes, except the codes depend on how the methods and callable attributes are implemented.
Exceptions and Warnings¶
The specifications of raising exceptions are considered as a part of standard backward compatibilities. No exception is raised in the future versions with correct usages that the documentation allows, unless the API changing process is completed.
On the other hand, warnings may be added at any minor updates for any APIs. It means minor updates do not keep backward compatibility of warnings.
Installation Compatibility¶
The installation process is another concern of compatibilities. We support environmental compatibilities in the following ways.
- Any changes of dependent libraries that force modifications on the existing environments must be done in major updates.
Such changes include following cases:
- dropping supported versions of dependent libraries (e.g. dropping cuDNN v2)
- adding new mandatory dependencies (e.g. adding h5py to setup_requires)
- Supporting optional packages/libraries may be done in minor updates (e.g. supporting h5py in optional features).
Note
The installation compatibility does not guarantee that all the features of CuPy correctly run on supported environments. It may contain bugs that only occurs in certain environments. Such bugs should be fixed in some updates.
Contribution Guide¶
This is a guide for all contributions to CuPy. The development of CuPy is running on the official repository at GitHub. Anyone that wants to register an issue or to send a pull request should read through this document.
Classification of Contributions¶
There are several ways to contribute to CuPy community:
- Registering an issue
- Sending a pull request (PR)
- Sending a question to CuPy User Group
- Writing a post about CuPy
This document mainly focuses on 1 and 2, though other contributions are also appreciated.
Release and Milestone¶
We are using GitHub Flow as our basic working process. In particular, we are using the master branch for our development, and releases are made as tags.
Releases are classified into three groups: major, minor, and revision. This classification is based on following criteria:
- Major update contains disruptive changes that break the backward compatibility.
- Minor update contains additions and extensions to the APIs keeping the supported backward compatibility.
- Revision update contains improvements on the API implementations without changing any API specification.
The release classification is reflected into the version number x.y.z, where x, y, and z corresponds to major, minor, and revision updates, respectively.
We set a milestone for an upcoming release. The milestone is of name ‘vX.Y.Z’, where the version number represents a revision release at the outset. If at least one feature PR is merged in the period, we rename the milestone to represent a minor release (see the next section for the PR types).
See also API Compatibility Policy.
Issues and PRs¶
Issues and PRs are classified into following categories:
- Bug: bug reports (issues) and bug fixes (PRs)
- Enhancement: implementation improvements without breaking the interface
- Feature: feature requests (issues) and their implementations (PRs)
- NoCompat: disrupts backward compatibility
- Test: test fixes and updates
- Document: document fixes and improvements
- Example: fixes and improvements on the examples
- Install: fixes installation script
- Contribution-Welcome: issues that we request for contribution (only issues are categorized to this)
- Other: other issues and PRs
Issues and PRs are labeled by these categories. This classification is often reflected into its corresponding release category: Feature issues/PRs are contained into minor/major releases and NoCompat issues/PRs are contained into major releases, while other issues/PRs can be contained into any releases including revision ones.
On registering an issue, write precise explanations on what you want CuPy to be. Bug reports must include necessary and sufficient conditions to reproduce the bugs. Feature requests must include what you want to do (and why you want to do, if needed). You can contain your thoughts on how to realize it into the feature requests, though what part is most important for discussions.
Warning
If you have a question on usages of CuPy, it is highly recommended to send a post to CuPy User Group instead of the issue tracker. The issue tracker is not a place to share knowledge on practices. We may redirect question issues to CuPy User Group.
If you can write code to fix an issue, send a PR to the master branch. Before writing your code for PRs, read through the Coding Guidelines. The description of any PR must contain a precise explanation of what and how you want to do; it is the first documentation of your code for developers, a very important part of your PR.
Once you send a PR, it is automatically tested on Travis CI for Linux and Mac OS X, and on AppVeyor for Windows. Your PR need to pass at least the test for Linux on Travis CI. After the automatic test passes, some of the core developers will start reviewing your code. Note that this automatic PR test only includes CPU tests.
Note
We are also running continuous integration with GPU tests for the master branch. Since this service is running on our internal server, we do not use it for automatic PR tests to keep the server secure.
Even if your code is not complete, you can send a pull request as a work-in-progress PR by putting the [WIP]
prefix to the PR title.
If you write a precise explanation about the PR, core developers and other contributors can join the discussion about how to proceed the PR.
Coding Guidelines¶
We use PEP8 and a part of OpenStack Style Guidelines related to general coding style as our basic style guidelines.
To check your code, use autopep8
and flake8
command installed by hacking
package:
$ pip install autopep8 hacking
$ autopep8 --global-config .pep8 path/to/your/code.py
$ flake8 path/to/your/code.py
To check Cython code, use .flake8.cython
configuration file:
$ flake8 --config=.flake8.cython path/to/your/cython/code.pyx
The autopep8
supports automatically correct Python code to conform to the PEP 8 style guide:
$ autopep8 --in-place --global-config .pep8 path/to/your/code.py
The flake8
command lets you know the part of your code not obeying our style guidelines.
Before sending a pull request, be sure to check that your code passes the flake8
checking.
Note that flake8
command is not perfect.
It does not check some of the style guidelines.
Here is a (not-complete) list of the rules that flake8
cannot check.
- Relative imports are prohibited. [H304]
- Importing non-module symbols is prohibited.
- Import statements must be organized into three parts: standard libraries, third-party libraries, and internal imports. [H306]
In addition, we restrict the usage of shortcut symbols in our code base.
They are symbols imported by packages and sub-packages of cupy
.
For example, cupy.cuda.Device
is a shortcut of cupy.cuda.device.Device
.
It is not allowed to use such shortcuts in the ``cupy`` library implementation.
Note that you can still use them in tests and examples directories.
Once you send a pull request, your coding style is automatically checked by Travis-CI. The reviewing process starts after the check passes.
The CuPy is designed based on NumPy’s API design. CuPy’s source code and documents contain the original NumPy ones. Please note the followings when writing the document.
- In order to identify overlapping parts, it is preferable to add some remarks that this document is just copied or altered from the original one. It is also preferable to briefly explain the specification of the function in a short paragraph, and refer to the corresponding function in NumPy so that users can read the detailed document. However, it is possible to include a complete copy of the document with such a remark if users cannot summarize in such a way.
- If a function in CuPy only implements a limited amount of features in the original one, users should explicitly describe only what is implemented in the document.
Testing Guidelines¶
Testing is one of the most important part of your code. You must test your code by unit tests following our testing guidelines.
Note that we are using pytest and mock package for testing, so install them before writing your code:
$ pip install pytest mock
In order to run unit tests at the repository root, you first have to build Cython files in place by running the following command:
$ pip install -e .
Note
When you modify *.pxd
files, before running pip install -e .
, you must clean *.cpp
and *.so
files once with the following command, because Cython does not automatically rebuild those files nicely:
$ git clean -fdx
Note
It’s not officially supported, but you can use ccache to reduce compilation time. On Ubuntu 16.04, you can set up as follows:
$ sudo apt-get install ccache
$ export PATH=/usr/lib/ccache:$PATH
See ccache for details.
If you want to use ccache for nvcc, please install ccache v3.3 or later.
You also need to set environment variable NVCC='ccache nvcc'
.
Once Cython modules are built, you can run unit tests by running the following command at the repository root:
$ python -m pytest
CUDA must be installed to run unit tests.
Some GPU tests require cuDNN to run.
In order to skip unit tests that require cuDNN, specify -m='not cudnn'
option:
$ python -m pytest path/to/your/test.py -m='not cudnn'
Some GPU tests involve multiple GPUs.
If you want to run GPU tests with insufficient number of GPUs, specify the number of available GPUs to CUPY_TEST_GPU_LIMIT
.
For example, if you have only one GPU, launch pytest
by the following command to skip multi-GPU tests:
$ export CUPY_TEST_GPU_LIMIT=1
$ python -m pytest path/to/gpu/test.py
Tests are put into the tests/cupy_tests and tests/install_tests directories. These have the same structure as that of cupy and install directories, respectively. In order to enable test runner to find test scripts correctly, we are using special naming convention for the test subdirectories and the test scripts.
- The name of each subdirectory of tests must end with the
_tests
suffix. - The name of each test script must start with the
test_
prefix.
Following this naming convention, you can run all the tests by running the following command at the repository root:
$ python -m pytest
Or you can also specify a root directory to search test scripts from:
$ python -m pytest tests/cupy_tests # to just run tests of CuPy
$ python -m pytest tests/install_tests # to just run tests of installation modules
If you modify the code related to existing unit tests, you must run appropriate commands.
There are many examples of unit tests under the tests directory.
They simply use the unittest
package of the standard library.
Even if your patch includes GPU-related code, your tests should not fail without GPU capability.
Test functions that require CUDA must be tagged by the cupy.testing.attr.gpu
:
import unittest
from cupy.testing import attr
class TestMyFunc(unittest.TestCase):
...
@attr.gpu
def test_my_gpu_func(self):
...
The functions tagged by the gpu
decorator are skipped if CUPY_TEST_GPU_LIMIT=0
environment variable is set.
We also have the cupy.testing.attr.cudnn
decorator to let pytest
know that the test depends on cuDNN.
The test functions decorated by cudnn
are skipped if -m='not cudnn'
is given.
The test functions decorated by gpu
must not depend on multiple GPUs.
In order to write tests for multiple GPUs, use cupy.testing.attr.multi_gpu()
or cupy.testing.attr.multi_gpu()
decorators instead:
import unittest
from cupy.testing import attr
class TestMyFunc(unittest.TestCase):
...
@attr.multi_gpu(2) # specify the number of required GPUs here
def test_my_two_gpu_func(self):
...
Once you send a pull request, Travis-CI automatically checks if your code meets our coding guidelines described above. Since Travis-CI does not support CUDA, we cannot run unit tests automatically. The reviewing process starts after the automatic check passes. Note that reviewers will test your code without the option to check CUDA-related code.
We leverage doctest as well. You can run doctest by typing make doctest
at the docs directory:
$ cd docs
$ make doctest
Installation Guide¶
Recommended Environments¶
We recommend the following Linux distributions.
Note
We are automatically testing CuPy on all the recommended environments above. We cannot guarantee that CuPy works on other environments including Windows and macOS, even if CuPy may seem to be running correctly.
Requirements¶
You need to have the following components to use CuPy.
- NVIDIA CUDA GPU
- Compute Capability of the GPU must be at least 3.0.
- CUDA Toolkit
- Supported Versions: 7.0, 7.5, 8.0, 9.0, 9.1 and 9.2.
- If you have multiple versions of CUDA Toolkit installed, CuPy will choose one of the CUDA installations automatically. See Working with Custom CUDA Installation for details.
- Python
- Supported Versions: 2.7.6+, 3.4.3+, 3.5.1+ and 3.6.0+.
- NumPy
- Supported Versions: 1.9, 1.10, 1.11, 1.12, 1.13 and 1.14.
- NumPy will be installed automatically during the installation of CuPy.
Before installing CuPy, we recommend you to upgrade setuptools
and pip
:
$ pip install -U setuptools pip
Optional Libraries¶
Some features in CuPy will only be enabled if the corresponding libraries are installed.
Install CuPy¶
Wheels (precompiled binary packages) are available for the recommended environments above. Package names are different depending on the CUDA version you have installed on your host.
(For CUDA 8.0)
$ pip install cupy-cuda80
(For CUDA 9.0)
$ pip install cupy-cuda90
(For CUDA 9.1)
$ pip install cupy-cuda91
(For CUDA 9.2)
$ pip install cupy-cuda92
Note
The latest version of cuDNN and NCCL libraries are included in these wheels. You don’t have to install them manually.
When using wheels, please be careful not to install multiple CuPy packages at the same time.
Any of these packages and cupy
package (source installation) conflict with each other.
Please make sure that only one CuPy package (cupy
or cupy-cudaXX
where XX is a CUDA version) is installed:
$ pip freeze | grep cupy
Install CuPy from Source¶
It is recommended to use wheels whenever possible. However, if wheels cannot meet your requirements (e.g., you are running non-Linux environment or want to use a version of CUDA / cuDNN / NCCL not supported by wheels), you can also build CuPy from source.
When installing from source, C++ compiler such as g++
is required.
You need to install it before installing CuPy.
This is typical installation method for each platform:
# Ubuntu 14.04
$ apt-get install g++
# CentOS 7
$ yum install gcc-c++
Note
When installing CuPy from source, features provided by optional libraries (cuDNN and NCCL) will be disabled if these libraries are not available at the time of installation. See Installing cuDNN and NCCL for the instructions.
Note
If you upgrade or downgrade the version of CUDA Toolkit, cuDNN or NCCL, you may need to reinstall CuPy. See Reinstall CuPy for details.
Using Tarball¶
The tarball of the source tree is available via pip download cupy
or from the release notes page.
You can install CuPy from the tarball:
$ pip install cupy-x.x.x.tar.gz
You can also install the development version of CuPy from a cloned Git repository:
$ git clone https://github.com/cupy/cupy.git
$ cd cupy
$ pip install .
If you are using source tree downloaded from GitHub, you need to install Cython 0.26.1 or later (pip install cython
).
Uninstall CuPy¶
Use pip to uninstall CuPy:
$ pip uninstall cupy
Note
When you upgrade Chainer, pip
sometimes installs the new version without removing the old one in site-packages
.
In this case, pip uninstall
only removes the latest one.
To ensure that CuPy is completely removed, run the above command repeatedly until pip
returns an error.
Note
If you are using a wheel, cupy
shall be replaced with cupy-cudaXX
(where XX is a CUDA version number).
Upgrade CuPy¶
Just use pip install
with -U
option:
$ pip install -U cupy
Note
If you are using a wheel, cupy
shall be replaced with cupy-cudaXX
(where XX is a CUDA version number).
Reinstall CuPy¶
If you want to reinstall CuPy, please uninstall CuPy and then install it.
When reinstalling CuPy, we recommend to use --no-cache-dir
option as pip
caches the previously built binaries:
$ pip uninstall cupy
$ pip install cupy --no-cache-dir
Note
If you are using a wheel, cupy
shall be replaced with cupy-cudaXX
(where XX is a CUDA version number).
Run CuPy with Docker¶
We are providing the official Docker image. Use nvidia-docker command to run CuPy image with GPU. You can login to the environment with bash, and run the Python interpreter:
$ nvidia-docker run -it cupy/cupy /bin/bash
Or run the interpreter directly:
$ nvidia-docker run -it cupy/cupy /usr/bin/python
FAQ¶
Warning message “cuDNN is not enabled” appears when using Chainer¶
You failed to build CuPy with cuDNN. If you don’t need cuDNN, ignore this message. Otherwise, retry to install CuPy with cuDNN.
See Installing cuDNN and NCCL and pip fails to install CuPy for details.
pip
fails to install CuPy¶
Please make sure that you are using the latest setuptools
and pip
:
$ pip install -U setuptools pip
Use -vvvv
option with pip
command.
This will display all logs of installation:
$ pip install cupy -vvvv
If you are using sudo
to install CuPy, note that sudo
command does not propagate environment variables.
If you need to pass environment variable (e.g., CUDA_PATH
), you need to specify them inside sudo
like this:
$ sudo CUDA_PATH=/opt/nvidia/cuda pip install cupy
If you are using certain versions of conda, it may fail to build CuPy with error g++: error: unrecognized command line option ‘-R’
.
This is due to a bug in conda (see conda/conda#6030 for details).
If you encounter this problem, please downgrade or upgrade it.
Installing cuDNN and NCCL¶
We recommend installing cuDNN and NCCL using binary packages (i.e., using apt
or yum
) provided by NVIDIA.
If you want to install tar-gz version of cuDNN and NCCL, we recommend you to install it under CUDA directory.
For example, if you are using Ubuntu, copy *.h
files to include
directory and *.so*
files to lib64
directory:
$ cp /path/to/cudnn.h $CUDA_PATH/include
$ cp /path/to/libcudnn.so* $CUDA_PATH/lib64
The destination directories depend on your environment.
If you want to use cuDNN or NCCL installed in another directory, please use CFLAGS
, LDFLAGS
and LD_LIBRARY_PATH
environment variables before installing CuPy:
export CFLAGS=-I/path/to/cudnn/include
export LDFLAGS=-L/path/to/cudnn/lib
export LD_LIBRARY_PATH=/path/to/cudnn/lib:$LD_LIBRARY_PATH
Note
Use full paths for the environment variables.
distutils
that is used in the setup script does not expand the home directory mark ~
.
Working with Custom CUDA Installation¶
If you have installed CUDA on the non-default directory or have multiple CUDA versions installed, you may need to manually specify the CUDA installation directory to be used by CuPy.
CuPy uses the first CUDA installation directory found by the following order.
CUDA_PATH
environment variable.- The parent directory of
nvcc
command. CuPy looks fornvcc
command in each directory set inPATH
environment variable. /usr/local/cuda
For example, you can tell CuPy to use non-default CUDA directory by CUDA_PATH
environment variable:
$ CUDA_PATH=/opt/nvidia/cuda pip install cupy
Note
CUDA installation discovery is also performed at runtime using the rule above.
Depending on your system configuration, you may also need to set LD_LIBRARY_PATH
environment variable to $CUDA_PATH/lib64
at runtime.
Using custom nvcc
command during installation¶
If you want to use a custom nvcc
compiler (for example, to use ccache
) to build CuPy, please set NVCC
environment variables before installing CuPy:
export NVCC='ccache nvcc'
Note
During runtime, you don’t need to set this environment variable since CuPy doesn’t use the nvcc command.
Installation for Developers¶
If you are hacking CuPy source code, we recommend you to use pip
with -e
option for editable mode:
$ cd /path/to/cupy/source
$ pip install -e .
Please note that even with -e
, you will have to rerun pip install -e .
to regenerate C++ sources using Cython if you modified Cython source files (e.g., *.pyx
files).
CuPy always raises cupy.cuda.compiler.CompileException
¶
If CuPy does not work at all with CompileException
, it is possible that CuPy cannot detect CUDA installed on your system correctly.
The followings are error messages commonly observed in such cases.
nvrtc: error: failed to load builtins
catastrophic error: cannot open source file "cuda_fp16.h"
error: cannot overload functions distinguished by return type alone
error: identifier "__half_raw" is undefined
Please try setting LD_LIBRARY_PATH
and CUDA_PATH
environment variable.
For example, if you have CUDA installed at /usr/local/cuda-9.0
:
export CUDA_PATH=/usr/local/cuda-9.0
export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH
Also see Working with Custom CUDA Installation.
If you are installing CuPy on Anaconda environment, also make sure that the following packages are not installed.
Use conda uninstall cudatoolkit cudnn nccl
to remove these package.
Upgrade Guide¶
This is a list of changes introduced in each release that users should be aware of when migrating from older versions. Most changes are carefully designed not to break existing code; however changes that may possibly break them are highlighted with a box.
CuPy v4¶
Note
The version number has been bumped from v2 to v4 to align with the versioning of Chainer. Therefore, CuPy v3 does not exist.
Default Memory Pool¶
Prior to CuPy v4, memory pool was only enabled by default when CuPy is used with Chainer. In CuPy v4, memory pool is now enabled by default, even when you use CuPy without Chainer. The memory pool significantly improves the performance by mitigating the overhead of memory allocation and CPU/GPU synchronization.
Attention
When you monitor GPU memory usage (e.g., using nvidia-smi
), you may notice that GPU memory not being freed even after the array instance become out of scope.
This is expected behavior, as the default memory pool “caches” the allocated memory blocks.
To access the default memory pool instance, use get_default_memory_pool()
and get_default_pinned_memory_pool()
.
You can access the statistics and free all unused memory blocks “cached” in the memory pool.
import cupy
a = cupy.ndarray(100, dtype=cupy.float32)
mempool = cupy.get_default_memory_pool()
# For performance, the size of actual allocation may become larger than the requested array size.
print(mempool.used_bytes()) # 512
print(mempool.total_bytes()) # 512
# Even if the array goes out of scope, its memory block is kept in the pool.
a = None
print(mempool.used_bytes()) # 0
print(mempool.total_bytes()) # 512
# You can clear the memory block by calling `free_all_blocks`.
mempool.free_all_blocks()
print(mempool.used_bytes()) # 0
print(mempool.total_bytes()) # 0
You can even disable the default memory pool by the code below. Be sure to do this before any other CuPy operations.
import cupy
cupy.cuda.set_allocator(None)
cupy.cuda.set_pinned_memory_allocator(None)
Compute Capability¶
CuPy v4 now requires NVIDIA GPU with Compute Capability 3.0 or larger. See the List of CUDA GPUs to check if your GPU supports Compute Capability 3.0.
CUDA Stream¶
As CUDA Stream is fully supported in CuPy v4, cupy.cuda.RandomState.set_stream
, the function to change the stream used by the random number generator, has been removed.
Please use cupy.cuda.Stream.use()
instead.
See the discussion in #306 for more details.
Update of Docker Images¶
CuPy official Docker images (see Installation Guide for details) are now updated to use CUDA 8.0 and cuDNN 6.0. This change was introduced because CUDA 7.5 does not support NVIDIA Pascal GPUs.
To use these images, you may need to upgrade the NVIDIA driver on your host. See Requirements of nvidia-docker for details.
CuPy v2¶
Changed Behavior of count_nonzero Function¶
For performance reasons, cupy.count_nonzero()
has been changed to return zero-dimensional ndarray
instead of int when axis=None.
See the discussion in #154 for more details.
License¶
Copyright (c) 2015 Preferred Infrastructure, Inc.
Copyright (c) 2015 Preferred Networks, Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
NumPy¶
The CuPy is designed based on NumPy’s API. CuPy’s source code and documents contain the original NumPy ones.
Copyright (c) 2005-2016, NumPy Developers.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the NumPy Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SciPy¶
The CuPy is designed based on SciPy’s API. CuPy’s source code and documents contain the original SciPy ones.
Copyright (c) 2001, 2002 Enthought, Inc.
All rights reserved.
Copyright (c) 2003-2016 SciPy Developers.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of Enthought nor the names of the SciPy Developers may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.