Welcome to Cython’s Documentation¶
Getting Started¶
Cython - an overview¶
[Cython] is a programming language based on Python, with extra syntax allowing for optional static type declarations. It aims to become a superset of the [Python] language which gives it high-level, object-oriented, functional, and dynamic programming. The source code gets translated into optimized C/C++ code and compiled as Python extension modules. This allows for both very fast program execution and tight integration with external C libraries, while keeping up the high programmer productivity for which the Python language is well known.
The primary Python execution environment is commonly referred to as CPython, as it is written in C. Other major implementations use Java (Jython [Jython]), C# (IronPython [IronPython]) and Python itself (PyPy [PyPy]). Written in C, CPython has been conducive to wrapping many external libraries that interface through the C language. It has, however, remained non trivial to write the necessary glue code in C, especially for programmers who are more fluent in a high-level language like Python than in a close-to-the-metal language like C.
Originally based on the well-known Pyrex [Pyrex], the Cython project has approached this problem by means of a source code compiler that translates Python code to equivalent C code. This code is executed within the CPython runtime environment, but at the speed of compiled C and with the ability to call directly into C libraries. At the same time, it keeps the original interface of the Python source code, which makes it directly usable from Python code. These two-fold characteristics enable Cython’s two major use cases: extending the CPython interpreter with fast binary modules, and interfacing Python code with external C libraries.
While Cython can compile (most) regular Python code, the generated C code usually gains major (and sometime impressive) speed improvements from optional static type declarations for both Python and C types. These allow Cython to assign C semantics to parts of the code, and to translate them into very efficient C code. Type declarations can therefore be used for two purposes: for moving code sections from dynamic Python semantics into static-and-fast C semantics, but also for directly manipulating types defined in external libraries. Cython thus merges the two worlds into a very broadly applicable programming language.
[Cython] | G. Ewing, R. W. Bradshaw, S. Behnel, D. S. Seljebotn et al., The Cython compiler, http://cython.org. |
[IronPython] | Jim Hugunin et al., http://www.codeplex.com/IronPython. |
[Jython] | J. Huginin, B. Warsaw, F. Bock, et al., Jython: Python for the Java platform, http://www.jython.org/ |
[PyPy] | The PyPy Group, PyPy: a Python implementation written in Python, http://codespeak.net/pypy. |
[Pyrex] | G. Ewing, Pyrex: C-Extensions for Python, http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/ |
[Python] | G. van Rossum et al., The Python programming language, http://python.org. |
Installing Cython¶
Many scientific Python distributions, such as the Enthought Python Distribution [EPD], Python(x,y) [Pythonxy], and Sage [Sage], bundle Cython and no setup is needed. Note however that if your distribution ships a version of Cython which is too old you can still use the instructions below to update Cython. Everything in this tutorial should work with Cython 0.11.2 and newer, unless a footnote says otherwise.
Unlike most Python software, Cython requires a C compiler to be present on the system. The details of getting a C compiler varies according to the system used:
- Linux The GNU C Compiler (gcc) is usually present, or easily available through the package system. On Ubuntu or Debian, for instance, the command
sudo apt-get install build-essential
will fetch everything you need.- Mac OS X To retrieve gcc, one option is to install Apple’s XCode, which can be retrieved from the Mac OS X’s install DVDs or from http://developer.apple.com.
- Windows A popular option is to use the open source MinGW (a Windows distribution of gcc). See the appendix for instructions for setting up MinGW manually. EPD and Python(x,y) bundle MinGW, but some of the configuration steps in the appendix might still be necessary. Another option is to use Microsoft’s Visual C. One must then use the same version which the installed Python was compiled with.
The newest Cython release can always be downloaded from http://cython.org. Unpack the tarball or zip file, enter the directory, and then run:
python setup.py install
If you have Python setuptools set up on your system, you should be able to fetch Cython from PyPI and install it using:
easy_install cython
For Windows there is also an executable installer available for download.
[EPD] | http://www.enthought.com/products/epd.php |
[Pythonxy] | http://www.pythonxy.com/ |
[Sage] |
|
Building Cython code¶
Cython code must, unlike Python, be compiled. This happens in two stages:
- A
.pyx
file is compiled by Cython to a.c
file, containing the code of a Python extension module- The
.c
file is compiled by a C compiler to a.so
file (or.pyd
on Windows) which can beimport
-ed directly into a Python session.
There are several ways to build Cython code:
- Write a distutils
setup.py
.- Use
pyximport
, importing Cython.pyx
files as if they were.py
files (using distutils to compile and build the background).- Run the
cython
command-line utility manually to produce the.c
file from the.pyx
file, then manually compiling the.c
file into a shared object library or.dll
suitable for import from Python. (This is mostly for debugging and experimentation.)- Use the [Sage] notebook which allows Cython code inline.
Currently, distutils is the most common way Cython files are built and distributed. The other methods are described in more detail in the Source Files and Compilation section of the reference manual.
Building a Cython module using distutils¶
Imagine a simple “hello world” script in a file hello.pyx
:
def say_hello_to(name):
print("Hello %s!" % name)
The following could be a corresponding setup.py
script:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
ext_modules = [Extension("hello", ["hello.pyx"])]
setup(
name = 'Hello world app',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules
)
To build, run python setup.py build_ext --inplace
. Then simply
start a Python session and do from hello import say_hello_to
and
use the imported function as you see fit.

The Sage notebook allows transparently editing and compiling Cython
code simply by typing %cython
at the top of a cell and evaluate
it. Variables and functions defined in a Cython cell imported into
the running session.
[Sage] |
|
Faster code via static typing¶
Cython is a Python compiler. This means that it can compile normal Python code without changes (with a few obvious exceptions of some as-yet unsupported language features). However, for performance critical code, it is often helpful to add static type declarations, as they will allow Cython to step out of the dynamic nature of the Python code and generate simpler and faster C code - sometimes faster by orders of magnitude.
It must be noted, however, that type declarations can make the source code more verbose and thus less readable. It is therefore discouraged to use them without good reason, such as where benchmarks prove that they really make the code substantially faster in a performance critical section. Typically a few types in the right spots go a long way.
All C types are available for type declarations: integer and floating
point types, complex numbers, structs, unions and pointer types.
Cython can automatically and correctly convert between the types on
assignment. This also includes Python’s arbitrary size integer types,
where value overflows on conversion to a C type will raise a Python
OverflowError
at runtime. (It does not, however, check for overflow
when doing arithmetic.) The generated C code will handle the
platform dependent sizes of C types correctly and safely in this case.
Types are declared via the cdef keyword.
Typing Variables¶
Consider the following pure Python code:
def f(x):
return x**2-x
def integrate_f(a, b, N):
s = 0
dx = (b-a)/N
for i in range(N):
s += f(a+i*dx)
return s * dx
Simply compiling this in Cython merely gives a 35% speedup. This is better than nothing, but adding some static types can make a much larger difference.
With additional type declarations, this might look like:
def f(double x):
return x**2-x
def integrate_f(double a, double b, int N):
cdef int i
cdef double s, dx
s = 0
dx = (b-a)/N
for i in range(N):
s += f(a+i*dx)
return s * dx
Since the iterator variable i
is typed with C semantics, the for-loop will be compiled
to pure C code. Typing a
, s
and dx
is important as they are involved
in arithmetic withing the for-loop; typing b
and N
makes less of a
difference, but in this case it is not much extra work to be
consistent and type the entire function.
This results in a 4 times speedup over the pure Python version.
Typing Functions¶
Python function calls can be expensive – in Cython doubly so because
one might need to convert to and from Python objects to do the call.
In our example above, the argument is assumed to be a C double both inside f()
and in the call to it, yet a Python float
object must be constructed around the
argument in order to pass it.
Therefore Cython provides a syntax for declaring a C-style function, the cdef keyword:
cdef double f(double) except? -2:
return x**2-x
Some form of except-modifier should usually be added, otherwise Cython
will not be able to propagate exceptions raised in the function (or a
function it calls). The except? -2
means that an error will be checked
for if -2
is returned (though the ?
indicates that -2
may also
be used as a valid return value).
Alternatively, the slower except *
is always
safe. An except clause can be left out if the function returns a Python
object or if it is guaranteed that an exception will not be raised
within the function call.
A side-effect of cdef is that the function is no longer available from
Python-space, as Python wouldn’t know how to call it. Using the
cpdef
keyword instead of cdef, a Python wrapper is also created,
so that the function is available both from Cython (fast, passing
typed values directly) and from Python (wrapping values in Python
objects).
Note also that it is no longer possible to change f
at runtime.
Speedup: 150 times over pure Python.
Determining where to add types¶
Because static typing is often the key to large speed gains, beginners often have a tendency to type everything in sight. This cuts down on both readability and flexibility. On the other hand, it is easy to kill performance by forgetting to type a critical loop variable. Two essential tools to help with this task are profiling and annotation. Profiling should be the first step of any optimization effort, and can tell you where you are spending your time. Cython’s annotation can then tell you why your code is taking time.
Using the -a
switch to the cython
command line program (or
following a link from the Sage notebook) results in an HTML report
of Cython code interleaved with the generated C code. Lines are
colored according to the level of “typedness” – white lines
translates to pure C without any Python API calls. This report
is invaluable when optimizing a function for speed.

Tutorials¶
Calling C functions¶
This tutorial describes shortly what you need to know in order to call C library functions from Cython code. For a longer and more comprehensive tutorial about using external C libraries, wrapping them and handling errors, see Using C libraries.
For simplicity, let’s start with a function from the standard C library. This does not add any dependencies to your code, and it has the additional advantage that Cython already defines many such functions for you. So you can just cimport and use them.
For example, let’s say you need a low-level way to parse a number from
a char*
value. You could use the atoi()
function, as defined
by the stdlib.h
header file. This can be done as follows:
from libc.stdlib cimport atoi
cdef parse_charptr_to_py_int(char* s):
assert s is not NULL, "byte string value is NULL"
return atoi(s) # note: atoi() has no error detection!
You can find a complete list of these standard cimport files in
Cython’s source package Cython/Includes/
. It also has a complete
set of declarations for CPython’s C-API. For example, to test at C
compilation time which CPython version your code is being compiled
with, you can do this:
from cpython.version cimport PY_VERSION_HEX
print PY_VERSION_HEX >= 0x030200F0 # Python version >= 3.2 final
Cython also provides declarations for the C math library:
from libc.math cimport sin
cdef double f(double x):
return sin(x*x)
However, this is a library that is not linked by default on some Unix-like
systems, such as Linux. In addition to cimporting the
declarations, you must configure your build system to link against the
shared library m
. For distutils, it is enough to add it to the
libraries
parameter of the Extension()
setup:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
ext_modules=[
Extension("demo",
["demo.pyx"],
libraries=["m"]) # Unix-like specific
]
setup(
name = "Demos",
cmdclass = {"build_ext": build_ext},
ext_modules = ext_modules
)
If you want to access C code for which Cython does not provide a ready
to use declaration, you must declare them yourself. For example, the
above sin()
function is defined as follows:
cdef extern from "math.h":
double sin(double)
This declares the sin()
function in a way that makes it available
to Cython code and instructs Cython to generate C code that includes
the math.h
header file. The C compiler will see the original
declaration in math.h
at compile time, but Cython does not parse
“math.h” and requires a separate definition.
Just like the sin()
function from the math library, it is possible
to declare and call into any C library as long as the module that
Cython generates is properly linked against the shared or static
library.
Using C libraries¶
Apart from writing fast code, one of the main use cases of Cython is to call external C libraries from Python code. As Cython code compiles down to C code itself, it is actually trivial to call C functions directly in the code. The following gives a complete example for using (and wrapping) an external C library in Cython code, including appropriate error handling and considerations about designing a suitable API for Python and Cython code.
Imagine you need an efficient way to store integer values in a FIFO
queue. Since memory really matters, and the values are actually
coming from C code, you cannot afford to create and store Python
int
objects in a list or deque. So you look out for a queue
implementation in C.
After some web search, you find the C-algorithms library [CAlg] and decide to use its double ended queue implementation. To make the handling easier, however, you decide to wrap it in a Python extension type that can encapsulate all memory management.
The C API of the queue implementation, which is defined in the header
file libcalg/queue.h
, essentially looks like this:
/* file: queue.h */
typedef struct _Queue Queue;
typedef void *QueueValue;
Queue *queue_new(void);
void queue_free(Queue *queue);
int queue_push_head(Queue *queue, QueueValue data);
QueueValue queue_pop_head(Queue *queue);
QueueValue queue_peek_head(Queue *queue);
int queue_push_tail(Queue *queue, QueueValue data);
QueueValue queue_pop_tail(Queue *queue);
QueueValue queue_peek_tail(Queue *queue);
int queue_is_empty(Queue *queue);
To get started, the first step is to redefine the C API in a .pxd
file, say, cqueue.pxd
:
# file: cqueue.pxd
cdef extern from "libcalg/queue.h":
ctypedef struct Queue:
pass
ctypedef void* QueueValue
Queue* queue_new()
void queue_free(Queue* queue)
int queue_push_head(Queue* queue, QueueValue data)
QueueValue queue_pop_head(Queue* queue)
QueueValue queue_peek_head(Queue* queue)
int queue_push_tail(Queue* queue, QueueValue data)
QueueValue queue_pop_tail(Queue* queue)
QueueValue queue_peek_tail(Queue* queue)
bint queue_is_empty(Queue* queue)
Note how these declarations are almost identical to the header file declarations, so you can often just copy them over. However, you do not need to provide all declarations as above, just those that you use in your code or in other declarations, so that Cython gets to see a sufficient and consistent subset of them. Then, consider adapting them somewhat to make them more comfortable to work with in Cython.
One noteworthy difference to the header file that we use above is the
declaration of the Queue
struct in the first line. Queue
is
in this case used as an opaque handle; only the library that is
called knows what is really inside. Since no Cython code needs to
know the contents of the struct, we do not need to declare its
contents, so we simply provide an empty definition (as we do not want
to declare the _Queue
type which is referenced in the C header)
[1].
[1] | There’s a subtle difference between cdef struct Queue: pass
and ctypedef struct Queue: pass . The former declares a
type which is referenced in C code as struct Queue , while
the latter is referenced in C as Queue . This is a C
language quirk that Cython is not able to hide. Most modern C
libraries use the ctypedef kind of struct. |
Another exception is the last line. The integer return value of the
queue_is_empty()
function is actually a C boolean value, i.e. the
only interesting thing about it is whether it is non-zero or zero,
indicating if the queue is empty or not. This is best expressed by
Cython’s bint
type, which is a normal int
type when used in C
but maps to Python’s boolean values True
and False
when
converted to a Python object. This way of tightening declarations in
a .pxd
file can often simplify the code that uses them.
It is good practice to define one .pxd
file for each library that
you use, and sometimes even for each header file (or functional group)
if the API is large. That simplifies their reuse in other projects.
Sometimes, you may need to use C functions from the standard C
library, or want to call C-API functions from CPython directly. For
common needs like this, Cython ships with a set of standard .pxd
files that provide these declarations in a readily usable way that is
adapted to their use in Cython. The main packages are cpython
,
libc
and libcpp
. The NumPy library also has a standard
.pxd
file numpy
, as it is often used in Cython code. See
Cython’s Cython/Includes/
source package for a complete list of
provided .pxd
files.
After declaring our C library’s API, we can start to design the Queue
class that should wrap the C queue. It will live in a file called
queue.pyx
. [2]
[2] | Note that the name of the .pyx file must be different from
the cqueue.pxd file with declarations from the C library,
as both do not describe the same code. A .pxd file next to
a .pyx file with the same name defines exported
declarations for code in the .pyx file. As the
cqueue.pxd file contains declarations of a regular C
library, there must not be a .pyx file with the same name
that Cython associates with it. |
Here is a first start for the Queue class:
# file: queue.pyx
cimport cqueue
cdef class Queue:
cdef cqueue.Queue _c_queue
def __cinit__(self):
self._c_queue = cqueue.queue_new()
Note that it says __cinit__
rather than __init__
. While
__init__
is available as well, it is not guaranteed to be run (for
instance, one could create a subclass and forget to call the
ancestor’s constructor). Because not initializing C pointers often
leads to hard crashes of the Python interpreter, Cython provides
__cinit__
which is always called immediately on construction,
before CPython even considers calling __init__
, and which
therefore is the right place to initialise cdef
fields of the new
instance. However, as __cinit__
is called during object
construction, self
is not fully constructed yet, and one must
avoid doing anything with self
but assigning to cdef
fields.
Note also that the above method takes no parameters, although subtypes
may want to accept some. A no-arguments __cinit__()
method is a
special case here that simply does not receive any parameters that
were passed to a constructor, so it does not prevent subclasses from
adding parameters. If parameters are used in the signature of
__cinit__()
, they must match those of any declared __init__
method of classes in the class hierarchy that are used to instantiate
the type.
Before we continue implementing the other methods, it is important to
understand that the above implementation is not safe. In case
anything goes wrong in the call to queue_new()
, this code will
simply swallow the error, so we will likely run into a crash later on.
According to the documentation of the queue_new()
function, the
only reason why the above can fail is due to insufficient memory. In
that case, it will return NULL
, whereas it would normally return a
pointer to the new queue.
The Python way to get out of this is to raise a MemoryError
[3].
We can thus change the init function as follows:
cimport cqueue
cdef class Queue:
cdef cqueue.Queue _c_queue
def __cinit__(self):
self._c_queue = cqueue.queue_new()
if self._c_queue is NULL:
raise MemoryError()
[3] | In the specific case of a MemoryError , creating a new
exception instance in order to raise it may actually fail because
we are running out of memory. Luckily, CPython provides a C-API
function PyErr_NoMemory() that safely raises the right
exception for us. Since version 0.14.1, Cython automatically
substitutes this C-API call whenever you write raise
MemoryError or raise MemoryError() . If you use an older
version, you have to cimport the C-API function from the standard
package cpython.exc and call it directly. |
The next thing to do is to clean up when the Queue instance is no
longer used (i.e. all references to it have been deleted). To this
end, CPython provides a callback that Cython makes available as a
special method __dealloc__()
. In our case, all we have to do is
to free the C Queue, but only if we succeeded in initialising it in
the init method:
def __dealloc__(self):
if self._c_queue is not NULL:
cqueue.queue_free(self._c_queue)
At this point, we have a working Cython module that we can test. To
compile it, we need to configure a setup.py
script for distutils.
Here is the most basic script for compiling a Cython module:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("queue", ["queue.pyx"])]
)
To build against the external C library, we must extend this script to
include the necessary setup. Assuming the library is installed in the
usual places (e.g. under /usr/lib
and /usr/include
on a
Unix-like system), we could simply change the extension setup from
ext_modules = [Extension("queue", ["queue.pyx"])]
to
ext_modules = [
Extension("queue", ["queue.pyx"],
libraries=["calg"])
]
If it is not installed in a ‘normal’ location, users can provide the required parameters externally by passing appropriate C compiler flags, such as:
CFLAGS="-I/usr/local/otherdir/calg/include" \
LDFLAGS="-L/usr/local/otherdir/calg/lib" \
python setup.py build_ext -i
Once we have compiled the module for the first time, we can now import it and instantiate a new Queue:
$ export PYTHONPATH=.
$ python -c 'import queue.Queue as Q ; Q()'
However, this is all our Queue class can do so far, so let’s make it more usable.
Before implementing the public interface of this class, it is good
practice to look at what interfaces Python offers, e.g. in its
list
or collections.deque
classes. Since we only need a FIFO
queue, it’s enough to provide the methods append()
, peek()
and
pop()
, and additionally an extend()
method to add multiple
values at once. Also, since we already know that all values will be
coming from C, it’s best to provide only cdef
methods for now, and
to give them a straight C interface.
In C, it is common for data structures to store data as a void*
to
whatever data item type. Since we only want to store int
values,
which usually fit into the size of a pointer type, we can avoid
additional memory allocations through a trick: we cast our int
values
to void*
and vice versa, and store the value directly as the
pointer value.
Here is a simple implementation for the append()
method:
cdef append(self, int value):
cqueue.queue_push_tail(self._c_queue, <void*>value)
Again, the same error handling considerations as for the
__cinit__()
method apply, so that we end up with this
implementation instead:
cdef append(self, int value):
if not cqueue.queue_push_tail(self._c_queue,
<void*>value):
raise MemoryError()
Adding an extend()
method should now be straight forward:
cdef extend(self, int* values, size_t count):
"""Append all ints to the queue.
"""
cdef size_t i
for i in range(count):
if not cqueue.queue_push_tail(
self._c_queue, <void*>values[i]):
raise MemoryError()
This becomes handy when reading values from a NumPy array, for example.
So far, we can only add data to the queue. The next step is to write
the two methods to get the first element: peek()
and pop()
,
which provide read-only and destructive read access respectively:
cdef int peek(self):
return <int>cqueue.queue_peek_head(self._c_queue)
cdef int pop(self):
return <int>cqueue.queue_pop_head(self._c_queue)
Simple enough. Now, what happens when the queue is empty? According
to the documentation, the functions return a NULL
pointer, which
is typically not a valid value. Since we are simply casting to and
from ints, we cannot distinguish anymore if the return value was
NULL
because the queue was empty or because the value stored in
the queue was 0
. However, in Cython code, we would expect the
first case to raise an exception, whereas the second case should
simply return 0
. To deal with this, we need to special case this
value, and check if the queue really is empty or not:
cdef int peek(self) except? -1:
cdef int value = \
<int>cqueue.queue_peek_head(self._c_queue)
if value == 0:
# this may mean that the queue is empty, or
# that it happens to contain a 0 value
if cqueue.queue_is_empty(self._c_queue):
raise IndexError("Queue is empty")
return value
Note how we have effectively created a fast path through the method in
the hopefully common cases that the return value is not 0
. Only
that specific case needs an additional check if the queue is empty.
The except? -1
declaration in the method signature falls into the
same category. If the function was a Python function returning a
Python object value, CPython would simply return NULL
internally
instead of a Python object to indicate an exception, which would
immediately be propagated by the surrounding code. The problem is
that the return type is int
and any int
value is a valid queue
item value, so there is no way to explicitly signal an error to the
calling code. In fact, without such a declaration, there is no
obvious way for Cython to know what to return on exceptions and for
calling code to even know that this method may exit with an
exception.
The only way calling code can deal with this situation is to call
PyErr_Occurred()
when returning from a function to check if an
exception was raised, and if so, propagate the exception. This
obviously has a performance penalty. Cython therefore allows you to
declare which value it should implicitly return in the case of an
exception, so that the surrounding code only needs to check for an
exception when receiving this exact value.
We chose to use -1
as the exception return value as we expect it
to be an unlikely value to be put into the queue. The question mark
in the except? -1
declaration indicates that the return value is
ambiguous (there may be a -1
value in the queue, after all) and
that an additional exception check using PyErr_Occurred()
is
needed in calling code. Without it, Cython code that calls this
method and receives the exception return value would silently (and
sometimes incorrectly) assume that an exception has been raised. In
any case, all other return values will be passed through almost
without a penalty, thus again creating a fast path for ‘normal’
values.
Now that the peek()
method is implemented, the pop()
method
also needs adaptation. Since it removes a value from the queue,
however, it is not enough to test if the queue is empty after the
removal. Instead, we must test it on entry:
cdef int pop(self) except? -1:
if cqueue.queue_is_empty(self._c_queue):
raise IndexError("Queue is empty")
return <int>cqueue.queue_pop_head(self._c_queue)
The return value for exception propagation is declared exactly as for
peek()
.
Lastly, we can provide the Queue with an emptiness indicator in the
normal Python way by implementing the __bool__()
special method
(note that Python 2 calls this method __nonzero__
, whereas Cython
code can use either name):
def __bool__(self):
return not cqueue.queue_is_empty(self._c_queue)
Note that this method returns either True
or False
as we
declared the return type of the queue_is_empty
function as
bint
in cqueue.pxd
.
Now that the implementation is complete, you may want to write some tests for it to make sure it works correctly. Especially doctests are very nice for this purpose, as they provide some documentation at the same time. To enable doctests, however, you need a Python API that you can call. C methods are not visible from Python code, and thus not callable from doctests.
A quick way to provide a Python API for the class is to change the
methods from cdef
to cpdef
. This will let Cython generate two
entry points, one that is callable from normal Python code using the
Python call semantics and Python objects as arguments, and one that is
callable from C code with fast C semantics and without requiring
intermediate argument conversion from or to Python types.
The following listing shows the complete implementation that uses
cpdef
methods where possible:
cimport cqueue
cdef class Queue:
"""A queue class for C integer values.
>>> q = Queue()
>>> q.append(5)
>>> q.peek()
5
>>> q.pop()
5
"""
cdef cqueue.Queue* _c_queue
def __cinit__(self):
self._c_queue = cqueue.queue_new()
if self._c_queue is NULL:
raise MemoryError()
def __dealloc__(self):
if self._c_queue is not NULL:
cqueue.queue_free(self._c_queue)
cpdef append(self, int value):
if not cqueue.queue_push_tail(self._c_queue,
<void*>value):
raise MemoryError()
cdef extend(self, int* values, size_t count):
cdef size_t i
for i in xrange(count):
if not cqueue.queue_push_tail(
self._c_queue, <void*>values[i]):
raise MemoryError()
cpdef int peek(self) except? -1:
cdef int value = \
<int>cqueue.queue_peek_head(self._c_queue)
if value == 0:
# this may mean that the queue is empty,
# or that it happens to contain a 0 value
if cqueue.queue_is_empty(self._c_queue):
raise IndexError("Queue is empty")
return value
cdef int pop(self) except? -1:
if cqueue.queue_is_empty(self._c_queue):
raise IndexError("Queue is empty")
return <int>cqueue.queue_pop_head(self._c_queue)
def __bool__(self):
return not cqueue.queue_is_empty(self._c_queue)
The cpdef
feature is obviously not available for the extend()
method, as the method signature is incompatible with Python argument
types. However, if wanted, we can rename the C-ish extend()
method to e.g. c_extend()
, and write a new extend()
method
instead that accepts an arbitrary Python iterable:
cdef c_extend(self, int* values, size_t count):
cdef size_t i
for i in range(count):
if not cqueue.queue_push_tail(
self._c_queue, <void*>values[i]):
raise MemoryError()
cpdef extend(self, values):
for value in values:
self.append(value)
As a quick test with 10000 numbers on the author’s machine indicates,
using this Queue from Cython code with C int
values is about five
times as fast as using it from Cython code with Python object values,
almost eight times faster than using it from Python code in a Python
loop, and still more than twice as fast as using Python’s highly
optimised collections.deque
type from Cython code with Python
integers.
[CAlg] | Simon Howard, C Algorithms library, http://c-algorithms.sourceforge.net/ |
Extension types (aka. cdef classes)¶
To support object-oriented programming, Cython supports writing normal Python classes exactly as in Python:
class MathFunction(object):
def __init__(self, name, operator):
self.name = name
self.operator = operator
def __call__(self, *operands):
return self.operator(*operands)
Based on what Python calls a “built-in type”, however, Cython supports a second kind of class: extension types, sometimes referred to as “cdef classes” due to the keywords used for their declaration. They are somewhat restricted compared to Python classes, but are generally more memory efficient and faster than generic Python classes. The main difference is that they use a C struct to store their fields and methods instead of a Python dict. This allows them to store arbitrary C types in their fields without requiring a Python wrapper for them, and to access fields and methods directly at the C level without passing through a Python dictionary lookup.
Normal Python classes can inherit from cdef classes, but not the other way around. Cython requires to know the complete inheritance hierarchy in order to lay out their C structs, and restricts it to single inheritance. Normal Python classes, on the other hand, can inherit from any number of Python classes and extension types, both in Cython code and pure Python code.
So far our integration example has not been very useful as it only integrates a single hard-coded function. In order to remedy this, without sacrificing speed, we will use a cdef class to represent a function on floating point numbers:
cdef class Function:
cpdef double evaluate(self, double x) except *:
return 0
Like before, cpdef makes two versions of the method available; one fast for use from Cython and one slower for use from Python. Then:
cdef class SinOfSquareFunction(Function):
cpdef double evaluate(self, double x) except *:
return sin(x**2)
Using this, we can now change our integration example:
def integrate(Function f, double a, double b, int N):
cdef int i
cdef double s, dx
if f is None:
raise ValueError("f cannot be None")
s = 0
dx = (b-a)/N
for i in range(N):
s += f.evaluate(a+i*dx)
return s * dx
print(integrate(SinOfSquareFunction(), 0, 1, 10000))
This is almost as fast as the previous code, however it is much more flexible as the function to integrate can be changed. It is even possible to pass in a new function defined in Python-space:
>>> import integrate
>>> class MyPolynomial(integrate.Function):
... def evaluate(self, x):
... return 2*x*x + 3*x - 10
...
>>> integrate(MyPolynomial(), 0, 1, 10000)
-7.8335833300000077
This is about 20 times slower, but still about 10 times faster than the original Python-only integration code. This shows how large the speed-ups can easily be when whole loops are moved from Python code into a Cython module.
Some notes on our new implementation of evaluate
:
- The fast method dispatch here only works because
evaluate
was declared inFunction
. Hadevaluate
been introduced inSinOfSquareFunction
, the code would still work, but Cython would have used the slower Python method dispatch mechanism instead.- In the same way, had the argument
f
not been typed, but only been passed as a Python object, the slower Python dispatch would be used.- Since the argument is typed, we need to check whether it is
None
. In Python, this would have resulted in anAttributeError
when theevaluate
method was looked up, but Cython would instead try to access the (incompatible) internal structure ofNone
as if it were aFunction
, leading to a crash or data corruption.
There is a compiler directive nonecheck
which turns on checks
for this, at the cost of decreased speed. Here’s how compiler directives
are used to dynamically switch on or off nonecheck
:
#cython: nonecheck=True
# ^^^ Turns on nonecheck globally
import cython
# Turn off nonecheck locally for the function
@cython.nonecheck(False)
def func():
cdef MyClass obj = None
try:
# Turn nonecheck on again for a block
with cython.nonecheck(True):
print obj.myfunc() # Raises exception
except AttributeError:
pass
print obj.myfunc() # Hope for a crash!
Attributes in cdef classes behave differently from attributes in regular classes:
- All attributes must be pre-declared at compile-time
- Attributes are by default only accessible from Cython (typed access)
- Properties can be declared to expose dynamic attributes to Python-space
cdef class WaveFunction(Function):
# Not available in Python-space:
cdef double offset
# Available in Python-space:
cdef public double freq
# Available in Python-space:
property period:
def __get__(self):
return 1.0 / self. freq
def __set__(self, value):
self. freq = 1.0 / value
<...>
pxd files¶
In addition to the .pyx
source files, Cython uses .pxd
files
which work like C header files – they contain Cython declarations
(and sometimes code sections) which are only meant for inclusion by
Cython modules. A pxd
file is imported into a pyx
module by
using the cimport
keyword.
pxd
files have many use-cases:
They can be used for sharing external C declarations.
They can contain functions which are well suited for inlining by the C compiler. Such functions should be marked
inline
, example:cdef inline int int_min(int a, int b): return b if b < a else aWhen accompanying an equally named
pyx
file, they provide a Cython interface to the Cython module so that other Cython modules can communicate with it using a more efficient protocol than the Python one.
In our integration example, we might break it up into pxd
files like this:
Add a
cmath.pxd
function which defines the C functions available from the Cmath.h
header file, likesin
. Then one would simply dofrom cmath cimport sin
inintegrate.pyx
.Add a
integrate.pxd
so that other modules written in Cython can define fast custom functions to integrate.cdef class Function: cpdef evaluate(self, double x) cpdef integrate(Function f, double a, double b, int N)Note that if you have a cdef class with attributes, the attributes must be declared in the class declaration
pxd
file (if you use one), not thepyx
file. The compiler will tell you about this.
Caveats¶
Since Cython mixes C and Python semantics, some things may be a bit surprising or unintuitive. Work always goes on to make Cython more natural for Python users, so this list may change in the future.
10**-2 == 0
, instead of0.01
like in Python.- Given two typed
int
variablesa
andb
,a % b
has the same sign as the second argument (following Python semantics) rather then having the same sign as the first (as in C). The C behavior can be obtained, at some speed gain, by enabling the division directive. (Versions prior to Cython 0.12. always followed C semantics.)- Care is needed with unsigned types.
cdef unsigned n = 10; print(range(-n, n))
will print an empty list, since-n
wraps around to a large positive integer prior to being passed to therange
function.- Python’s
float
type actually wraps Cdouble
values, and Python’sint
type wraps Clong
values.
Profiling¶
This part describes the profiling abilities of Cython. If you are familiar with profiling pure Python code, you can only read the first section (Cython Profiling Basics). If you are not familiar with python profiling you should also read the tutorial (Profiling Tutorial) which takes you through a complete example step by step.
Cython Profiling Basics¶
Profiling in Cython is controlled by a compiler directive. It can either be set either for an entire file or on a per function via a Cython decorator.
Enable profiling for a complete source file¶
Profiling is enable for a complete source file via a global directive to the Cython compiler at the top of a file:
# cython: profile=True
Note that profiling gives a slight overhead to each function call therefore making your program a little slower (or a lot, if you call some small functions very often).
Once enabled, your Cython code will behave just like Python code when called from the cProfile module. This means you can just profile your Cython code together with your Python code using the same tools as for Python code alone.
Disabling profiling function wise¶
If your profiling is messed up because of the call overhead to some small functions that you rather do not want to see in your profile - either because you plan to inline them anyway or because you are sure that you can’t make them any faster - you can use a special decorator to disable profiling for one function only:
cimport cython
@cython.profile(False)
def my_often_called_function():
pass
Profiling Tutorial¶
This will be a complete tutorial, start to finish, of profiling python code, turning it into Cython code and keep profiling until it is fast enough.
As a toy example, we would like to evaluate the summation of the reciprocals of
squares up to a certain integer for evaluating
. The
relation we want to use has been proven by Euler in 1735 and is known as the
Basel problem.
A simple python code for evaluating the truncated sum looks like this:
#!/usr/bin/env python
# encoding: utf-8
# filename: calc_pi.py
def recip_square(i):
return 1./i**2
def approx_pi(n=10000000):
val = 0.
for k in range(1,n+1):
val += recip_square(k)
return (6 * val)**.5
On my box, this needs approximately 4 seconds to run the function with the
default n. The higher we choose n, the better will be the approximation for
. An experienced python programmer will already see plenty of
places to optimize this code. But remember the golden rule of optimization:
Never optimize without having profiled. Let me repeat this: Never optimize
without having profiled your code. Your thoughts about which part of your
code takes too much time are wrong. At least, mine are always wrong. So let’s
write a short script to profile our code:
#!/usr/bin/env python
# encoding: utf-8
# filename: profile.py
import pstats, cProfile
import calc_pi
cProfile.runctx("calc_pi.approx_pi()", globals(), locals(), "Profile.prof")
s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()
Running this on my box gives the following output:
TODO: how to display this not as code but verbatimly?
Sat Nov 7 17:40:54 2009 Profile.prof
10000004 function calls in 6.211 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.243 3.243 6.211 6.211 calc_pi.py:7(approx_pi)
10000000 2.526 0.000 2.526 0.000 calc_pi.py:4(recip_square)
1 0.442 0.442 0.442 0.442 {range}
1 0.000 0.000 6.211 6.211 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
This contains the information that the code runs in 6.2 CPU seconds. Note that the code got slower by 2 seconds because it ran inside the cProfile module. The table contains the real valuable information. You might want to check the python profiling documentation for the nitty gritty details. The most important columns here are totime (total time spent in this function not counting functions that were called by this function) and cumtime (total time spent in this function also counting the functions called by this function). Looking at the tottime column, we see that approximately half the time is spent in approx_pi and the other half is spent in recip_square. Also half a second is spent in range ... of course we should have used xrange for such a big iteration. And in fact, just changing range to xrange makes the code run in 5.8 seconds.
We could optimize a lot in the pure python version, but since we are interested in Cython, let’s move forward and bring this module to Cython. We would do this anyway at some time to get the loop run faster. Here is our first Cython version:
# encoding: utf-8
# cython: profile=True
# filename: calc_pi.pyx
def recip_square(int i):
return 1./i**2
def approx_pi(int n=10000000):
cdef double val = 0.
cdef int k
for k in xrange(1,n+1):
val += recip_square(k)
return (6 * val)**.5
Note the second line: We have to tell Cython that profiling should be enabled. This makes the Cython code slightly slower, but without this we would not get meaningful output from the cProfile module. The rest of the code is mostly unchanged, I only typed some variables which will likely speed things up a bit.
We also need to modify our profiling script to import the Cython module directly. Here is the complete version adding the import of the pyximport module:
#!/usr/bin/env python
# encoding: utf-8
# filename: profile.py
import pstats, cProfile
import pyximport
pyximport.install()
import calc_pi
cProfile.runctx("calc_pi.approx_pi()", globals(), locals(), "Profile.prof")
s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()
We only added two lines, the rest stays completely the same. Alternatively, we could also manually compile our code into an extension; we wouldn’t need to change the profile script then at all. The script now outputs the following:
Sat Nov 7 18:02:33 2009 Profile.prof
10000004 function calls in 4.406 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.305 3.305 4.406 4.406 calc_pi.pyx:7(approx_pi)
10000000 1.101 0.000 1.101 0.000 calc_pi.pyx:4(recip_square)
1 0.000 0.000 4.406 4.406 {calc_pi.approx_pi}
1 0.000 0.000 4.406 4.406 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
We gained 1.8 seconds. Not too shabby. Comparing the output to the previous, we see that recip_square function got faster while the approx_pi function has not changed a lot. Let’s concentrate on the recip_square function a bit more. First note, that this function is not to be called from code outside of our module; so it would be wise to turn it into a cdef to reduce call overhead. We should also get rid of the power operator: it is turned into a pow(i,2) function call by Cython, but we could instead just write i*i which could be faster. The whole function is also a good candidate for inlining. Let’s look at the necessary changes for these ideas:
# encoding: utf-8
# cython: profile=True
# filename: calc_pi.pyx
cdef inline double recip_square(int i):
return 1./(i*i)
def approx_pi(int n=10000000):
cdef double val = 0.
cdef int k
for k in xrange(1,n+1):
val += recip_square(k)
return (6 * val)**.5
Now running the profile script yields:
Sat Nov 7 18:10:11 2009 Profile.prof
10000004 function calls in 2.622 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.782 1.782 2.622 2.622 calc_pi.pyx:7(approx_pi)
10000000 0.840 0.000 0.840 0.000 calc_pi.pyx:4(recip_square)
1 0.000 0.000 2.622 2.622 {calc_pi.approx_pi}
1 0.000 0.000 2.622 2.622 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
That bought us another 1.8 seconds. Not the dramatic change we could have expected. And why is recip_square still in this table; it is supposed to be inlined, isn’t it? The reason for this is that Cython still generates profiling code even if the function call is eliminated. Let’s tell it to not profile recip_square any more; we couldn’t get the function to be much faster anyway:
# encoding: utf-8
# cython: profile=True
# filename: calc_pi.pyx
cimport cython
@cython.profile(False)
cdef inline double recip_square(int i):
return 1./(i*i)
def approx_pi(int n=10000000):
cdef double val = 0.
cdef int k
for k in xrange(1,n+1):
val += recip_square(k)
return (6 * val)**.5
Running this shows an interesting result:
Sat Nov 7 18:15:02 2009 Profile.prof
4 function calls in 0.089 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.089 0.089 0.089 0.089 calc_pi.pyx:10(approx_pi)
1 0.000 0.000 0.089 0.089 {calc_pi.approx_pi}
1 0.000 0.000 0.089 0.089 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
First note the tremendous speed gain: this version only takes 1/50 of the time of our first Cython version. Also note that recip_square has vanished from the table like we wanted. But the most peculiar and import change is that approx_pi also got much faster. This is a problem with all profiling: calling a function in a profile run adds a certain overhead to the function call. This overhead is not added to the time spent in the called function, but to the time spent in the calling function. In this example, approx_pi didn’t need 2.622 seconds in the last run; but it called recip_square 10000000 times, each time taking a little to set up profiling for it. This adds up to the massive time loss of around 2.6 seconds. Having disabled profiling for the often called function now reveals realistic timings for approx_pi; we could continue optimizing it now if needed.
This concludes this profiling tutorial. There is still some room for improvement in this code. We could try to replace the power operator in approx_pi with a call to sqrt from the C stdlib; but this is not necessarily faster than calling pow(x,0.5).
Even so, the result we achieved here is quite satisfactory: we came up with a solution that is much faster then our original python version while retaining functionality and readability.
Using Cython with NumPy¶
Cython has support for fast access to NumPy arrays. To optimize code
using such arrays one must cimport
the NumPy pxd file (which ships
with Cython), and declare any arrays as having the ndarray
type. The data type and number of dimensions should be fixed at
compile-time and passed. For instance:
import numpy as np
cimport numpy as np
def myfunc(np.ndarray[np.float64_t, ndim=2] A):
<...>
myfunc
can now only be passed two-dimensional arrays containing
double precision floats, but array indexing operation is much, much faster,
making it suitable for numerical loops. Expect speed increases well
over 100 times over a pure Python loop; in some cases the speed
increase can be as high as 700 times or more. [Seljebotn09]
contains detailed examples and benchmarks.
Fast array declarations can currently only be used with function
local variables and arguments to def
-style functions (not with
arguments to cpdef
or cdef
, and neither with fields in cdef
classes or as global variables). These limitations are considered
known defects and we hope to remove them eventually. In most
circumstances it is possible to work around these limitations rather
easily and without a significant speed penalty, as all NumPy arrays
can also be passed as untyped objects.
Array indexing is only optimized if exactly as many indices are provided as the number of array dimensions. Furthermore, all indices must have a native integer type. Slices and NumPy “fancy indexing” is not optimized. Examples:
def myfunc(np.ndarray[np.float64_t, ndim=1] A):
cdef Py_ssize_t i, j
for i in range(A.shape[0]):
print A[i, 0] # fast
j = 2*i
print A[i, j] # fast
k = 2*i
print A[i, k] # slow, k is not typed
print A[i][j] # slow
print A[i,:] # slow
Py_ssize_t
is a signed integer type provided by Python which
covers the same range of values as is supported as NumPy array
indices. It is the preferred type to use for loops over arrays.
Any Cython primitive type (float, complex float and integer types) can
be passed as the array data type. For each valid dtype in the numpy
module (such as np.uint8
, np.complex128
) there is a
corresponding Cython compile-time definition in the cimport-ed NumPy
pxd file with a _t
suffix [1]. Cython structs are also allowed
and corresponds to NumPy record arrays. Examples:
cdef packed struct Point:
np.float64_t x, y
def f():
cdef np.ndarray[np.complex128_t, ndim=3] a = \
np.zeros((3,3,3), dtype=np.complex128)
cdef np.ndarray[Point] b = np.zeros(10,
dtype=np.dtype([('x', np.float64),
('y', np.float64)]))
<...>
Note that ndim
defaults to 1. Also note that NumPy record arrays
are by default unaligned, meaning data is packed as tightly as
possible without considering the alignment preferences of the
CPU. Such unaligned record arrays corresponds to a Cython packed
struct. If one uses an aligned dtype, by passing align=True
to the
dtype
constructor, one must drop the packed
keyword on the
struct definition.
Some data types are not yet supported, like boolean arrays and string arrays. Also data types describing data which is not in the native endian will likely never be supported. It is however possible to access such arrays on a lower level by casting the arrays:
cdef np.ndarray[np.uint8, cast=True] boolarr = (x < y)
cdef np.ndarray[np.uint32, cast=True] values = \
np.arange(10, dtype='>i4')
Assuming one is on a little-endian system, the values
array
can still access the raw bit content of the array (which must then
be reinterpreted to yield valid results on a little-endian system).
Finally, note that typed NumPy array variables in some respects behave
a little differently from untyped arrays. arr.shape
is no longer a
tuple. arr.shape[0]
is valid but to e.g. print the shape one must
do print (<object>arr).shape
in order to “untype” the variable
first. The same is true for arr.data
(which in typed mode is a C
data pointer).
There are many more options for optimizations to consider for Cython and NumPy arrays. We again refer the interested reader to [Seljebotn09].
[1] | In Cython 0.11.2, np.complex64_t and np.complex128_t
does not work and one must write complex or
double complex instead. This is fixed in 0.11.3. Cython
0.11.1 and earlier does not support complex numbers. |
[Seljebotn09] | (1, 2) D. S. Seljebotn, Fast numerical computations with Cython, Proceedings of the 8th Python in Science Conference, 2009. |
Unicode and passing strings¶
Similar to the string semantics in Python 3, Cython also strictly separates byte strings and unicode strings. Above all, this means that there is no automatic conversion between byte strings and unicode strings (except for what Python 2 does in string operations). All encoding and decoding must pass through an explicit encoding/decoding step.
It is, however, very easy to pass byte strings between C code and Python. When receiving a byte string from a C library, you can let Cython convert it into a Python byte string by simply assigning it to a Python variable:
cdef char* c_string = c_call_returning_a_c_string()
cdef bytes py_string = c_string
This creates a Python byte string object that holds a copy of the original C string. It can be safely passed around in Python code, and will be garbage collected when the last reference to it goes out of scope. It is important to remember that null bytes in the string act as terminator character, as generally known from C. The above will therefore only work correctly for C strings that do not contain null bytes.
Note that the creation of the Python bytes string can fail with an
exception, e.g. due to insufficient memory. If you need to free()
the string after the conversion, you should wrap the assignment in a
try-finally construct:
cimport stdlib
cdef bytes py_string
cdef char* c_string = c_call_returning_a_c_string()
try:
py_string = c_string
finally:
stdlib.free(c_string)
To convert the byte string back into a C char*
, use the opposite
assignment:
cdef char* other_c_string = py_string
This is a very fast operation after which other_c_string
points to
the byte string buffer of the Python string itself. It is tied to the
life time of the Python string. When the Python string is garbage
collected, the pointer becomes invalid. It is therefore important to
keep a reference to the Python string as long as the char*
is in
use. Often enough, this only spans the call to a C function that
receives the pointer as parameter. Special care must be taken,
however, when the C function stores the pointer for later use. Apart
from keeping a Python reference to the string, no manual memory
management is required.
Decoding bytes to text¶
The initially presented way of passing and receiving C strings is sufficient if your code only deals with binary data in the strings. When we deal with encoded text, however, it is best practice to decode the C byte strings to Python Unicode strings on reception, and to encode Python Unicode strings to C byte strings on the way out.
With a Python byte string object, you would normally just call the
.decode()
method to decode it into a Unicode string:
ustring = byte_string.decode('UTF-8')
Cython allows you to do the same for a C string, as long as it contains no null bytes:
cdef char* some_c_string = c_call_returning_a_c_string()
ustring = some_c_string.decode('UTF-8')
However, this will not work for strings that contain null bytes, and
it is very inefficient for long strings, since Cython has to call
strlen()
on the C string first to find out the length by counting
the bytes up to the terminating null byte. In many cases, the user
code will know the length already, e.g. because a C function returned
it. In this case, it is much more efficient to tell Cython the exact
number of bytes by slicing the C string:
cdef char* c_string = NULL
cdef Py_ssize_t length = 0
# get pointer and length from a C function
get_a_c_string(&c_string, &length)
ustring = c_string[:length].decode('UTF-8')
The same can be used when the string contains null bytes, e.g. when it uses an encoding like UCS-4, where each character is encoded in four bytes.
It is common practice to wrap string conversions (and non-trivial type conversions in general) in dedicated functions, as this needs to be done in exactly the same way whenever receiving text from C. This could look as follows:
cimport python_unicode
cimport stdlib
cdef unicode tounicode(char* s):
return s.decode('UTF-8', 'strict')
cdef unicode tounicode_with_length(
char* s, size_t length):
return s[:length].decode('UTF-8', 'strict')
cdef unicode tounicode_with_length_and_free(
char* s, size_t length):
try:
return s[:length].decode('UTF-8', 'strict')
finally:
stdlib.free(s)
Most likely, you will prefer shorter function names in your code based on the kind of string being handled. Different types of content often imply different ways of handling them on reception. To make the code more readable and to anticipate future changes, it is good practice to use separate conversion functions for different types of strings.
Encoding text to bytes¶
The reverse way, converting a Python unicode string to a C char*
,
is pretty efficient by itself, assuming that what you actually want is
a memory managed byte string:
py_byte_string = py_unicode_string.encode('UTF-8')
cdef char* c_string = py_byte_string
As noted before, this takes the pointer to the byte buffer of the Python byte string. Trying to do the same without keeping a reference to the Python byte string will fail with a compile error:
# this will not compile !
cdef char* c_string = py_unicode_string.encode('UTF-8')
Here, the Cython compiler notices that the code takes a pointer to a temporary string result that will be garbage collected after the assignment. Later access to the invalidated pointer will read invalid memory and likely result in a segfault. Cython will therefore refuse to compile this code.
Source code encoding¶
When string literals appear in the code, the source code encoding is
important. It determines the byte sequence that Cython will store in
the C code for bytes literals, and the Unicode code points that Cython
builds for unicode literals when parsing the byte encoded source file.
Following PEP 263, Cython supports the explicit declaration of
source file encodings. For example, putting the following comment at
the top of an ISO-8859-15
(Latin-9) encoded source file (into the
first or second line) is required to enable ISO-8859-15
decoding
in the parser:
# -*- coding: ISO-8859-15 -*-
When no explicit encoding declaration is provided, the source code is parsed as UTF-8 encoded text, as specified by PEP 3120. UTF-8 is a very common encoding that can represent the entire Unicode set of characters and is compatible with plain ASCII encoded text that it encodes efficiently. This makes it a very good choice for source code files which usually consist mostly of ASCII characters.
As an example, putting the following line into a UTF-8 encoded source
file will print 5
, as UTF-8 encodes the letter 'ö'
in the two
byte sequence '\xc3\xb6'
:
print( len(b'abcö') )
whereas the following ISO-8859-15
encoded source file will print
4
, as the encoding uses only 1 byte for this letter:
# -*- coding: ISO-8859-15 -*-
print( len(b'abcö') )
Note that the unicode literal u'abcö'
is a correctly decoded four
character Unicode string in both cases, whereas the unprefixed Python
str
literal 'abcö'
will become a byte string in Python 2 (thus
having length 4 or 5 in the examples above), and a 4 character Unicode
string in Python 3. If you are not familiar with encodings, this may
not appear obvious at first read. See CEP 108 for details.
As a rule of thumb, it is best to avoid unprefixed non-ASCII str
literals and to use unicode string literals for all text. Cython also
supports the __future__
import unicode_literals
that instructs
the parser to read all unprefixed str
literals in a source file as
unicode string literals, just like Python 3.
Single bytes and characters¶
The Python C-API uses the normal C char
type to represent a byte
value, but it has two special integer types for a Unicode code point
value, i.e. a single Unicode character: Py_UNICODE
and
Py_UCS4
. Since version 0.13, Cython supports the first natively,
support for Py_UCS4
is new in Cython 0.15. Py_UNICODE
is
either defined as an unsigned 2-byte or 4-byte integer, or as
wchar_t
, depending on the platform. The exact type is a compile
time option in the build of the CPython interpreter and extension
modules inherit this definition at C compile time. The advantage of
Py_UCS4
is that it is guaranteed to be large enough for any
Unicode code point value, regardless of the platform. It is defined
as a 32bit unsigned int or long.
In Cython, the char
type behaves differently from the
Py_UNICODE
and Py_UCS4
types when coercing to Python objects.
Similar to the behaviour of the bytes type in Python 3, the char
type coerces to a Python integer value by default, so that the
following prints 65 and not A
:
# -*- coding: ASCII -*-
cdef char char_val = 'A'
assert char_val == 65 # ASCII encoded byte value of 'A'
print( char_val )
If you want a Python bytes string instead, you have to request it
explicitly, and the following will print A
(or b'A'
in Python
3):
print( <bytes>char_val )
The explicit coercion works for any C integer type. Values outside of
the range of a char
or unsigned char
will raise an
OverflowError
at runtime. Coercion will also happen automatically
when assigning to a typed variable, e.g.:
cdef bytes py_byte_string
py_byte_string = char_val
On the other hand, the Py_UNICODE
and Py_UCS4
types are rarely
used outside of the context of a Python unicode string, so their
default behaviour is to coerce to a Python unicode object. The
following will therefore print the character A
, as would the same
code with the Py_UNICODE
type:
cdef Py_UCS4 uchar_val = u'A'
assert uchar_val == 65 # character point value of u'A'
print( uchar_val )
Again, explicit casting will allow users to override this behaviour. The following will print 65:
cdef Py_UCS4 uchar_val = u'A'
print( <long>uchar_val )
Note that casting to a C long
(or unsigned long
) will work
just fine, as the maximum code point value that a Unicode character
can have is 1114111 (0x10FFFF
). On platforms with 32bit or more,
int
is just as good.
Narrow Unicode builds¶
In narrow Unicode builds of CPython, i.e. builds where
sys.maxunicode
is 65535 (such as all Windows builds, as opposed to
1114111 in wide builds), it is still possible to use Unicode character
code points that do not fit into the 16 bit wide Py_UNICODE
type.
For example, such a CPython build will accept the unicode literal
u'\U00012345'
. However, the underlying system level encoding
leaks into Python space in this case, so that the length of this
literal becomes 2 instead of 1. This also shows when iterating over
it or when indexing into it. The visible substrings are u'\uD808'
and u'\uDF45'
in this example. They form a so-called surrogate
pair that represents the above character.
For more information on this topic, it is worth reading the `Wikipedia article about the UTF-16 encoding`_.
The same properties apply to Cython code that gets compiled for a narrow CPython runtime environment. In most cases, e.g. when searching for a substring, this difference can be ignored as both the text and the substring will contain the surrogates. So most Unicode processing code will work correctly also on narrow builds. Encoding, decoding and printing will work as expected, so that the above literal turns into exactly the same byte sequence on both narrow and wide Unicode platforms.
However, programmers should be aware that a single Py_UNICODE
value (or single ‘character’ unicode string in CPython) may not be
enough to represent a complete Unicode character on narrow platforms.
For example, if an independent search for u'\uD808'
and
u'\uDF45'
in a unicode string succeeds, this does not necessarily
mean that the character u'\U00012345
is part of that string. It
may well be that two different characters are in the string that just
happen to share a code unit with the surrogate pair of the character
in question. Looking for substrings works correctly because the two
code units in the surrogate pair use distinct value ranges, so the
pair is always identifiable in a sequence of code points.
As of version 0.15, Cython has extended support for surrogate pairs so
that you can safely use an in
test to search character values from
the full Py_UCS4
range even on narrow platforms:
cdef Py_UCS4 uchar = 0x12345
print( uchar in some_unicode_string )
Similarly, it can coerce a one character string with a high Unicode code point value to a Py_UCS4 value on both narrow and wide Unicode platforms:
cdef Py_UCS4 uchar = u'\U00012345'
assert uchar == 0x12345
Iteration¶
Cython 0.13 supports efficient iteration over char*
, bytes and
unicode strings, as long as the loop variable is appropriately typed.
So the following will generate the expected C code:
cdef char* c_string = ...
cdef char c
for c in c_string[:100]:
if c == 'A': ...
The same applies to bytes objects:
cdef bytes bytes_string = ...
cdef char c
for c in bytes_string:
if c == 'A': ...
For unicode objects, Cython will automatically infer the type of the
loop variable as Py_UCS4
:
cdef unicode ustring = ...
# NOTE: no typing required for 'uchar' !
for uchar in ustring:
if uchar == u'A': ...
The automatic type inference usually leads to much more efficient code here. However, note that some unicode operations still require the value to be a Python object, so Cython may end up generating redundant conversion code for the loop variable value inside of the loop. If this leads to a performance degradation for a specific piece of code, you can either type the loop variable as a Python object explicitly, or assign its value to a Python typed variable somewhere inside of the loop to enforce one-time coercion before running Python operations on it.
There are also optimisations for in
tests, so that the following
code will run in plain C code, (actually using a switch statement):
cdef Py_UCS4 uchar_val = get_a_unicode_character()
if uchar_val in u'abcABCxY':
...
Combined with the looping optimisation above, this can result in very efficient character switching code, e.g. in unicode parsers.
Pure Python Mode¶
Cython provides language constructs to let the same file be either interpreted
or compiled. This is accomplished by the same “magic” module cython
that
directives use and which must be imported. This is available for both .py
and
.pyx
files.
This is accomplished via special functions and decorators and an (optional)
augmenting .pxd
file.
Magic Attributes¶
The currently supported attributes of the cython
module are:
declare
declares a typed variable in the current scope, which can be used in place of thecdef type var [= value]
construct. This has two forms, the first as an assignment (useful as it creates a declaration in interpreted mode as well):x = cython.declare(cython.int) # cdef int x y = cython.declare(cython.double, 0.57721) # cdef double y = 0.57721
and the second mode as a simple function call:
cython.declare(x=cython.int, y=cython.double) # cdef int x; cdef double y
locals
is a decorator that is used to specify the types of local variables in the function body (including any or all of the argument types):@cython.locals(a=cython.double, b=cython.double, n=cython.p_double) def foo(a, b, x, y): ...
address
is used in place of the&
operator:cython.declare(x=cython.int, x_ptr=cython.p_int) x_ptr = cython.address(x)
sizeof
emulates the sizeof operator. It can take both types and expressions.:cython.declare(n=cython.longlong) print cython.sizeof(cython.longlong), cython.sizeof(n)
struct
can be used to create struct types.:MyStruct = cython.struct(x=cython.int, y=cython.int, data=cython.double) a = cython.declare(MyStruct)
is equivalent to the code:
cdef struct MyStruct: int x int y double data cdef MyStruct a
union
creates union types with exactly the same syntax asstruct
typedef
creates a new type:T = cython.typedef(cython.p_int) # ctypedef int* T
compiled
is a special variable which is set toTrue
when the compiler runs, andFalse
in the interpreter. Thus the code:if cython.compiled: print "Yep, I'm compiled." else: print "Just a lowly interpreted script."
will behave differently depending on whether or not the code is loaded as a compiled
.so
file or a plain.py
file.
Augmenting .pxd¶
If a .pxd
file is found with the same name as a .py
file, it will be
searched for cdef
classes and cdef
/cpdef
functions and methods. It will then convert the corresponding
classes/functions/methods in the .py
file to be of the correct type. Thus if
one had a.pxd
:
cdef class A:
cpdef foo(self, int i)
the file a.py
:
class A:
def foo(self, i):
print "Big" if i > 1000 else "Small"
would be interpreted as:
cdef class A:
cpdef foo(self, int i):
print "Big" if i > 1000 else "Small"
The special cython module can also be imported and used within the augmenting
.pxd
file. This makes it possible to add types to a pure python file without
changing the file itself. For example, the following python file
dostuff.py
:
def dostuff(n):
t = 0
for i in range(n):
t += i
return t
could be augmented with the following .pxd
file dostuff.pxd
:
import cython
@cython.locals(t = cython.int, i = cython.int)
cpdef int dostuff(int n)
Besides the cython.locals
decorator, the cython.declare()
function can also be
used to add types to global variables in the augmenting .pxd
file.
Note that normal Python (def
) functions cannot be declared in
.pxd
files, so it is currently impossible to override the types of
Python functions in .pxd
files if they use *args
or **kwargs
in their
signature, for instance.
Types¶
There are numerous types built in to the cython module. One has all the
standard C types, namely char
, short
, int
, long
, longlong
as well as their unsigned versions uchar
, ushort
, uint
, ulong
,
ulonglong
. One also has bint
and Py_ssize_t
. For each type, one
has pointer types p_int
, pp_int
, . . ., up to three levels deep in
interpreted mode, and infinitely deep in compiled mode. The Python types int,
long and bool are interpreted as C int
, long
and bint
respectively. Also, the python types list
, dict
, tuple
, . . . may
be used, as well as any user defined types.
Pointer types may be constructed with cython.pointer(cython.int)
, and
arrays as cython.int[10]
. A limited attempt is made to emulate these more
complex types, but only so much can be done from the Python language.
Decorators (not yet implemented)¶
We have settled on @cython.cclass
for the cdef class
decorators, and @cython.cfunc
and @cython.ccall
for cdef
and
cpdef
functions (respectively).
http://codespeak.net/pipermail/cython-dev/2008-November/002925.html
Further reading¶
The main documentation is located at http://docs.cython.org/. Some recent features might not have documentation written yet, in such cases some notes can usually be found in the form of a Cython Enhancement Proposal (CEP) on http://wiki.cython.org/enhancements.
[Seljebotn09] contains more information about Cython and NumPy arrays. If you intend to use Cython code in a multi-threaded setting, it is essential to read up on Cython’s features for managing the Global Interpreter Lock (the GIL). The same paper contains an explanation of the GIL, and the main documentation explains the Cython features for managing it.
Finally, don’t hesitate to ask questions (or post reports on successes!) on the Cython users mailing list [UserList]. The Cython developer mailing list, [DevList], is also open to everybody. Feel free to use it to report a bug, ask for guidance, if you have time to spare to develop Cython, or if you have suggestions for future development.
[DevList] | Cython developer mailing list: http://codespeak.net/mailman/listinfo/cython-dev. |
[Seljebotn09] | D. S. Seljebotn, Fast numerical computations with Cython, Proceedings of the 8th Python in Science Conference, 2009. |
[UserList] | Cython users mailing list: http://groups.google.com/group/cython-users |
Appendix: Installing MinGW on Windows¶
Download the MinGW installer from http://www.mingw.org/wiki/HOWTO_Install_the_MinGW_GCC_Compiler_Suite. (As of this writing, the download link is a bit difficult to find; it’s under “About” in the menu on the left-hand side). You want the file entitled “Automated MinGW Installer” (currently version 5.1.4).
Run it and install MinGW. Only the basic package is strictly needed for Cython, although you might want to grab at least the C++ compiler as well.
You need to set up Windows’ “PATH” environment variable so that includes e.g. “c:\mingw\bin” (if you installed MinGW to “c:\mingw”). The following web-page describes the procedure in Windows XP (the Vista procedure is similar): http://support.microsoft.com/kb/310519
Finally, tell Python to use MinGW as the default compiler (otherwise it will try for Visual C). If Python is installed to “c:\Python26”, create a file named “c:\Python26\Lib\distutils\distutils.cfg” containing:
[build] compiler = mingw32
The [WinInst] wiki page contains updated information about this procedure. Any contributions towards making the Windows install process smoother is welcomed; it is an unfortunate fact that none of the regular Cython developers have convenient access to Windows.
[WinInst] | http://wiki.cython.org/InstallingOnWindows |
Cython Users Guide¶
Contents:
Overview¶
About Cython¶
Cython is a language that makes writing C extensions for the Python language as easy as Python itself. Cython is based on the well-known Pyrex language by Greg Ewing, but supports more cutting edge functionality and optimizations [1]. The Cython language is very close to the Python language, but Cython additionally supports calling C functions and declaring C types on variables and class attributes. This allows the compiler to generate very efficient C code from Cython code.
This makes Cython the ideal language for wrapping external C libraries, and for fast C modules that speed up the execution of Python code.
Future Plans¶
Cython is not finished. Substantial tasks remaining. See Limitations for a current list.
Footnotes
[1] | For differences with Pyrex see Differences between Cython and Pyrex. |
Tutorial¶
The Basics of Cython¶
The fundamental nature of Cython can be summed up as follows: Cython is Python with C data types.
Cython is Python: Almost any piece of Python code is also valid Cython code. (There are a few Limitations, but this approximation will serve for now.) The Cython compiler will convert it into C code which makes equivalent calls to the Python/C API.
But Cython is much more than that, because parameters and variables can be declared to have C data types. Code which manipulates Python values and C values can be freely intermixed, with conversions occurring automatically wherever possible. Reference count maintenance and error checking of Python operations is also automatic, and the full power of Python’s exception handling facilities, including the try-except and try-finally statements, is available to you – even in the midst of manipulating C data.
Cython Hello World¶
As Cython can accept almost any valid python source file, one of the hardest things in getting started is just figuring out how to compile your extension.
So lets start with the canonical python hello world:
print "Hello World"
So the first thing to do is rename the file to helloworld.pyx
. Now we
need to make the setup.py
, which is like a python Makefile (for more
information see Source Files and Compilation). Your setup.py
should look like:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("helloworld", ["helloworld.pyx"])]
)
To use this to build your Cython file use the commandline options:
$ python setup.py build_ext --inplace
Which will leave a file in your local directory called helloworld.so
in unix
or helloworld.dll
in Windows. Now to use this file: start the python
interpreter and simply import it as if it was a regular python module:
>>> import helloworld
Hello World
Congratulations! You now know how to build a Cython extension. But So Far this example doesn’t really give a feeling why one would ever want to use Cython, so lets create a more realistic example.
pyximport
: Cython Compilation the Easy Way¶
If your module doesn’t require any extra C libraries or a special
build setup, then you can use the pyximport module by Paul Prescod and
Stefan Behnel to load .pyx files directly on import, without having to
write a setup.py
file. It is shipped and installed with
Cython and can be used like this:
>>> import pyximport; pyximport.install()
>>> import helloworld
Hello World
Since Cython 0.11, the pyximport
module also has experimental
compilation support for normal Python modules. This allows you to
automatically run Cython on every .pyx and .py module that Python
imports, including the standard library and installed packages.
Cython will still fail to compile a lot of Python modules, in which
case the import mechanism will fall back to loading the Python source
modules instead. The .py import mechanism is installed like this:
>>> pyximport.install(pyimport = True)
Fibonacci Fun¶
From the official Python tutorial a simple fibonacci function is defined as:
Now following the steps for the Hello World example we first rename the file
to have a .pyx extension, lets say fib.pyx
, then we create the
setup.py
file. Using the file created for the Hello World example, all
that you need to change is the name of the Cython filename, and the resulting
module name, doing this we have:
Build the extension with the same command used for the helloworld.pyx:
$ python setup.py build_ext --inplace
And use the new extension with:
>>> import fib
>>> fib.fib(2000)
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
Primes¶
Here’s a small example showing some of what can be done. It’s a routine for finding prime numbers. You tell it how many primes you want, and it returns them as a Python list.
primes.pyx
:
You’ll see that it starts out just like a normal Python function definition,
except that the parameter kmax
is declared to be of type int
. This
means that the object passed will be converted to a C integer (or a
TypeError.
will be raised if it can’t be).
Lines 2 and 3 use the cdef
statement to define some local C variables.
Line 4 creates a Python list which will be used to return the result. You’ll
notice that this is done exactly the same way it would be in Python. Because
the variable result hasn’t been given a type, it is assumed to hold a Python
object.
Lines 7-9 set up for a loop which will test candidate numbers for primeness until the required number of primes has been found. Lines 11-12, which try dividing a candidate by all the primes found so far, are of particular interest. Because no Python objects are referred to, the loop is translated entirely into C code, and thus runs very fast.
When a prime is found, lines 14-15 add it to the p array for fast access by
the testing loop, and line 16 adds it to the result list. Again, you’ll notice
that line 16 looks very much like a Python statement, and in fact it is, with
the twist that the C parameter n
is automatically converted to a Python
object before being passed to the append method. Finally, at line 18, a normal
Python return statement returns the result list.
Compiling primes.pyx with the Cython compiler produces an extension module which we can try out in the interactive interpreter as follows:
>>> import primes
>>> primes.primes(10)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
See, it works! And if you’re curious about how much work Cython has saved you, take a look at the C code generated for this module.
Language Details¶
For more about the Cython language, see Language Basics. To dive right in to using Cython in a numerical computation context, see Cython for NumPy users.
Language Basics¶
C variable and type definitions¶
The cdef
statement is used to declare C variables, either local or
module-level:
cdef int i, j, k
cdef float f, g[42], *h
and C struct
, union
or enum
types:
cdef struct Grail:
int age
float volume
cdef union Food:
char *spam
float *eggs
cdef enum CheeseType:
cheddar, edam,
camembert
cdef enum CheeseState:
hard = 1
soft = 2
runny = 3
There is currently no special syntax for defining a constant, but you can use
an anonymous enum
declaration for this purpose, for example,:
cdef enum:
tons_of_spam = 3
Note
the words struct
, union
and enum
are used only when
defining a type, not when referring to it. For example, to declare a variable
pointing to a Grail
you would write:
cdef Grail *gp
and not:
cdef struct Grail *gp # WRONG
There is also a ctypedef
statement for giving names to types, e.g.:
ctypedef unsigned long ULong
ctypedef int *IntPtr
Grouping multiple C declarations¶
If you have a series of declarations that all begin with cdef
, you
can group them into a cdef
block like this:
cdef:
struct Spam:
int tons
int i
float f
Spam *p
void f(Spam *s):
print s.tons, "Tons of spam"
Python functions vs. C functions¶
There are two kinds of function definition in Cython:
Python functions are defined using the def statement, as in Python. They take Python objects as parameters and return Python objects.
C functions are defined using the new cdef
statement. They take
either Python objects or C values as parameters, and can return either Python
objects or C values.
Within a Cython module, Python functions and C functions can call each other
freely, but only Python functions can be called from outside the module by
interpreted Python code. So, any functions that you want to “export” from your
Cython module must be declared as Python functions using def.
There is also a hybrid function, called cpdef
. A cpdef
can be called from anywhere, but uses the faster C calling conventions
when being called from other Cython code.
Parameters of either type of function can be declared to have C data types, using normal C declaration syntax. For example,:
def spam(int i, char *s):
...
cdef int eggs(unsigned long l, float f):
...
When a parameter of a Python function is declared to have a C data type, it is passed in as a Python object and automatically converted to a C value, if possible. Automatic conversion is currently only possible for numeric types and string types; attempting to use any other type for the parameter of a Python function will result in a compile-time error.
C functions, on the other hand, can have parameters of any type, since they’re passed in directly using a normal C function call.
A more complete comparison of the pros and cons of these different method types can be found at Early Binding for Speed.
Python objects as parameters and return values¶
If no type is specified for a parameter or return value, it is assumed to be a Python object. (Note that this is different from the C convention, where it would default to int.) For example, the following defines a C function that takes two Python objects as parameters and returns a Python object:
cdef spamobjs(x, y):
...
Reference counting for these objects is performed automatically according to the standard Python/C API rules (i.e. borrowed references are taken as parameters and a new reference is returned).
The name object can also be used to explicitly declare something as a Python object. This can be useful if the name being declared would otherwise be taken as the name of a type, for example,:
cdef ftang(object int):
...
declares a parameter called int which is a Python object. You can also use object as the explicit return type of a function, e.g.:
cdef object ftang(object int):
...
In the interests of clarity, it is probably a good idea to always be explicit about object parameters in C functions.
Error return values¶
If you don’t do anything special, a function declared with cdef
that
does not return a Python object has no way of reporting Python exceptions to
its caller. If an exception is detected in such a function, a warning message
is printed and the exception is ignored.
If you want a C function that does not return a Python object to be able to propagate exceptions to its caller, you need to declare an exception value for it. Here is an example:
cdef int spam() except -1:
...
With this declaration, whenever an exception occurs inside spam, it will
immediately return with the value -1
. Furthermore, whenever a call to spam
returns -1
, an exception will be assumed to have occurred and will be
propagated.
When you declare an exception value for a function, you should never explicitly return that value. If all possible return values are legal and you can’t reserve one entirely for signalling errors, you can use an alternative form of exception value declaration:
cdef int spam() except? -1:
...
The ”?” indicates that the value -1
only indicates a possible error. In this
case, Cython generates a call to :cfunc:`PyErr_Occurred` if the exception value is
returned, to make sure it really is an error.
There is also a third form of exception value declaration:
cdef int spam() except *:
...
This form causes Cython to generate a call to :cfunc:`PyErr_Occurred` after every call to spam, regardless of what value it returns. If you have a function returning void that needs to propagate errors, you will have to use this form, since there isn’t any return value to test. Otherwise there is little use for this form.
An external C++ function that may raise an exception can be declared with:
cdef int spam() except +
See Using C++ in Cython for more details.
Some things to note:
Exception values can only declared for functions returning an integer, enum, float or pointer type, and the value must be a constant expression. Void functions can only use the
except *
form.The exception value specification is part of the signature of the function. If you’re passing a pointer to a function as a parameter or assigning it to a variable, the declared type of the parameter or variable must have the same exception value specification (or lack thereof). Here is an example of a pointer-to-function declaration with an exception value:
int (*grail)(int, char *) except -1
You don’t need to (and shouldn’t) declare exception values for functions which return Python objects. Remember that a function with no declared return type implicitly returns a Python object. (Exceptions on such functions are implicitly propagated by returning NULL.)
Checking return values of non-Cython functions¶
It’s important to understand that the except clause does not cause an error to be raised when the specified value is returned. For example, you can’t write something like:
cdef extern FILE *fopen(char *filename, char *mode) except NULL # WRONG!
and expect an exception to be automatically raised if a call to fopen()
returns NULL
. The except clause doesn’t work that way; its only purpose is
for propagating Python exceptions that have already been raised, either by a Cython
function or a C function that calls Python/C API routines. To get an exception
from a non-Python-aware function such as fopen()
, you will have to check the
return value and raise it yourself, for example,:
cdef FILE *p
p = fopen("spam.txt", "r")
if p == NULL:
raise SpamError("Couldn't open the spam file")
Automatic type conversions¶
In most situations, automatic conversions will be performed for the basic numeric and string types when a Python object is used in a context requiring a C value, or vice versa. The following table summarises the conversion possibilities.
C types | From Python types | To Python types |
---|---|---|
[unsigned] char [unsigned] short int, long | int, long | int |
unsigned int unsigned long [unsigned] long long | int, long | long |
float, double, long double | int, long, float | float |
char * | str/bytes | str/bytes [1] |
struct | dict |
[1] | The conversion is to/from str for Python 2.x, and bytes for Python 3.x. |
Caveats when using a Python string in a C context¶
You need to be careful when using a Python string in a context expecting a
char *
. In this situation, a pointer to the contents of the Python string is
used, which is only valid as long as the Python string exists. So you need to
make sure that a reference to the original Python string is held for as long
as the C string is needed. If you can’t guarantee that the Python string will
live long enough, you will need to copy the C string.
Cython detects and prevents some mistakes of this kind. For instance, if you attempt something like:
cdef char *s
s = pystring1 + pystring2
then Cython will produce the error message Obtaining char * from temporary
Python value
. The reason is that concatenating the two Python strings
produces a new Python string object that is referenced only by a temporary
internal variable that Cython generates. As soon as the statement has finished,
the temporary variable will be decrefed and the Python string deallocated,
leaving s
dangling. Since this code could not possibly work, Cython refuses to
compile it.
The solution is to assign the result of the concatenation to a Python
variable, and then obtain the char *
from that, i.e.:
cdef char *s
p = pystring1 + pystring2
s = p
It is then your responsibility to hold the reference p for as long as necessary.
Keep in mind that the rules used to detect such errors are only heuristics. Sometimes Cython will complain unnecessarily, and sometimes it will fail to detect a problem that exists. Ultimately, you need to understand the issue and be careful what you do.
Statements and expressions¶
Control structures and expressions follow Python syntax for the most part. When applied to Python objects, they have the same semantics as in Python (unless otherwise noted). Most of the Python operators can also be applied to C values, with the obvious semantics.
If Python objects and C values are mixed in an expression, conversions are performed automatically between Python objects and C numeric or string types.
Reference counts are maintained automatically for all Python objects, and all Python operations are automatically checked for errors, with appropriate action taken.
Differences between C and Cython expressions¶
There are some differences in syntax and semantics between C expressions and Cython expressions, particularly in the area of C constructs which have no direct equivalent in Python.
An integer literal is treated as a C constant, and will be truncated to whatever size your C compiler thinks appropriate. To get a Python integer (of arbitrary precision) cast immediately to an object (e.g.
<object>100000000000000000000
). TheL
,LL
, andU
suffixes have the same meaning as in C.There is no
->
operator in Cython. Instead ofp->x
, usep.x
There is no unary
*
operator in Cython. Instead of*p
, usep[0]
There is an
&
operator, with the same semantics as in C.The null C pointer is called
NULL
, not0
(andNULL
is a reserved word).Type casts are written
<type>value
, for example:cdef char *p, float *q p = <char*>q
Scope rules¶
Cython determines whether a variable belongs to a local scope, the module scope, or the built-in scope completely statically. As with Python, assigning to a variable which is not otherwise declared implicitly declares it to be a Python variable residing in the scope where it is assigned.
Note
A consequence of these rules is that the module-level scope behaves the same way as a Python local scope if you refer to a variable before assigning to it. In particular, tricks such as the following will not work in Cython:
try:
x = True
except NameError:
True = 1
because, due to the assignment, the True will always be looked up in the module-level scope. You would have to do something like this instead:
import __builtin__
try:
True = __builtin__.True
except AttributeError:
True = 1
Built-in Functions¶
Cython compiles calls to the following built-in functions into direct calls to the corresponding Python/C API routines, making them particularly fast.
Function and arguments | Return type | Python/C API Equivalent |
---|---|---|
abs(obj) | object | PyNumber_Absolute |
delattr(obj, name) | int | PyObject_DelAttr |
dir(obj) getattr(obj, name) (Note 1) getattr3(obj, name, default) | object | PyObject_Dir |
hasattr(obj, name) | int | PyObject_HasAttr |
hash(obj) | int | PyObject_Hash |
intern(obj) | object | PyObject_InternFromString |
isinstance(obj, type) | int | PyObject_IsInstance |
issubclass(obj, type) | int | PyObject_IsSubclass |
iter(obj) | object | PyObject_GetIter |
len(obj) | Py_ssize_t | PyObject_Length |
pow(x, y, z) (Note 2) | object | PyNumber_Power |
reload(obj) | object | PyImport_ReloadModule |
repr(obj) | object | PyObject_Repr |
setattr(obj, name) | void | PyObject_SetAttr |
Note 1: There are two different functions corresponding to the Python
getattr()
depending on whether a third argument is used. In a Python
context, they both evaluate to the Python getattr()
function.
Note 2: Only the three-argument form of pow()
is supported. Use the
**
operator otherwise.
Only direct function calls using these names are optimised. If you do something else with one of these names that assumes it’s a Python object, such as assign it to a Python variable, and later call it, the call will be made as a Python function call.
Operator Precedence¶
Keep in mind that there are some differences in operator precedence between Python and C, and that Cython uses the Python precedences, not the C ones.
Integer for-loops¶
Cython recognises the usual Python for-in-range integer loop pattern:
for i in range(n):
...
If i
is declared as a cdef
integer type, it will
optimise this into a pure C loop. This restriction is required as
otherwise the generated code wouldn’t be correct due to potential
integer overflows on the target architecture. If you are worried that
the loop is not being converted correctly, use the annotate feature of
the cython commandline (-a
) to easily see the generated C code.
See Automatic range conversion
For backwards compatibility to Pyrex, Cython also supports another form of for-loop:
for i from 0 <= i < n:
...
or:
for i from 0 <= i < n by s:
...
where s
is some integer step size.
Some things to note about the for-from loop:
- The target expression must be a variable name.
- The name between the lower and upper bounds must be the same as the target name.
- The direction of iteration is determined by the relations. If they are both
from the set {
<
,<=
} then it is upwards; if they are both from the set {>
,>=
} then it is downwards. (Any other combination is disallowed.)
Like other Python looping statements, break and continue may be used in the body, and the loop may have an else clause.
The include statement¶
Warning
Historically the include
statement was used for sharing declarations.
Use Sharing Declarations Between Cython Modules instead.
A Cython source file can include material from other files using the include statement, for example:
include "spamstuff.pxi"
The contents of the named file are textually included at that point. The included file can contain any complete statements or declarations that are valid in the context where the include statement appears, including other include statements. The contents of the included file should begin at an indentation level of zero, and will be treated as though they were indented to the level of the include statement that is including the file.
Note
There are other mechanisms available for splitting Cython code into separate parts that may be more appropriate in many cases. See Sharing Declarations Between Cython Modules.
Conditional Compilation¶
Some features are available for conditional compilation and compile-time constants within a Cython source file.
Compile-Time Definitions¶
A compile-time constant can be defined using the DEF statement:
DEF FavouriteFood = "spam"
DEF ArraySize = 42
DEF OtherArraySize = 2 * ArraySize + 17
The right-hand side of the DEF
must be a valid compile-time expression.
Such expressions are made up of literal values and names defined using DEF
statements, combined using any of the Python expression syntax.
The following compile-time names are predefined, corresponding to the values
returned by os.uname()
.
UNAME_SYSNAME, UNAME_NODENAME, UNAME_RELEASE, UNAME_VERSION, UNAME_MACHINE
The following selection of builtin constants and functions are also available:
None, True, False, abs, bool, chr, cmp, complex, dict, divmod, enumerate, float, hash, hex, int, len, list, long, map, max, min, oct, ord, pow, range, reduce, repr, round, slice, str, sum, tuple, xrange, zip
A name defined using DEF
can be used anywhere an identifier can appear,
and it is replaced with its compile-time value as though it were written into
the source at that point as a literal. For this to work, the compile-time
expression must evaluate to a Python value of type int
, long
,
float
or str
.:
cdef int a1[ArraySize]
cdef int a2[OtherArraySize]
print "I like", FavouriteFood
Conditional Statements¶
The IF
statement can be used to conditionally include or exclude sections
of code at compile time. It works in a similar way to the #if
preprocessor
directive in C.:
IF UNAME_SYSNAME == "Windows":
include "icky_definitions.pxi"
ELIF UNAME_SYSNAME == "Darwin":
include "nice_definitions.pxi"
ELIF UNAME_SYSNAME == "Linux":
include "penguin_definitions.pxi"
ELSE:
include "other_definitions.pxi"
The ELIF
and ELSE
clauses are optional. An IF
statement can appear
anywhere that a normal statement or declaration can appear, and it can contain
any statements or declarations that would be valid in that context, including
DEF
statements and other IF
statements.
The expressions in the IF
and ELIF
clauses must be valid compile-time
expressions as for the DEF
statement, although they can evaluate to any
Python value, and the truth of the result is determined in the usual Python
way.
Extension Types¶
Introduction¶
As well as creating normal user-defined classes with the Python class
statement, Cython also lets you create new built-in Python types, known as
extension types. You define an extension type using the cdef
class
statement. Here’s an example:
cdef class Shrubbery:
cdef int width, height
def __init__(self, w, h):
self.width = w
self.height = h
def describe(self):
print "This shrubbery is", self.width, \
"by", self.height, "cubits."
As you can see, a Cython extension type definition looks a lot like a Python
class definition. Within it, you use the def statement to define methods that
can be called from Python code. You can even define many of the special
methods such as __init__()
as you would in Python.
The main difference is that you can use the cdef
statement to define
attributes. The attributes may be Python objects (either generic or of a
particular extension type), or they may be of any C data type. So you can use
extension types to wrap arbitrary C data structures and provide a Python-like
interface to them.
Attributes¶
Attributes of an extension type are stored directly in the object’s C struct. The set of attributes is fixed at compile time; you can’t add attributes to an extension type instance at run time simply by assigning to them, as you could with a Python class instance. (You can subclass the extension type in Python and add attributes to instances of the subclass, however.)
There are two ways that attributes of an extension type can be accessed: by Python attribute lookup, or by direct access to the C struct from Cython code. Python code is only able to access attributes of an extension type by the first method, but Cython code can use either method.
By default, extension type attributes are only accessible by direct access,
not Python access, which means that they are not accessible from Python code.
To make them accessible from Python code, you need to declare them as
public
or readonly
. For example,:
cdef class Shrubbery:
cdef public int width, height
cdef readonly float depth
makes the width and height attributes readable and writable from Python code, and the depth attribute readable but not writable.
Note
You can only expose simple C types, such as ints, floats, and strings, for Python access. You can also expose Python-valued attributes.
Note
Also the public
and readonly
options apply only to
Python access, not direct access. All the attributes of an extension type
are always readable and writable by C-level access.
Type declarations¶
Before you can directly access the attributes of an extension type, the Cython
compiler must know that you have an instance of that type, and not just a
generic Python object. It knows this already in the case of the self
parameter of the methods of that type, but in other cases you will have to use
a type declaration.
For example, in the following function,:
cdef widen_shrubbery(sh, extra_width): # BAD
sh.width = sh.width + extra_width
because the sh
parameter hasn’t been given a type, the width attribute
will be accessed by a Python attribute lookup. If the attribute has been
declared public
or readonly
then this will work, but it
will be very inefficient. If the attribute is private, it will not work at all
– the code will compile, but an attribute error will be raised at run time.
The solution is to declare sh
as being of type Shrubbery
, as
follows:
cdef widen_shrubbery(Shrubbery sh, extra_width):
sh.width = sh.width + extra_width
Now the Cython compiler knows that sh
has a C attribute called
width
and will generate code to access it directly and efficiently.
The same consideration applies to local variables, for example,:
cdef Shrubbery another_shrubbery(Shrubbery sh1):
cdef Shrubbery sh2
sh2 = Shrubbery()
sh2.width = sh1.width
sh2.height = sh1.height
return sh2
Type Testing and Casting¶
Suppose I have a method quest()
which returns an object of type Shrubbery
.
To access it’s width I could write:
cdef Shrubbery sh = quest()
print sh.width
which requires the use of a local variable and performs a type test on assignment.
If you know the return value of quest()
will be of type Shrubbery
you can use a cast to write:
print (<Shrubbery>quest()).width
This may be dangerous if quest()
is not actually a Shrubbery
, as it
will try to access width as a C struct member which may not exist. At the C level,
rather than raising an AttributeError
, either an nonsensical result will be
returned (interpreting whatever data is at at that address as an int) or a segfault
may result from trying to access invalid memory. Instead, one can write:
print (<Shrubbery?>quest()).width
which performs a type check (possibly raising a TypeError
) before making the
cast and allowing the code to proceed.
To explicitly test the type of an object, use the isinstance()
method. By default,
in Python, the isinstance()
method checks the __class__
attribute of the
first argument to determine if it is of the required type. However, this is potentially
unsafe as the __class__
attribute can be spoofed or changed, but the C structure
of an extension type must be correct to access its cdef
attributes and call its cdef
methods. Cython detects if the second argument is a known extension
type and does a type check instead, analogous to Pyrex’s typecheck()
.
The old behavior is always available by passing a tuple as the second parameter:
print isinstance(sh, Shrubbery) # Check the type of sh
print isinstance(sh, (Shrubbery,)) # Check sh.__class__
Extension types and None¶
When you declare a parameter or C variable as being of an extension type,
Cython will allow it to take on the value None
as well as values of its
declared type. This is analogous to the way a C pointer can take on the value
NULL
, and you need to exercise the same caution because of it. There is no
problem as long as you are performing Python operations on it, because full
dynamic type checking will be applied. However, when you access C attributes
of an extension type (as in the widen_shrubbery function above), it’s up to
you to make sure the reference you’re using is not None
– in the
interests of efficiency, Cython does not check this.
You need to be particularly careful when exposing Python functions which take
extension types as arguments. If we wanted to make widen_shrubbery()
a
Python function, for example, if we simply wrote:
def widen_shrubbery(Shrubbery sh, extra_width): # This is
sh.width = sh.width + extra_width # dangerous!
then users of our module could crash it by passing None
for the sh
parameter.
One way to fix this would be:
def widen_shrubbery(Shrubbery sh, extra_width):
if sh is None:
raise TypeError
sh.width = sh.width + extra_width
but since this is anticipated to be such a frequent requirement, Cython
provides a more convenient way. Parameters of a Python function declared as an
extension type can have a not None
clause:
def widen_shrubbery(Shrubbery sh not None, extra_width):
sh.width = sh.width + extra_width
Now the function will automatically check that sh
is not None
along
with checking that it has the right type.
Note
not None
clause can only be used in Python functions (defined with
def
) and not C functions (defined with cdef
). If
you need to check whether a parameter to a C function is None, you will
need to do it yourself.
Note
Some more things:
- The self parameter of a method of an extension type is guaranteed never to
be
None
. - When comparing a value with
None
, keep in mind that, ifx
is a Python object,x is None
andx is not None
are very efficient because they translate directly to C pointer comparisons, whereasx == None
andx != None
, or simply usingx
as a boolean value (as inif x: ...
) will invoke Python operations and therefore be much slower.
Special methods¶
Although the principles are similar, there are substantial differences between
many of the __xxx__()
special methods of extension types and their Python
counterparts. There is a separate page devoted to this subject, and you should
read it carefully before attempting to use any special methods in your
extension types.
Properties¶
There is a special syntax for defining properties in an extension class:
cdef class Spam:
property cheese:
"A doc string can go here."
def __get__(self):
# This is called when the property is read.
...
def __set__(self, value):
# This is called when the property is written.
...
def __del__(self):
# This is called when the property is deleted.
The __get__()
, __set__()
and __del__()
methods are all
optional; if they are omitted, an exception will be raised when the
corresponding operation is attempted.
Here’s a complete example. It defines a property which adds to a list each time it is written to, returns the list when it is read, and empties the list when it is deleted.:
# cheesy.pyx
cdef class CheeseShop:
cdef object cheeses
def __cinit__(self):
self.cheeses = []
property cheese:
def __get__(self):
return "We don't have: %s" % self.cheeses
def __set__(self, value):
self.cheeses.append(value)
def __del__(self):
del self.cheeses[:]
# Test input
from cheesy import CheeseShop
shop = CheeseShop()
print shop.cheese
shop.cheese = "camembert"
print shop.cheese
shop.cheese = "cheddar"
print shop.cheese
del shop.cheese
print shop.cheese
# Test output
We don't have: []
We don't have: ['camembert']
We don't have: ['camembert', 'cheddar']
We don't have: []
Subclassing¶
An extension type may inherit from a built-in type or another extension type:
cdef class Parrot:
...
cdef class Norwegian(Parrot):
...
A complete definition of the base type must be available to Cython, so if the
base type is a built-in type, it must have been previously declared as an
extern extension type. If the base type is defined in another Cython module, it
must either be declared as an extern extension type or imported using the
cimport
statement.
An extension type can only have one base class (no multiple inheritance).
Cython extension types can also be subclassed in Python. A Python class can inherit from multiple extension types provided that the usual Python rules for multiple inheritance are followed (i.e. the C layouts of all the base classes must be compatible).
Since Cython 0.13.1, there is a way to prevent extension types from
being subtyped in Python. This is done via the final
directive,
usually set on an extension type using a decorator:
cimport cython
@cython.final
cdef class Parrot:
def done(self): pass
Trying to create a Python subclass from this type will raise a
TypeError
at runtime. Cython will also prevent subtyping a
final type inside of the same module, i.e. creating an extension type
that uses a final type as its base type will fail at compile time.
Note, however, that this restriction does not currently propagate to
other extension modules, so even final extension types can still be
subtyped at the C level by foreign code.
C methods¶
Extension types can have C methods as well as Python methods. Like C
functions, C methods are declared using cdef
or cpdef
instead of
def
. C methods are “virtual”, and may be overridden in derived
extension types.:
# pets.pyx
cdef class Parrot:
cdef void describe(self):
print "This parrot is resting."
cdef class Norwegian(Parrot):
cdef void describe(self):
Parrot.describe(self)
print "Lovely plumage!"
cdef Parrot p1, p2
p1 = Parrot()
p2 = Norwegian()
print "p1:"
p1.describe()
print "p2:"
p2.describe()
# Output
p1:
This parrot is resting.
p2:
This parrot is resting.
Lovely plumage!
The above example also illustrates that a C method can call an inherited C method using the usual Python technique, i.e.:
Parrot.describe(self)
Forward-declaring extension types¶
Extension types can be forward-declared, like struct
and
union
types. This will be necessary if you have two extension types
that need to refer to each other, e.g.:
cdef class Shrubbery # forward declaration
cdef class Shrubber:
cdef Shrubbery work_in_progress
cdef class Shrubbery:
cdef Shrubber creator
If you are forward-declaring an extension type that has a base class, you must specify the base class in both the forward declaration and its subsequent definition, for example,:
cdef class A(B)
...
cdef class A(B):
# attributes and methods
Making extension types weak-referenceable¶
By default, extension types do not support having weak references made to
them. You can enable weak referencing by declaring a C attribute of type
object called __weakref__
. For example,:
cdef class ExplodingAnimal:
"""This animal will self-destruct when it is
no longer strongly referenced."""
cdef object __weakref__
Public and external extension types¶
Extension types can be declared extern or public. An extern extension type declaration makes an extension type defined in external C code available to a Cython module. A public extension type declaration makes an extension type defined in a Cython module available to external C code.
External extension types¶
An extern extension type allows you to gain access to the internals of Python objects defined in the Python core or in a non-Cython extension module.
Note
In previous versions of Pyrex, extern extension types were also used to reference extension types defined in another Pyrex module. While you can still do that, Cython provides a better mechanism for this. See Sharing Declarations Between Cython Modules.
Here is an example which will let you get at the C-level members of the built-in complex object.:
cdef extern from "complexobject.h":
struct Py_complex:
double real
double imag
ctypedef class __builtin__.complex [object PyComplexObject]:
cdef Py_complex cval
# A function which uses the above type
def spam(complex c):
print "Real:", c.cval.real
print "Imag:", c.cval.imag
Note
Some important things:
In this example,
ctypedef
class has been used. This is because, in the Python header files, thePyComplexObject
struct is declared with:ctypedef struct { ... } PyComplexObject;
As well as the name of the extension type, the module in which its type object can be found is also specified. See the implicit importing section below.
When declaring an external extension type, you don’t declare any methods. Declaration of methods is not required in order to call them, because the calls are Python method calls. Also, as with
structs
andunions
, if your extension class declaration is inside acdef
extern from block, you only need to declare those C members which you wish to access.
Name specification clause¶
The part of the class declaration in square brackets is a special feature only available for extern or public extension types. The full form of this clause is:
[object object_struct_name, type type_object_name ]
where object_struct_name
is the name to assume for the type’s C struct,
and type_object_name is the name to assume for the type’s statically declared
type object. (The object and type clauses can be written in either order.)
If the extension type declaration is inside a cdef
extern from
block, the object clause is required, because Cython must be able to generate
code that is compatible with the declarations in the header file. Otherwise,
for extern extension types, the object clause is optional.
For public extension types, the object and type clauses are both required, because Cython must be able to generate code that is compatible with external C code.
Implicit importing¶
Cython requires you to include a module name in an extern extension class declaration, for example,:
cdef extern class MyModule.Spam:
...
The type object will be implicitly imported from the specified module and bound to the corresponding name in this module. In other words, in this example an implicit:
from MyModule import Spam
statement will be executed at module load time.
The module name can be a dotted name to refer to a module inside a package hierarchy, for example,:
cdef extern class My.Nested.Package.Spam:
...
You can also specify an alternative name under which to import the type using an as clause, for example,:
cdef extern class My.Nested.Package.Spam as Yummy:
...
which corresponds to the implicit import statement:
from My.Nested.Package import Spam as Yummy
Type names vs. constructor names¶
Inside a Cython module, the name of an extension type serves two distinct purposes. When used in an expression, it refers to a module-level global variable holding the type’s constructor (i.e. its type-object). However, it can also be used as a C type name to declare variables, arguments and return values of that type.
When you declare:
cdef extern class MyModule.Spam:
...
the name Spam serves both these roles. There may be other names by which you
can refer to the constructor, but only Spam can be used as a type name. For
example, if you were to explicity import MyModule, you could use
MyModule.Spam()
to create a Spam instance, but you wouldn’t be able to use
MyModule.Spam
as a type name.
When an as clause is used, the name specified in the as clause also takes over both roles. So if you declare:
cdef extern class MyModule.Spam as Yummy:
...
then Yummy becomes both the type name and a name for the constructor. Again, there are other ways that you could get hold of the constructor, but only Yummy is usable as a type name.
Public extension types¶
An extension type can be declared public, in which case a .h
file is
generated containing declarations for its object struct and type object. By
including the .h
file in external C code that you write, that code can
access the attributes of the extension type.
Special Methods of Extension Types¶
This page describes the special methods currently supported by Cython extension types. A complete list of all the special methods appears in the table at the bottom. Some of these methods behave differently from their Python counterparts or have no direct Python counterparts, and require special mention.
Declaration¶
Special methods of extension types must be declared with def
, not
cdef
. This does not impact their performance–Python uses different
calling conventions to invoke these special methods.
Docstrings¶
Currently, docstrings are not fully supported in some special methods of extension
types. You can place a docstring in the source to serve as a comment, but it
won’t show up in the corresponding __doc__
attribute at run time. (This
seems to be is a Python limitation – there’s nowhere in the PyTypeObject
data structure to put such docstrings.)
Initialisation methods: __cinit__()
and __init__()
¶
There are two methods concerned with initialising the object.
The __cinit__()
method is where you should perform basic C-level
initialisation of the object, including allocation of any C data structures
that your object will own. You need to be careful what you do in the
__cinit__()
method, because the object may not yet be fully valid Python
object when it is called. Therefore, you should be careful invoking any Python
operations which might touch the object; in particular, its methods.
By the time your __cinit__()
method is called, memory has been allocated for the
object and any C attributes it has have been initialised to 0 or null. (Any
Python attributes have also been initialised to None, but you probably
shouldn’t rely on that.) Your __cinit__()
method is guaranteed to be called
exactly once.
If your extension type has a base type, the __cinit__()
method of the base type
is automatically called before your __cinit__()
method is called; you cannot
explicitly call the inherited __cinit__()
method. If you need to pass a modified
argument list to the base type, you will have to do the relevant part of the
initialisation in the __init__()
method instead (where the normal rules for
calling inherited methods apply).
Any initialisation which cannot safely be done in the __cinit__()
method should
be done in the __init__()
method. By the time __init__()
is called, the object is
a fully valid Python object and all operations are safe. Under some
circumstances it is possible for __init__()
to be called more than once or not
to be called at all, so your other methods should be designed to be robust in
such situations.
Any arguments passed to the constructor will be passed to both the
__cinit__()
method and the __init__()
method. If you anticipate
subclassing your extension type in Python, you may find it useful to give the
__cinit__()
method * and ** arguments so that it can accept and
ignore extra arguments. Otherwise, any Python subclass which has an
__init__()
with a different signature will have to override
__new__`[#] as well as :meth:`__init__()
, which the writer of a Python
class wouldn’t expect to have to do. Alternatively, as a convenience, if you declare
your __cinit__`()
method to take no arguments (other than self) it
will simply ignore any extra arguments passed to the constructor without
complaining about the signature mismatch.
[1] | http://docs.python.org/reference/datamodel.html#object.__new__ |
Finalization method: __dealloc__()
¶
The counterpart to the __cinit__()
method is the __dealloc__()
method, which should perform the inverse of the __cinit__()
method. Any
C data that you explicitly allocated (e.g. via malloc) in your
__cinit__()
method should be freed in your __dealloc__()
method.
You need to be careful what you do in a __dealloc__()
method. By the time your
__dealloc__()
method is called, the object may already have been partially
destroyed and may not be in a valid state as far as Python is concerned, so
you should avoid invoking any Python operations which might touch the object.
In particular, don’t call any other methods of the object or do anything which
might cause the object to be resurrected. It’s best if you stick to just
deallocating C data.
You don’t need to worry about deallocating Python attributes of your object,
because that will be done for you by Cython after your __dealloc__()
method
returns.
Arithmetic methods¶
Arithmetic operator methods, such as __add__()
, behave differently from their
Python counterparts. There are no separate “reversed” versions of these
methods (__radd__()
, etc.) Instead, if the first operand cannot perform the
operation, the same method of the second operand is called, with the operands
in the same order.
This means that you can’t rely on the first parameter of these methods being “self” or being the right type, and you should test the types of both operands before deciding what to do. If you can’t handle the combination of types you’ve been given, you should return NotImplemented.
This also applies to the in-place arithmetic method __ipow__()
. It doesn’t apply
to any of the other in-place methods (__iadd__()
, etc.) which always
take self as the first argument.
Rich comparisons¶
There are no separate methods for the individual rich comparison operations
(__eq__()
, __le__()
, etc.) Instead there is a single method
__richcmp__()
which takes an integer indicating which operation is to be
performed, as follows:
< | 0 |
== | 2 |
> | 4 |
<= | 1 |
!= | 3 |
>= | 5 |
The __next__()
method¶
Extension types wishing to implement the iterator interface should define a
method called __next__()
, not next. The Python system will automatically
supply a next method which calls your __next__()
. Do NOT explicitly
give your type a next()
method, or bad things could happen.
Special Method Table¶
This table lists all of the special methods together with their parameter and return types. In the table below, a parameter name of self is used to indicate that the parameter has the type that the method belongs to. Other parameters with no type specified in the table are generic Python objects.
You don’t have to declare your method as taking these parameter types. If you declare different types, conversions will be performed as necessary.
General¶
Name | Parameters | Return type | Description |
---|---|---|---|
__cinit__ | self, ... | Basic initialisation (no direct Python equivalent) | |
__init__ | self, ... | Further initialisation | |
__dealloc__ | self | Basic deallocation (no direct Python equivalent) | |
__cmp__ | x, y | int | 3-way comparison |
__richcmp__ | x, y, int op | object | Rich comparison (no direct Python equivalent) |
__str__ | self | object | str(self) |
__repr__ | self | object | repr(self) |
__hash__ | self | int | Hash function |
__call__ | self, ... | object | self(...) |
__iter__ | self | object | Return iterator for sequence |
__getattr__ | self, name | object | Get attribute |
__setattr__ | self, name, val | Set attribute | |
__delattr__ | self, name | Delete attribute |
Arithmetic operators¶
Name | Parameters | Return type | Description |
---|---|---|---|
__add__ | x, y | object | binary + operator |
__sub__ | x, y | object | binary - operator |
__mul__ | x, y | object | * operator |
__div__ | x, y | object | / operator for old-style division |
__floordiv__ | x, y | object | // operator |
__truediv__ | x, y | object | / operator for new-style division |
__mod__ | x, y | object | % operator |
__divmod__ | x, y | object | combined div and mod |
__pow__ | x, y, z | object | ** operator or pow(x, y, z) |
__neg__ | self | object | unary - operator |
__pos__ | self | object | unary + operator |
__abs__ | self | object | absolute value |
__nonzero__ | self | int | convert to boolean |
__invert__ | self | object | ~ operator |
__lshift__ | x, y | object | << operator |
__rshift__ | x, y | object | >> operator |
__and__ | x, y | object | & operator |
__or__ | x, y | object | | operator |
__xor__ | x, y | object | ^ operator |
Numeric conversions¶
Name | Parameters | Return type | Description |
---|---|---|---|
__int__ | self | object | Convert to integer |
__long__ | self | object | Convert to long integer |
__float__ | self | object | Convert to float |
__oct__ | self | object | Convert to octal |
__hex__ | self | object | Convert to hexadecimal |
__index__ (2.5+ only) | self | object | Convert to sequence index |
In-place arithmetic operators¶
Name | Parameters | Return type | Description |
---|---|---|---|
__iadd__ | self, x | object | += operator |
__isub__ | self, x | object | -= operator |
__imul__ | self, x | object | *= operator |
__idiv__ | self, x | object | /= operator for old-style division |
__ifloordiv__ | self, x | object | //= operator |
__itruediv__ | self, x | object | /= operator for new-style division |
__imod__ | self, x | object | %= operator |
__ipow__ | x, y, z | object | **= operator |
__ilshift__ | self, x | object | <<= operator |
__irshift__ | self, x | object | >>= operator |
__iand__ | self, x | object | &= operator |
__ior__ | self, x | object | |= operator |
__ixor__ | self, x | object | ^= operator |
Sequences and mappings¶
Name | Parameters | Return type | Description |
---|---|---|---|
__len__ | self int | len(self) | |
__getitem__ | self, x | object | self[x] |
__setitem__ | self, x, y | self[x] = y | |
__delitem__ | self, x | del self[x] | |
__getslice__ | self, Py_ssize_t i, Py_ssize_t j | object | self[i:j] |
__setslice__ | self, Py_ssize_t i, Py_ssize_t j, x | self[i:j] = x | |
__delslice__ | self, Py_ssize_t i, Py_ssize_t j | del self[i:j] | |
__contains__ | self, x | int | x in self |
Iterators¶
Name | Parameters | Return type | Description |
---|---|---|---|
__next__ | self | object | Get next item (called next in Python) |
Buffer interface [PEP 3118] (no Python equivalents - see note 1)¶
Name | Parameters | Return type | Description |
---|---|---|---|
__getbuffer__ | self, Py_buffer *view, int flags | ||
__releasebuffer__ | self, Py_buffer *view |
Buffer interface [legacy] (no Python equivalents - see note 1)¶
Name | Parameters | Return type | Description |
---|---|---|---|
__getreadbuffer__ | self, Py_ssize_t i, void **p | ||
__getwritebuffer__ | self, Py_ssize_t i, void **p | ||
__getsegcount__ | self, Py_ssize_t *p | ||
__getcharbuffer__ | self, Py_ssize_t i, char **p |
Descriptor objects (see note 2)¶
Name | Parameters | Return type | Description |
---|---|---|---|
__get__ | self, instance, class | object | Get value of attribute |
__set__ | self, instance, value | Set value of attribute | |
__delete__ | self, instance | Delete attribute |
Note
(1) The buffer interface was intended for use by C code and is not directly accessible from Python. It is described in the Python/C API Reference Manual of Python 2.x under sections 6.6 and 10.6. It was superseded by the new PEP 3118 buffer protocol in Python 2.6 and is no longer available in Python 3.
Note
(2) Descriptor objects are part of the support mechanism for new-style Python classes. See the discussion of descriptors in the Python documentation. See also PEP 252, “Making Types Look More Like Classes”, and PEP 253, “Subtyping Built-In Types”.
Sharing Declarations Between Cython Modules¶
This section describes a new set of facilities for making C declarations, functions and extension types in one Cython module available for use in another Cython module. These facilities are closely modelled on the Python import mechanism, and can be thought of as a compile-time version of it.
Definition and Implementation files¶
A Cython module can be split into two parts: a definition file with a .pxd
suffix, containing C declarations that are to be available to other Cython
modules, and an implementation file with a .pyx
suffix, containing
everything else. When a module wants to use something declared in another
module’s definition file, it imports it using the cimport
statement.
A .pxd
file that consists solely of extern declarations does not need
to correspond to an actual .pyx
file or Python module. This can make it a
convenient place to put common declarations, for example declarations of
functions from an external library that one wants to use in several modules.
What a Definition File contains¶
A definition file can contain:
- Any kind of C type declaration.
- extern C function or variable declarations.
- Declarations of C functions defined in the module.
- The definition part of an extension type (see below).
It cannot contain any non-extern C variable declarations.
It cannot contain the implementations of any C or Python functions, or any
Python class definitions, or any executable statements. It is needed when one
wants to access cdef
attributes and methods, or to inherit from
cdef
classes defined in this module.
Note
You don’t need to (and shouldn’t) declare anything in a declaration file public in order to make it available to other Cython modules; its mere presence in a definition file does that. You only need a public declaration if you want to make something available to external C code.
What an Implementation File contains¶
An implementation file can contain any kind of Cython statement, although there
are some restrictions on the implementation part of an extension type if the
corresponding definition file also defines that type (see below).
If one doesn’t need to cimport
anything from this module, then this
is the only file one needs.
The cimport statement¶
The cimport
statement is used in a definition or
implementation file to gain access to names declared in another definition
file. Its syntax exactly parallels that of the normal Python import
statement:
cimport module [, module...]
from module cimport name [as name] [, name [as name] ...]
Here is an example. The file on the left is a definition file which exports a C data type. The file on the right is an implementation file which imports and uses it.
dishes.pxd
:
cdef enum otherstuff:
sausage, eggs, lettuce
cdef struct spamdish:
int oz_of_spam
otherstuff filler
restaurant.pyx
:
cimport dishes
from dishes cimport spamdish
cdef void prepare(spamdish *d):
d.oz_of_spam = 42
d.filler = dishes.sausage
def serve():
cdef spamdish d
prepare(&d)
print "%d oz spam, filler no. %d" % (d.oz_of_spam, d.otherstuff)
It is important to understand that the cimport
statement can only
be used to import C data types, C functions and variables, and extension
types. It cannot be used to import any Python objects, and (with one
exception) it doesn’t imply any Python import at run time. If you want to
refer to any Python names from a module that you have cimported, you will have
to include a regular import statement for it as well.
The exception is that when you use cimport
to import an extension type, its
type object is imported at run time and made available by the name under which
you imported it. Using cimport
to import extension types is covered in more
detail below.
If a .pxd
file changes, any modules that cimport
from it may need to be
recompiled.
Search paths for definition files¶
When you cimport
a module called modulename
, the Cython
compiler searches for a file called modulename.pxd
along the search
path for include files, as specified by -I
command line options.
Also, whenever you compile a file modulename.pyx
, the corresponding
definition file modulename.pxd
is first searched for along the same
path, and if found, it is processed before processing the .pyx
file.
Using cimport to resolve naming conflicts¶
The cimport
mechanism provides a clean and simple way to solve the
problem of wrapping external C functions with Python functions of the same
name. All you need to do is put the extern C declarations into a .pxd
file
for an imaginary module, and cimport
that module. You can then
refer to the C functions by qualifying them with the name of the module.
Here’s an example:
c_lunch.pxd
cdef extern from "lunch.h":
void eject_tomato(float)
lunch.pyx
cimport c_lunch
def eject_tomato(float speed):
c_lunch.eject_tomato(speed)
You don’t need any c_lunch.pyx
file, because the only things defined
in c_lunch.pxd
are extern C entities. There won’t be any actual
c_lunch
module at run time, but that doesn’t matter; the
c_lunch.pxd
file has done its job of providing an additional namespace
at compile time.
Sharing C Functions¶
C functions defined at the top level of a module can be made available via
cimport
by putting headers for them in the .pxd
file, for
example,:
volume.pxd
:
cdef float cube(float)
spammery.pyx
:
from volume cimport cube
def menu(description, size):
print description, ":", cube(size), \
"cubic metres of spam"
menu("Entree", 1)
menu("Main course", 3)
menu("Dessert", 2)
volume.pyx
:
cdef float cube(float x):
return x * x * x
Note
When a module exports a C function in this way, an object appears in the
module dictionary under the function’s name. However, you can’t make use of
this object from Python, nor can you use it from Cython using a normal import
statement; you have to use cimport
.
Sharing Extension Types¶
An extension type can be made available via cimport
by splitting
its definition into two parts, one in a definition file and the other in the
corresponding implementation file.
The definition part of the extension type can only declare C attributes and C methods, not Python methods, and it must declare all of that type’s C attributes and C methods.
The implementation part must implement all of the C methods declared in the definition part, and may not add any further C attributes. It may also define Python methods.
Here is an example of a module which defines and exports an extension type, and another module which uses it.:
# Shrubbing.pxd
cdef class Shrubbery:
cdef int width
cdef int length
# Shrubbing.pyx
cdef class Shrubbery:
def __cinit__(self, int w, int l):
self.width = w
self.length = l
def standard_shrubbery():
return Shrubbery(3, 7)
# Landscaping.pyx
cimport Shrubbing
import Shrubbing
cdef Shrubbing.Shrubbery sh
sh = Shrubbing.standard_shrubbery()
print "Shrubbery size is %d x %d" % (sh.width, sh.height)
Some things to note about this example:
- There is a
cdef
class Shrubbery declaration in bothShrubbing.pxd
andShrubbing.pyx
. When the Shrubbing module is compiled, these two declarations are combined into one. - In Landscaping.pyx, the
cimport
Shrubbing declaration allows us to refer to the Shrubbery type asShrubbing.Shrubbery
. But it doesn’t bind the name Shrubbing in Landscaping’s module namespace at run time, so to accessShrubbing.standard_shrubbery()
we also need toimport Shrubbing
.
Interfacing with External C Code¶
One of the main uses of Cython is wrapping existing libraries of C code. This is achieved by using external declarations to declare the C functions and variables from the library that you want to use.
You can also use public declarations to make C functions and variables defined in a Cython module available to external C code. The need for this is expected to be less frequent, but you might want to do it, for example, if you are embedding Python in another application as a scripting language. Just as a Cython module can be used as a bridge to allow Python code to call C code, it can also be used to allow C code to call Python code.
External declarations¶
By default, C functions and variables declared at the module level are local to the module (i.e. they have the C static storage class). They can also be declared extern to specify that they are defined elsewhere, for example:
cdef extern int spam_counter
cdef extern void order_spam(int tons)
Referencing C header files¶
When you use an extern definition on its own as in the examples above, Cython includes a declaration for it in the generated C file. This can cause problems if the declaration doesn’t exactly match the declaration that will be seen by other C code. If you’re wrapping an existing C library, for example, it’s important that the generated C code is compiled with exactly the same declarations as the rest of the library.
To achieve this, you can tell Cython that the declarations are to be found in a C header file, like this:
cdef extern from "spam.h":
int spam_counter
void order_spam(int tons)
The cdef extern
from clause does three things:
- It directs Cython to place a
#include
statement for the named header file in the generated C code. - It prevents Cython from generating any C code for the declarations found in the associated block.
- It treats all declarations within the block as though they started with
cdef extern
.
It’s important to understand that Cython does not itself read the C header file, so you still need to provide Cython versions of any declarations from it that you use. However, the Cython declarations don’t always have to exactly match the C ones, and in some cases they shouldn’t or can’t. In particular:
Don’t use
const
. Cython doesn’t know anything aboutconst
, so just leave it out. Most of the time this shouldn’t cause any problem, although on rare occasions you might have to use a cast. You can also explicitly declare something like:ctypedef char* const_char_ptr "const char*"
though in most cases this will not be needed.
Warning
A problem with const could arise if you have something like:
cdef extern from "grail.h": char *nun
where grail.h actually contains:
extern const char *nun;
and you do:
cdef void languissement(char *s): #something that doesn't change s ... languissement(nun)
which will cause the C compiler to complain. You can work around it by casting away the constness:
languissement(<char *>nun)
Leave out any platform-specific extensions to C declarations such as
__declspec()
.If the header file declares a big struct and you only want to use a few members, you only need to declare the members you’re interested in. Leaving the rest out doesn’t do any harm, because the C compiler will use the full definition from the header file.
In some cases, you might not need any of the struct’s members, in which case you can just put pass in the body of the struct declaration, e.g.:
cdef extern from "foo.h": struct spam: pass
Note
you can only do this inside a
cdef extern from
block; struct declarations anywhere else must be non-empty.If the header file uses
typedef
names such as :ctype:`word` to refer to platform-dependent flavours of numeric types, you will need a correspondingctypedef
statement, but you don’t need to match the type exactly, just use something of the right general kind (int, float, etc). For example,:ctypedef int word
will work okay whatever the actual size of a :ctype:`word ` is (provided the header file defines it correctly). Conversion to and from Python types, if any, will also be used for this new type.
If the header file uses macros to define constants, translate them into a dummy
enum
declaration.If the header file defines a function using a macro, declare it as though it were an ordinary function, with appropriate argument and result types.
For archaic reasons C uses the keyword
void
to declare a function taking no parameters. In Cython as in Python, simply declare such functions asfoo()
.
A few more tricks and tips:
If you want to include a C header because it’s needed by another header, but don’t want to use any declarations from it, put pass in the extern-from block:
cdef extern from "spam.h": pass
If you want to include some external declarations, but don’t want to specify a header file (because it’s included by some other header that you’ve already included) you can put
*
in place of the header file name:cdef extern from *: ...
Styles of struct, union and enum declaration¶
There are two main ways that structs, unions and enums can be declared in C header files: using a tag name, or using a typedef. There are also some variations based on various combinations of these.
It’s important to make the Cython declarations match the style used in the
header file, so that Cython can emit the right sort of references to the type
in the code it generates. To make this possible, Cython provides two different
syntaxes for declaring a struct, union or enum type. The style introduced
above corresponds to the use of a tag name. To get the other style, you prefix
the declaration with ctypedef
, as illustrated below.
The following table shows the various possible styles that can be found in a
header file, and the corresponding Cython declaration that you should put in
the cdef extern
from block. Struct declarations are used as an example; the
same applies equally to union and enum declarations.
C code | Possibilities for corresponding Cython Code | Comments |
---|---|---|
struct Foo {
...
};
|
cdef struct Foo:
...
|
Cython will refer to the as struct Foo in the generated C code. |
typedef struct {
...
} Foo;
|
ctypedef struct Foo:
...
|
Cython will refer to the type simply as Foo in
the generated C code. |
typedef struct foo {
...
} Foo;
|
cdef struct foo:
...
ctypedef foo Foo #optional
or: ctypedef struct Foo:
...
|
If the C header uses both a tag and a typedef with different names, you can use either form of declaration in Cython (although if you need to forward reference the type, you’ll have to use the first form). |
typedef struct Foo {
...
} Foo;
|
cdef struct Foo:
...
|
If the header uses the same name for the tag and typedef, you
won’t be able to include a ctypedef for it – but then,
it’s not necessary. |
Note that in all the cases below, you refer to the type in Cython code simply
as :ctype:`Foo`, not struct Foo
.
Accessing Python/C API routines¶
One particular use of the cdef extern from
statement is for gaining access to
routines in the Python/C API. For example,:
cdef extern from "Python.h":
object PyString_FromStringAndSize(char *s, Py_ssize_t len)
will allow you to create Python strings containing null bytes.
Special Types¶
Cython predefines the name Py_ssize_t
for use with Python/C API routines. To
make your extensions compatible with 64-bit systems, you should always use
this type where it is specified in the documentation of Python/C API routines.
Windows Calling Conventions¶
The __stdcall
and __cdecl
calling convention specifiers can be used in
Cython, with the same syntax as used by C compilers on Windows, for example,:
cdef extern int __stdcall FrobnicateWindow(long handle)
cdef void (__stdcall *callback)(void *)
If __stdcall
is used, the function is only considered compatible with
other __stdcall
functions of the same signature.
Resolving naming conflicts - C name specifications¶
Each Cython module has a single module-level namespace for both Python and C names. This can be inconvenient if you want to wrap some external C functions and provide the Python user with Python functions of the same names.
Cython provides a couple of different ways of solving this problem. The best way, especially if you have many C functions to wrap, is probably to put the extern C function declarations into a different namespace using the facilities described in the section on sharing declarations between Cython modules.
The other way is to use a C name specification to give different Cython and C
names to the C function. Suppose, for example, that you want to wrap an
external function called eject_tomato()
. If you declare it as:
cdef extern void c_eject_tomato "eject_tomato" (float speed)
then its name inside the Cython module will be c_eject_tomato
, whereas its name
in C will be eject_tomato
. You can then wrap it with:
def eject_tomato(speed):
c_eject_tomato(speed)
so that users of your module can refer to it as eject_tomato
.
Another use for this feature is referring to external names that happen to be Cython keywords. For example, if you want to call an external function called print, you can rename it to something else in your Cython module.
As well as functions, C names can be specified for variables, structs, unions, enums, struct and union members, and enum values. For example,:
cdef extern int one "ein", two "zwei"
cdef extern float three "drei"
cdef struct spam "SPAM":
int i "eye"
cdef enum surprise "inquisition":
first "alpha"
second "beta" = 3
Using Cython Declarations from C¶
Cython provides two methods for making C declarations from a Cython module available for use by external C code—public declarations and C API declarations.
Note
You do not need to use either of these to make declarations from one
Cython module available to another Cython module – you should use the
cimport
statement for that. Sharing Declarations Between Cython Modules.
Public Declarations¶
You can make C types, variables and functions defined in a Cython module accessible to C code that is linked with the module, by declaring them with the public keyword:
cdef public struct Bunny: # public type declaration
int vorpalness
cdef public int spam # public variable declaration
cdef public void grail(Bunny *): # public function declaration
...
If there are any public declarations in a Cython module, a header file called
modulename.h
file is generated containing equivalent C declarations for
inclusion in other C code.
Any C code wanting to make use of these declarations will need to be linked, either statically or dynamically, with the extension module.
If the Cython module resides within a package, then the name of the .h
file consists of the full dotted name of the module, e.g. a module called
foo.spam
would have a header file called foo.spam.h
.
C API Declarations¶
The other way of making declarations available to C code is to declare them
with the api
keyword. You can use this keyword with C functions and
extension types. A header file called modulename_api.h
is produced
containing declarations of the functions and extension types, and a function
called import_modulename()
.
C code wanting to use these functions or extension types needs to include the
header and call the import_modulename()
function. The other functions
can then be called and the extension types used as usual.
Any public C type or extension type declarations in the Cython module are also
made available when you include modulename_api.h
.:
# delorean.pyx
cdef public struct Vehicle:
int speed
float power
cdef api void activate(Vehicle *v):
if v.speed >= 88 and v.power >= 1.21:
print "Time travel achieved"
# marty.c
#include "delorean_api.h"
Vehicle car;
int main(int argc, char *argv[]) {
import_delorean();
car.speed = atoi(argv[1]);
car.power = atof(argv[2]);
activate(&car);
}
Note
Any types defined in the Cython module that are used as argument or return types of the exported functions will need to be declared public, otherwise they won’t be included in the generated header file, and you will get errors when you try to compile a C file that uses the header.
Using the api
method does not require the C code using the
declarations to be linked with the extension module in any way, as the Python
import machinery is used to make the connection dynamically. However, only
functions can be accessed this way, not variables.
You can use both public
and api
on the same function to
make it available by both methods, e.g.:
cdef public api void belt_and_braces():
...
However, note that you should include either modulename.h
or
modulename_api.h
in a given C file, not both, otherwise you may get
conflicting dual definitions.
If the Cython module resides within a package, then:
- The name of the header file contains of the full dotted name of the module.
- The name of the importing function contains the full name with dots replaced by double underscores.
E.g. a module called foo.spam
would have an API header file called
foo.spam_api.h
and an importing function called
import_foo__spam()
.
Multiple public and API declarations¶
You can declare a whole group of items as public
and/or
api
all at once by enclosing them in a cdef
block, for
example,:
cdef public api:
void order_spam(int tons)
char *get_lunch(float tomato_size)
This can be a useful thing to do in a .pxd
file (see
Sharing Declarations Between Cython Modules) to make the module’s public interface
available by all three methods.
Acquiring and Releasing the GIL¶
Cython provides facilities for releasing the Global Interpreter Lock (GIL) before calling C code, and for acquiring the GIL in functions that are to be called back from C code that is executed without the GIL.
Releasing the GIL¶
You can release the GIL around a section of code using the
with nogil
statement:
with nogil:
<code to be executed with the GIL released>
Code in the body of the statement must not manipulate Python objects in any way, and must not call anything that manipulates Python objects without first re-acquiring the GIL. Cython currently does not check this.
Acquiring the GIL¶
A C function that is to be used as a callback from C code that is executed
without the GIL needs to acquire the GIL before it can manipulate Python
objects. This can be done by specifying with gil
in the function
header:
cdef void my_callback(void *data) with gil:
...
Declaring a function as callable without the GIL¶
You can specify nogil
in a C function header or function type to
declare that it is safe to call without the GIL.:
cdef void my_gil_free_func(int spam) nogil:
...
If you are implementing such a function in Cython, it cannot have any Python arguments, Python local variables, or Python return type, and cannot manipulate Python objects in any way or call any function that does so without acquiring the GIL first. Some of these restrictions are currently checked by Cython, but not all. It is possible that more stringent checking will be performed in the future.
Declaring a function with gil
also implicitly makes its signature
nogil
.
Source Files and Compilation¶
Cython source file names consist of the name of the module followed by a
.pyx
extension, for example a module called primes would have a source
file named primes.pyx
.
Once you have written your .pyx
file, there are a couple of ways of turning it
into an extension module. One way is to compile it manually with the Cython
compiler, e.g.:
$ cython primes.pyx
This will produce a file called primes.c
, which then needs to be
compiled with the C compiler using whatever options are appropriate on your
platform for generating an extension module. For these options look at the
official Python documentation.
The other, and probably better, way is to use the distutils
extension
provided with Cython. The benifit of this method is that it will give the
platform specific compilation options, acting like a stripped down autotools.
Basic setup.py¶
The distutils extension provided with Cython allows you to pass .pyx
files
directly to the Extension
constructor in your setup file.
If you have a single Cython file that you want to turn into a compiled
extension, say with filename example.pyx
the associated setup.py
would be:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("example", ["example.pyx"])]
)
To understand the setup.py
more fully look at the official
distutils
documentation. To compile the extension for use in the
current directory use:
$ python setup.py build_ext --inplace
Cython Files Depending on C Files¶
When you have come C files that have been wrapped with cython and you want to
compile them into your extension the basic setup.py
file to do this
would be:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
sourcefiles = ['example.pyx', 'helper.c', 'another_helper.c']
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("example", sourcefiles)]
)
Notice that the files have been given a name, this is not necessary, but it makes the file easier to format if the list gets long.
The Extension
class takes many options, and a fuller explanation can
be found in the distutils documentation. Some useful options to know about
are include_dirs
, libraries
, and library_dirs
which specify where
to find the .h
and library files when linking to external libraries.
Multiple Cython Files in a Package¶
TODO
Distributing Cython modules¶
It is strongly recommended that you distribute the generated .c
files as well
as your Cython sources, so that users can install your module without needing
to have Cython available.
It is also recommended that Cython compilation not be enabled by default in the version you distribute. Even if the user has Cython installed, he probably doesn’t want to use it just to install your module. Also, the version he has may not be the same one you used, and may not compile your sources correctly.
This simply means that the setup.py
file that you ship with will just
be a normal distutils file on the generated .c files, for the basic example
we would have instead:
from distutils.core import setup
from distutils.extension import Extension
setup(
ext_modules = [Extension("example", ["example.c"])]
)
Pyximport¶
Cython is a compiler. Therefore it is natural that people tend to go
through an edit/compile/test cycle with Cython modules. But my personal
opinion is that one of the deep insights in Python’s implementation is
that a language can be compiled (Python modules are compiled to .pyc
)
files and hide that compilation process from the end-user so that they
do not have to worry about it. Pyximport does this for Cython modules.
For instance if you write a Cython module called foo.pyx
, with
Pyximport you can import it in a regular Python module like this:
import pyximport; pyximport.install()
import foo
Doing so will result in the compilation of foo.pyx
(with appropriate
exceptions if it has an error in it).
If you would always like to import Cython files without building them
specially, you can also the first line above to your sitecustomize.py
.
That will install the hook every time you run Python. Then you can use
Cython modules just with simple import statements. I like to test my
Cython modules like this:
$ python -c "import foo"
Dependency Handling¶
In Pyximport 1.1 it is possible to declare that your module depends on
multiple files, (likely .h
and .pxd
files). If your Cython module is
named foo
and thus has the filename foo.pyx
then you should make
another file in the same directory called foo.pyxdep
. The
modname.pyxdep
file can be a list of filenames or “globs” (like
*.pxd
or include/*.h
). Each filename or glob must be on a separate
line. Pyximport will check the file date for each of those files before
deciding whether to rebuild the module. In order to keep track of the
fact that the dependency has been handled, Pyximport updates the
modification time of your ”.pyx” source file. Future versions may do
something more sophisticated like informing distutils of the
dependencies directly.
Limitations¶
Pyximport does not give you any control over how your Cython file is compiled. Usually the defaults are fine. You might run into problems if you wanted to write your program in half-C, half-Cython and build them into a single library. Pyximport 1.2 will probably do this.
Pyximport does not hide the Distutils/GCC warnings and errors generated by the import process. Arguably this will give you better feedback if something went wrong and why. And if nothing went wrong it will give you the warm fuzzy that pyximport really did rebuild your module as it was supposed to.
For further thought and discussion¶
I don’t think that Python’s reload()
will do anything for changed
.so
‘s on some (all?) platforms. It would require some (easy)
experimentation that I haven’t gotten around to. But reload is rarely used in
applications outside of the Python interactive interpreter and certainly not
used much for C extension modules. Info about Windows
http://mail.python.org/pipermail/python-list/2001-July/053798.html
setup.py install
does not modify sitecustomize.py
for you. Should it?
Modifying Python’s “standard interpreter” behaviour may be more than
most people expect of a package they install..
Pyximport puts your .c
file beside your .pyx
file (analogous to
.pyc
beside .py
). But it puts the platform-specific binary in a
build directory as per normal for Distutils. If I could wave a magic
wand and get Cython or distutils or whoever to put the build directory I
might do it but not necessarily: having it at the top level is VERY
HELPFUL for debugging Cython problems.
Using C++ in Cython¶
Overview¶
Cython v0.13 introduces native support for most of the C++ language. This means that the previous tricks that were used to wrap C++ classes (as described in http://wiki.cython.org/WrappingCPlusPlus_ForCython012AndLower) are no longer needed.
Wrapping C++ classes with Cython is now much more straightforward. This document describe in details the new way of wrapping C++ code.
What’s new in Cython v0.13 about C++¶
For users of previous Cython versions, here is a brief overview of the main new features of Cython v0.13 regarding C++ support:
- C++ objects can now be dynamically allocated with
new
anddel
keywords. - C++ objects can now be stack-allocated.
- C++ classes can be declared with the new keyword
cppclass
. - Templated classes are supported.
- Overloaded functions are supported.
- Overloading of C++ operators (such as operator+, operator[],...) is supported.
Procedure Overview¶
The general procedure for wrapping a C++ file can now be described as follow:
- Specify C++ language in
setup.py
script - Create
cdef extern from
blocks with the optional namespace (if exists) and the namespace name as string - Declare classes as
cdef cppclass
blocks - Declare public attributes (variables, methods and constructors)
A simple Tutorial¶
An example C++ API¶
Here is a tiny C++ API which we will use as an example throughout this
document. Let’s assume it will be in a header file called
Rectangle.h
:
namespace shapes {
class Rectangle {
public:
int x0, y0, x1, y1;
Rectangle(int x0, int y0, int x1, int y1);
~Rectangle();
int getLength();
int getHeight();
int getArea();
void move(int dx, int dy);
};
}
and the implementation in the file called Rectangle.cpp
:
#include "Rectangle.h"
Rectangle::Rectangle(int X0, int Y0, int X1, int Y1)
{
x0 = X0;
y0 = Y0;
x1 = X1;
y1 = Y1;
}
Rectangle::~Rectangle()
{
}
int Rectangle::getLength()
{
return (x1 - x0);
}
int Rectangle::getHeight()
{
return (y1 - y0);
}
int Rectangle::getArea()
{
return (x1 - x0) * (y1 - y0);
}
void Rectangle::move(int dx, int dy)
{
x0 += dx;
y0 += dy;
x1 += dx;
y1 += dy;
}
This is pretty dumb, but should suffice to demonstrate the steps involved.
Specify C++ language in setup.py¶
In Cython setup.py
scripts, one normally instantiates an Extension
object. To make Cython generate and compile a C++ source, you just need
to add the keyword language="c++"
to your Extension construction statement, as in:
ext = Extension(
"rectangle", # name of extension
["rectangle.pyx", "Rectangle.cpp"], # filename of our Cython source
language="c++", # this causes Cython to create C++ source
include_dirs=[...], # usual stuff
libraries=["stdc++", ...], # ditto
extra_link_args=[...], # if needed
cmdclass = {'build_ext': build_ext}
)
Cython will generate and compile the rectangle.cpp
file (from the
rectangle.pyx
), then it will compile Rectangle.cpp
(implementation of the Rectangle
class) and link both objects files
together into rectangle.so
, which you can then import in Python using
import rectangle
(if you forget to link the Rectangle.o
, you will
get missing symbols while importing the library in Python).
Alternatively, one can also use the cython
command-line utility to generate a C++ .cpp
file, and then compile it into a python extension. C++ mode for the cython
command is turned on with the --cplus
option.
Declaring a C++ class interface¶
The procedure for wrapping a C++ class is quite similar to that for wrapping
normal C structs, with a couple of additions. Let’s start here by creating the
basic cdef extern from
block:
cdef extern from "Rectangle.h" namespace "shapes":
This will make the C++ class def for Rectangle available. Note the namespace declaration.
Declare class with cdef cppclass¶
Now, let’s add the Rectangle class to this extern from block - just copy the class name from Rectangle.h and adjust for Cython syntax, so now it becomes:
cdef extern from "Rectangle.h" namespace "shapes":
cdef cppclass Rectangle:
Add public attributes¶
We now need to declare the attributes for use on Cython:
cdef extern from "Rectangle.h" namespace "shapes":
cdef cppclass Rectangle:
Rectangle(int, int, int, int)
int x0, y0, x1, y1
int getLength()
int getHeight()
int getArea()
void move(int, int)
Declare a var with the wrapped C++ class¶
Now, we use cdef to declare a var of the class with the C++ new
statement:
cdef Rectangle *rec = new Rectangle(1, 2, 3, 4)
cdef int recLength = rec.getLength()
...
del rec #delete heap allocated object
It’s also possible to declare a stack allocated object, but it’s necessary to have a “default” constructor:
cdef extern from "Foo.h":
cdef cppclass Foo:
Foo()
cdef Foo foo
Note that, like C++, if the class has only one constructor and it is a default one, it’s not necessary to declare it.
Create Cython wrapper class¶
At this point, we have exposed into our pyx file’s namespace the interface of the C++ Rectangle type. Now, we need to make this accessible from external Python code (which is our whole point).
Common programming practice is to create a Cython extension type which
holds a C++ instance pointer as an attribute thisptr
, and create a bunch of
forwarding methods. So we can implement the Python extension type as:
cdef class PyRectangle:
cdef Rectangle *thisptr # hold a C++ instance which we're wrapping
def __cinit__(self, int x0, int y0, int x1, int y1):
self.thisptr = new Rectangle(x0, y0, x1, y1)
def __dealloc__(self):
del self.thisptr
def getLength(self):
return self.thisptr.getLength()
def getHeight(self):
return self.thisptr.getHeight()
def getArea(self):
return self.thisptr.getArea()
def move(self, dx, dy):
self.thisptr.move(dx, dy)
And there we have it. From a Python perspective, this extension type will look and feel just like a natively defined Rectangle class. If you want to give attribute access, you could just implement some properties:
property x0:
def __get__(self): return self.thisptr.x0
def __set__(self, x0): self.thisptr.x0 = x0
...
Advanced C++ features¶
We describe here all the C++ features that were not discussed in the above tutorial.
Overloading¶
Overloading is very simple. Just declare the method with different parameters and use any of them:
cdef extern from "Foo.h":
cdef cppclass Foo:
Foo(int)
Foo(bool)
Foo(int, bool)
Foo(int, int)
Overloading operators¶
Cython uses C++ for overloading operators:
cdef extern from "foo.h":
cdef cppclass Foo:
Foo()
Foo* operator+(Foo*)
Foo* operator-(Foo)
int operator*(Foo*)
int operator/(int)
cdef Foo* foo = new Foo()
cdef int x
cdef Foo* foo2 = foo[0] + foo
foo2 = foo[0] - foo[0]
x = foo[0] * foo2
x = foo[0] / 1
cdef Foo f
foo = f + &f
foo2 = f - f
del foo, foo2
Nested class declarations¶
C++ allows nested class declaration. Class declarations can also be nested in Cython:
cdef extern from "<vector>" namespace "std":
cdef cppclass vector[T]:
cppclass iterator:
T operator*()
iterator operator++()
bint operator==(iterator)
bint operator!=(iterator)
vector()
void push_back(T&)
T& operator[](int)
T& at(int)
iterator begin()
iterator end()
cdef vector[int].iterator iter #iter is declared as being of type vector<int>::iterator
Note that the nested class is declared with a cppclass
but without a cdef
.
C++ operators not compatible with Python syntax¶
Cython try to keep a syntax as close as possible to standard Python. Because of this, certain C++ operators, like the preincrement ++foo
or the dereferencing operator *foo
cannot be used with the same syntax as C++. Cython provides functions replacing these operators in a special module cython.operator
. The functions provided are:
cython.operator.dereference
for dereferencing.dereference(foo)
will produce the C++ code*foo
cython.operator.preincrement
for pre-incrementation.preincrement(foo)
will produce the C++ code++foo
- ...
These functions need to be cimported. Of course, one can use a from ... cimport ... as
to have shorter and more readable functions. For example: from cython.operator cimport dereference as deref
.
Templates¶
Cython uses a bracket syntax for templating. A simple example for wrapping C++ vector:
from cython.operator cimport dereference as deref, preincrement as inc #dereference and increment operators
cdef extern from "<vector>" namespace "std":
cdef cppclass vector[T]:
cppclass iterator:
T operator*()
iterator operator++()
bint operator==(iterator)
bint operator!=(iterator)
vector()
void push_back(T&)
T& operator[](int)
T& at(int)
iterator begin()
iterator end()
cdef vector[int] *v = new vector[int]()
cdef int i
for i in range(10):
v.push_back(i)
cdef vector[int].iterator it = v.begin()
while it != v.end():
print deref(it)
inc(it)
del v
Multiple template parameters can be defined as a list, such as [T, U, V] or [int, bool, char].
Standard library¶
Most of the containers of the C++ Standard Library have been declared in pxd files located in /Cython/Includes/libcpp
. These containers are: deque, list, map, pair, queue, set, stack, vector.
For example:
from libcpp.vector cimport vector
cdef vector[int] vect
cdef int i
for i in range(10):
vect.push_back(i)
for i in range(10):
print vect[i]
The pxd files in /Cython/Includes/libcpp
also work as good examples on how to declare C++ classes.
Exceptions¶
Cython cannot throw C++ exceptions, or catch them with a try-except statement, but it is possible to declare a function as potentially raising an C++ exception and converting it into a Python exception. For example,
cdef extern from "some_file.h":
cdef int foo() except +
This will translate try and the C++ error into an appropriate Python exception (currently an IndexError on std::out_of_range and a RuntimeError otherwise (preserving the what() message).
cdef int bar() except +MemoryError
This will catch any C++ error and raise a Python MemoryError in its place. (Any Python exception is valid here.)
cdef int raise_py_error()
cdef int something_dangerous() except +raise_py_error
If something_dangerous raises a C++ exception then raise_py_error will be called, which allows one to do custom C++ to Python error “translations.” If raise_py_error does not actually raise an exception a RuntimeError will be raised.
Caveats and Limitations¶
Access to C-only functions¶
Whenever generating C++ code, Cython generates declarations of and calls to functions assuming these functions are C++ (ie, not declared as extern “C” {...} . This is ok if the C functions have C++ entry points, but if they’re C only, you will hit a roadblock. If you have a C++ Cython module needing to make calls to pure-C functions, you will need to write a small C++ shim module which:
- includes the needed C headers in an extern “C” block
- contains minimal forwarding functions in C++, each of which calls the respective pure-C function
Inherited C++ methods¶
If you have a class Foo
with a child class Bar
, and Foo
has a
method fred()
, then you’ll have to cast to access this method from
Bar
objects.
For example:
cdef class MyClass:
Bar *b
...
def myfunc(self):
...
b.fred() # wrong, won't work
(<Foo *>(self.b)).fred() # should work, Cython now thinks it's a 'Foo'
It might take some experimenting by others (you?) to find the most elegant ways of handling this issue.
Declaring/Using References¶
Question: How do you declare and call a function that takes a reference as an argument?
C++ left-values¶
C++ allows functions returning a reference to be left-values. This is currently not supported in Cython. cython.operator.dereference(foo)
is also not considered a left-value.
Cython for NumPy users¶
This tutorial is aimed at NumPy users who have no experience with Cython at all. If you have some knowledge of Cython you may want to skip to the ‘’Efficient indexing’’ section which explains the new improvements made in summer 2008.
The main scenario considered is NumPy end-use rather than NumPy/SciPy development. The reason is that Cython is not (yet) able to support functions that are generic with respect to datatype and the number of dimensions in a high-level fashion. This restriction is much more severe for SciPy development than more specific, “end-user” functions. See the last section for more information on this.
The style of this tutorial will not fit everybody, so you can also consider:
- Robert Bradshaw’s slides on cython for SciPy2008 (a higher-level and quicker introduction)
- Basic Cython documentation (see Cython front page).
[:enhancements/buffer:Spec for the efficient indexing]
Note
The fast array access documented below is a completely new feature, and there may be bugs waiting to be discovered. It might be a good idea to do a manual sanity check on the C code Cython generates before using this for serious purposes, at least until some months have passed.
Cython at a glance¶
Cython is a compiler which compiles Python-like code files to C code. Still, ‘’Cython is not a Python to C translator’‘. That is, it doesn’t take your full program and “turns it into C” – rather, the result makes full use of the Python runtime environment. A way of looking at it may be that your code is still Python in that it runs within the Python runtime environment, but rather than compiling to interpreted Python bytecode one compiles to native machine code (but with the addition of extra syntax for easy embedding of faster C-like code).
This has two important consequences:
- Speed. How much depends very much on the program involved though. Typical Python numerical programs would tend to gain very little as most time is spent in lower-level C that is used in a high-level fashion. However for-loop-style programs can gain many orders of magnitude, when typing information is added (and is so made possible as a realistic alternative).
- Easy calling into C code. One of Cython’s purposes is to allow easy wrapping of C libraries. When writing code in Cython you can call into C code as easily as into Python code.
Some Python constructs are not yet supported, though making Cython compile all Python code is a stated goal (among the more important omissions are inner functions and generator functions).
Your Cython environment¶
Using Cython consists of these steps:
- Write a
.pyx
source file - Run the Cython compiler to generate a C file
- Run a C compiler to generate a compiled library
- Run the Python interpreter and ask it to import the module
However there are several options to automate these steps:
- The SAGE mathematics software system provides excellent support for using Cython and NumPy from an interactive command line (like IPython) or through a notebook interface (like Maple/Mathematica). See this documentation.
- A version of pyximport is shipped with Cython, so that you can import pyx-files dynamically into Python and have them compiled automatically (See Pyximport).
- Cython supports distutils so that you can very easily create build scripts which automate the process, this is the preferred method for full programs.
- Manual compilation (see below)
Note
If using another interactive command line environment than SAGE, like IPython or Python itself, it is important that you restart the process when you recompile the module. It is not enough to issue an “import” statement again.
Installation¶
Unless you are used to some other automatic method:
download Cython (0.9.8.1.1 or later), unpack it,
and run the usual `python setup.py install
. This will install a
cython
executable on your system. It is also possible to use Cython from
the source directory without installing (simply launch cython.py
in the
root directory).
As of this writing SAGE comes with an older release of Cython than required for this tutorial. So if using SAGE you should download the newest Cython and then execute
$ cd path/to/cython-distro
$ path-to-sage/sage -python setup.py install
This will install the newest Cython into SAGE.
Manual compilation¶
As it is always important to know what is going on, I’ll describe the manual method here. First Cython is run:
$ cython yourmod.pyx
This creates yourmod.c
which is the C source for a Python extension
module. A useful additional switch is -a
which will generate a document
yourmod.html
) that shows which Cython code translates to which C code
line by line.
Then we compile the C file. This may vary according to your system, but the C file should be built like Python was built. Python documentation for writing extensions should have some details. On Linux this often means something like:
$ gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.5 -o yourmod.so yourmod.c
gcc
should have access to the NumPy C header files so if they are not
installed at /usr/include/numpy
or similar you may need to pass another
option for those.
This creates yourmod.so
in the same directory, which is importable by
Python by using a normal import yourmod
statement.
The first Cython program¶
The code below does 2D discrete convolution of an image with a filter (and I’m
sure you can do better!, let it serve for demonstration purposes). It is both
valid Python and valid Cython code. I’ll refer to it as both
convolve_py.py
for the Python version and convolve1.pyx
for the
Cython version – Cython uses ”.pyx” as its file suffix.
from __future__ import division
import numpy as np
def naive_convolve(f, g):
# f is an image and is indexed by (v, w)
# g is a filter kernel and is indexed by (s, t),
# it needs odd dimensions
# h is the output image and is indexed by (x, y),
# it is not cropped
if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
raise ValueError("Only odd dimensions on filter supported")
# smid and tmid are number of pixels between the center pixel
# and the edge, ie for a 5x5 filter they will be 2.
#
# The output size is calculated by adding smid, tmid to each
# side of the dimensions of the input image.
vmax = f.shape[0]
wmax = f.shape[1]
smax = g.shape[0]
tmax = g.shape[1]
smid = smax // 2
tmid = tmax // 2
xmax = vmax + 2*smid
ymax = wmax + 2*tmid
# Allocate result image.
h = np.zeros([xmax, ymax], dtype=f.dtype)
# Do convolution
for x in range(xmax):
for y in range(ymax):
# Calculate pixel value for h at (x,y). Sum one component
# for each pixel (s, t) of the filter g.
s_from = max(smid - x, -smid)
s_to = min((xmax - x) - smid, smid + 1)
t_from = max(tmid - y, -tmid)
t_to = min((ymax - y) - tmid, tmid + 1)
value = 0
for s in range(s_from, s_to):
for t in range(t_from, t_to):
v = x - smid + s
w = y - tmid + t
value += g[smid - s, tmid - t] * f[v, w]
h[x, y] = value
return h
This should be compiled to produce yourmod.so
(for Linux systems). We
run a Python session to test both the Python version (imported from
.py
-file) and the compiled Cython module.
In [1]: import numpy as np
In [2]: import convolve_py
In [3]: convolve_py.naive_convolve(np.array([[1, 1, 1]], dtype=np.int),
... np.array([[1],[2],[1]], dtype=np.int))
Out [3]:
array([[1, 1, 1],
[2, 2, 2],
[1, 1, 1]])
In [4]: import convolve1
In [4]: convolve1.naive_convolve(np.array([[1, 1, 1]], dtype=np.int),
... np.array([[1],[2],[1]], dtype=np.int))
Out [4]:
array([[1, 1, 1],
[2, 2, 2],
[1, 1, 1]])
In [11]: N = 100
In [12]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
In [13]: g = np.arange(81, dtype=np.int).reshape((9, 9))
In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g)
2 loops, best of 3: 1.86 s per loop
In [20]: %timeit -n2 -r3 convolve1.naive_convolve(f, g)
2 loops, best of 3: 1.41 s per loop
There’s not such a huge difference yet; because the C code still does exactly what the Python interpreter does (meaning, for instance, that a new object is allocated for each number used). Look at the generated html file and see what is needed for even the simplest statements you get the point quickly. We need to give Cython more information; we need to add types.
Adding types¶
To add types we use custom Cython syntax, so we are now breaking Python source
compatibility. Here’s convolve2.pyx
. Read the comments!
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# The builtin min and max functions works with Python objects, and are
# so very slow. So we create our own.
# - "cdef" declares a function which has much less overhead than a normal
# def function (but it is not Python-callable)
# - "inline" is passed on to the C compiler which may inline the functions
# - The C type "int" is chosen as return type and argument types
# - Cython allows some newer Python constructs like "a if x else b", but
# the resulting C file compiles with Python 2.3 through to Python 3.0 beta.
cdef inline int int_max(int a, int b): return a if a >= b else b
cdef inline int int_min(int a, int b): return a if a <= b else b
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
def naive_convolve(np.ndarray f, np.ndarray g):
if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
raise ValueError("Only odd dimensions on filter supported")
assert f.dtype == DTYPE and g.dtype == DTYPE
# The "cdef" keyword is also used within functions to type variables. It
# can only be used at the top indendation level (there are non-trivial
# problems with allowing them in other places, though we'd love to see
# good and thought out proposals for it).
#
# For the indices, the "int" type is used. This corresponds to a C int,
# other C types (like "unsigned int") could have been used instead.
# Purists could use "Py_ssize_t" which is the proper Python type for
# array indices.
cdef int vmax = f.shape[0]
cdef int wmax = f.shape[1]
cdef int smax = g.shape[0]
cdef int tmax = g.shape[1]
cdef int smid = smax // 2
cdef int tmid = tmax // 2
cdef int xmax = vmax + 2*smid
cdef int ymax = wmax + 2*tmid
cdef np.ndarray h = np.zeros([xmax, ymax], dtype=DTYPE)
cdef int x, y, s, t, v, w
# It is very important to type ALL your variables. You do not get any
# warnings if not, only much slower code (they are implicitly typed as
# Python objects).
cdef int s_from, s_to, t_from, t_to
# For the value variable, we want to use the same data type as is
# stored in the array, so we use "DTYPE_t" as defined above.
# NB! An important side-effect of this is that if "value" overflows its
# datatype size, it will simply wrap around like in C, rather than raise
# an error like in Python.
cdef DTYPE_t value
for x in range(xmax):
for y in range(ymax):
s_from = int_max(smid - x, -smid)
s_to = int_min((xmax - x) - smid, smid + 1)
t_from = int_max(tmid - y, -tmid)
t_to = int_min((ymax - y) - tmid, tmid + 1)
value = 0
for s in range(s_from, s_to):
for t in range(t_from, t_to):
v = x - smid + s
w = y - tmid + t
value += g[smid - s, tmid - t] * f[v, w]
h[x, y] = value
return h
At this point, have a look at the generated C code for convolve1.pyx
and
convolve2.pyx
. Click on the lines to expand them and see corresponding C.
(Note that this code annotation is currently experimental and especially
“trailing” cleanup code for a block may stick to the last expression in the
block and make it look worse than it is – use some common sense).
Especially have a look at the for loops: In convolve1.c
, these are ~20 lines
of C code to set up while in convolve2.c
a normal C for loop is used.
After building this and continuing my (very informal) benchmarks, I get:
In [21]: import convolve2
In [22]: %timeit -n2 -r3 convolve2.naive_convolve(f, g)
2 loops, best of 3: 828 ms per loop
Efficient indexing¶
There’s still a bottleneck killing performance, and that is the array lookups
and assignments. The []
-operator still uses full Python operations –
what we would like to do instead is to access the data buffer directly at C
speed.
What we need to do then is to type the contents of the ndarray
objects.
We do this with a special “buffer” syntax which must be told the datatype
(first argument) and number of dimensions (“ndim” keyword-only argument, if
not provided then one-dimensional is assumed).
More information on this syntax [:enhancements/buffer:can be found here].
Showing the changes needed to produce convolve3.pyx
only:
...
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
...
cdef np.ndarray[DTYPE_t, ndim=2] h = ...
Usage:
In [18]: import convolve3
In [19]: %timeit -n3 -r100 convolve3.naive_convolve(f, g)
3 loops, best of 100: 11.6 ms per loop
Note the importance of this change.
Gotcha: This efficient indexing only affects certain index operations,
namely those with exactly ndim
number of typed integer indices. So if
v
for instance isn’t typed, then the lookup f[v, w]
isn’t
optimized. On the other hand this means that you can continue using Python
objects for sophisticated dynamic slicing etc. just as when the array is not
typed.
Tuning indexing further¶
The array lookups are still slowed down by two factors:
Bounds checking is performed.
Negative indices are checked for and handled correctly. The code above is explicitly coded so that it doesn’t use negative indices, and it (hopefully) always access within bounds. We can add a decorator to disable bounds checking:
... cimport cython @cython.boundscheck(False) # turn of bounds-checking for entire function def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): ...
Now bounds checking is not performed (and, as a side-effect, if you ‘’do’’ happen to access out of bounds you will in the best case crash your program and in the worst case corrupt data). It is possible to switch bounds-checking mode in many ways, see [:docs/compilerdirectives:compiler directives] for more information.
Negative indices are dealt with by ensuring Cython that the indices will be
positive, by casting the variables to unsigned integer types (if you do have
negative values, then this casting will create a very large positive value
instead and you will attempt to access out-of-bounds values). Casting is done
with a special <>
-syntax. The code below is changed to use either
unsigned ints or casting as appropriate:
...
cdef int s, t # changed
cdef unsigned int x, y, v, w # changed
cdef int s_from, s_to, t_from, t_to
cdef DTYPE_t value
for x in range(xmax):
for y in range(ymax):
s_from = max(smid - x, -smid)
s_to = min((xmax - x) - smid, smid + 1)
t_from = max(tmid - y, -tmid)
t_to = min((ymax - y) - tmid, tmid + 1)
value = 0
for s in range(s_from, s_to):
for t in range(t_from, t_to):
v = <unsigned int>(x - smid + s) # changed
w = <unsigned int>(y - tmid + t) # changed
value += g[<unsigned int>(smid - s), <unsigned int>(tmid - t)] * f[v, w] # changed
h[x, y] = value
...
(In the next Cython release we will likely add a compiler directive or
argument to the np.ndarray[]
-type specifier to disable negative indexing
so that casting so much isn’t necessary; feedback on this is welcome.)
The function call overhead now starts to play a role, so we compare the latter two examples with larger N:
In [11]: %timeit -n3 -r100 convolve4.naive_convolve(f, g)
3 loops, best of 100: 5.97 ms per loop
In [12]: N = 1000
In [13]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
In [14]: g = np.arange(81, dtype=np.int).reshape((9, 9))
In [17]: %timeit -n1 -r10 convolve3.naive_convolve(f, g)
1 loops, best of 10: 1.16 s per loop
In [18]: %timeit -n1 -r10 convolve4.naive_convolve(f, g)
1 loops, best of 10: 597 ms per loop
(Also this is a mixed benchmark as the result array is allocated within the function call.)
Warning
Speed comes with some cost. Especially it can be dangerous to set typed
objects (like f
, g
and h
in our sample code) to None
.
Setting such objects to None
is entirely legal, but all you can do with them
is check whether they are None. All other use (attribute lookup or indexing)
can potentially segfault or corrupt data (rather than raising exceptions as
they would in Python).
The actual rules are a bit more complicated but the main message is clear: Do not use typed objects without knowing that they are not set to None.
More generic code¶
It would be possible to do:
def naive_convolve(object[DTYPE_t, ndim=2] f, ...):
i.e. use object
rather than np.ndarray
. Under Python 3.0 this
can allow your algorithm to work with any libraries supporting the buffer
interface; and support for e.g. the Python Imaging Library may easily be added
if someone is interested also under Python 2.x.
There is some speed penalty to this though (as one makes more assumptions
compile-time if the type is set to np.ndarray
, specifically it is
assumed that the data is stored in pure strided more and not in indirect
mode).
[:enhancements/buffer:More information]
The future¶
These are some points to consider for further development. All points listed here has gone through a lot of thinking and planning already; still they may or may not happen depending on available developer time and resources for Cython.
- Support for efficient access to structs/records stored in arrays; currently only primitive types are allowed.
- Support for efficient access to complex floating point types in arrays. The main obstacle here is getting support for efficient complex datatypes in Cython.
- Calling NumPy/SciPy functions currently has a Python call overhead; it would be possible to take a short-cut from Cython directly to C. (This does however require some isolated and incremental changes to those libraries; mail the Cython mailing list for details).
- Efficient code that is generic with respect to the number of dimensions.
This can probably be done today by calling the NumPy C multi-dimensional
iterator API directly; however it would be nice to have for-loops over
enumerate()
andndenumerate()
on NumPy arrays create efficient code. - A high-level construct for writing type-generic code, so that one can write functions that work simultaneously with many datatypes. Note however that a macro preprocessor language can help with doing this for now.
Limitations¶
Unsupported Python Features¶
One of our goals is to make Cython as compatible as possible with standard Python. This page lists the things that work in Python but not in Cython. As Cython matures, the items in this list should go away.
Generators and generator expressions¶
The yield keyword is not yet supported. This is work in progress.
Since Cython 0.13, some generator expressions are supported when they
can be transformed into inlined loops in combination with builtins,
e.g. sum(x*2 for x in seq)
. As of 0.14, the supported builtins
are list()
, set()
, dict()
, sum()
, any()
,
all()
, sorted()
.
Other Current Limitations¶
- The
globals()
builtin returns the last Python callers globals, not the current function’s locals. This behavior should not be relied upon, as it will probably change in the future. - The :fun:`locals` builtin can only be used if all local variables can be converted to Python objects, and returns a dict.
- Class and function definitions cannot be placed inside control structures.
Semantic differences between Python and Cython¶
Behaviour of class scopes¶
In Python, referring to a method of a class inside the class definition, i.e.
while the class is being defined, yields a plain function object, but in
Cython it yields an unbound method [1]. A consequence of this is that the
usual idiom for using the classmethod()
and staticmethod()
functions,
e.g.:
class Spam:
def method(cls):
...
method = classmethod(method)
will not work in Cython. This can be worked around by defining the function
outside the class, and then assigning the result of classmethod
or
staticmethod
inside the class, i.e.:
def Spam_method(cls):
...
class Spam:
method = classmethod(Spam_method)
This will change in the near future.
Footnotes
[1] | The reason for the different behaviour of class scopes is that
Cython-defined Python functions are PyCFunction objects, not
PyFunction objects, and are not recognised by the machinery that creates a
bound or unbound method when a function is extracted from a class. To get
around this, Cython wraps each method in an unbound method object itself
before storing it in the class’s dictionary. |
Differences between Cython and Pyrex¶
Warning
Both Cython and Pyrex are moving targets. It has come to the point that an explicit list of all the differences between the two projects would be laborious to list and track, but hopefully this high-level list gives an idea of the differences that are present. It should be noted that both projects make an effort at mutual compatibility, but Cython’s goal is to be as close to and complete as Python as reasonable.
Python 3.0 Support¶
Cython creates .c
files that can be built and used with both
Python 2.x and Python 3.x. In fact, compiling your module with
Cython may very well be the easiest way to port code to Python 3.0.
We are also working to make the compiler run in both Python 2.x and 3.0.
Many Python 3 constructs are already supported by Cython.
List/Set/Dict Comprehensions¶
Cython supports the different comprehensions defined by Python 3.0 for lists, sets and dicts:
[expr(x) for x in A] # list
{expr(x) for x in A} # set
{key(x) : value(x) for x in A} # dict
Looping is optimized if A
is a list, tuple or dict. You can use
the for
... from
syntax, too, but it is
generally preferred to use the usual for
... in
range(...)
syntax with a C run variable (e.g. cdef int i
).
Note
Note that Cython also supports set literals starting from Python 2.3.
Keyword-only arguments¶
Python functions can have keyword-only arguments listed after the *
parameter and before the **
parameter if any, e.g.:
def f(a, b, *args, c, d = 42, e, **kwds):
...
Here c
, d
and e
cannot be passed as position arguments and must be
passed as keyword arguments. Furthermore, c
and e
are required keyword
arguments, since they do not have a default value.
If the parameter name after the *
is omitted, the function will not accept any
extra positional arguments, e.g.:
def g(a, b, *, c, d):
...
takes exactly two positional parameters and has two required keyword parameters.
Conditional expressions “x if b else y” (python 2.5)¶
Conditional expressions as described in http://www.python.org/dev/peps/pep-0308/:
X if C else Y
Only one of X
and Y
is evaluated, (depending on the value of C).
cdef inline¶
Module level functions can now be declared inline, with the inline
keyword passed on to the C compiler. These can be as fast as macros.:
cdef inline int something_fast(int a, int b):
return a*a + b
Note that class-level cdef
functions are handled via a virtual
function table, so the compiler won’t be able to inline them in almost all
cases.
Assignment on declaration (e.g. “cdef int spam = 5”)¶
In Pyrex, one must write:
cdef int i, j, k
i = 2
j = 5
k = 7
Now, with cython, one can write:
cdef int i = 2, j = 5, k = 7
The expression on the right hand side can be arbitrarily complicated, e.g.:
cdef int n = python_call(foo(x,y), a + b + c) - 32
‘by’ expression in for loop (e.g. “for i from 0 <= i < 10 by 2”)¶
for i from 0 <= i < 10 by 2:
print i
yields:
0
2
4
6
8
Note
Boolean int type (e.g. it acts like a c int, but coerces to/from python as a boolean)¶
In C, ints are used for truth values. In python, any object can be used as a
truth value (using the __nonzero__()
method, but the canonical choices
are the two boolean objects True
and False
. The bint
of
“boolean int” object is compiled to a C int, but get coerced to and from
Cython as booleans. The return type of comparisons and several builtins is a
:ctype:`bint` as well. This allows one to avoid having to wrap things in
bool()
. For example, one can write:
def is_equal(x):
return x == y
which would return 1
or 0
in Pyrex, but returns True
or False
in
python. One can declare variables and return values for functions to be of the
:ctype:`bint` type. For example:
cdef int i = x
cdef bint b = x
The first conversion would happen via x.__int__()
whereas the second would
happen via x.__nonzero__()
. (Actually, if x
is the python object
True
or False
then no method call is made.)
Executable class bodies¶
Including a working classmethod()
:
cdef class Blah:
def some_method(self):
print self
some_method = classmethod(some_method)
a = 2*3
print "hi", a
cpdef functions¶
Cython adds a third function type on top of the usual def
and
cdef
. If a function is declared cpdef
it can be called
from and overridden by both extension and normal python subclasses. You can
essentially think of a cpdef
method as a cdef
method +
some extras. (That’s how it’s implemented at least.) First, it creates a
def
method that does nothing but call the underlying
cdef
method (and does argument unpacking/coercion if needed). At
the top of the cdef
method a little bit of code is added to check
to see if it’s overridden. Specifically, in pseudocode:
if type(self) has a __dict__:
foo = self.getattr('foo')
if foo is not wrapper_foo:
return foo(args)
[cdef method body]
To detect whether or not a type has a dictionary, it just checks the
tp_dictoffset slot, which is NULL
(by default) for extension types, but
non- null for instance classes. If the dictionary exists, it does a single
attribute lookup and can tell (by comparing pointers) whether or not the
returned result is actually a new function. If, and only if, it is a new
function, then the arguments packed into a tuple and the method called. This
is all very fast. A flag is set so this lookup does not occur if one calls the
method on the class directly, e.g.:
cdef class A:
cpdef foo(self):
pass
x = A()
x.foo() # will check to see if overridden
A.foo(x) # will call A's implementation whether overridden or not
See Early Binding for Speed for explanation and usage tips.
Automatic range conversion¶
This will convert statements of the form for i in range(...)
to for i
from ...
when i
is any cdef’d integer type, and the direction (i.e. sign
of step) can be determined.
Warning
This may change the semantics if the range causes
assignment to i
to overflow. Specifically, if this option is set, an error
will be raised before the loop is entered, whereas without this option the loop
will execute until a overflowing value is encountered. If this effects you
change Cython/Compiler/Options.py
(eventually there will be a better
way to set this).
More friendly type casting¶
In Pyrex, if one types <int>x
where x
is a Python object, one will get
the memory address of x
. Likewise, if one types <object>i
where i
is a C int, one will get an “object” at location i
in memory. This leads
to confusing results and segfaults.
In Cython <type>x
will try and do a coercion (as would happen on assignment of
x
to a variable of type type) if exactly one of the types is a python object.
It does not stop one from casting where there is no conversion (though it will
emit a warning). If one really wants the address, cast to a void *
first.
As in Pyrex <MyExtensionType>x
will cast x
to type :ctype:`MyExtensionType` without any
type checking. Cython supports the syntax <MyExtensionType?>
to do the cast
with type checking (i.e. it will throw an error if x
is not a (subclass of)
:ctype:`MyExtensionType`.
Optional arguments in cdef/cpdef functions¶
Cython now supports optional arguments for cdef
and
cpdef
functions.
The syntax in the .pyx
file remains as in Python, but one declares such
functions in the .pxd
file by writing cdef foo(x=*)
. The number of
arguments may increase on subclassing, but the argument types and order must
remain the same. There is a slight performance penalty in some cases when a
cdef/cpdef function without any optional is overridden with one that does have
default argument values.
For example, one can have the .pxd
file:
cdef class A:
cdef foo(self)
cdef class B(A)
cdef foo(self, x=*)
cdef class C(B):
cpdef foo(self, x=*, int k=*)
with corresponding .pyx
file:
cdef class A:
cdef foo(self):
print "A"
cdef class B(A)
cdef foo(self, x=None)
print "B", x
cdef class C(B):
cpdef foo(self, x=True, int k=3)
print "C", x, k
Note
this also demonstrates how cpdef
functions can override
cdef
functions.
Function pointers in structs¶
Functions declared in structs
are automatically converted to
function pointers for convenience.
C++ Exception handling¶
cdef
functions can now be declared as:
cdef int foo(...) except +
cdef int foo(...) except +TypeError
cdef int foo(...) except +python_error_raising_function
in which case a Python exception will be raised when a C++ error is caught. See Using C++ in Cython for more details.
Synonyms¶
cdef import from
means the same thing as cdef extern from
Source code encoding¶
Cython supports PEP 3120 and PEP 263, i.e. you can start your Cython source
file with an encoding comment and generally write your source code in UTF-8.
This impacts the encoding of byte strings and the conversion of unicode string
literals like u'abcd'
to unicode objects.
Automatic typecheck
¶
Rather than introducing a new keyword typecheck
as explained in the
Pyrex docs,
Cython emits a (non-spoofable and faster) typecheck whenever
isinstance()
is used with an extension type as the second parameter.
From __future__ directives¶
Cython supports several from __future__ directives, namely unicode_literals
and division
.
With statements are always enabled.
Pure Python mode¶
Cython has support for compiling .py
files, and
accepting type annotations using decorators and other
valid Python syntax. This allows the same source to
be interpreted as straight Python, or compiled for
optimized results.
See http://wiki.cython.org/pure
for more details.
Early Binding for Speed¶
As a dynamic language, Python encourages a programming style of considering classes and objects in terms of their methods and attributes, more than where they fit into the class hierarchy.
This can make Python a very relaxed and comfortable language for rapid development, but with a price - the ‘red tape’ of managing data types is dumped onto the interpreter. At run time, the interpreter does a lot of work searching namespaces, fetching attributes and parsing argument and keyword tuples. This run-time ‘late binding’ is a major cause of Python’s relative slowness compared to ‘early binding’ languages such as C++.
However with Cython it is possible to gain significant speed-ups through the use of ‘early binding’ programming techniques.
For example, consider the following (silly) code example:
cdef class Rectangle:
cdef int x0, y0
cdef int x1, y1
def __init__(self, int x0, int y0, int x1, int y1):
self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
def area(self):
area = (self.x1 - self.x0) * (self.y1 - self.y0)
if area < 0:
area = -area
return area
def rectArea(x0, y0, x1, y1):
rect = Rectangle(x0, y0, x1, y1)
return rect.area()
In the rectArea()
method, the call to rect.area()
and the
area()
method contain a lot of Python overhead.
However, in Cython, it is possible to eliminate a lot of this overhead in cases where calls occur within Cython code. For example:
cdef class Rectangle:
cdef int x0, y0
cdef int x1, y1
def __init__(self, int x0, int y0, int x1, int y1):
self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
cdef int _area(self):
int area
area = (self.x1 - self.x0) * (self.y1 - self.y0)
if area < 0:
area = -area
return area
def area(self):
return self._area()
def rectArea(x0, y0, x1, y1):
cdef Rectangle rect
rect = Rectangle(x0, y0, x1, y1)
return rect._area()
Here, in the Rectangle extension class, we have defined two different area
calculation methods, the efficient _area()
C method, and the
Python-callable area()
method which serves as a thin wrapper around
_area()
. Note also in the function rectArea()
how we ‘early bind’
by declaring the local variable rect
which is explicitly given the type
Rectangle. By using this declaration, instead of just dynamically assigning to
rect
, we gain the ability to access the much more efficient C-callable
_rect()
method.
But Cython offers us more simplicity again, by allowing us to declare dual-access methods - methods that can be efficiently called at C level, but can also be accessed from pure Python code at the cost of the Python access overheads. Consider this code:
cdef class Rectangle:
cdef int x0, y0
cdef int x1, y1
def __init__(self, int x0, int y0, int x1, int y1):
self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
cpdef int area(self):
int area
area = (self.x1 - self.x0) * (self.y1 - self.y0)
if area < 0:
area = -area
return area
def rectArea(x0, y0, x1, y1):
cdef Rectangle rect
rect = Rectangle(x0, y0, x1, y1)
return rect.area()
Note
in earlier versions of Cython, the cpdef
keyword is
rdef
- but has the same effect).
Here, we just have a single area method, declared as cpdef
to make it
efficiently callable as a C function, but still accessible from pure Python
(or late-binding Cython) code.
If within Cython code, we have a variable already ‘early-bound’ (ie, declared explicitly as type Rectangle, (or cast to type Rectangle), then invoking its area method will use the efficient C code path and skip the Python overhead. But if in Pyrex or regular Python code we have a regular object variable storing a Rectangle object, then invoking the area method will require:
- an attribute lookup for the area method
- packing a tuple for arguments and a dict for keywords (both empty in this case)
- using the Python API to call the method
and within the area method itself:
- parsing the tuple and keywords
- executing the calculation code
- converting the result to a python object and returning it
So within Cython, it is possible to achieve massive optimisations by using strong typing in declaration and casting of variables. For tight loops which use method calls, and where these methods are pure C, the difference can be huge.
Debugging your Cython program¶
Cython comes with an extension for the GNU Debugger that helps users debug Cython code. To use this functionality, you will need to install gdb 7.2 or higher, built with Python support (linked to Python 2.5 or higher). The debugger supports debuggees with versions 2.6 and higher. For Python 3, code should be built with Python 3 and the debugger should be run with Python 2 (or at least it should be able to find the Python 2 Cython installation).
The debugger will need debug information that the Cython compiler can export.
This can be achieved from within the setup
script by passing pyrex_gdb=True
to your Cython Extenion class:
from Cython.Distutils import extension
ext = extension.Extension('source', 'source.pyx', pyrex_gdb=True)
setup(..., ext_modules=[ext)]
With this approach debug information can be enabled on a per-module basis.
Another (easier) way is to simply pass the --pyrex-gdb
flag as a command
line argument:
python setup.py build_ext --pyrex-gdb
For development it’s often easy to use the --inplace
flag also, which makes
distutils build your project “in place”, i.e., not in a separate build
directory.
When invoking Cython from the command line directly you can have it write
debug information using the --gdb
flag:
cython --gdb myfile.pyx
Note
The debugger is newly part of Cython 0.14 and as such is still experimental. CC markflorisson88@gmail.com in your TRAC tickets or mailing list complaints.
Running the Debugger¶
To run the Cython debugger and have it import the debug information exported
by Cython, run cygdb
in the build directory:
$ python setup.py build_ext --pyrex-gdb --inplace
$ cygdb
GNU gdb (GDB) 7.2
...
(gdb)
When using the Cython debugger, it’s preferable that you build and run your code
with an interpreter that is compiled with debugging symbols (i.e. configured
with --with-pydebug
or compiled with the -g
CFLAG). If your Python is
installed and managed by your package manager you probably need to install debug
support separately, e.g. for ubuntu:
$ sudo apt-get install python-dbg
$ python-dbg setup.py build_ext --pyrex-gdb --inplace
Then you need to run your script with python-dbg
also.
You can also pass additional arguments to gdb:
$ cygdb /path/to/build/directory/ GDBARGS
i.e.:
$ cygdb . --args python-dbg mainscript.py
To tell cygdb not to import any debug information, supply --
as the first
argument:
$ cygdb --
Using the Debugger¶
The Cython debugger comes with a set of commands that support breakpoints, stack inspection, source code listing, stepping, stepping over, etc. Most of these commands are analogous to their respective gdb command.
-
cy break breakpoints...
Break in a Python, Cython or C function. First it will look for a Cython function with that name, if cygdb doesn’t know about a function (or method) with that name, it will set a (pending) C breakpoint. The
-p
option can be used to specify a Python breakpoint.Breakpoints can be set for either the function or method name, or they can be fully “qualified”, which means that the entire “path” to a function is given:
(gdb) cy break cython_function_or_method (gdb) cy break packagename.modulename.cythonfunction (gdb) cy break packagename.modulename.ClassName.cythonmethod (gdb) cy break c_function
You can also break on Cython line numbers:
(gdb) cy break packagename.modulename:14 (gdb) cy break :14
Python breakpoints currently support names of the module (not the entire package path) and the function or method:
(gdb) cy break -p pythonmodule.python_function_or_method (gdb) cy break -p python_function_or_method
Note
Python breakpoints only work in Python builds where the Python frame information can be read from the debugger. To ensure this, use a Python debug build or a non-stripped build compiled with debug support.
-
cy step
Step through Python, Cython or C code. Python, Cython and C functions called directly from Cython code are considered relevant and will be stepped into.
-
cy next
Step over Python, Cython or C code.
-
cy run
Run the program. The default interpreter is the interpreter that was used to build your extensions with, or the interpreter
cygdb
is run with in case the “don’t import debug information” option was in effect. The interpreter can be overridden using gdb’sfile
command.
-
cy cont
Continue the program.
-
cy up
-
cy down
Go up and down the stack to what is considered a relevant frame.
-
cy finish
Execute until an upward relevant frame is met or something halts execution.
-
cy bt
-
cy backtrace
Print a traceback of all frames considered relevant. The
-a
option makes it print the full traceback (all C frames).
-
cy select
Select a stack frame by number as listed by
cy backtrace
. This command is introduced becausecy backtrace
prints a reversed stack trace, so frame numbers differ from gdb’sbt
.
-
cy print varname
Print a local or global Cython, Python or C variable (depending on the context). Variables may also be dereferenced:
(gdb) cy print x x = 1 (gdb) cy print *x *x = (PyObject) { _ob_next = 0x93efd8, _ob_prev = 0x93ef88, ob_refcnt = 65, ob_type = 0x83a3e0 }
-
cy list
List the source code surrounding the current line.
-
cy locals
-
cy globals
Print all the local and global variables and their values.
-
cy import FILE...
Import debug information from files given as arguments. The easiest way to import debug information is to use the cygdb command line tool.
-
cy exec code
Execute code in the current Python or Cython frame. This works like Python’s interactive interpreter.
For Python frames it uses the globals and locals from the Python frame, for Cython frames it uses the dict of globals used on the Cython module and a new dict filled with the local Cython variables.
Note
cy exec
modifies state and executes code in the debuggee and is
therefore potentially dangerous.
Example:
(gdb) cy exec x + 1
2
(gdb) cy exec import sys; print sys.version_info
(2, 6, 5, 'final', 0)
(gdb) cy exec
>global foo
>
>foo = 'something'
>end
Convenience functions¶
The following functions are gdb functions, which means they can be used in a gdb expression.
-
cy_cname
(varname)¶ Returns the C variable name of a Cython variable. For global variables this may not be actually valid.
-
cy_cvalue
(varname)¶ Returns the value of a Cython variable.
-
cy_lineno
()¶ Returns the current line number in the selected Cython frame.
Example:
(gdb) print $cy_cname("x")
$1 = "__pyx_v_x"
(gdb) watch $cy_cvalue("x")
Hardware watchpoint 13: $cy_cvalue("x")
(gdb) print $cy_lineno()
$2 = 12
Configuring the Debugger¶
A few aspects of the debugger are configurable with gdb parameters. For instance, colors can be disabled, the terminal background color and breakpoint autocompletion can be configured.
-
cy_complete_unqualified
¶ Tells the Cython debugger whether
cy break
should also complete plain function names, i.e. not prefixed by their module name. E.g. if you have a function namedspam
, in moduleM
, it tells whether to only completeM.spam
or also justspam
.The default is true.
-
cy_colorize_code
¶ Tells the debugger whether to colorize source code. The default is true.
-
cy_terminal_background_color
¶ Tells the debugger about the terminal background color, which affects source code coloring. The default is “dark”, another valid option is “light”.
This is how these parameters can be used:
(gdb) set cy_complete_unqualified off
(gdb) set cy_terminal_background_color light
(gdb) show cy_colorize_code
Reference Guide¶
Note
Todo
Most of the boldface is to be changed to refs or other markup later.
Contents:
Compilation¶
- Cython code, unlike Python, must be compiled.
- This happens in two stages:
- A
.pyx
file is compiles by Cython to a.c
file.- The
.c
file is compiled by a C comiler to a.so
file (or a.pyd
file on Windows)
- The following sub-sections describe several ways to build your extension modules.
Note
The -a
option
- Using the Cython compiler with the
-a
option will produce a really nice HTML file of the Cython generated.c
code. - Double clicking on the highlighted sections will expand the code to reveal what Cython has actually generated for you.
- This is very useful for understanding, optimizing or debugging your module.
From the Command Line¶
Run the Cython compiler command with your options and list of
.pyx
files to generate:$ cython -a yourmod.pyx
This creates a
yourmod.c
file. (and the -a switch produces a generated html file)Compiling your
.c
files will vary depending on your operating system.
- Python documentation for writing extension modules should have some details for your system.
Here we give an example on a Linux system:
$ gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.5 -o yourmod.so yourmod.c
gcc
will need to have paths to your included header files and paths to libraries you need to link with.- A
yourmod.so
file is now in the same directory.- Your module,
yourmod
is available for you to import as you normally would.
Distutils¶
Ensure Distutils is installed in your system.
The following assumes a Cython file to be compiled called hello.pyx.
Create a
setup.py
script:from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext ext_modules = [Extension("hello", ["hello.pyx"])] setup( name = ’Hello world app’, cmdclass = {’build_ext’: build_ext}, ext_modules = ext_modules )
Run the command
python setup.py build_ext --inplace
in your system’s command shell.Your done.. import your new extension module into your python shell or script as normal.
SCons¶
to be completed...
Pyximport¶
For generating Cython code right in your pure python modulce:
>>> import pyximport; pyximport.install() >>> import helloworld Hello World
Use for simple Cython builds only.
- No extra C libraries.
- No special build setup needed.
- Also has experimental compilation support for normal Python modules.
- Allows you to automatically run Cython on every
.pyx
and.py
module that Python imports.
- This includes the standard library and installed packages.
- In the case that Cython fails to compile a Python module, pyximport will fall back to loading the source modules instead.
The
.py
import mechanism is installed like this:>>> pyximport.install(pyimport = True)
Note
Authors
Paul Prescod, Stefan Behnal
Sage¶
The Sage notebook allows transparently editing and compiling Cython code simply by typing %cython at the top of a cell and evaluate it. Variables and func- tions defined in a Cython cell imported into the run- ning session.
Todo
Provide a link to Sage docs
Language Basics¶
Cython File Types¶
There are three file types in cython:
- Implementation files carry a
.pyx
suffix - Definition files carry a
.pxd
suffix - Include files which carry a
.pxi
suffix
Implementation File¶
What can it contain?¶
- Basically anything Cythonic, but see below.
What can’t it contain?¶
- There are some restrictions when it comes to extension types, if the extension type is already defined else where... more on this later
Definition File¶
What can it contain?¶
- Any kind of C type declaration.
extern
C function or variable decarations.- Declarations for module implementations.
- The definition parts of extension types.
- All declarations of functions, etc., for an external library
What can’t it contain?¶
- Any non-extern C variable declaration.
- Implementations of C or Python functions.
- Python class definitions
- Python executable statements.
- Any declaration that is defined as public to make it accessible to other Cython modules.
- This is not necessary, as it is automatic.
- a public declaration is only needed to make it accessible to external C code.
What else?¶
- Use the cimport statement, as you would Python’s import statement, to access these files from other definition or implementation files.
- cimport does not need to be called in
.pyx
file for for.pxd
file that has the same name, as they are already in the same namespace. - For cimport to find the stated definition file, the path to the file must be appended to the
-I
option of the cython compile command.
- When a
.pyx
file is to be compiled, cython first checks to see if a corresponding.pxd
file exits and processes it first.
Include File¶
What can it contain?¶
- Any Cythonic code really, because the entire file is textually embedded at the location you prescribe.
How do I use it?¶
- Include the
.pxi
file with aninclude
statement like:include "spamstuff.pxi
- The
include
statement can appear anywhere in your cython file and at any indentation level - The code in the
.pxi
file needs to be rooted at the “zero” indentation level. - The included code can itself contain other
include
statements.
Declaring Data Types¶
As a dynamic language, Python encourages a programming style of considering classes and objects in terms of their methods and attributes, more than where they fit into the class hierarchy.
This can make Python a very relaxed and comfortable language for rapid development, but with a price - the ‘red tape’ of managing data types is dumped onto the interpreter. At run time, the interpreter does a lot of work searching namespaces, fetching attributes and parsing argument and keyword tuples. This run-time ‘late binding’ is a major cause of Python’s relative slowness compared to ‘early binding’ languages such as C++.
However with Cython it is possible to gain significant speed-ups through the use of ‘early binding’ programming techniques.
Note
Typing is not a necessity
Providing static typing to parameters and variables is convenience to speed up your code, but it is not a necessity. Optimize where and when needed.
The cdef Statement¶
The cdef
statement is used to make C level declarations for:
Variables: |
---|
cdef int i, j, k
cdef float f, g[42], *h
Structs: |
---|
cdef struct Grail:
int age
float volume
Unions: |
---|
cdef union Food:
char *spam
float *eggs
Enums: |
---|
cdef enum CheeseType:
cheddar, edam,
camembert
cdef enum CheeseState:
hard = 1
soft = 2
runny = 3
Funtions: |
---|
cdef int eggs(unsigned long l, float f):
...
Extenstion Types: | |
---|---|
cdef class Spam:
...
Note
Constants
Constants can be defined by using an anonymous enum:
cdef enum:
tons_of_spam = 3
Grouping cdef Declarations¶
A series of declarations can grouped into a cdef
block:
cdef:
struct Spam:
int tons
int i
float f
Spam *p
void f(Spam *s):
print s.tons, "Tons of spam"
Note
ctypedef statement
The ctypedef
statement is provided for naming types:
ctypedef unsigned long ULong
ctypedef int *IntPtr
Parameters¶
Both C and Python function types can be declared to have parameters C data types.
Use normal C declaration syntax:
def spam(int i, char *s): ... cdef int eggs(unsigned long l, float f): ...
As these parameters are passed into a Python declared function, they are magically converted to the specified C type value.
- This holds true for only numeric and string types
- If no type is specified for a parameter or a return value, it is assumed to be a Python object
The following takes two Python objects as parameters and returns a Python object:
cdef spamobjs(x, y): ...Note
–
This is different then C language behavior, where it is an int by default.
- Python object types have reference counting performed according to the standard Python C-API rules:
- Borrowed references are taken as parameters
- New references are returned
Todo
link or label here the one ref count caveat for numpy.
- The name
object
can be used to explicitly declare something as a Python Object.
For sake of code clarity, it recomened to always use
object
explicitly in your code.This is also useful for cases where the name being declared would otherwise be taken for a type:
cdef foo(object int): ...As a return type:
cdef object foo(object int): ...
Todo
Do a see also here ..??
Optional Arguments¶
- Are supported for
cdef
andcpdef
functions - There differences though whether you declare them in a
.pyx
file or a.pxd
file
When in a
.pyx
file, the signature is the same as it is in Python itself:cdef class A: cdef foo(self): print "A" cdef class B(A) cdef foo(self, x=None) print "B", x cdef class C(B): cpdef foo(self, x=True, int k=3) print "C", x, kWhen in a
.pxd
file, the signature is different like this example:cdef foo(x=*)
:cdef class A: cdef foo(self) cdef class B(A) cdef foo(self, x=*) cdef class C(B): cpdef foo(self, x=*, int k=*)
- The number of arguments may increase when subclassing, but the arg types and order must be the same.
- There may be a slight performance penalty when the optional arg is overridden with one that does not have default values.
Keyword-only Arguments¶
As in Python 3,
def
functions can have keyword-only argurments listed after a"*"
parameter and before a"**"
parameter if any:def f(a, b, *args, c, d = 42, e, **kwds): ...
- Shown above, the
c
,d
ande
arguments can not be passed as positional arguments and must be passed as keyword arguments.- Furthermore,
c
ande
are required keyword arguments since they do not have a default value.
If the parameter name after the
"*"
is omitted, the function will not accept any extra positional argumrents:def g(a, b, *, c, d): ...
- Shown above, the signature takes exactly two positional parameters and has two required keyword parameters
Automatic Type Conversion¶
For basic numeric and string types, in most situations, when a Python object is used in the context of a C value and vice versa.
The following table summarises the conversion possibilities, assuming
sizeof(int) == sizeof(long)
:C types From Python types To Python types [unsigned] char int, long int [unsigned] short int, long unsigned int int, long long unsigned long [unsigned] long long float, double, long double int, long, float float char * str/bytes str/bytes [1] struct dict
Note
Python String in a C Context
A Python string, passed to C context expecting a
char*
, is only valid as long as the Python string exists.A reference to the Python string must be kept around for as long as the C string is needed.
If this can’t be guarenteed, then make a copy of the C string.
Cython may produce an error message:
Obtaining char* from a temporary Python value
and will not resume compiling in situations like this:cdef char *s s = pystring1 + pystring2
The reason is that concatenating to strings in Python produces a temporary variable.
- The variable is decrefed, and the Python string deallocated as soon as the statement has finished,
- Therefore the lvalue ``s`` is left dangling.
The solution is to assign the result of the concatenation to a Python variable, and then obtain the
char*
from that:cdef char *s p = pystring1 + pystring2 s = p
Note
It is up to you to be aware of this, and not to depend on Cython’s error message, as it is not guarenteed to be generated for every situation.
Type Casting¶
- The syntax used in type casting are
"<"
and">"
Note
The syntax is different from C convention
cdef char *p, float *q p = <char*>q
- If one of the types is a python object for
<type>x
, Cython will try and do a coersion.
Note
Cython will not stop a casting where there is no conversion, but it will emit a warning.
- If the address is what is wanted, cast to a
void*
first.
Type Checking¶
- A cast like
<MyExtensionType>x
will cast x to typeMyExtensionType
without type checking at all. - To have a cast type checked, use the syntax like:
<MyExtenstionType?>x
.
- In this case, Cython will throw an error if
"x"
is not a (subclass) ofMyExtenstionClass
- Automatic type checking for extension types can be obtained by whenever
isinstance()
is used as the second parameter
Python Objects¶
Statements and Expressions¶
- For the most part, control structures and expressions follow Python syntax.
- When applied to Python objects, the semantics are the same unless otherwise noted.
- Most Python operators can be applied to C values with the obvious semantics.
- An expression with mixed Python and C values will have conversions performed automatically.
- Python operations are automatically checked for errors, with the appropriate action taken.
Differences Between Cython and C¶
- Most notable are C constructs which have no direct equivalent in Python.
- An integer literal is treated as a C constant
It will be truncated to whatever size your C compiler thinks appropriate.
Cast to a Python object like this:
<object>10000000000000000000The
"L"
,"LL"
and the"U"
suffixes have the same meaning as in C
- There is no
->
operator in Cython.. instead ofp->x
, usep.x
. - There is no
*
operator in Cython.. instead of*p
, usep[0]
. &
is permissible and has the same semantics as in C.NULL
is the null C pointer.
- Do NOT use 0.
NULL
is a reserved word in Cython
- Syntax for Type casts are
<type>value
.
Scope Rules¶
- All determination of scoping (local, module, built-in) in Cython is determined statically.
- As with Python, a variable assignment which is not declared explicitly is implicitly declared to be a Python variable residing in the scope where it was assigned.
Note
- Module-level scope behaves the same way as a Python local scope if you refer to the variable before assigning to it.
Tricks, like the following will NOT work in Cython:
try: x = True except NameError: True = 1The above example will not work because
True
will always be looked up in the module-level scope. Do the following instead:import __builtin__ try: True = __builtin__.True except AttributeError: True = 1
Operator Precedence¶
- Cython uses Python precedence order, not C
For-loops¶
range()
is C optimized when the index value has been declared bycdef
:cdef i for i in range(n): ...
The other form available in C is the for-from style
The target expression must be a variable name.
The name between the lower and upper bounds must be the same as the target name.
- for i from 0 <= i < n:
...
Or when using a step size:
for i from 0 <= i < n by s: ...To reverse the direction, reverse the conditional operation:
for i from 0 >= i > n: ...
- The
break
andcontinue
are permissible. - Can contain an else clause.
Functions and Methods¶
- There are three types of function declarations in Cython as the sub-sections show below.
- Only “Python” functions can be called outside a Cython module from Python interpretted code.
Callable from Python¶
- Are decalared with the
def
statement - Are called with Python objects
- Return Python objects
- See Parameters for special consideration
Callable from C¶
- Are declared with the
cdef
statement. - Are called with either Python objects or C values.
- Can return either Python objects or C values.
Callable from both Python and C¶
- Are declared with the
cpdef
statement. - Can be called from anywhere, because it uses a little Cython magic.
- Uses the faster C calling conventions when being called from other Cython code.
Overriding¶
cpdef
functions can override cdef
functions:
cdef class A:
cdef foo(self):
print "A"
cdef class B(A)
cdef foo(self, x=None)
print "B", x
cdef class C(B):
cpdef foo(self, x=True, int k=3)
print "C", x, k
Function Pointers¶
- Functions declared in a
struct
are automatically converted to function pointers. - see using exceptions with function pointers
Python Built-ins¶
The following are provided:
Todo
incomplete
Function and arguments | Return type | Python/C API Equivalent |
---|---|---|
abs(obj) | object | PyNumber_Absolute |
bool(obj) | object | Py_True, Py_False |
chr(obj) | object | char |
delattr(obj, name) | int | PyObject_DelAttr |
dir(obj) getattr(obj, name) (Note 1) getattr3(obj, name, default) | object | PyObject_Dir |
hasattr(obj, name) | int | PyObject_HasAttr |
hash(obj) | int | PyObject_Hash |
intern(obj) | object | PyObject_InternFromString |
isinstance(obj, type) | int | PyObject_IsInstance |
issubclass(obj, type) | int | PyObject_IsSubclass |
iter(obj) | object | PyObject_GetIter |
len(obj) | Py_ssize_t | PyObject_Length |
pow(x, y, z) (Note 2) | object | PyNumber_Power |
reload(obj) | object | PyImport_ReloadModule |
repr(obj) | object | PyObject_Repr |
setattr(obj, name) | void | PyObject_SetAttr |
Error and Exception Handling¶
- A plain
cdef
declared function, that does not return a Python object...
- Has no way of reporting a Python exception to it’s caller.
- Will only print a warning message and the exception is ignored.
- Inorder to propagate exceptions like this to it’s caller, you need to declare an exception value for it.
- There are three forms of declaring an exception for a C compiled program.
First:
cdef int spam() except -1: ...
- In the example above, if an error occurs inside spam, it will immediately return with the value of
-1
, causing an exception to be propagated to it’s caller.- Functions declared with an exception value, should explicitly prevent a return of that value.
Second:
cdef int spam() except? -1: ...
- Used when a
-1
may possibly be returned and is not to be considered an error.- The
"?"
tells Cython that-1
only indicates a possible error.- Now, each time
-1
is returned, Cython generates a call toPyErr_Occurrd
to verify it is an actual error.
Third:
cdef int spam() except *
A call to
PyErr_Occurred
happens every time the function gets called.Note
Returning
void
A need to propagate errors when returning
void
must use this version.
- Exception values can only be declared for functions returning an..
- integer
- enum
- float
- pointer type
- Must be a constant expression
Note
Note
Function pointers
Require the same exception value specification as it’s user has declared.
Use cases here are when used as parameters and when assigned to a variable:
int (*grail)(int, char *) except -1
Note
Python Objects
- Declared exception values are not need.
- Remember that Cython assumes that a function function without a declared return value, returns a Python object.
- Exceptions on such functions are implicitly propagated by returning
NULL
Note
C++
- For exceptions from C++ compiled programs, see Wrapping C++ Classes
Checking return values for non-Cython functions..¶
Do not try to raise exceptions by returning the specified value.. Example:
cdef extern FILE *fopen(char *filename, char *mode) except NULL # WRONG!
- The except clause does not work that way.
- It’s only purpose is to propagate Python exceptions that have already been raised by either...
- A Cython function
- A C function that calls Python/C API routines.
To propagate an exception for these circumstances you need to raise it yourself:
cdef FILE *p p = fopen("spam.txt", "r") if p == NULL: raise SpamError("Couldn't open the spam file")
Conditional Compilation¶
- The expressions in the following sub-sections must be valid compile-time expressions.
- They can evaluate to any Python value.
- The truth of the result is determined in the usual Python way.
Compile-Time Definitions¶
Defined using the
DEF
statement:DEF FavouriteFood = "spam" DEF ArraySize = 42 DEF OtherArraySize = 2 * ArraySize + 17
The right hand side must be a valid compile-time expression made up of either:
- Literal values
- Names defined by other
DEF
statements
- They can be combined using any of the Python expression syntax
- Cython provides the following pre-defined names
- Corresponding to the values returned by
os.uname()
- UNAME_SYSNAME
- UNAME_NODENAME
- UNAME_RELEASE
- UNAME_VERSION
- UNAME_MACHINE
- A name defined by
DEF
can appear anywhere an identifier can appear. - Cython replaces the name with the literal value before compilation.
The compile-time expression, in this case, must eveluate to a Python value of
int
,long
,float
, orstr
:cdef int a1[ArraySize] cdef int a2[OtherArraySize] print "I like", FavouriteFood
Conditional Statements¶
- Similiar semantics of the C pre-processor
- The following statements can be used to conditinally include or exclude sections of code to compile.
IF
ELIF
ELSE
IF UNAME_SYSNAME == "Windows":
include "icky_definitions.pxi"
ELIF UNAME_SYSNAME == "Darwin":
include "nice_definitions.pxi"
ELIF UNAME_SYSNAME == "Linux":
include "penguin_definitions.pxi"
ELSE:
include "other_definitions.pxi"
ELIF
andELSE
are optional.IF
can appear anywhere that a normal statement or declaration can appear- It can contain any statements or declarations that would be valid in that context.
- This includes other
IF
andDEF
statements
[1] | The conversion is to/from str for Python 2.x, and bytes for Python 3.x. |
Extension Types¶
- Normal Python as well as extension type classes can be defined.
- Extension types:
- Are considered by Python as “built-in” types.
- Can be used to wrap arbitrary C-data structures, and provide a Python-like interface to them from Python.
- Attributes and methods can be called from Python or Cython code
- Are defined by the
cdef class
statement.
cdef class Shrubbery:
cdef int width, height
def __init__(self, w, h):
self.width = w
self.height = h
def describe(self):
print "This shrubbery is", self.width, \
"by", self.height, "cubits."
Attributes¶
- Are stored directly in the object’s C struct.
- Are fixed at compile time.
- You can’t add attributes to an extension type instance at run time like in normal Python.
- You can sub-class the extenstion type in Python to add attributes at run-time.
- There are two ways to access extension type attributes:
- By Python look-up.
- Python code’s only method of access.
- By direct access to the C struct from Cython code.
- Cython code can use either method of access, though.
- By default, extension type attributes are:
- Only accessible by direct access.
- Not accessible from Python code.
To make attributes accessible to Python, they must be declared
public
orreadonly
:cdef class Shrubbery: cdef public int width, height cdef readonly float depth
- The
width
andheight
attributes are readable and writable from Python code.- The
depth
attribute is readable but not writable.
Note
Note
You can only expose simple C types, such as ints, floats, and strings, for Python access. You can also expose Python-valued attributes.
Note
The public
and readonly
options apply only to Python access, not direct access. All the attributes of an extension type are always readable and writable by C-level access.
Methods¶
self
is used in extension type methods just like it normally is in Python.- See Functions and Methods; all of which applies here.
Properties¶
Cython provides a special syntax:
cdef class Spam: property cheese: "A doc string can go here." def __get__(self): # This is called when the property is read. ... def __set__(self, value): # This is called when the property is written. ... def __del__(self): # This is called when the property is deleted.
The
__get__()
,__set__()
, and__del__()
methods are all optional.
- If they are ommitted, An exception is raised when an access attempt is made.
- Below, is a full example that defines a property which can..
- Add to a list each time it is written to (
"__set__"
).- Return the list when it is read (
"__get__"
).- Empty the list when it is deleted (
"__del__"
).
# cheesy.pyx
cdef class CheeseShop:
cdef object cheeses
def __cinit__(self):
self.cheeses = []
property cheese:
def __get__(self):
return "We don't have: %s" % self.cheeses
def __set__(self, value):
self.cheeses.append(value)
def __del__(self):
del self.cheeses[:]
# Test input
from cheesy import CheeseShop
shop = CheeseShop()
print shop.cheese
shop.cheese = "camembert"
print shop.cheese
shop.cheese = "cheddar"
print shop.cheese
del shop.cheese
print shop.cheese
# Test output
We don't have: []
We don't have: ['camembert']
We don't have: ['camembert', 'cheddar']
We don't have: []
Special Methods¶
Note
- The semantics of Cython’s special methods are similar in principle to that of Python’s.
- There are substantial differences in some behavior.
- Some Cython special methods have no Python counter-part.
- See the Special Methods Table for the many that are available.
Declaration¶
- Must be declared with
def
and cannot be declared withcdef
. - Performance is not affected by the
def
declaration because of special calling conventions
Docstrings¶
- Docstrings are not supported yet for some special method types.
- They can be included in the source, but may not appear in the corresponding
__doc__
attribute at run-time.
- This a Python library limitation because the
PyTypeObject
data structure is limited
Initialization: __cinit__()
and __init__()
¶
- Any arguments passed to the extension type’s constructor, will be passed to both initialization methods.
__cinit__()
is where you should perform C-level initialization of the object
- This includes any allocation of C data structures.
- Caution is warranted as to what you do in this method.
- The object may not be fully valid Python object when it is called.
- Calling Python objects, including the extensions own methods, may be hazardous.
- By the time
__cinit__()
is called...
- Memory has been allocated for the object.
- All C-level attributes have been initialized to 0 or null.
- Python have been initialized to
None
, but you can not rely on that for each occasion.- This initialization method is guaranteed to be called exactly once.
- For Extensions types that inherit a base type:
- The
__cinit__()
method of the base type is automatically called before this one.- The inherited
__cinit__()
method can not be called explicitly.- Passing modified argument lists to the base type must be done through
__init__()
.- It may be wise to give the
__cinit__()
method both"*"
and"**"
arguments.
- Allows the method to accept or ignore additional arguments.
- Eliminates the need for a Python level sub-class, that changes the
__init__()
method’s signature, to have to override both the__new__()
and__init__()
methods.
- If
__cinit__()
is declared to take no arguments exceptself
, it will ignore any extra arguments passed to the constructor without complaining about a signature mis-match
__init__()
is for higher-level initialization and is safer for Python access.
- By the time this method is called, the extension type is a fully valid Python object.
- All operations are safe.
- This method may sometimes be called more than once, or possibly not at all.
- Take this into consideration to make sure the design of your other methods are robust of this fact.
Finalization: __dealloc__()
¶
- This method is the counter-part to
__cinit__()
. - Any C-data that was explicitly allocated in the
__cinit__()
method should be freed here. - Use caution in this method:
- The Python object to which this method belongs may not be completely intact at this point.
- Avoid invoking any Python operations that may touch the object.
- Don’t call any of this object’s methods.
- It’s best to just deallocate C-data structures here.
- All Python attributes of your extension type object are deallocated by Cython after the
__dealloc__()
method returns.
Arithmetic Methods¶
Note
Most of these methods behave differently than in Python
- There are not “reversed” versions of these methods... there is no __radd__() for instance.
- If the first operand cannot perform the operation, the same method of the second operand is called, with the operands in the same order.
- Do not rely on the first parameter of these methods, being
"self"
or the right type. - The types of both operands should be tested before deciding what to do.
- Return
NotImplemented
for unhandled, mis-matched operand types. - The previously mentioned points..
- Also apply to ‘in-place’ method
__ipow__()
.- Do not apply to other ‘in-place’ methods like
__iadd__()
, in that these always takeself
as the first argument.
Rich Comparisons¶
Note
There are no separate methods for individual rich comparison operations.
A single special method called
__richcmp__()
replaces all the individual rich compare, special method types.__richcmp__()
takes an integer argument, indicating which operation is to be performed as shown in the table below.< 0 == 2 > 4 <= 1 != 3 >= 5
The __next__()
Method¶
- Extension types used to expose an iterator interface should define a
__next__()
method. - Do not explicitly supply a
next()
method, because Python does that for you automatically.
Subclassing¶
An extension type may inherit from a built-in type or another extension type:
cdef class Parrot: ... cdef class Norwegian(Parrot): ...
A complete definition of the base type must be available to Cython
- If the base type is a built-in type, it must have been previously declared as an
extern
extension type.cimport
can be used to import the base type, if the extern declared base type is in a.pxd
definition file.- In Cython, multiple inheritance is not permitted.. singlular inheritance only
- Cython extenstion types can also be sub-classed in Python.
- Here multiple inhertance is permissible as is normal for Python.
- Even multiple extension types may be inherited, but C-layout of all the base classes must be compatible.
Forward Declarations¶
Extension types can be “forward-declared”.
This is necessary when two extension types refer to each other:
cdef class Shrubbery # forward declaration cdef class Shrubber: cdef Shrubbery work_in_progress cdef class Shrubbery: cdef Shrubber creator
An extension type that has a base-class, requires that both forward-declarations be specified:
cdef class A(B) ... cdef class A(B): # attributes and methods
Extension Types and None¶
- Parameters and C-variables declared as an Extension type, may take the value of
None
. - This is analogous to the way a C-pointer can take the value of
NULL
.
Note
- Exercise caution when using
None
- Read this section carefully.
- There is no problem as long as you are performing Python operations on it.
- This is because full dynamic type checking is applied
- When accessing an extension type’s C-attributes, make sure it is not
None
.
- Cython does not check this for reasons of efficency.
Be very aware of exposing Python functions that take extension types as arguments:
def widen_shrubbery(Shrubbery sh, extra_width): # This is sh.width = sh.width + extra_width * Users could **crash** the program by passing ``None`` for the ``sh`` parameter. * This could be avoided by:: def widen_shrubbery(Shrubbery sh, extra_width): if sh is None: raise TypeError sh.width = sh.width + extra_width * Cython provides a more convenient way with a ``not None`` clause:: def widen_shrubbery(Shrubbery sh not None, extra_width): sh.width = sh.width + extra_width * Now this function automatically checks that ``sh`` is not ``None``, as well as that is the right type.
not None
can only be used in Python functions (declared withdef
notcdef
).For
cdef
functions, you will have to provide the check yourself.The
self
parameter of an extension type is guaranteed to never beNone
.When comparing a value
x
withNone
, andx
is a Python object, note the following:
x is None
andx is not None
are very efficient.
- They translate directly to C-pointer comparisons.
x == None
andx != None
orif x: ...
(a boolean condition), will invoke Python operations and will therefore be much slower.
Weak Referencing¶
By default, weak references are not supported.
It can be enabled by declaring a C attribute of the
object
type called__weakref__()
:cdef class ExplodingAnimal: """This animal will self-destruct when it is no longer strongly referenced.""" cdef object __weakref__
External and Public Types¶
Public¶
- When an extention type is declared
public
, Cython will generate a C-header (”.h”) file. - The header file will contain the declarations for it’s object-struct and it’s type-object.
- External C-code can now access the attributes of the extension type.
External¶
- An
extern
extension type allows you to gain access to the internals of:
- Python objects defined in the Python core.
- Non-Cython extension modules
The following example lets you get at the C-level members of Python’s built-in “complex” object:
cdef extern from "complexobject.h": struct Py_complex: double real double imag ctypedef class __builtin__.complex [object PyComplexObject]: cdef Py_complex cval # A function which uses the above type def spam(complex c): print "Real:", c.cval.real print "Imag:", c.cval.imag
Note
Some important things in the example:
#. ctypedef
has been used because because Python’s header file has the struct decalared with:
ctypedef struct {
...
} PyComplexObject;
- The module of where this type object can be found is specified along side the name of the extension type. See Implicit Importing.
- When declaring an external extension type...
- Don’t declare any methods, because they are Python method class the are not needed.
- Similiar to structs and unions, extension classes declared inside a
cdef extern from
block only need to declare the C members which you will actually need to access in your module.
Name Specification Clause¶
Note
Only available to public and extern extension types.
Example:
[object object_struct_name, type type_object_name ]
object_struct_name
is the name to assume for the type’s C-struct.type_object_name
is the name to assume for the type’s statically declared type-object.The object and type clauses can be written in any order.
For
cdef extern from
declarations, This clause is required.
- The object clause is required because Cython must generate code that is compatible with the declarations in the header file.
- Otherwise the object clause is optional.
- For public extension types, both the object and type clauses are required for Cython to generate code that is compatible with external C-code.
Type Names vs. Constructor Names¶
- In a Cython module, the name of an extension type serves two distinct purposes:
- When used in an expression, it refers to a “module-level” global variable holding the type’s constructor (i.e. it’s type-object)
- It can also be used as a C-type name to declare a “type” for variables, arguments, and return values.
Example:
cdef extern class MyModule.Spam: ...
- The name “Spam” serves both of these roles.
- Only “Spam” can be used as the type-name.
- The constructor can be referred to by other names.
- Upon an explicit import of “MyModule”...
MyModule.Spam()
could be used as the constructor call.MyModule.Spam
could not be used as a type-name
When an “as” clause is used, the name specified takes over both roles:
cdef extern class MyModule.Spam as Yummy: ...
Yummy
becomes both type-name and a name for the constructor.- There other ways of course, to get hold of the constructor, but
Yummy
is the only usable type-name.
Special Mention¶
Limitations¶
Compiler Directives¶
TODO. See http://wiki.cython.org/enhancements/compilerdirectives
Indices and tables¶
Special Methods Table¶
This table lists all of the special methods together with their parameter and return types. In the table below, a parameter name of self is used to indicate that the parameter has the type that the method belongs to. Other parameters with no type specified in the table are generic Python objects.
You don’t have to declare your method as taking these parameter types. If you declare different types, conversions will be performed as necessary.
General¶
Name | Parameters | Return type | Description |
---|---|---|---|
__cinit__ | self, ... | Basic initialisation (no direct Python equivalent) | |
__init__ | self, ... | Further initialisation | |
__dealloc__ | self | Basic deallocation (no direct Python equivalent) | |
__cmp__ | x, y | int | 3-way comparison |
__richcmp__ | x, y, int op | object | Rich comparison (no direct Python equivalent) |
__str__ | self | object | str(self) |
__repr__ | self | object | repr(self) |
__hash__ | self | int | Hash function |
__call__ | self, ... | object | self(...) |
__iter__ | self | object | Return iterator for sequence |
__getattr__ | self, name | object | Get attribute |
__setattr__ | self, name, val | Set attribute | |
__delattr__ | self, name | Delete attribute |
Arithmetic operators¶
Name | Parameters | Return type | Description |
---|---|---|---|
__add__ | x, y | object | binary + operator |
__sub__ | x, y | object | binary - operator |
__mul__ | x, y | object | * operator |
__div__ | x, y | object | / operator for old-style division |
__floordiv__ | x, y | object | // operator |
__truediv__ | x, y | object | / operator for new-style division |
__mod__ | x, y | object | % operator |
__divmod__ | x, y | object | combined div and mod |
__pow__ | x, y, z | object | ** operator or pow(x, y, z) |
__neg__ | self | object | unary - operator |
__pos__ | self | object | unary + operator |
__abs__ | self | object | absolute value |
__nonzero__ | self | int | convert to boolean |
__invert__ | self | object | ~ operator |
__lshift__ | x, y | object | << operator |
__rshift__ | x, y | object | >> operator |
__and__ | x, y | object | & operator |
__or__ | x, y | object | | operator |
__xor__ | x, y | object | ^ operator |
Numeric conversions¶
Name | Parameters | Return type | Description |
---|---|---|---|
__int__ | self | object | Convert to integer |
__long__ | self | object | Convert to long integer |
__float__ | self | object | Convert to float |
__oct__ | self | object | Convert to octal |
__hex__ | self | object | Convert to hexadecimal |
__index__ (2.5+ only) | self | object | Convert to sequence index |
In-place arithmetic operators¶
Name | Parameters | Return type | Description |
---|---|---|---|
__iadd__ | self, x | object | += operator |
__isub__ | self, x | object | -= operator |
__imul__ | self, x | object | *= operator |
__idiv__ | self, x | object | /= operator for old-style division |
__ifloordiv__ | self, x | object | //= operator |
__itruediv__ | self, x | object | /= operator for new-style division |
__imod__ | self, x | object | %= operator |
__ipow__ | x, y, z | object | **= operator |
__ilshift__ | self, x | object | <<= operator |
__irshift__ | self, x | object | >>= operator |
__iand__ | self, x | object | &= operator |
__ior__ | self, x | object | |= operator |
__ixor__ | self, x | object | ^= operator |
Sequences and mappings¶
Name | Parameters | Return type | Description |
---|---|---|---|
__len__ | self int | len(self) | |
__getitem__ | self, x | object | self[x] |
__setitem__ | self, x, y | self[x] = y | |
__delitem__ | self, x | del self[x] | |
__getslice__ | self, Py_ssize_t i, Py_ssize_t j | object | self[i:j] |
__setslice__ | self, Py_ssize_t i, Py_ssize_t j, x | self[i:j] = x | |
__delslice__ | self, Py_ssize_t i, Py_ssize_t j | del self[i:j] | |
__contains__ | self, x | int | x in self |
Iterators¶
Name | Parameters | Return type | Description |
---|---|---|---|
__next__ | self | object | Get next item (called next in Python) |
Buffer interface¶
Note
The buffer interface is intended for use by C code and is not directly accessible from Python. It is described in the Python/C API Reference Manual under sections 6.6 and 10.6.
Name | Parameters | Return type | Description |
---|---|---|---|
__getreadbuffer__ | self, int i, void **p | ||
__getwritebuffer__ | self, int i, void **p | ||
__getsegcount__ | self, int *p | ||
__getcharbuffer__ | self, int i, char **p |
Descriptor objects¶
Note
Descriptor objects are part of the support mechanism for new-style Python classes. See the discussion of descriptors in the Python documentation. See also PEP 252, “Making Types Look More Like Classes”, and PEP 253, “Subtyping Built-In Types”.
Name | Parameters | Return type | Description |
---|---|---|---|
__get__ | self, instance, class | object | Get value of attribute |
__set__ | self, instance, value | Set value of attribute | |
__delete__ | self, instance | Delete attribute |