.. ========================================================== .. Application-level Stackless features .. ========================================================== ======================================== アプリケーションレベルのスタックレス機能 ======================================== はじめに ======== PyPy でも `Stackless Python`_ と同様に **大規模並行処理** 用のコードを書くための機能が提供されています。 (**再起の深さ制限** がありますが、その制限を解除する方法もあります) この機能は continulet_ というカスタムプリミティブの呼び出しが基本となります。 continulet_ は直接アプリケーションコードとして利用したり、ユーザフレンドリなインターフェースを (アプリケーションレベルで) 書くことができます。 現在 PyPy は greenlets_ を continulet 上に実装しています。 同様に `Stackless Python`_ に習って、 tasklets や channels の実装も簡単です。 continulet はとても軽量なため、PyPy はそれを大量に含むプログラムを実行することができます。 しかし、実装の制限により、 ``--gcrootfinder=shadowstack`` オプションでコンパイルすると continulet 毎に物理メモリを 1 ページ (4KB) 以上消費し、 32 ビットでは仮想メモリの半分、 64 ビットではすべての仮想メモリを消費します。 また、(現状) x86 と x86-64 CPU でのみ利用可能です; 他の CPU については、 `pypy/translator/c/src/stacklet/`_ にそのアセンブラを追加する必要があります。 理論 ==== 基本的な考え方は、どんな時点でもスタック内のフレームの 1 つが (スレッドから / マルチスレッドから) 実行されるということです。 スタックを参照するために、一番上のフレームから一番下のフレームまで ``f_back`` でつなげます。 ある時点のフレームには、もう一つのフレームの参照 ``f_back`` を (一番下でない場合) 持っており、また他のフレームが自分への参照 ``f_back`` を (一番上でない場合) 持っています。 continulet の理論は文字通り "OK な状態" を定義することです。 トリックは単一のスタックの場合より、 "OK な状態" が複雑な場合があるということです: 常に単一のスタックを保持しますが、それとは別に一連の ``f_back`` をサイクル内で実行するための複数のフレームのサイクルを保持しています。 しかし、このサイクルは完全に分離されていることに注意する必要があります: 一番上のフレーム (現在実行されているフレーム) は一つであり、それは他のフレームの ``f_back`` から参照されておらず、下位フレームが終わるまでは常にスタックの最上部にあり、外部のサイクルには属しません。 このようなサイクルはどのように作成するのでしょうか? この基本的な動作は 2 つのフレームを選択し、その ``f_back`` の順序を入れ替えることです。 "OK な状態" の原則を犯すことなく、どんな ``f_back`` の順序でも入れ替えることができます。 例えば ``f`` というフレームがスタックの下部にあり、その ``f_back`` と一番上のフレームの ``f_back`` の順序を入れ替えます。 そして、通常のスタック内のフレームを削除し、それを 1 つの独立したサイクルに変換します。 再び同じ順序で並び替えることで、最初の状況に戻します。 PyPy では、任意のフレームの ``f_back`` を変更できるわけではなく、 ``continulets`` 内のフレームのみ変更できます。 continulets は内部的に stacklets_ で実装されています。 stacklets はさらに原始的 (one-shot continuation: 継続が一度だけ起動される) で、 Python ではなく C で動作します。 continulets の基本的な考え方は、どんな時点でもすべての有効なスタックを保持するということです。 例えば、正しく例外 (およびトレースバック) を伝達するためにこれが重要になります。 アプリケーションレベルインターフェース ====================================== .. _continulet: continulet ++++++++++ トランスレートされた PyPy には、デフォルトで ``continulet`` タイプをエクスポートする ``_continuation`` というモジュールが含まれています。 このモジュールからエクスポートされた ``continulet`` オブジェクトは "one-shot continuation" 用のコンテナです。 これはスタックにフレームを挿入したり、 ``f_back`` を変更する役割を持っています。 continulet オブジェクトを作成するためには、 ``continulet()`` に、呼び出し可能オブジェクト (callable) とその引数を指定して呼び出します。 そして、continulet での最初の ``switch()`` の際に、 その continulet から指定された引数で callable が実行されます。 その際、 ``switch()`` の呼び出し元の参照が continulet 内の one-shot continuation に格納されます。 次に ``switch()`` が呼び出されると、格納された one-shot continuation は新しいものと入れ替えられます。 ``switch()`` の呼び出しは、コンテナ内に格納された continuation をサスペンドし、continulet オブジェクトから古い continuation を再開します。 最も原始的な API は '``permute()``' であり、それは 2 つ以上の continulet に格納された one-shot continuation の順序を入れ替えます。 In more details: * ``continulet(callable, *args, **kwds)``: make a new continulet. Like a generator, this only creates it; the ``callable`` is only actually called the first time it is switched to. It will be called as follows:: callable(cont, *args, **kwds) where ``cont`` is the same continulet object. Note that it is actually ``cont.__init__()`` that binds the continulet. It is also possible to create a not-bound-yet continulet by calling explicitly ``continulet.__new__()``, and only bind it later by calling explicitly ``cont.__init__()``. * ``cont.switch(value=None, to=None)``: start the continulet if it was not started yet. Otherwise, store the current continuation in ``cont``, and activate the target continuation, which is the one that was previously stored in ``cont``. Note that the target continuation was itself previously suspended by another call to ``switch()``; this older ``switch()`` will now appear to return. The ``value`` argument is any object that is carried to the target and returned by the target's ``switch()``. If ``to`` is given, it must be another continulet object. In that case, performs a "double switch": it switches as described above to ``cont``, and then immediately switches again to ``to``. This is different from switching directly to ``to``: the current continuation gets stored in ``cont``, the old continuation from ``cont`` gets stored in ``to``, and only then we resume the execution from the old continuation out of ``to``. * ``cont.throw(type, value=None, tb=None, to=None)``: similar to ``switch()``, except that immediately after the switch is done, raise the given exception in the target. * ``cont.is_pending()``: return True if the continulet is pending. This is False when it is not initialized (because we called ``__new__`` and not ``__init__``) or when it is finished (because the ``callable()`` returned). When it is False, the continulet object is empty and cannot be ``switch()``-ed to. * ``permute(*continulets)``: a global function that permutes the continuations stored in the given continulets arguments. Mostly theoretical. In practice, using ``cont.switch()`` is easier and more efficient than using ``permute()``; the latter does not on its own change the currently running frame. Genlets +++++++ The ``_continuation`` module also exposes the ``generator`` decorator:: @generator def f(cont, a, b): cont.switch(a + b) cont.switch(a + b + 1) for i in f(10, 20): print i This example prints 30 and 31. The only advantage over using regular generators is that the generator itself is not limited to ``yield`` statements that must all occur syntactically in the same function. Instead, we can pass around ``cont``, e.g. to nested sub-functions, and call ``cont.switch(x)`` from there. The ``generator`` decorator can also be applied to methods:: class X: @generator def f(self, cont, a, b): ... Greenlets +++++++++ Greenlets are implemented on top of continulets in `lib_pypy/greenlet.py`_. See the official `documentation of the greenlets`_. Note that unlike the CPython greenlets, this version does not suffer from GC issues: if the program "forgets" an unfinished greenlet, it will always be collected at the next garbage collection. Unimplemented features ++++++++++++++++++++++ The following features (present in some past Stackless version of PyPy) are for the time being not supported any more: * Tasklets and channels (currently ``stackless.py`` seems to import, but you have tasklets on top of coroutines on top of greenlets on top of continulets on top of stacklets, and it's probably not too hard to cut two of these levels by adapting ``stackless.py`` to use directly continulets) * Coroutines (could be rewritten at app-level) * Pickling and unpickling continulets (*) * Continuing execution of a continulet in a different thread (*) * Automatic unlimited stack (must be emulated__ so far) * Support for other CPUs than x86 and x86-64 .. __: `recursion depth limit`_ (*) Pickling, as well as changing threads, could be implemented by using a "soft" stack switching mode again. We would get either "hard" or "soft" switches, similarly to Stackless Python 3rd version: you get a "hard" switch (like now) when the C stack contains non-trivial C frames to save, and a "soft" switch (like previously) when it contains only simple calls from Python to Python. Soft-switched continulets would also consume a bit less RAM, and the switch might be a bit faster too (unsure about that; what is the Stackless Python experience?). Recursion depth limit +++++++++++++++++++++ You can use continulets to emulate the infinite recursion depth present in Stackless Python and in stackless-enabled older versions of PyPy. The trick is to start a continulet "early", i.e. when the recursion depth is very low, and switch to it "later", i.e. when the recursion depth is high. Example:: from _continuation import continulet def invoke(_, callable, arg): return callable(arg) def bootstrap(c): # this loop runs forever, at a very low recursion depth callable, arg = c.switch() while True: # start a new continulet from here, and switch to # it using an "exchange", i.e. a switch with to=. to = continulet(invoke, callable, arg) callable, arg = c.switch(to=to) c = continulet(bootstrap) c.switch() def recursive(n): if n == 0: return ("ok", n) if n % 200 == 0: prev = c.switch((recursive, n - 1)) else: prev = recursive(n - 1) return (prev[0], prev[1] + 1) print recursive(999999) # prints ('ok', 999999) Note that if you press Ctrl-C while running this example, the traceback will be built with *all* recursive() calls so far, even if this is more than the number that can possibly fit in the C stack. These frames are "overlapping" each other in the sense of the C stack; more precisely, they are copied out of and into the C stack as needed. (The example above also makes use of the following general "guideline" to help newcomers write continulets: in ``bootstrap(c)``, only call methods on ``c``, not on another continulet object. That's why we wrote ``c.switch(to=to)`` and not ``to.switch()``, which would mess up the state. This is however just a guideline; in general we would recommend to use other interfaces like genlets and greenlets.) Stacklets +++++++++ Continulets are internally implemented using stacklets, which is the generic RPython-level building block for "one-shot continuations". For more information about them please see the documentation in the C source at `pypy/translator/c/src/stacklet/stacklet.h`_. The module ``pypy.rlib.rstacklet`` is a thin wrapper around the above functions. The key point is that new() and switch() always return a fresh stacklet handle (or an empty one), and switch() additionally consumes one. It makes no sense to have code in which the returned handle is ignored, or used more than once. Note that ``stacklet.c`` is written assuming that the user knows that, and so no additional checking occurs; this can easily lead to obscure crashes if you don't use a wrapper like PyPy's '_continuation' module. Theory of composability +++++++++++++++++++++++ Although the concept of coroutines is far from new, they have not been generally integrated into mainstream languages, or only in limited form (like generators in Python and iterators in C#). We can argue that a possible reason for that is that they do not scale well when a program's complexity increases: they look attractive in small examples, but the models that require explicit switching, for example by naming the target coroutine, do not compose naturally. This means that a program that uses coroutines for two unrelated purposes may run into conflicts caused by unexpected interactions. To illustrate the problem, consider the following example (simplified code using a theorical ``coroutine`` class). First, a simple usage of coroutine:: main_coro = coroutine.getcurrent() # the main (outer) coroutine data = [] def data_producer(): for i in range(10): # add some numbers to the list 'data' ... data.append(i) data.append(i * 5) data.append(i * 25) # and then switch back to main to continue processing main_coro.switch() producer_coro = coroutine() producer_coro.bind(data_producer) def grab_next_value(): if not data: # put some more numbers in the 'data' list if needed producer_coro.switch() # then grab the next value from the list return data.pop(0) Every call to grab_next_value() returns a single value, but if necessary it switches into the producer function (and back) to give it a chance to put some more numbers in it. Now consider a simple reimplementation of Python's generators in term of coroutines:: def generator(f): """Wrap a function 'f' so that it behaves like a generator.""" def wrappedfunc(*args, **kwds): g = generator_iterator() g.bind(f, *args, **kwds) return g return wrappedfunc class generator_iterator(coroutine): def __iter__(self): return self def next(self): self.caller = coroutine.getcurrent() self.switch() return self.answer def Yield(value): """Yield the value from the current generator.""" g = coroutine.getcurrent() g.answer = value g.caller.switch() def squares(n): """Demo generator, producing square numbers.""" for i in range(n): Yield(i * i) squares = generator(squares) for x in squares(5): print x # this prints 0, 1, 4, 9, 16 Both these examples are attractively elegant. However, they cannot be composed. If we try to write the following generator:: def grab_values(n): for i in range(n): Yield(grab_next_value()) grab_values = generator(grab_values) then the program does not behave as expected. The reason is the following. The generator coroutine that executes ``grab_values()`` calls ``grab_next_value()``, which may switch to the ``producer_coro`` coroutine. This works so far, but the switching back from ``data_producer()`` to ``main_coro`` lands in the wrong coroutine: it resumes execution in the main coroutine, which is not the one from which it comes. We expect ``data_producer()`` to switch back to the ``grab_next_values()`` call, but the latter lives in the generator coroutine ``g`` created in ``wrappedfunc``, which is totally unknown to the ``data_producer()`` code. Instead, we really switch back to the main coroutine, which confuses the ``generator_iterator.next()`` method (it gets resumed, but not as a result of a call to ``Yield()``). Thus the notion of coroutine is *not composable*. By opposition, the primitive notion of continulets is composable: if you build two different interfaces on top of it, or have a program that uses twice the same interface in two parts, then assuming that both parts independently work, the composition of the two parts still works. A full proof of that claim would require careful definitions, but let us just claim that this fact is true because of the following observation: the API of continulets is such that, when doing a ``switch()``, it requires the program to have some continulet to explicitly operate on. It shuffles the current continuation with the continuation stored in that continulet, but has no effect outside. So if a part of a program has a continulet object, and does not expose it as a global, then the rest of the program cannot accidentally influence the continuation stored in that continulet object. In other words, if we regard the continulet object as being essentially a modifiable ``f_back``, then it is just a link between the frame of ``callable()`` and the parent frame --- and it cannot be arbitrarily changed by unrelated code, as long as they don't explicitly manipulate the continulet object. Typically, both the frame of ``callable()`` (commonly a local function) and its parent frame (which is the frame that switched to it) belong to the same class or module; so from that point of view the continulet is a purely local link between two local frames. It doesn't make sense to have a concept that allows this link to be manipulated from outside. .. _`Stackless Python`: http://www.stackless.com .. _`documentation of the greenlets`: http://packages.python.org/greenlet/ .. include:: _ref.txt