Haydi¶
Overview¶
Haydi (Haystack diver) is a framework for generating discrete structures. It provides a way to define a structure from basic building blocks (e.g. Cartesian product, mappings) and then enumerate all elements, all non-isomorphic elements, or generate random elements.
- Pure Python implementation (Python 2.7+, PyPy supported)
- MIT license
- Sequential or distributed computation (via dask/distributed)
Example of usage¶
Let us define directed graphs on two vertices (represented as a set of edges):
>>> import haydi as hd >>> nodes = hd.USet(2, "n") # A two-element set with (unlabeled) elements {n0, n1} >>> graphs = hd.Subsets(nodes * nodes) # Subsets of a cartesian product
Now we can iterate all elements:
>>> list(graphs.iterate()) [{}, {(n0, n0)}, {(n0, n0), (n0, n1)}, {(n0, n0), (n0, n1), (n1, n0)}, {(n0, # ... 3 lines removed ... n1)}, {(n1, n0)}, {(n1, n0), (n1, n1)}, {(n1, n1)}]
or iterate all non-isomorphic ones:
>>> list(graphs.cnfs()) # cnfs = canonical forms [{}, {(n0, n0)}, {(n0, n0), (n1, n1)}, {(n0, n0), (n0, n1)}, {(n0, n0), (n0, n1), (n1, n1)}, {(n0, n0), (n0, n1), (n1, n0)}, {(n0, n0), (n0, n1), (n1, n0), (n1, n1)}, {(n0, n0), (n1, n0)}, {(n0, n1)}, {(n0, n1), (n1, n0)}]
or generate random instances:
>>> list(graphs.generate(3)) [{(n1, n0)}, {(n1, n1), (n0, n0)}, {(n0, n1), (n1, n0)}]
Haydi supports standard operations like map, filter and reduce:
>>> op = graphs.map(lambda g: len(g)).reduce(lambda x, y: x + y) # Just a demonstration pipeline; nothing usefull >>> op.run()
We can run it transparently as a distributed application:
>>> from haydi import DistributedContext # We are now assuming that dask/distributed is running at hostname:1234 >>> context = DistributedContext("hostname", 1234) >>> op.run(ctx=context)
Example: Černy’s conjecture¶
The following example shows how to use Haydi for verifying Černy’s conjecture on bounded instances. The conjecture states that the length of a minimal reset word is bounded by \((n-1)^2\) where \(n\) is the number of states of the automaton. The reset word is a word that send all states of the automaton to a unique state.
The example program find the maximal length of a minimal reset word for automata of a given size. The full source code is in examples/cerny/cerny.py in the repository.
The following approach is a simple one, just a few lines of code, without any sophisticated optimization. It takes around two minutes (in PyPy) in the sequential mode to verify the conjecture for automata with five states. It probably needs many hours to check automata with six states. The state of the art result is verifying the conjecture for automata for more than 14 states, but it needs some clever optimizations that are out of scope of this example.
First, we describe deterministic automata by their transition functions (mapping from pair of state and symbol to a new state):
>>> import haydi as hd
>>> n_states = 4 # Number of states
>>> n_symbols = 2 # Number of symbols in alphabet
>>> states = hd.USet(n_states, "q") # set of states q0, q1, ..., q_{n_states-1}
>>> alphabet = hd.USet(n_symbols, "a") # set of symbols a0, ..., a_{a_symbols-1}
# All Mappings (states * alphabet) -> states
>>> delta = hd.Mappings(states * alphabet, states)
Now we can create a pipeline that goes through all non-isomorphic automata and finds maximum among lengths of their minimal reset word:
>>> pipeline = delta.cnfs().map(check_automaton).max(size=1)
>>> result = pipeline.run()
>>> print ("The maximal length of a minimal reset word for an "
... "automaton with {} states and {} symbols is {}.".
... format(n_states, n_symbols, result[0]))
The function check_automaton
takes an automaton (as a transition function)
and returns the length the minimal reset word or 0 when there is no such word.
It is just a simple breath-first search on the set of states:
from haydi.algorithms import search
# Let us precompute some values that will be repeatedly used
init_state = frozenset(states)
max_steps = (n_states**3 - n_states) / 6
# Known result is that we do not need more than (n^3 - n) / 6 steps
def check_automaton(delta):
# This function takes automaton as a transition function and
# returns the minimal length of synchronizing word or 0 if there
# is no such word
def step(state, depth):
# A step in bread-first search; gives a set of states
# and return a set reachable by one step
for a in alphabet:
yield frozenset(delta[(s, a)] for s in state)
delta = delta.to_dict()
return search.bfs(
init_state, # Initial state
step, # Function that takes a node and returns the followers
lambda state, depth: depth if len(state) == 1 else None,
# Run until we reach a single state
max_depth=max_steps, # Limit depth of search
not_found_value=0) # Return 0 when we exceed depth limit
Installation¶
Haydi is a pure Python package for Python 2.7+.
Basic installation¶
Haydi itself can be installed simply with pip
:
pip install git+https://github.com/spirali/haydi.git
Domains¶
This section introduces the core structure of Haydi: domains and basic operations with them. Advanced domains and canonical forms are covered in a separate section: Canonical forms.
Elementary Domains¶
One of basic structures in Haydi is Domain
that represents a generic
collection of arbitrary objects. The main operation with domains is to provide a
method for iteration and random generation of elements in domain. Domains are
composable, i.e. more complex domains can be created from the simpler ones.
There are six elementary domains shipped with Haydi: Range
(range of
integers), Values
(domain of explicitly listed Python objects),
Boolean
(two-element domain), and NoneDomain
(a domain
containing only one element: None
), USet
, and CnfValues
.
Domains USet
and CnfValues
are little bit special and they are
designed for enumerating non-isomorphic structures. The topic is covered in
Canonical forms; these domains are not used in this section.
Examples:
>>> import haydi as hd
>>> hd.Range(4) # Domain of four integers
<Range size=4 {0, 1, 2, 3}>
>>> hd.Values(["Haystack", "diver"])
<Values size=2 {'Haystack', 'diver'}>
>>> hd.Boolean()
<Boolean size=2 {False, True}>
>>> hd.NoneDomain()
<NoneDomain size=1 {None}>
Composition¶
New domains can be created by composing existing ones. There are the following compositions: product, sequences, subsets, mappings, and join.
Cartesian product \((A \times B)\)¶
Product
creates a domain of all ordered tuples; for example:
>>> import haydi as hd
>>> a = hd.Range(2)
>>> b = hd.Values(("a", "b", "c"))
>>> hd.Product((a, b))
<Product size=6 {(0, 'a'), (0, 'b'), (0, 'c'), (1, 'a'), ...}>
alternatively, the same thing can be written by using the infix operator *
:
>>> a * b
<Product size=6 {(0, 'a'), (0, 'b'), (0, 'c'), (1, 'a'), ...}>
The product can be created on more than two domains:
>>> hd.Product((a, b, hd.Values["x", "y"]))
<Product size=12{(0, 'a', 'x'), (0, 'a', 'y'), (0, 'b', 'x'), ...}>
>>> a * b * hd.Values(["x", "y"])
<Product size=12{(0, 'a', 'x'), (0, 'a', 'y'), (0, 'b', 'x'), ...}>
Note
Generally, a * b
equals to hd.Product((a, b))
. However, there is one
exception when a
is also product. The expression hd.Product((x, y)) *
b
is equal to hd.Product((x, y, b))
(not hd.Product(hd.Product(x, y),
b)
). The reason is to enable definining n-ary tuples by multiplication. If
you want to avoid this behavior and define “product in product”, then
explicitly use hd.Product
instead of *
.
Sequences \((A^n)\)¶
Sequences
is a shortcut for a product over the same domain.
Sequences of a given length can be defined as:
>>> import haydi as hd
>>> a = hd.Range(2)
>>> hd.Sequences(a, 3) # Sequences of length 3
<Sequences size=8 {(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), ...}>
or sequences with a length in a given range:
>>> hd.Sequences(a, 0, 2) # Sequences of length 0 to 2
<Sequences size=7 {(), (0,), (1,), (0, 0), ...}>
Sequences of a fixed length can also be created by the ** operator on a domain:
>>> hd.Range(2) ** 3
<Sequences size=8 {(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), ...>
Subsets \((\mathcal{P}(A))\)¶
The Subsets
contains subsets from elements of a given domain;
the following example creates the power set:
>>> import haydi as hd
>>> hd.Subsets(hd.Range(2))
<Subsets size=4 {{}, {0}, {0, 1}, {1}}>
When a single argument is provided, it is used to limit subsets to a given size:
>>> hd.Subsets(hd.Range(3), 2) # Subsets of size 2
<Subsets size=3 {{0, 1}, {0, 2}, {1, 2}}>
Two arguments limit the subsets to a size in a given range:
>>> hd.Subsets(hd.Range(3), 0, 1) # Subsets of size between 0 and 1
<Subsets size=4 {{}, {0}, {1}, {2}}>
Note
Type of elements created by Subsets
is not the standard Python
set
, but haydi.Set
. For more information, see Set and Map.
This behavior can be overridden by argument set_class
:
>>> hd.Subsets(hd.Range(2), set_class=frozenset)
<Subsets size=4 {frozenset([]), frozenset([0]), frozenset([0, 1]), ...}>
Mappings \((A \rightarrow B)\)¶
The domain Mappings
contains all mappings from a domain to another
domain:
>>> import haydi as hd
>>> a = hd.Range(2)
>>> b = hd.Values(["a", "b"])
>>> hd.Mappings(a, b)
<Mappings size=4 {{0: 'a'; 1: 'a'}, {0: 'a'; 1: 'b'}, {0: ... a'}, ...}>
Note
Type of elements created by Mappings
is not the standard Python
dict
, but haydi.Map
. For more information, see Set and Map.
This behavior can be overridden by argument map_class
:
>>> hd.Mappings(a, b, map_class=dict)
<Mappings size=4 {{0: 'a', 1: 'a'}, {0: 'a', 1: 'b'}, {0: ... a'}, ...}>
Join \((A \uplus B)\)¶
Join
operation creates a new domain that contains elements of all given
domains (disjoint union):
>>> import haydi as hd
>>> a = hd.Range(2)
>>> b = hd.Values(["abc", "ikl", "xyz"])
>>> c = hd.Values([123])
>>> hd.Join((a, b, c))
<Join size=6 {0, 1, 'abc', 'ikl', ...}>
The same behavior can be also achieved by + operator on domains:
>>> a + b + c
<Join size=6 {0, 1, 'abc', 'ikl', ...}>
Note that Join
does not collapse the same elements in the joined
domains:
>>> a = hd.Range(2)
>>> b = hd.Range(3)
>>> a + b
<Join size=5 {0, 1, 0, 1, 2}>
Let us make now a small detour: Each domain can create a random element by
calling generate_one()
:
>>> a = hd.Range(2)
>>> b = hd.Values(["abc", "ikl", "xyz"])
>>> c = hd.Values([123])
>>> d = a + b + c
>>> d.generate_one()
"ikl"
By default, domains return each element with the same probability and
Join
is not an exception. Therefore, each element of d
has
probablity 1/6 to be returned by generate_one()
(d
has six elements).
This can be changed by ratios
argument:
>>> d2 = hd.Join((a, b, c), ratios=(1, 1, 1))
First, we choose with the same probability (1:1:1) from which subdomain we want
to pick an element and generate_one()
is called on the selected domain.
Therefore 123 will occur with probability 1/3; “ikl” has probability 1/9.
Laziness of domains¶
A domain is generally a lazy object that does not eagerly construct its elements. Therefore if we use code like this:
>>> import haydi as hd
>>> a = hd.Range(1000000)
>>> a * a * a
<Product size=1000000000000000000 {(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), ...}>
we obtain the result instantly, it only instantiates first few objects for the repr string. The ways how to instantiate elements from a domain is explained in Pipeline.
Transformations¶
Another way of creating new domains is applying a transformation on an existing domain. There are two basic transformations: map and filter.
Map¶
Map transformation takes elements of a domain and applies a function to each element to create a new domain:
>>> a = hd.Range(4).map(lambda x: x * 10)
>>> a
<MapTransformation size=4 {0, 10, 20, 30}>
The resulting object is again a domain. For example we can make a product of it:
>>> a * a
<Product size=16 {(0, 0), (0, 10), (0, 20), (0, 30), ...}>
Filter¶
Filter transformation creates a new domain by removing some elements from an existing domain. What elements are removed is configured by providing a function that is called for each element and should return True/False. When the function returns True, then the element is put into the new domain.
>>> hd.Range(10).filter(lambda x: x % 2 == 0 and x > 5)
<FilterTransformation size=10 filtered>
As we can see, the returned repr string is a different from what we have seen so far. The flag ‘filtered’ means that domain contains a filter transformation and the size argument is not exact but only an upper bound. The reason is that to obtain a real size we need to apply the filter function to each element (which would require going through a potentially very large domain).
If we transform the domain into the list, we force the evaluation of the domain and obtain:
>>> list(a)
[6, 8]
The ‘filtered’ flag is propaged during the composition of domains. When a domain is created by composing at least one filtered domain, it is also filtered:
>>> p = a * a
<Product size=100 filtered>
Names¶
It is possible to provide a name for a domain as an argument in the domain constructor. This name serves only for debugging purposes. For example:
>>> import haydi as hd
>>> a = hd.Range(10, name="MyRange")
>>> a
<MyRange size=10 {0, 1, 2, 3, ...}>
>>> a.name
'MyRange'
>>> a = hd.Range(10)
>>> hd.Product((a, a), name="MyProduct")
<MyProduct size=100 {(0, 0), (0, 1), (0, 2), (0, 3), ...}>
Pipeline¶
This section contains a description of working with elements of domains. The main message of this section is that there are three basic methods of creating a stream of elements from a domain:
iterate()
– stream of all elements in domaincnfs()
– stream of all canonical elements in domaingenerate()
– stream of random elements from domain
The rest of this section describes the whole machinery in more detail. If you are interested in sequential computations only, and you want to handle the stream manually, you can just directly use Python iterators as follows:
>>> import haydi as hd
>>> for x in hd.Subsets(hd.Range(10)).generate(5): # print five random subsets
>>> print x
The purpose of the pipeline is to simplify some common operations and enable transparent distributed computations.
Overview¶
The whole pipeline is composed of the following elements:

- Domain – a domain as described in the previous section
- Method – how to take elements from the domain into the stream
- Transformations – transformations of the stream
- Action – final aggregation of results
- Run – the actual invocation of the pipeline
Method¶
There are three methods how we can walk through a domain: iterate, iterate through canonical forms and random generation.
Iterate¶
A pipeline that iterates through all elements is created by method
iterate()
:
>>> import haydi as hd
>>> domain = hd.Range(2) * hd.Range(2)
>>> domain.iterate()
<Pipeline for Product: method=iterate action=Collect>
Calling iterate()
on a domain creates a pipeline object. Moreover, we can
also see that the default action is Collect. This action simply takes all
elements and put them into the list. More details about actions can be found
in Section Actions.
The pipeline is a lazy object and no elements are actually constructed. To run
the pipeline, we need to call run()
method:
>>> domain.iterate().run()
[(0, 0), (0, 1), (1, 0), (1, 1)]
The iterate()
method iterates through all elements in the domain. It is
guaranteed that each element in the domain occurs in the stream in the same
number of occurrences as in the domain. The actual order of elements in the
stream is not guaranteed.
Canonical forms¶
Iterating over canonical elements is a more complex operation, hence there is a dedicated section about this topic: Canonical forms.
Random elements¶
A pipeline that generates random elements from a domain is created by method
generate(count=None)
, where the optional parameter count
specifies the
number of generated elements:
>>> domain = hd.Range(2) * hd.Range(2)
>>> domain.generate(5)
[(0, 1), (0, 0), (1, 0), (0, 0), (1, 1)]
By default, all elements are generated with the same probability, however it can
be configured in some places. See the API documentation for Join
.
When the argument of generate
is None
, then we obtain an infinite
pipeline of random instances. It usually makes sense in combination with a
filter and setting a limit after the filter.
For example, the following code generates 10 pairs whose sum is 11:
>>> domain = hd.Range(10) * hd.Range(10)
>>> domain.generate().filter(lambda x: x[0] + x[1] == 11).take(5).run()
[(3, 8), (5, 6), (9, 2), (3, 8), (6, 5)]
Transformation take(5)
limits the pipeline for the first five elements. As
an exercise we left what happens when we put 5
as the argument for
generate
and remove the take
. TODO
Transformations¶
The current version offers three pipeline transformations:
map(fn)
– maps the functionfn
on each element that goes through the pipelinefilter(fn)
– filters elements in the pipeline according to the provided functiontake(count)
– takes only firstcount
elements from pipeline
At the first sight, there is an overlap between transformations on domains and in the pipeline. In fact, they have in many cases completely the same effect:
>>> domain = hd.Range(5)
>>> domain.map(lambda x: x * 10).iterate().run() # Create a new domain and then iterate
[0, 10, 20, 30, 40]
>>> domain.iterate().map(lambda x: x * 10).run() # Transformation in pipeline
[0, 10, 20, 30, 40]
So why distinguish transformations in pipelines and on domains? The reason is
that in the pipeline, we know that process of the domain creation is completed
and have more freedom for additional features and optimizations. We already have a
stream of elements; therefore, we can introduce take
transformation.
Moreover, the pipeline transformations do not have limitation in case of
Strict domains that are important in the usage of cnfs()
.
For performance reasons, pipeline transformations provide more opportunities for efficient distributed computations. Therefore, Haydi prefers map and filter transformations as pipeline transformations rather than domain transformations. For this reason, Haydi automatically moves last transformations on domains to the pipeline; therefore, the above example actually creates the same pipeline (with one pipeline transformation):
>>> domain.map(lambda x: x * 10).iterate()
<Pipeline for Range: method=iterate ts=[MapTransformation] action=Collect>
>>> domain.iterate().map(lambda x: x * 10)
<Pipeline for Range: method=iterate ts=[MapTransformation] action=Collect>
Of course ‘inner’ domain transformations cannot be moved. For example the following code creates a pipeline without any transformation (the transformation remains hidden inside the domain composition):
>>> domain = hd.Subsets(hd.Range(3).map(lambda x: x * x))
>>> domain.iterate().run()
[{}, {0}, {0, 1}, {0, 1, 4}, {0, 4}, {1}, {1, 4}, {4}]
>>> domain.iterate()
<Pipeline for Subsets: method=iterate action=Collect>
Actions¶
Action is a terminal operation on a stream of elements. The list of operations
follows; more details can be found in API documentation of Pipeline
.
Collect¶
Action collect creates a list from the stream:
>>> hd.Range(5).iterate().collect().run()
[0, 1, 2, 3, 4]
The collect is the default action; therefore, the above code is equivalent to:
>>> hd.Range(5).iterate().run()
[0, 1, 2, 3, 4]
First¶
Action first takes the first element from the stream. If the stream is
empty it returns the provided argument (the default is None
).
>>> hd.Range(5).iterate().first().run()
0
>>> hd.Range(5).filter(lambda x: x > 10).first().run()
None
>>> hd.Range(5).filter(lambda x: x > 10).first("no value").run()
'no value'
Reduce¶
Action reduce applies a binary operation on elements of the stream:
>>> hd.Range(10).reduce(lambda x, y: x + y).run()
45
You can optionally specify an initial value:
>>> hd.Range(10).reduce(lambda x, y: x + y, -3).run()
42
It is assumed by default that the operation is associative, if that is not true, you have to explicitly specify it:
>>> hd.Range(10).reduce(lambda x, y: x - y, 100, associative=False).run()
55
Max¶
Action max gathers maximal elements in the stream, optionally it can take a
function that extracts a value from the element that is used for comparison. The
second optional argument specifies the limit of maximal elements. No more than
the limit number of elements is returned; the rest of maximal elements is thrown
away. Which maximal elements are thrown away and what are returned is not
specified. If the value of the second argument is None
(default) then all
maximal elements are returned:
>>> domain = hd.Range(5) * hd.Range(5)
>>> domain.max().run()
[(4, 4)]
>>> domain.max(lambda x: x[0]).run() # Maximum in the first element in the pair
[(4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]
>>> domain.max(lambda x: x[0], 2).run() # At most two maximal elements
[(4, 0), (4, 1)]
Groups¶
Action groups divides elements in the stream into groups according to a key. The method takes a function that is applied on each element to obtain the key.
>>> hd.Range(10).groups(lambda x: x % 3).run()
{0: [0, 3, 6, 9], 1: [1, 4, 7], 2: [2, 5, 8]}
Optionally, it takes an integer argument that limits the size of groups. No more than the limit number of elements is returned for each group. What elements in the group are thrown away and what are returned is not specified.
>>> hd.Range(10).groups(lambda x: x % 3, 2).run()
{0: [0, 3], 1: [1, 4], 2: [2, 5]}
Groups_counts¶
This is an extension of Groups action that also returns the total number of elements including the elements that was thrown away. The total number of elements is returned at the first index of the lists:
>>> hd.Range(10).groups_counts(lambda x: x % 3, 2).run()
{0: [4, 0, 3], 1: [3, 1, 4], 2: [3, 2, 5]}
Run¶
The run(ctx=None, timeout=None, otf_trace=False)
method invokes the
pipeline. By default, it creates and executes a sequential computation without
any time limit. This can be changed by arguments.
The ctx
parameter defines a context used for the execution of the the
pipeline. Providing an instance of a distributed context,
makes the pipeline parallel and distributed, see distributed
computation.
Parameter timeout
expects float
(a number of seconds) or a timedelta
object. This defines the maximum time of the computation. If the allocated time
runs out, the computation is stopped and a partial result is returned.
Shortcuts¶
To make the code more concise, there are the following defaults defined for the pipeline:
- Method:
iterate
- Transformations: None
- Action:
collect
- Run:
run()
Therefore, we can call .run()
directly on domain and obtain the same results
as using .iterate().collect().run()
. It automatically creates the default
pipeline.
In the same manner, we can also directly call actions on a domain. It creates a
pipeline with iterate()
method.
Examples:
>>> hd.Range(5).run() # .iterate().collect() is used
[0, 1, 2, 3, 4]
>>> hd.Range(5).max().run() # .iterate() is used
[4]
When we create an iterator from a domain or a pipeline, we obtain an iterator to the result of pipeline where missing elements are filled by defaults:
>>> list(hd.Range(5))
[0, 1, 2, 3, 4]
>>> list(hd.Range(5).map(lambda x: x * x))
[0, 1, 4, 9, 16]
>>> list(hd.Range(5).max())
[4]
Immutability of pipelines¶
Pipelines are immutable objects (as same domains); therefore, calling methods on them actually creates new objects. It is thus safe to reuse them:
>>> pipeline = hd.Range(5).iterate()
>>> pipeline.take(2).run()
[0, 1]
>>> pipeline.max().run()
[4]
Set and Map¶
Domains distributed with Haydi use standard Python types almost in all places
for example: Range
creates a domain where elements have type
int
, Product
and Sequences
use tuple
as
elements in the resulting domains. The exceptions are Subsets
that use haydi.Set
and Mappings
with
haydi.Map
, even though a natural choice would be a standard set
and
dict
. The reason is a performance optimization during generation (mainly in
generating Canonical forms).
If you need, both types can be simply converted into the standard ones:
>>> import haydi as hd
>>> a = hd.Range(2)
>>> hd.Subsets(a).map(lambda s: s.to_set()) # Creates standard Python sets
<MapTransformation size=4 {set([]), set([0]), set([0, 1]), set([1])}>
>>> hd.Subsets(a).map(lambda s: s.to_frozenset()) # Creates standard Python frozensets
<MapTransformation size=4 {frozenset([]), frozenset([0]), frozenset([0, , ...}>
>>> hd.Mappings(a, a).map(lambda m: m.to_dict()) # Creates standard Python dicts
<MapTransformation size=4 {{0: 0, 1: 0}, {0: 0, 1: 1}, {0: 1, 1: 0}, {0:, ...}>
Note
Classes Set
and Map
are not designed for frequent
searching. If you need it, please convert them to set/dict. From this
reasons they do not intentionally implement methods __in__
and
__getitem__
to avoid accident usage theses class instead of
set/dict. For occasional lookup, there are methods contains/get in
these classes.
This is still a subject of discussions if we want to introduce
__in__
and __getitem__
method for these classes.
Canonical forms¶
This section covers a feature that serves to iterate only non-isomorphic elements in a domain. It is based on iterating over canonical forms – one element for each equivalence class of isomorphism.
Basic building blocks for the whole machinery are Atoms and USets, that allows to define bijections between elements.
Atoms and USets¶
Domain USet
contains a finite number of instances of class
Atom
. When a new USet is created a number of atoms and a name have to
be specified. With a new USet new atoms are created. Each atom remembers its
index number and parent USet that created it:
>>> import haydi as hd
>>> uset = hd.USet(3, "a") # Create a new USet
>>> uset
<USet id=1 size=3 name=a>
>>> list(uset) # Atoms in uset
[a0, a1, a2]
>>> a0, a1, a2 = list(uset)
>>> a0.index
0
>>> a0.parent # Each Atom remembers its parent USet
<USet id=1 size=3 name=a>
From the perspective of pipeline methods iterate()
and generate()
, USet
behaves as a kind of fancy Range
that wraps numbers into special
objects.
The main difference between Range
and USet
arises when we
introduce isomorphism. Let us remind that two (discrete) objects are
isomorphic if there exits a bijection between them that preserves the
structure.
In our case, we are establishing bijection between atoms and two objects are isomorphic when we can obtain one from another by replacing atoms according to the bijection.
Let us show some examples that use haydi.is_isomorphic()
which checks
isomorphism:
>>> a0, a1, a2 = hd.USet(3, "a")
>>> hd.is_isomorphic(a0, a1)
True # Because there is bijection: a0 -> a1; a1 -> a0; a2 -> a2
>>> hd.is_isomorphic((a0, a1), a1)
False # No mapping between atoms can bring us from a tuple to an atom
>>> hd.is_isomorphic((a0, a1), (a1, a2))
True # Because there is mapping: a0 -> a1; a1 -> a2; a2 -> a0
>>> hd.is_isomorphic((a0, a0), (a0, a1))
False # The explanation below
The bijection between objects in the last case cannot exists. The first tuple
represents a pair of the same object and “renaming” a0 to anything else
preserves this property. The second one represents a pair of two different atoms
(even from the same USet) and any renaming cannot achieve the property of the
first one (any mapping containing a0 -> a0; a1 -> a0
is not bijective)
Atoms from different USets¶
Two atoms from different USets cannot be renamed to each other; i.e. they are never isomorphic. It can be seen as each USet and its atoms have a different color.
>>> a0, a1 = hd.USet(2, "a")
>>> b0, b1 = hd.USet(2, "b")
>>> hd.is_isomorphic(a0, b0)
False # Map containing a0 -> b0 is not allowed
>>> hd.is_isomorphic((a0, b1), (a1, b0))
True # There is bijection: a0 -> a1; a1 -> a0; b0 -> b1; b1 -> b0
The bijection in the second case is correct, since each atom has an image from its parent USet.
Note
The name of an USet serves only for the debugging purpose and has no impact on behavior. Creating two USets with the same name still creates two disjoint sets of atoms with their own parents.
Basic objects¶
So far, we have seen atoms and tuples of atoms in the examples. However, the whole machinery around isomorphisms is implemented for objects that we call basic objects; they are inductively defined as follows:
- atoms, integers, strings, True, False, and None are basic objects
- a tuple of basic objects is a basic object
haydi.Set
of basic objects is a basic objecthaydi.Map
where keys and values are basic objects is a basic object
Examples:
>>> a0, a1, a2 = hd.USet(3, "a")
>>> hd.is_isomorphic((a0, 1), (a0, 2))
False # Renaming is defined only for atoms, not for other objects
>>> hd.is_isomorphic((a0, 1), (a1, 1))
True # Bijection: a0 -> a1; a1 -> a0; a2 -> a2
>>> hd.is_isomorphic(hd.Set((a0, a1)), hd.Set((a2, a0)))
True # Bijection: a0 -> a0; a1 -> a2; a2 -> a1
Canonical forms¶
Since we are interested only in finite (basic) objects, they contain only finitely many atoms, so there are only finitely many bijections (recall that USets are finite). Therefore, each class of equivalence induced by isomorphism is also finite.
In Haydi, there there is a fixed linear ordering of all basic objects defined by
haydi.compare()
. Since each isomorphic class is finite, hence each class
has the smallest element according this ordering. We call this element as a
canonical form of the class.
The pipeline method cnfs()
iterates only through canonical elements in a
domain; therefore, we obtain only one element for each equivalence class.
Let us show some examples:
>>> uset = hd.USet(3, "a")
>>> bset = hd.USet(3, "b")
>>> list(uset) # All elements
[a0, a1, a2]
>>> list(uset.cnfs()) # Canonical forms
[a0]
>>> list(uset + bset) # All elements
[a0, a1, a2, b0, b1, b2]
>>> list((uset + bset).cnfs()) # Canonical forms
[a0, b0]
>>> p = uset * uset
>>> list(p) # All elements
[(a0, a0), (a0, a1), (a0, a2),(a1, a0), (a1, a1),
(a1, a2), (a2, a0), (a2, a1), (a2, a2)]
>>> list(p.cnfs()) # Canonical forms
[(a0, a0), (a0, a1)]
>>> s = hd.Subsets(uset + bset, 2)
>>> list(s) # All elements
[{a0, a1}, {a0, a2}, {a0, b0}, {a0, b1}, {a0, b2}, {a1, a2}, {a1, b0},
{a1, b1}, {a1, b2}, {a2, b0}, {a2, b1}, {a2, b2}, {b0, b1}, {b0, b2}, {b1, b2}]
>>> list(s.cnfs()) # Canonical forms
[{a0, a1}, {a0, b0}, {b0, b1}]
Strict domains¶
The pipeline method cnfs()
is allowed only for strict domains. Strict
domain is a domain that contains only basic objects and is closed under
isomorphism (if it contains an element, it contains also all isomorphic ones).
We call it “strict”, but it is usually not a problem to fulfill these criteria
in practice.
All elementary domains except Values
are always strict. (Domain
CnfValues
is a counter-part of Values
for canonical forms; see
Domain CnfValues). The strictness of a domain can be checked by reading its
attribute strict
:
>>> hd.Range(10).strict
True
All basic domain compositions preserve strictness if and only if all their inner domains are also strict, e.g.:
>>> domain = hd.Subsets(hd.Range(5) * hd.USet(2))
>>> domain.strict
True
The only places where we have to be more careful are transformations and when we create a strict domain from explicit elements. These topics are covered in next two subsections.
Transformations on strict domains¶
Generally transformations may break strict-domain invariant. A filter may remove some elements and left some isomorphic ones. A map may even returns some non-basic objects. Therefore, a domain created by transformation is non-strict by default.
In most cases, when we want to use cnfs()
while applying a transformation,
we can simply move the transformation into the pipeline, where are no such
restrictions, since in the pipeline we do not create a new domain:
>>> domain = hd.USet(3, "a") * hd.Range(4)
>>> list(domain.cnfs()) # This is Ok
>>> new_domain = domain.map(lambda x: SomeMyClass(x))
>>> new_domain.strict
False
>>> new_domain.cnfs() # Throws an error
>>> domain.cnfs().map(lambda x: SomeMyClass(x)) # This is ok, map is in pipeline
If we really need to create a new strict domain by applying a transformation, it is now possible only with filter by the following way:
>>> domain = hd.USet(3, "a") * hd.Range(4)
>>> new_domain = domain.filter(lambda x: x[1] != 2, strict=True)
>>> new_domain.strict
True
>>> list(new_domain.cnfs())
[(a0, 0), (a0, 1), (a0, 3)]
When the filter parameter strict
is set to True
and the original domain
is strict, then the resulting domain is still strict.
Warning
It is the user’s responsibility to assure that strict filter removes all
isomorphic elements. Fortunately, in practice it is usually the desired
behavior of filters. However, if the rule of strict filter is broken, the
behavior of cnfs()
is undefined on such a domain.
Domain CnfValues
¶
Domain Values
creates a non-strict domain, since we cannot assure that
all invariants are valid. If you want to create a strict domain from explicit
elements, you can use CnfValues
. The difference is that
CnfValues
is constructed from canonical elements and it automatically
adds necessary objects into the domain to make it strict (i.e. it adds all
elements isomorphic to the given canonical elements):
>>> uset = hd.USet(3, "a")
>>> a0, a1, a2 = uset
>>> domain = hd.CnfValues((a0, (a0, a1), "x"))
>>> list(domain.cnfs())
[a0, (a0, a1), 'x']
>>> list(domain.iterate())
[a0, a1, a2, (a0, a1), (a0, a2), (a1, a0), (a1, a2), (a2, a0), (a2, a1), 'x']
Public functions¶
This is list of public methods that may be useful when you are working with canonical forms:
is_canonical()
– returnsTrue
if and only if a given is in canonical form.haydi.expand()
– returns a list of isomorphic objects to a given objects.haydi.compare()
– defines a linear ordering between two objects. The exact ordering is left is unspecified, but it is guaranteed that for basic objects it stays fixed even between separated executions.haydi.sort()
– sorts object according hayd.compare.
Distributed computation¶
The pipeline computation in Haydi can be transparently executed through dask/distributed. This enables distributed computation or local parallel computation.
Switching pipeline from default serial context to distributed one, is done
through providing a class to run
method in the pipeline.
Local computation¶
The following example shows how to start four local dask/distributed workers and run Haydi computation on them:
>>> from haydi import DistributedContext
>>> dctx = DistributedContext(spawn_workers=4)
>>> hd.Range(100).map(lambda x: x + 1).collect().run(ctx=dctx)
[1, 2, 3, 4, ...]
The argument spawn_workers
forces DistributedContext
to spawn
dask/distributed workers for you. Switching pipeline from default serial context
to distributed one, is done through providing a distributed context to run
method in the pipeline.
Distributed computation¶
If you want to distribute the computation amongst multiple computers, you first have to create a distributed cluster. An example of cluster setup can be found here. Once the cluster is created, simply pass the IP adress and port of the clusters’ scheduler to the context:
>>> dctx = DistributedContext(ip='192.168.1.1', port=8787)
# connects to cluster at 192.168.1.1:8787
>>> hd.Range(100).map(lambda x: x + 1).collect().run(ctx=dctx)
[1, 2, 3, 4, ...]
Limitations¶
The nested distributed computations are not allowed, i.e. you cannot run distributed pipeline in another distributed pipeline. The following example shows the invalid case:
>>> def worker(x):
... r = hd.Range(x)
... # invalid! sequential run() must be used
... return r.collect().run(DistributedContext(spawn_workers=4))
# THIS IS INVALID - nested distributed computations are not allowed
>>> hd.Range(100).map(worker).run(DistributedContext(spawn_workers=4))
Cookbook¶
Cookbook of Haydi snippets for commonly occurring patterns:
Graphs¶
Directed graphs (subsets of pair of nodes):
>>> nodes = hd.USet(3, "n")
>>> graphs = hd.Subsets(nodes * nodes)
Undirected graphs (subsets of two-element sets):
>>> nodes = hd.USet(3, "n")
>>> graphs = hd.Subsets(Subsets(nodes, 2))
Directed graphs with labeled arcs by “x” or “y”:
>>> nodes = hd.USet(3, "n")
>>> graphs = hd.Subsets(nodes * nodes * hd.Values(("x", "y"))
Undirected graphs with 2 “red” and 3 “blue” vertices:
>>> nodes = hd.USet(2, "red") + hd.USet(3, "blue")
>>> graphs = hd.Subsets(Subsets(nodes, 2))
Automata¶
Deterministic finite automaton with more accepting states:
>>> SIMPLE TODO
Deterministic one-counter automata:
>>> SIMPLE TODO
Deterministic push-down automata:
>>> SIMPLE TODO
Performance tips¶
This section collects tips for better performace.
Materialization of domains¶
TODO
Step jumps¶
TODO
Contact¶
Authors¶
- Stanislav Böhm <stanislav.bohm at vsb.cz>
- Jakub Beránek <berykubik at gmail.com>
- Martin Šurkovský <martin.surkovsky at gmail.com>
Acknowledgment¶
The development of Haydi was supported by the Grant Agency of the Czech Republic, project GAČR 15-13784S. Core hours for optimization on HPC infrastructure was provided by project OPEN-8-26 at the National Supercomputing Center IT4Innovations.