Welcome to Blocks’ documentation!¶
Blocks is a framework that helps you build and manage neural network models on using Theano.
Want to get try it out? Start by installing Blocks and having a look at the quickstart further down this page. Once you’re hooked, try your hand at the tutorials and the examples.
Blocks is developed in parallel with Fuel, a dataset processing framework.
Warning
Blocks is a new project which is still under development. As such, certain (all) parts of the framework are subject to change. The last stable (and thus likely an outdated) version can be found in the stable branch.
Tip
That said, if you are interested in using Blocks and run into any problems, feel free to ask your question on the mailing list. Also, don’t hesitate to file bug reports and feature requests by making a GitHub issue.
Tutorials¶
Installation¶
The easiest way to install Blocks using the Python package manager pip. Blocks isn’t listed yet on the Python Package Index (PyPI), so you will have to grab it directly from GitHub.
$ pip install git+git://github.com/mila-udem/blocks.git \
-r https://raw.githubusercontent.com/mila-udem/blocks/master/req.txt
This will give you the cutting-edge development version. The latest stable release is in the stable branch and can be installed as follows.
$ pip install git+git://github.com/mila-udem/blocks.git@stable \
-r https://raw.githubusercontent.com/mila-udem/blocks/stable/req.txt
Note
Blocks relies on several packages, such as Theano and picklable_itertools, to be installed directly from GitHub. The only way of doing so reliably is through a req.txt file, which is why this installation command might look slightly different from what you’re used to.
Installing requirements from GitHub requires pip 1.5 or higher; you can update with pip update pip.
If you don’t have administrative rights, add the --user switch to the install commands to install the packages in your home folder. If you want to update Blocks, simply repeat the first command with the --upgrade switch added to pull the latest version from GitHub.
Warning
Pip may try to install or update NumPy and SciPy if they are not present or outdated. However, pip’s versions might not be linked to an optimized BLAS implementation. To prevent this from happening make sure you update NumPy and SciPy using your system’s package manager (e.g. apt-get or yum), or use a Python distribution like Anaconda, before installing Blocks. You can also pass the --no-deps switch and install all the requirements manually.
If the installation crashes with ImportError: No module named numpy.distutils.core, install NumPy and try again again.
Requirements¶
Blocks’ requirements are
- Theano, for pretty much everything
- PyYAML, to parse the configuration file
- six, to support both Python 2 and 3 with a single codebase
- Toolz, to add a bit of functional programming where it is needed
Bokeh is an optional requirement for if you want to use live plotting of your training progress (part of blocks-extras_).
We develop using the bleeding-edge version of Theano, so be sure to follow the relevant installation instructions to make sure that your Theano version is up to date if you didn’t install it through Blocks.
Development¶
If you want to work on Blocks’ development, your first step is to fork Blocks on GitHub. You will now want to install your fork of Blocks in editable mode. To install in your home directory, use the following command, replacing USER with your own GitHub user name:
$ pip install -e git+git@github.com:USER/blocks.git#egg=blocks[test,docs] --src=$HOME \
-r https://raw.githubusercontent.com/mila-udem/blocks/master/req.txt
As with the usual installation, you can use --user or --no-deps if you need to. You can now make changes in the blocks directory created by pip, push to your repository and make a pull request.
If you had already cloned the GitHub repository, you can use the following command from the folder you cloned Blocks to:
$ pip install -e file:.#egg=blocks[test,docs] -r req.txt
Documentation¶
If you want to build a local copy of the documentation, follow the instructions at the documentation development guidelines.
Introduction tutorial¶
In this tutorial we will perform handwriting recognition by training a multilayer perceptron (MLP) on the MNIST handwritten digit database.
The Task¶
MNIST is a dataset which consists of 70,000 handwritten digits. Each digit is a grayscale image of 28 by 28 pixels. Our task is to classify each of the images into one of the 10 categories representing the numbers from 0 to 9.

Sample MNIST digits
The Model¶
We will train a simple MLP with a single hidden layer that uses the rectifier activation function. Our output layer will consist of a softmax function with 10 units; one for each class. Mathematically speaking, our model is parametrized by \(\mathbf{\theta}\), defined as the weight matrices \(\mathbf{W}^{(1)}\) and \(\mathbf{W}^{(2)}\), and bias vectors \(\mathbf{b}^{(1)}\) and \(\mathbf{b}^{(2)}\). The rectifier activation function is defined as
and our softmax output function is defined as
Hence, our complete model is
Since the output of a softmax sums to 1, we can interpret it as a categorical probability distribution: \(f(\mathbf{x})_c = \hat p(y = c \mid \mathbf{x})\), where \(\mathbf{x}\) is the 784-dimensional (28 × 28) input and \(c \in \{0, ..., 9\}\) one of the 10 classes. We can train the parameters of our model by minimizing the negative log-likelihood i.e. the cross-entropy between our model’s output and the target distribution. This means we will minimize the sum of
(where \(\mathbf{1}\) is the indicator function) over all examples. We use stochastic gradient descent (SGD) on mini-batches for this.
Building the model¶
Blocks uses “bricks” to build models. Bricks are parametrized Theano operations. You can read more about it in the building with bricks tutorial.
Constructing the model with Blocks is very simple. We start by defining the input variable using Theano.
Tip
Want to follow along with the Python code? If you are using IPython, enable the doctest mode using the special %doctest_mode command so that you can copy-paste the examples below (including the >>> prompts) straight into the IPython interpreter.
>>> from theano import tensor
>>> x = tensor.matrix('features')
Note that we picked the name 'features' for our input. This is important, because the name needs to match the name of the data source we want to train on. MNIST defines two data sources: 'features' and 'targets'.
For the sake of this tutorial, we will go through building an MLP the long way. For a much quicker way, skip right to the end of the next section. We begin with applying the linear transformations and activations.
We start by initializing bricks with certain parameters e.g. input_dim. After initialization we can apply our bricks on Theano variables to build the model we want. We’ll talk more about bricks in the next tutorial, Building with bricks.
>>> from blocks.bricks import Linear, Rectifier, Softmax
>>> input_to_hidden = Linear(name='input_to_hidden', input_dim=784, output_dim=100)
>>> h = Rectifier().apply(input_to_hidden.apply(x))
>>> hidden_to_output = Linear(name='hidden_to_output', input_dim=100, output_dim=10)
>>> y_hat = Softmax().apply(hidden_to_output.apply(h))
Loss function and regularization¶
Now that we have built our model, let’s define the cost to minimize. For this, we will need the Theano variable representing the target labels.
>>> y = tensor.lmatrix('targets')
>>> from blocks.bricks.cost import CategoricalCrossEntropy
>>> cost = CategoricalCrossEntropy().apply(y.flatten(), y_hat)
To reduce the risk of overfitting, we can penalize excessive values of the parameters by adding a \(L2\)-regularization term (also known as weight decay) to the objective function:
To get the weights from our model, we will use Blocks’ annotation features (read more about them in the Managing the computation graph tutorial).
>>> from blocks.bricks import WEIGHT
>>> from blocks.graph import ComputationGraph
>>> from blocks.filter import VariableFilter
>>> cg = ComputationGraph(cost)
>>> W1, W2 = VariableFilter(roles=[WEIGHT])(cg.variables)
>>> cost = cost + 0.005 * (W1 ** 2).sum() + 0.005 * (W2 ** 2).sum()
>>> cost.name = 'cost_with_regularization'
Note
Note that we explicitly gave our variable a name. We do this so that when we monitor the performance of our model, the progress monitor will know what name to report in the logs.
Here we set \(\lambda_1 = \lambda_2 = 0.005\). And that’s it! We now have the final objective function we want to optimize.
But creating a simple MLP this way is rather cumbersome. In practice, we would have used the MLP class instead.
>>> from blocks.bricks import MLP
>>> mlp = MLP(activations=[Rectifier(), Softmax()], dims=[784, 100, 10]).apply(x)
Initializing the parameters¶
When we constructed the Linear bricks to build our model, they automatically allocated Theano shared variables to store their parameters in. All of these parameters were initially set to NaN. Before we start training our network, we will want to initialize these parameters by sampling them from a particular probability distribution. Bricks can do this for you.
>>> from blocks.initialization import IsotropicGaussian, Constant
>>> input_to_hidden.weights_init = hidden_to_output.weights_init = IsotropicGaussian(0.01)
>>> input_to_hidden.biases_init = hidden_to_output.biases_init = Constant(0)
>>> input_to_hidden.initialize()
>>> hidden_to_output.initialize()
We have now initialized our weight matrices with entries drawn from a normal distribution with a standard deviation of 0.01.
>>> W1.get_value()
array([[ 0.01624345, -0.00611756, -0.00528172, ..., 0.00043597, ...
Training your model¶
Besides helping you build models, Blocks also provides the main other features needed to train a model. It has a set of training algorithms (like SGD), an interface to datasets, and a training loop that allows you to monitor and control the training process.
We want to train our model on the training set of MNIST. We load the data using the Fuel framework. Have a look at this tutorial to get started.
After having configured Fuel, you can load the dataset.
>>> from fuel.datasets import MNIST
>>> mnist = MNIST(("train",))
Datasets only provide an interface to the data. For actual training, we will need to iterate over the data in minibatches. This is done by initiating a data stream which makes use of a particular iteration scheme. We will use an iteration scheme that iterates over our MNIST examples sequentially in batches of size 256.
>>> from fuel.streams import DataStream
>>> from fuel.schemes import SequentialScheme
>>> from fuel.transformers import Flatten
>>> data_stream = Flatten(DataStream.default_stream(
... mnist,
... iteration_scheme=SequentialScheme(mnist.num_examples, batch_size=256)))
The training algorithm we will use is straightforward SGD with a fixed learning rate.
>>> from blocks.algorithms import GradientDescent, Scale
>>> algorithm = GradientDescent(cost=cost, parameters=cg.parameters,
... step_rule=Scale(learning_rate=0.1))
During training we will want to monitor the performance of our model on a separate set of examples. Let’s create a new data stream for that.
>>> mnist_test = MNIST(("test",))
>>> data_stream_test = Flatten(DataStream.default_stream(
... mnist_test,
... iteration_scheme=SequentialScheme(
... mnist_test.num_examples, batch_size=1024)))
In order to monitor our performance on this data stream during training, we need to use one of Blocks’ extensions, namely the DataStreamMonitoring extension.
>>> from blocks.extensions.monitoring import DataStreamMonitoring
>>> monitor = DataStreamMonitoring(
... variables=[cost], data_stream=data_stream_test, prefix="test")
We can now use the MainLoop to combine all the different bits and pieces. We use two more extensions to make our training stop after a single epoch and to make sure that our progress is printed.
>>> from blocks.main_loop import MainLoop
>>> from blocks.extensions import FinishAfter, Printing
>>> main_loop = MainLoop(data_stream=data_stream, algorithm=algorithm,
... extensions=[monitor, FinishAfter(after_n_epochs=1), Printing()])
>>> main_loop.run()
-------------------------------------------------------------------------------
BEFORE FIRST EPOCH
-------------------------------------------------------------------------------
Training status:
epochs_done: 0
iterations_done: 0
Log records from the iteration 0:
test_cost_with_regularization: 2.34244632721
-------------------------------------------------------------------------------
AFTER ANOTHER EPOCH
-------------------------------------------------------------------------------
Training status:
epochs_done: 1
iterations_done: 235
Log records from the iteration 235:
test_cost_with_regularization: 0.664899230003
training_finish_requested: True
-------------------------------------------------------------------------------
TRAINING HAS BEEN FINISHED:
-------------------------------------------------------------------------------
Training status:
epochs_done: 1
iterations_done: 235
Log records from the iteration 235:
test_cost_with_regularization: 0.664899230003
training_finish_requested: True
training_finished: True
Building with bricks¶
Blocks is a framework that is supposed to make it easier to build complicated neural network models on top of Theano. In order to do so, we introduce the concept of “bricks”, which you might have already come across in the introduction tutorial.
Bricks life-cycle¶
Blocks uses “bricks” to build models. Bricks are parametrized Theano operations. A brick is usually defined by a set of attributes and a set of parameters, the former specifying the attributes that define the Block (e.g., the number of input and output units), the latter representing the parameters of the brick object that will vary during learning (e.g., the weights and the biases).
The life-cycle of a brick is as follows:
- Configuration: set (part of) the attributes of the brick. Can take place when the brick object is created, by setting the arguments of the constructor, or later, by setting the attributes of the brick object. No Theano variable is created in this phase.
- Allocation: (optional) allocate the Theano shared variables for the parameters of the Brick. When Brick.allocate() is called, the required Theano variables are allocated and initialized by default to NaN.
- Application: instantiate a part of the Theano computational graph, linking the inputs and the outputs of the brick through its parameters and according to the attributes. Cannot be performed (i.e., results in an error) if the Brick object is not fully configured.
- Initialization: set the numerical values of the Theano variables that store the parameters of the Brick. The user-provided value will replace the default initialization value.
Note
If the Theano variables of the brick object have not been allocated when apply() is called, Blocks will quietly call Brick.allocate().
Example¶
Bricks take Theano variables as inputs, and provide Theano variables as outputs.
>>> import theano
>>> from theano import tensor
>>> from blocks.bricks import Tanh
>>> x = tensor.vector('x')
>>> y = Tanh().apply(x)
>>> print(y)
tanh_apply_output
>>> isinstance(y, theano.Variable)
True
This is clearly an artificial example, as this seems like a complicated way of writing y = tensor.tanh(x). To see why Blocks is useful, consider a very common task when building neural networks: Applying a linear transformation (with optional bias) to a vector, and then initializing the weight matrix and bias vector with values drawn from a particular distribution.
>>> from blocks.bricks import Linear
>>> from blocks.initialization import IsotropicGaussian, Constant
>>> linear = Linear(input_dim=10, output_dim=5,
... weights_init=IsotropicGaussian(),
... biases_init=Constant(0.01))
>>> y = linear.apply(x)
So what happened here? We constructed a brick called Linear with a particular configuration: the input dimension (10) and output dimension (5). When we called Linear.apply, the brick automatically constructed the shared Theano variables needed to store its parameters. In the lifecycle of a brick we refer to this as allocation.
>>> linear.parameters
[W, b]
>>> linear.parameters[1].get_value()
array([ nan, nan, nan, nan, nan])
By default, all our parameters are set to NaN. To initialize them, simply call the Brick.initialize() method. This is the last step in the brick lifecycle: initialization.
>>> linear.initialize()
>>> linear.parameters[1].get_value()
array([ 0.01, 0.01, 0.01, 0.01, 0.01])
Keep in mind that at the end of the day, bricks just help you construct a Theano computational graph, so it is possible to mix in regular Theano statements when building models. (However, you might miss out on some of the niftier features of Blocks, such as variable annotation.)
>>> z = tensor.max(y + 4)
Lazy initialization¶
In the example above we configured the Linear brick during initialization. We specified input and output dimensions, and specified the way in which weight matrices should be initialized. But consider the following case, which is quite common: We want to take the output of one model, and feed it as an input to another model, but the output and input dimensions don’t match, so we will need to add a linear transformation in the middle.
To support this use case, bricks allow for lazy initialization, which is turned on by default. This means that you can create a brick without configuring it fully (or at all):
>>> linear2 = Linear(output_dim=10)
>>> print(linear2.input_dim)
NoneAllocation
Of course, as long as the brick is not configured, we cannot actually apply it!
>>> linear2.apply(x)
Traceback (most recent call last):
...
ValueError: allocation config not set: input_dim
We can now easily configure our brick based on other bricks.
>>> linear2.input_dim = linear.output_dim
>>> linear2.apply(x)
linear_apply_output
In the examples so far, the allocation of the parameters has always happened implicitly when calling the apply methods, but it can also be called explicitly. Consider the following example:
>>> linear3 = Linear(input_dim=10, output_dim=5)
>>> linear3.parameters
Traceback (most recent call last):
...
AttributeError: 'Linear' object has no attribute 'parameters'
>>> linear3.allocate()
>>> linear3.parameters
[W, b]
Nested bricks¶
Many neural network models, especially more complex ones, can be considered hierarchical structures. Even a simple multi-layer perceptron consists of layers, which in turn consist of a linear transformation followed by a non-linear transformation.
As such, bricks can have children. Parent bricks are able to configure their children, to e.g. make sure their configurations are compatible, or have sensible defaults for a particular use case.
>>> from blocks.bricks import MLP, Logistic
>>> mlp = MLP(activations=[Logistic(name='sigmoid_0'),
... Logistic(name='sigmoid_1')], dims=[16, 8, 4],
... weights_init=IsotropicGaussian(), biases_init=Constant(0.01))
>>> [child.name for child in mlp.children]
['linear_0', 'sigmoid_0', 'linear_1', 'sigmoid_1']
>>> y = mlp.apply(x)
>>> mlp.children[0].input_dim
16
We can see that the MLP brick automatically constructed two child bricks to perform the linear transformations. When we applied the MLP to x, it automatically configured the input and output dimensions of its children. Likewise, when we call Brick.initialize(), it automatically pushed the weight matrix and biases initialization configuration to its children.
>>> mlp.initialize()
>>> mlp.children[1].parameters[0].get_value()
array([[-0.38312393, -1.7718271 , 0.78074479, -0.74750996],
...
[ 1.32390416, -0.56375355, -0.24268186, -2.06008577]])
There are cases where we want to override the way the parent brick configured its children. For example in the case where we want to initialize the weights of the first layer in an MLP slightly differently from the others. In order to do so, we need to have a closer look at the life cycle of a brick. In the first two sections we already talked talked about the three stages in the life cycle of a brick:
- Construction of the brick
- Allocation of its parameters
- Initialization of its parameters
When dealing with children, the life cycle actually becomes a bit more complicated. (The full life cycle is documented as part of the Brick class.) Before allocating or initializing parameters, the parent brick calls its Brick.push_allocation_config() and Brick.push_initialization_config() methods, which configure the children. If you want to override the child configuration, you will need to call these methods manually, after which you can override the child bricks’ configuration.
>>> mlp = MLP(activations=[Logistic(name='sigmoid_0'),
... Logistic(name='sigmoid_1')], dims=[16, 8, 4],
... weights_init=IsotropicGaussian(), biases_init=Constant(0.01))
>>> y = mlp.apply(x)
>>> mlp.push_initialization_config()
>>> mlp.children[0].weights_init = Constant(0.01)
>>> mlp.initialize()
>>> mlp.children[0].parameters[0].get_value()
array([[ 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01],
...
[ 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]])
Managing the computation graph¶
Theano constructs computation graphs of mathematical expressions. Bricks help you build these graphs, but they do more than that. When you apply a brick to a Theano variable, it automatically annotates this Theano variable, in two ways:
- It defines the role this variable plays in the computation graph e.g. it will label weight matrices and biases as parameters, keep track of which variables were the in- and outputs of your bricks, and more.
- It constructs auxiliary variables. These are variables which are not outputs of your brick, but might still be of interest. For example, if you are training a neural network, you might be interested to know the norm of your weight matrices, so Blocks attaches these as auxiliary variables to the graph.
Using annotations¶
The ComputationGraph class provides an interface to this annotated graph. For example, let’s say we want to train an autoencoder using weight decay on some of the layers.
>>> from theano import tensor
>>> x = tensor.matrix('features')
>>> from blocks.bricks import MLP, Logistic, Rectifier
>>> from blocks.initialization import IsotropicGaussian, Constant
>>> mlp = MLP(activations=[Rectifier()] * 2 + [Logistic()],
... dims=[784, 256, 128, 784],
... weights_init=IsotropicGaussian(), biases_init=Constant(0.01))
>>> y_hat = mlp.apply(x)
>>> from blocks.bricks.cost import BinaryCrossEntropy
>>> cost = BinaryCrossEntropy().apply(x, y_hat)
Our Theano computation graph is now defined by our loss, cost. We initialize the managed graph.
>>> from blocks.graph import ComputationGraph
>>> cg = ComputationGraph(cost)
We will find that there are many variables in this graph.
>>> print(cg.variables)
[TensorConstant{0}, b, W_norm, b_norm, features, TensorConstant{1.0}, ...]
To apply weight decay, we only need the weights matrices. These have been tagged with the WEIGHT role. So let’s create a filter that finds these for us.
>>> from blocks.filter import VariableFilter
>>> from blocks.roles import WEIGHT
>>> print(VariableFilter(roles=[WEIGHT])(cg.variables))
[W, W, W]
Note that the variables in cg.variables are ordered according to the topological order of their apply nodes. This means that for a feedforward network the parameters will be returned in the order of our layers.
But let’s imagine for a second that we are actually dealing with a far more complicated network, and we want to apply weight decay to the parameters of one layer in particular. To do that, we can filter the variables by the bricks that created them.
>>> second_layer = mlp.linear_transformations[1]
>>> from blocks.roles import PARAMETER
>>> var_filter = VariableFilter(roles=[PARAMETER], bricks=[second_layer])
>>> print(var_filter(cg.variables))
[b, W]
Note
There are a variety of different roles that you can filter by. You might have noted already that there is a hierarchy to many of them: Filtering by PARAMETER will also return variables of the child roles WEIGHT and BIAS.
We can also see what auxiliary variables our bricks have created. These might be of interest to monitor during training, for example.
>>> print(cg.auxiliary_variables)
[W_norm, b_norm, W_norm, b_norm, W_norm, b_norm]
Live plotting¶
Note
The live plotting functionality is part of blocks-extras, which must be separately installed.
Plots often give a clearer image of your training progress than textual logs. This is why Blocks has a Plot extension which allows you to plot the entries from the log that you are interested in.
We use Bokeh, an interactive visualization library, to perform the plotting. More specifically, we use the Bokeh Plot Server. This is basically a light web server to which Blocks can send data, which then gets displayed in live plots in your browser. The advantage of this approach is that you can even monitor your models’ training progress over a network.
First, make sure that you installed the necessary requirements (see the installation instructions). To start the server type
$ bokeh-server
This will start a server that is accesible on your computer at http://localhost:5006. If you want to make sure that you can access your plots across a network (or the internet), you can listen on all IP addresses using
$ bokeh-server --ip 0.0.0.0
Now that your plotting server is up and running, start your main loop and pass the Plot extension. Consider this example of fitting the function \(f(x) = x^a\) to \(f(x) = x^2\).
>>> import theano
>>> a = theano.shared(3.)
>>> a.name = 'a'
>>> x = theano.tensor.scalar('data')
>>> cost = abs(x ** 2 - x ** a)
>>> cost.name = 'cost'
We train on a 150 random points in \([0, 1]\).
>>> import numpy
>>> from fuel.streams import DataStream
>>> from fuel.datasets import IterableDataset
>>> data_stream = DataStream(IterableDataset(
... numpy.random.rand(150).astype(theano.config.floatX)))
Now let’s train with gradient descent and plot the results.
>>> from blocks.main_loop import MainLoop
>>> from blocks.algorithms import GradientDescent, Scale
>>> from blocks.extensions import FinishAfter
>>> from blocks.extensions.monitoring import TrainingDataMonitoring
>>> from blocks.extras.extensions.plot import Plot
>>> main_loop = MainLoop(
... model=None, data_stream=data_stream,
... algorithm=GradientDescent(cost=cost,
... parameters=[a],
... step_rule=Scale(learning_rate=0.1)),
... extensions=[FinishAfter(after_n_epochs=1),
... TrainingDataMonitoring([cost, a], after_batch=True),
... Plot('Plotting example', channels=[['cost'], ['a']],
... after_batch=True)])
>>> main_loop.run()
Tip
If you want to plot channels in the same figure, pass them as part of the same list. For example, [['cost', 'a']] would have plotted a single figure with both the cost and the estimate of the exponent.
Open up your browser and go to http://localhost:5006 to see your model cost go down in real-time!


In-depth¶
Recurrent neural networks¶
Warning
This section is very much work in progress!
This tutorial explains recurrent bricks in Blocks. Readers unfamiliar with bricks should start with the bricks overview first and continue with this tutorial afterwards.
Quickstart example¶
As a starting example, we’ll be building an RNN which accumulates the input it receives (figure above). The equation describing that RNN is
>>> import numpy
>>> import theano
>>> from theano import tensor
>>> from blocks import initialization
>>> from blocks.bricks import Identity
>>> from blocks.bricks.recurrent import SimpleRecurrent
>>> x = tensor.tensor3('x')
>>> rnn = SimpleRecurrent(
... dim=3, activation=Identity(), weights_init=initialization.Identity())
>>> rnn.initialize()
>>> h = rnn.apply(x)
>>> f = theano.function([x], h)
>>> print(f(numpy.ones((3, 1, 3), dtype=theano.config.floatX)))
[[[ 1. 1. 1.]]
[[ 2. 2. 2.]]
[[ 3. 3. 3.]]]...
Let’s modify that example so that the RNN accumulates two times the input it receives (figure below).
The equation for the RNN is
>>> from blocks.bricks import Linear
>>> doubler = Linear(
... input_dim=3, output_dim=3, weights_init=initialization.Identity(2),
... biases_init=initialization.Constant(0))
>>> doubler.initialize()
>>> h_doubler = rnn.apply(doubler.apply(x))
>>> f = theano.function([x], h_doubler)
>>> print(f(numpy.ones((3, 1, 3), dtype=theano.config.floatX)))
[[[ 2. 2. 2.]]
[[ 4. 4. 4.]]
[[ 6. 6. 6.]]]...
Note that in order to double the input we had to apply a bricks.Linear brick to x, even though
is what is usually thought of as the RNN equation. The reason why recurrent bricks work that way is it allows greater flexibility and modularity: \(\mathbf{W}\mathbf{x}_t\) can be replaced by a whole neural network if we want.
Initial states¶
Recurrent models all have in common that their initial state has to be specified. However, in constructing our toy examples, we omitted to pass \(\mathbf{h}_0\) when applying the recurrent brick. What happened?
It turns out that recurrent bricks set that initial state to zero if it’s not passed as argument, which is a good sane default in most cases, but we can just as well set it explicitly.
We will modify the starting example so that it accumulates the input it receives, but starting from one instead of zero (figure above):
>>> h0 = tensor.matrix('h0')
>>> h = rnn.apply(inputs=x, states=h0)
>>> f = theano.function([x, h0], h)
>>> print(f(numpy.ones((3, 1, 3), dtype=theano.config.floatX),
... numpy.ones((1, 3), dtype=theano.config.floatX)))
[[[ 2. 2. 2.]]
[[ 3. 3. 3.]]
[[ 4. 4. 4.]]]...
Reverse¶
Todo
Say something about the reverse argument
Getting initial states back¶
Todo
Say something about the return_initial_states argument
Iterate (or not)¶
The apply method of a recurrent brick accepts an iterate argument, which defaults to True. Setting it to False causes the apply method to compute only one step in the sequence.
This is very useful when you’re trying to combine multiple recurrent layers in a network.
Imagine you’d like to build a network with two recurrent layers. The second layer accumulates the output of the first layer, while the first layer accumulates the input of the network and the output of the second layer (see figure below).
Here’s how you can create a recurrent brick that encapsulate the two layers:
>>> from blocks.bricks.recurrent import BaseRecurrent, recurrent
>>> class FeedbackRNN(BaseRecurrent):
... def __init__(self, dim, **kwargs):
... super(FeedbackRNN, self).__init__(**kwargs)
... self.dim = dim
... self.first_recurrent_layer = SimpleRecurrent(
... dim=self.dim, activation=Identity(), name='first_recurrent_layer',
... weights_init=initialization.Identity())
... self.second_recurrent_layer = SimpleRecurrent(
... dim=self.dim, activation=Identity(), name='second_recurrent_layer',
... weights_init=initialization.Identity())
... self.children = [self.first_recurrent_layer,
... self.second_recurrent_layer]
...
... @recurrent(sequences=['inputs'], contexts=[],
... states=['first_states', 'second_states'],
... outputs=['first_states', 'second_states'])
... def apply(self, inputs, first_states=None, second_states=None):
... first_h = self.first_recurrent_layer.apply(
... inputs=inputs, states=first_states + second_states, iterate=False)
... second_h = self.second_recurrent_layer.apply(
... inputs=first_h, states=second_states, iterate=False)
... return first_h, second_h
...
... def get_dim(self, name):
... return (self.dim if name in ('inputs', 'first_states', 'second_states')
... else super(FeedbackRNN, self).get_dim(name))
...
>>> x = tensor.tensor3('x')
>>> feedback = FeedbackRNN(dim=3)
>>> feedback.initialize()
>>> first_h, second_h = feedback.apply(inputs=x)
>>> f = theano.function([x], [first_h, second_h])
>>> for states in f(numpy.ones((3, 1, 3), dtype=theano.config.floatX)):
... print(states)
[[[ 1. 1. 1.]]
[[ 3. 3. 3.]]
[[ 8. 8. 8.]]]
[[[ 1. 1. 1.]]
[[ 4. 4. 4.]]
[[ 12. 12. 12.]]]...
There’s a lot of things going on here!
We defined a recurrent brick class called FeedbackRNN whose constructor initializes two bricks.recurrent.SimpleRecurrent bricks as its children.
The class has a get_dim method whose purpose is to tell the dimensionality of each input to the brick’s apply method.
The core of the class resides in its apply method. The @recurrent decorator is used to specify which of the arguments to the method are sequences to iterate over, what is returned when the method is called and which of those returned values correspond to recurrent states. Its relationship with the inputs and outputs arguments to the @application decorator is as follows:
- outputs, like in @application, defines everything that’s returned by apply, including recurrent outputs
- states is a subset of outputs that corresponds to recurrent outputs, which means that the union of sequences and states forms what would be inputs in @application
Notice how no call to theano.scan() is being made. This is because the implementation of apply is responsible for computing one time step of the recurrent application of the brick. It takes states at time \(t - 1\) and inputs at time \(t\) and produces the output for time \(t\). The rest is all handled by the @recurrent decorator behind the scenes.
This is why the iterate argument of the apply method is so useful: it allows to combine multiple recurrent brick applications within another apply implementation.
Tip
When looking at a recurrent brick’s documentation, keep in mind that the parameters to its apply method are explained in terms of a single iteration, i.e. with the assumption that iterate = False.
Configuration¶
Blocks allows module-wide configuration values to be set using a YAML configuration file and environment variables. Environment variables override the configuration file which in its turn overrides the defaults.
The configuration is read from ~/.blocksrc if it exists. A custom configuration file can be used by setting the BLOCKS_CONFIG environment variable. A configuration file is of the form:
data_path: /home/user/datasets
If a setting is not configured and does not provide a default, a ConfigurationError is raised when it is accessed.
Configuration values can be accessed as attributes of blocks.config.config.
>>> from blocks.config import config
>>> print(config.default_seed)
1
The following configurations are supported:
- default_seed¶
The seed used when initializing random number generators (RNGs) such as NumPy RandomState objects as well as Theano’s MRG_RandomStreams objects. Must be an integer. By default this is set to 1.
- recursion_limit¶
The recursion max depth limit used in MainLoop as well as in other situations when deep recursion is required. The most notable example of such a situation is pickling or unpickling a complex structure with lots of objects, such as a big Theano computation graph.
- profile, BLOCKS_PROFILE¶
A boolean value which determines whether to print profiling information at the end of a call to MainLoop.run().
- log_backend¶
The backend to use for logging experiments. Defaults to python, which stores the log as a Python object in memory. The other option is sqlite.
- sqlite_database, BLOCKS_SQLITEDB¶
The SQLite database file to use.
- max_blob_size¶
The maximum size of an object to store in an SQLite database in bytes. Objects beyond this size will trigger a warning. Defaults to 4 kilobyte.
- temp_dir¶
The directory in which Blocks will create temporary files. If unspecified, the platform-dependent default chosen by the Python tempfile module is used.
- class blocks.config.ConfigurationError¶
Bases: exceptions.Exception
Error raised when a configuration value is requested but not set.
Serialization¶
The ability to save models and their training progress is important for two reasons:
- Neural nets can take days or even weeks to train. If training is interrupted during this time, it is important that we can continue from where we left off.
- We need the ability to save models in order to share them with others or save them for later use or inspection.
These two goals come with differing requirements, which is why Blocks implements a custom serialization approach that tries to meet both needs in the dump() and load() functions.
Pickling the training loop¶
Warning
Due to the complexity of serializing a Python objects as large as the main loop, (un)pickling will sometimes fail because it exceeds the default maximum recursion depth set in Python. Increasing the limit should fix the problem.
When checkpointing, Blocks pickles the entire main loop, effectively serializing the exact state of the model as well as the training state (iteration state, extensions, etc.). Technically there are some difficulties with this approach:
- Some Python objects cannot be pickled e.g. file handles, generators, dynamically generated classes, nested classes, etc.
- The pickling of Theano objects can be problematic.
- We do not want to serialize the training data kept in memory, since this can be prohibitively large.
Blocks addresses these problems by avoiding certain data structures such as generators and nested classes (see the developer guidelines) and overriding the pickling behaviour of some objects, making the pickling of the main loop possible.
However, pickling can be problematic for long-term storage of models, because
- Unpickling depends on the libraries used being unchanged. This means that if you updated Blocks, Theano, etc. to a new version where the interface has changed, loading your training progress could fail.
- The unpickling of Theano objects can be problematic, especially when transferring from GPU to CPU or vice versa.
- It is not possible on Python 2 to unpickle objects that were pickled in Python 3.
Parameter saving¶
This is why Blocks intercepts the pickling of all Theano shared variables (which includes the parameters), and stores them as separate NPY files. The resulting file is a ZIP arcive that contains the pickled main loop as well as a collection of NumPy arrays. The NumPy arrays (and hence parameters) in the ZIP file can be read, across platforms, using the numpy.load() function, making it possible to inspect and load parameter values, even if the unpickling of the main loop fails.
API Reference¶
Warning
This API reference is currently nothing but a dump of docstrings, ordered alphabetically.
The API reference contains detailed descriptions of the different end-user classes, functions, methods, etc. you will need to work with Blocks.
Note
This API reference only contains end-user documentation. If you are looking to hack away at Blocks’ internals, you will find more detailed comments in the source code.
Algorithms¶
- class blocks.algorithms.AdaDelta(decay_rate=0.95, epsilon=1e-06)¶
Bases: blocks.algorithms.StepRule
Adapts the step size over time using only first order information.
Parameters: - decay_rate (float, optional) – Decay rate in [0, 1]. Defaults to 0.95.
- epsilon (float, optional) – Stabilizing constant for RMS. Defaults to 1e-6.
Notes
For more information, see [ADADELTA].
[ADADELTA] Matthew D. Zeiler, ADADELTA: An Adaptive Learning Rate Method, arXiv:1212.5701. - compute_step(parameter, previous_step)¶
- class blocks.algorithms.AdaGrad(learning_rate=0.002, epsilon=1e-06)¶
Bases: blocks.algorithms.StepRule
Implements the AdaGrad learning rule.
Parameters: - learning_rate (float, optional) – Step size. Default value is set to 0.0002.
- epsilon (float, optional) – Stabilizing constant for one over root of sum of squares. Defaults to 1e-6.
Notes
For more information, see [ADAGRAD].
[ADADGRAD] Duchi J, Hazan E, Singer Y., *Adaptive subgradient methods for online learning and
stochastic optimization*,- compute_step(parameter, previous_step)¶
- class blocks.algorithms.Adam(learning_rate=0.002, beta1=0.1, beta2=0.001, epsilon=1e-08, decay_factor=0.99999999)¶
Bases: blocks.algorithms.StepRule
Adam optimizer as described in [King2014].
[King2014] Diederik Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization, http://arxiv.org/abs/1412.6980 Parameters: - learning_rate (float, optional) – Step size. Default value is set to 0.0002.
- beta1 (float, optional) – Exponential decay rate for the first moment estimates. Default value is set to 0.1.
- beta2 (float, optional) – Exponential decay rate for the second moment estimates. Default value is set to 0.001.
- epsilon (float, optional) – Default value is set to 1e-8.
- decay_factor (float, optional) – Default value is set to 1 - 1e-8.
- compute_step(parameter, previous_step)¶
- class blocks.algorithms.BasicMomentum(momentum=0.0)¶
Bases: blocks.algorithms.StepRule
Accumulates step with exponential discount.
Parameters: momentum (float, optional) – The momentum coefficient. Defaults to 0. Notes
This step rule is intended to be used in conjunction with another step rule, _e.g._ Scale. For an all-batteries-included experience, look at Momentum.
- compute_step(parameter, previous_step)¶
- class blocks.algorithms.BasicRMSProp(decay_rate=0.9, max_scaling=100000.0)¶
Bases: blocks.algorithms.StepRule
Scales the step size by a running average of the recent step norms.
Parameters: - decay_rate (float, optional) – How fast the running average decays, value in [0, 1] (lower is faster). Defaults to 0.9.
- max_scaling (float, optional) – Maximum scaling of the step size, in case the running average is really small. Needs to be greater than 0. Defaults to 1e5.
Notes
This step rule is intended to be used in conjunction with another step rule, _e.g._ Scale. For an all-batteries-included experience, look at RMSProp.
In general, this step rule should be used _before_ other step rules, because it has normalization properties that may undo their work. For instance, it should be applied first when used in conjunction with Scale.
For more information, see [Hint2014].
- compute_step(parameter, previous_step)¶
- class blocks.algorithms.CompositeRule(components)¶
Bases: blocks.algorithms.StepRule
Chains several step rules.
Parameters: components (list of StepRule) – The learning rules to be chained. The rules will be applied in the order as given. - compute_steps(previous_steps)¶
- class blocks.algorithms.DifferentiableCostMinimizer(cost, parameters)¶
Bases: blocks.algorithms.TrainingAlgorithm
Minimizes a differentiable cost given as a Theano expression.
Very often the goal of training is to minimize the expected value of a Theano expression. Batch processing in this cases typically consists of running a (or a few) Theano functions. DifferentiableCostMinimizer is the base class for such algorithms.
Parameters: - cost (TensorVariable) – The objective to be minimized.
- parameters (list of TensorSharedVariable) – The parameters to be tuned.
- updates¶
list of TensorSharedVariable updates
Updates to be done for every batch. It is required that the updates are done using the old values of optimized parameters.
- cost¶
TensorVariable
The objective to be minimized.
- parameters¶
list of TensorSharedVariable
The parameters to be tuned.
Notes
Changing updates attribute or calling add_updates after the initialize method is called will have no effect.
Todo
Some shared variables are not parameters (e.g. those created by random streams).
Todo
Due to a rather premature status of the ComputationGraph class the parameter used only inside scans are not fetched currently.
- add_updates(updates)¶
Add updates to the training process.
The updates will be done _before_ the parameters are changed.
Parameters: updates (list of tuples or OrderedDict) – The updates to add.
- inputs¶
Return inputs of the cost computation graph.
Returns: inputs – Inputs to this graph. Return type: list of TensorVariable
- updates
- class blocks.algorithms.GradientDescent(step_rule=None, gradients=None, known_grads=None, consider_constant=None, on_unused_sources='raise', theano_func_kwargs=None, **kwargs)¶
Bases: blocks.algorithms.DifferentiableCostMinimizer
A base class for all gradient descent algorithms.
By “gradient descent” we mean a training algorithm of the following form:
for batch in data: steps = step_rule.compute_steps(parameters, gradients_wr_parameters) for parameter in parameters: parameter -= steps[parameter]
Note, that the step is subtracted, not added! This is done in order to make step rule chaining possible.
Parameters: - step_rule (instance of StepRule, optional) – An object encapsulating most of the algorithm’s logic. Its compute_steps method is called to get Theano expression for steps. Note, that the step rule might have a state, e.g. to remember a weighted sum of gradients from previous steps like it is done in gradient descent with momentum. If None, an instance of Scale is created.
- gradients (dict, optional) – A dictionary mapping a parameter to an expression for the cost’s gradient with respect to the parameter. If None, the gradient are taken automatically using theano.gradient.grad().
- known_grads (dict, optional) – A passthrough to theano.tensor.grad‘s known_grads argument. Useful when you know the [approximate] gradients of some sub-expressions and would like Theano to use that information to compute parameter gradients. Only makes sense when gradients is None.
- consider_constant (list, optional) – A passthrough to theano.tensor.grad‘s consider_constant argument. A list of expressions through which gradients will not be backpropagated. Only makes sense when gradients is None.
- on_unused_sources (str, one of ‘raise’ (default), ‘ignore’, ‘warn’) – Controls behavior when not all sources are used.
- theano_func_kwargs (dict, optional) – A passthrough to theano.function for additional arguments. Useful for passing profile or mode arguments to the theano function that will be compiled for the algorithm.
- gradients¶
dict
The gradient dictionary.
- initialize()¶
- process_batch(batch)¶
- class blocks.algorithms.Momentum(learning_rate=1.0, momentum=0.0)¶
Bases: blocks.algorithms.CompositeRule
Accumulates step with exponential discount.
Combines BasicMomentum and Scale to form the usual momentum step rule.
Parameters: - learning_rate (float, optional) – The learning rate by which the previous step scaled. Defaults to 1.
- momentum (float, optional) – The momentum coefficient. Defaults to 0.
- learning_rate¶
SharedVariable
A variable for learning rate.
- momentum¶
SharedVariable
A variable for momentum.
See also
SharedVariableModifier
- class blocks.algorithms.RMSProp(learning_rate=1.0, decay_rate=0.9, max_scaling=100000.0)¶
Bases: blocks.algorithms.CompositeRule
Scales the step size by a running average of the recent step norms.
Combines BasicRMSProp and Scale to form the step rule described in [Hint2014].
[Hint2014] (1, 2) Geoff Hinton, Neural Networks for Machine Learning, lecture 6a, http://cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf Parameters: - learning_rate (float, optional) – The learning rate by which the previous step scaled. Defaults to 1.
- decay_rate (float, optional) – How fast the running average decays (lower is faster). Defaults to 0.9.
- max_scaling (float, optional) – Maximum scaling of the step size, in case the running average is really small. Defaults to 1e5.
- learning_rate¶
SharedVariable
A variable for learning rate.
- decay_rate¶
SharedVariable
A variable for decay rate.
See also
SharedVariableModifier
- class blocks.algorithms.RemoveNotFinite(scaler=1)¶
Bases: blocks.algorithms.StepRule
A step rule that skips steps with non-finite elements.
Replaces a step (the parameter update of a single shared variable) which contains non-finite elements (such as inf or NaN) with a step rescaling the parameters.
Parameters: scaler (float, optional) – The scaling applied to the parameter in case the step contains non-finite elements. Defaults to 1, which means that parameters will not be changed. Notes
This rule should be applied last!
This trick was originally used in the GroundHog framework.
- compute_step(parameter, previous_step)¶
- class blocks.algorithms.Restrict(step_rule, variables)¶
Bases: blocks.algorithms.StepRule
Applies a given StepRule only to certain variables.
Example applications include clipping steps on only certain parameters, or scaling a certain kind of parameter’s updates (e.g. adding an additional scalar multiplier to the steps taken on convolutional filters).
Parameters: - compute_steps(previous_steps)¶
- class blocks.algorithms.Scale(learning_rate=1.0)¶
Bases: blocks.algorithms.StepRule
A step in the direction proportional to the previous step.
If used in GradientDescent alone, this step rule implements steepest descent.
Parameters: learning_rate (float) – The learning rate by which the previous step is multiplied to produce the step. - learning_rate¶
TensorSharedVariable
The shared variable storing the learning rate used.
- compute_step(parameter, previous_step)¶
- class blocks.algorithms.StepClipping(threshold=None)¶
Bases: blocks.algorithms.StepRule
Rescales an entire step if its L2 norm exceeds a threshold.
When the previous steps are the gradients, this step rule performs gradient clipping.
Parameters: threshold (float, optional) – The maximum permitted L2 norm for the step. The step will be rescaled to be not higher than this quanity. If None, no rescaling will be applied. - threshold¶
tensor.TensorSharedVariable
The shared variable storing the clipping threshold used.
- compute_steps(previous_steps)¶
- class blocks.algorithms.StepRule¶
Bases: object
A rule to compute steps for a gradient descent algorithm.
- compute_step(parameter, previous_step)¶
Build a Theano expression for the step for a parameter.
This method is called by default implementation of compute_steps(), it relieves from writing a loop each time.
Parameters: - parameter (TensorSharedVariable) – The parameter.
- previous_step (TensorVariable) – Some quantity related to the gradient of the cost with respect to the parameter, either the gradient itself or a step in a related direction.
Returns: - step (Variable) – Theano variable for the step to take.
- updates (list) – A list of tuples representing updates to be performed. This is useful for stateful rules such as Momentum which need to update shared variables after itetations.
- compute_steps(previous_steps)¶
Build a Theano expression for steps for all parameters.
Override this method if you want to process the steps with respect to all parameters as a whole, not parameter-wise.
Parameters: previous_steps (OrderedDict) – An OrderedDict of (TensorSharedVariable TensorVariable) pairs. The keys are the parameters being trained, the values are the expressions for quantities related to gradients of the cost with respect to the parameters, either the gradients themselves or steps in related directions. Returns: - steps (OrderedDict) – A dictionary of the proposed steps in the same form as previous_steps.
- updates (list) – A list of tuples representing updates to be performed.
- class blocks.algorithms.TrainingAlgorithm¶
Bases: object
Base class for training algorithms.
A training algorithm object has a simple life-cycle. First it is initialized by calling its initialize() method. At this stage, for instance, Theano functions can be compiled. After that the process_batch() method is repeatedly called with a batch of training data as a parameter.
- initialize(**kwargs)¶
Initialize the training algorithm.
- class blocks.algorithms.VariableClipping(threshold, axis=None)¶
Bases: blocks.algorithms.StepRule
Clip the maximum norm of individual variables along certain axes.
This StepRule can be used to implement L2 norm constraints on e.g. the weight vectors of individual hidden units, convolutional filters or entire weight tensors. Combine with Restrict (and possibly CompositeRule), to apply such constraints only to certain variables and/or apply different norm constraints to different variables.
Parameters: - threshold (float) – Maximum norm for a given (portion of a) tensor.
- axis (int or iterable, optional) – An integer single axis, or an iterable collection of integer axes over which to sum in order to calculate the L2 norm. If None (the default), the norm is computed over all elements of the tensor.
Notes
Because of the way the StepRule API works, this particular rule implements norm clipping of the value after update in the following way: it computes parameter - previous_step, scales it to have (possibly axes-wise) norm(s) of at most threshold, then subtracts that value from parameter to yield an ‘equivalent step’ that respects the desired norm constraints. This procedure implicitly assumes one is doing simple (stochastic) gradient descent, and so steps computed by this step rule may not make sense for use in other contexts.
Investigations into max-norm regularization date from [Srebro2005]. The first appearance of this technique as a regularization method for the weight vectors of individual hidden units in feed-forward neural networks may be [Hinton2012].
[Srebro2005] Nathan Srebro and Adi Shraibman. “Rank, Trace-Norm and Max-Norm”. 18th Annual Conference on Learning Theory (COLT), June 2005. [Hinton2012] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov. “Improving neural networks by preventing co-adaptation of feature detectors”. arXiv:1207.0580. - compute_step(parameter, previous_step)¶
Bricks¶
- Convolutional bricks
- Routing bricks
- Recurrent bricks
- Attention bricks
- Sequence generators
- Cost bricks
- class blocks.bricks.Bias(*args, **kwargs)¶
Bases: blocks.bricks.Feedforward, blocks.bricks.Initializable
Add a bias (i.e. sum with a vector).
- apply¶
Apply the linear transformation.
Parameters: input (TensorVariable) – The input on which to apply the transformation Returns: output – The transformed input plus optional bias Return type: TensorVariable
- get_dim(name)¶
- input_dim¶
- output_dim¶
- class blocks.bricks.Feedforward(name=None)¶
Bases: blocks.bricks.base.Brick
Declares an interface for bricks with one input and one output.
Many bricks have just one input and just one output (activations, Linear, MLP). To make such bricks interchangable in most contexts they should share an interface for configuring their input and output dimensions. This brick declares such an interface.
- input_dim¶
int
The input dimension of the brick.
- output_dim¶
int
The output dimension of the brick.
- class blocks.bricks.FeedforwardSequence(application_methods, **kwargs)¶
Bases: blocks.bricks.Sequence, blocks.bricks.Feedforward
A sequence where the first and last bricks are feedforward.
Parameters: application_methods (list) – List of BoundApplication to apply. The first and last application method should belong to a Feedforward brick. - input_dim¶
- output_dim¶
- class blocks.bricks.Identity(name=None)¶
Bases: blocks.bricks.Activation
Elementwise application of identity function.
- apply¶
Apply the identity function element-wise.
Parameters: input (TensorVariable) – Theano variable to apply identity to, element-wise. Returns: output – The input with the activation function applied. Return type: TensorVariable
- class blocks.bricks.Initializable(*args, **kwargs)¶
Bases: blocks.bricks.base.Brick
Base class for bricks which push parameter initialization.
Many bricks will initialize children which perform a linear transformation, often with biases. This brick allows the weights and biases initialization to be configured in the parent brick and pushed down the hierarchy.
Parameters: - weights_init (object) – A NdarrayInitialization instance which will be used by to initialize the weight matrix. Required by initialize().
- biases_init (object, optional) – A NdarrayInitialization instance that will be used to initialize the biases. Required by initialize() when use_bias is True. Only supported by bricks for which has_biases is True.
- use_bias (bool, optional) – Whether to use a bias. Defaults to True. Required by initialize(). Only supported by bricks for which has_biases is True.
- rng (numpy.random.RandomState) –
- has_biases¶
bool
False if the brick does not support biases, and only has weights_init. For an example of this, see Bidirectional. If this is False, the brick does not support the arguments biases_init or use_bias.
- has_biases = True
- rng¶
- seed¶
- seed_rng = <mtrand.RandomState object at 0x7f2d16d7dd10>¶
- class blocks.bricks.Linear(*args, **kwargs)¶
Bases: blocks.bricks.Initializable, blocks.bricks.Feedforward
A linear transformation with optional bias.
Brick which applies a linear (affine) transformation by multiplying the input with a weight matrix. By default, a bias term is added (see Initializable for information on disabling this).
Parameters: - input_dim (int) – The dimension of the input. Required by allocate().
- output_dim (int) – The dimension of the output. Required by allocate().
Notes
See Initializable for initialization parameters.
A linear transformation with bias is a matrix multiplication followed by a vector summation.
\[f(\mathbf{x}) = \mathbf{W}\mathbf{x} + \mathbf{b}\]- W¶
- apply¶
Apply the linear transformation.
Parameters: input (TensorVariable) – The input on which to apply the transformation Returns: output – The transformed input plus optional bias Return type: TensorVariable
- b¶
- get_dim(name)¶
- class blocks.bricks.LinearMaxout(*args, **kwargs)¶
Bases: blocks.bricks.Initializable, blocks.bricks.Feedforward
Maxout pooling following a linear transformation.
This code combines the Linear brick with a Maxout brick.
Parameters: - input_dim (int) – The dimension of the input. Required by allocate().
- output_dim (int) – The dimension of the output. Required by allocate().
- num_pieces (int) – The number of linear functions. Required by allocate().
Notes
See Initializable for initialization parameters.
- apply¶
Apply the linear transformation followed by maxout.
Parameters: input (TensorVariable) – The input on which to apply the transformations Returns: output – The transformed input Return type: TensorVariable
- input_dim¶
- class blocks.bricks.Logistic(name=None)¶
Bases: blocks.bricks.Activation
Elementwise application of logistic function.
- apply¶
Apply the logistic function element-wise.
Parameters: input (TensorVariable) – Theano variable to apply logistic to, element-wise. Returns: output – The input with the activation function applied. Return type: TensorVariable
- class blocks.bricks.MLP(*args, **kwargs)¶
Bases: blocks.bricks.Sequence, blocks.bricks.Initializable, blocks.bricks.Feedforward
A simple multi-layer perceptron.
Parameters: - activations (list of Brick, BoundApplication,) – or None A list of activations to apply after each linear transformation. Give None to not apply any activation. It is assumed that the application method to use is apply. Required for __init__().
- dims (list of ints) – A list of input dimensions, as well as the output dimension of the last layer. Required for allocate().
Notes
See Initializable for initialization parameters.
Note that the weights_init, biases_init and use_bias configurations will overwrite those of the layers each time the MLP is re-initialized. For more fine-grained control, push the configuration to the child layers manually before initialization.
>>> from blocks.initialization import IsotropicGaussian, Constant >>> mlp = MLP(activations=[Tanh(), None], dims=[30, 20, 10], ... weights_init=IsotropicGaussian(), ... biases_init=Constant(1)) >>> mlp.push_initialization_config() # Configure children >>> mlp.children[0].weights_init = IsotropicGaussian(0.1) >>> mlp.initialize()
- input_dim¶
- output_dim¶
- class blocks.bricks.Maxout(*args, **kwargs)¶
Bases: blocks.bricks.base.Brick
Maxout pooling transformation.
A brick that does max pooling over groups of input units. If you use this code in a research project, please cite [GWFM13].
[GWFM13] Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio, Maxout networks, ICML (2013), pp. 1319-1327. Parameters: num_pieces (int) – The size of the groups the maximum is taken over. Notes
Maxout applies a set of linear transformations to a vector and selects for each output dimension the result with the highest value.
- apply¶
Apply the maxout transformation.
Parameters: input (TensorVariable) – The input on which to apply the transformation Returns: output – The transformed input Return type: TensorVariable
- class blocks.bricks.NDimensionalSoftmax(name=None)¶
Bases: blocks.bricks.Softmax
A wrapped brick class.
This brick was automatically constructed by wrapping Softmax with WithExtraDims.
- apply¶
Wraps the application method with reshapes.
Parameters: extra_ndim (int, optional) – The number of extra dimensions. Default is zero. See also
- Softmax.apply()
- For documentation of the wrapped application method.
- apply_delegate()¶
- categorical_cross_entropy¶
Wraps the application method with reshapes.
Parameters: extra_ndim (int, optional) – The number of extra dimensions. Default is zero. See also
- Softmax.categorical_cross_entropy()
- For documentation of the wrapped application method.
- categorical_cross_entropy_delegate()¶
- decorators = [<blocks.bricks.wrappers.WithExtraDims object at 0x7f2d16d11890>]¶
- log_probabilities¶
Wraps the application method with reshapes.
Parameters: extra_ndim (int, optional) – The number of extra dimensions. Default is zero. See also
- Softmax.log_probabilities()
- For documentation of the wrapped application method.
- log_probabilities_delegate()¶
- class blocks.bricks.Random(theano_seed=None, **kwargs)¶
Bases: blocks.bricks.base.Brick
A mixin class for Bricks which need Theano RNGs.
Parameters: theano_seed (int or list, optional) – Seed to use for a MRG_RandomStreams object. - seed_rng = <mtrand.RandomState object at 0x7f2d16d7d690>¶
- theano_rng¶
Returns Brick’s Theano RNG, or a default one.
The default seed can be set through blocks.config.
- theano_seed¶
- class blocks.bricks.Rectifier(name=None)¶
Bases: blocks.bricks.Activation
Elementwise application of rectifier function.
- apply¶
Apply the rectifier function element-wise.
Parameters: input (TensorVariable) – Theano variable to apply rectifier to, element-wise. Returns: output – The input with the activation function applied. Return type: TensorVariable
- class blocks.bricks.Sequence(application_methods, **kwargs)¶
Bases: blocks.bricks.base.Brick
A sequence of bricks.
This brick applies a sequence of bricks, assuming that their in- and outputs are compatible.
Parameters: application_methods (list) – List of BoundApplication to apply - apply¶
- apply_inputs()¶
- apply_outputs()¶
- class blocks.bricks.Softmax(name=None)¶
Bases: blocks.bricks.base.Brick
A softmax brick.
Works with 2-dimensional inputs only. If you need more, see NDimensionalSoftmax.
- apply¶
Standard softmax.
Parameters: input (Variable) – A matrix, each row contains unnormalized log-probabilities of a distribution. Returns: output_ – A matrix with probabilities in each row for each distribution from input_. Return type: Variable
- categorical_cross_entropy¶
Computationally stable cross-entropy for pre-softmax values.
Parameters: - y (TensorVariable) – In the case of a matrix argument, each row represents a probabilility distribution. In the vector case, each element represents a distribution by specifying the position of 1 in a 1-hot vector.
- x (TensorVariable) – A matrix, each row contains unnormalized probabilities of a distribution.
Returns: cost – A vector of cross-entropies between respective distributions from y and x.
Return type: TensorVariable
- log_probabilities¶
Normalize log-probabilities.
Converts unnormalized log-probabilities (exponents of which do not sum to one) into actual log-probabilities (exponents of which sum to one).
Parameters: input (Variable) – A matrix, each row contains unnormalized log-probabilities of a distribution. Returns: output – A matrix with normalized log-probabilities in each row for each distribution from input_. Return type: Variable
- class blocks.bricks.Softplus(name=None)¶
Bases: blocks.bricks.Activation
Elementwise application of softplus function.
- apply¶
Apply the softplus function element-wise.
Parameters: input (TensorVariable) – Theano variable to apply softplus to, element-wise. Returns: output – The input with the activation function applied. Return type: TensorVariable
- class blocks.bricks.Tanh(name=None)¶
Bases: blocks.bricks.Activation
Elementwise application of tanh function.
- apply¶
Apply the tanh function element-wise.
Parameters: input (TensorVariable) – Theano variable to apply tanh to, element-wise. Returns: output – The input with the activation function applied. Return type: TensorVariable
- class blocks.bricks.lookup.LookupTable(*args, **kwargs)¶
Bases: blocks.bricks.Initializable
Encapsulates representations of a range of integers.
Parameters: Notes
See Initializable for initialization parameters.
- W¶
- apply¶
Perform lookup.
Parameters: indices (TensorVariable) – The indices of interest. The dtype must be integer. Returns: output – Representations for the indices of the query. Has \(k+1\) dimensions, where \(k\) is the number of dimensions of the indices parameter. The last dimension stands for the representation element. Return type: TensorVariable
- has_bias = False¶
Convolutional bricks¶
- class blocks.bricks.conv.Convolutional(*args, **kwargs)¶
Bases: blocks.bricks.Initializable
Performs a 2D convolution.
Parameters: - filter_size (tuple) – The height and width of the filter (also called kernels).
- num_filters (int) – Number of filters per channel.
- num_channels (int) – Number of input channels in the image. For the first layer this is normally 1 for grayscale images and 3 for color (RGB) images. For subsequent layers this is equal to the number of filters output by the previous convolutional layer. The filters are pooled over the channels.
- batch_size (int, optional) – Number of examples per batch. If given, this will be passed to Theano convolution operator, possibly resulting in faster execution.
- image_size (tuple, optional) – The height and width of the input (image or feature map). If given, this will be passed to the Theano convolution operator, resulting in possibly faster execution times.
- step (tuple, optional) – The step (or stride) with which to slide the filters over the image. Defaults to (1, 1).
- border_mode ({‘valid’, ‘full’}, optional) – The border mode to use, see scipy.signal.convolve2d() for details. Defaults to ‘valid’.
- tied_biases (bool) – If True, it indicates that the biases of every filter in this layer should be shared amongst all applications of that filter. Setting this to False will untie the biases, yielding a separate bias for every location at which the filter is applied. Defaults to False.
- apply¶
Perform the convolution.
Parameters: input (TensorVariable) – A 4D tensor with the axes representing batch size, number of channels, image height, and image width. Returns: output – A 4D tensor of filtered images (feature maps) with dimensions representing batch size, number of filters, feature map height, and feature map width. The height and width of the feature map depend on the border mode. For ‘valid’ it is image_size - filter_size + 1 while for ‘full’ it is image_size + filter_size - 1.
Return type: TensorVariable
- get_dim(name)¶
- class blocks.bricks.conv.ConvolutionalActivation(*args, **kwargs)¶
Bases: blocks.bricks.Sequence, blocks.bricks.Initializable
A convolution followed by an activation function.
Parameters: activation (BoundApplication) – The application method to apply after convolution (i.e. the nonlinear activation function) See also
- Convolutional
- For the documentation of other parameters.
- get_dim(name)¶
- class blocks.bricks.conv.ConvolutionalLayer(*args, **kwargs)¶
Bases: blocks.bricks.Sequence, blocks.bricks.Initializable
A complete convolutional layer: Convolution, nonlinearity, pooling.
Todo
Mean pooling.
Parameters: activation (BoundApplication) – The application method to apply in the detector stage (i.e. the nonlinearity before pooling. Needed for __init__. See also
- Convolutional
- Documentation of convolution arguments.
- MaxPooling
- Documentation of pooling arguments.
Notes
Uses max pooling.
- get_dim(name)¶
- class blocks.bricks.conv.ConvolutionalSequence(*args, **kwargs)¶
Bases: blocks.bricks.Sequence, blocks.bricks.Initializable, blocks.bricks.Feedforward
A sequence of convolutional operations.
Parameters: - layers (list) – List of convolutional bricks (i.e. ConvolutionalActivation or ConvolutionalLayer)
- num_channels (int) – Number of input channels in the image. For the first layer this is normally 1 for grayscale images and 3 for color (RGB) images. For subsequent layers this is equal to the number of filters output by the previous convolutional layer.
- batch_size (int, optional) – Number of images in batch. If given, will be passed to theano’s convolution operator resulting in possibly faster execution.
- image_size (tuple, optional) – Width and height of the input (image/featuremap). If given, will be passed to theano’s convolution operator resulting in possibly faster execution.
Notes
The passed convolutional operators should be ‘lazy’ constructed, that is, without specifying the batch_size, num_channels and image_size. The main feature of ConvolutionalSequence is that it will set the input dimensions of a layer to the output dimensions of the previous layer by the push_allocation_config() method.
- get_dim(name)¶
- class blocks.bricks.conv.Flattener(name=None)¶
Bases: blocks.bricks.base.Brick
Flattens the input.
It may be used to pass multidimensional objects like images or feature maps of convolutional bricks into bricks which allow only two dimensional input (batch, features) like MLP.
- apply¶
- class blocks.bricks.conv.MaxPooling(*args, **kwargs)¶
Bases: blocks.bricks.Initializable, blocks.bricks.Feedforward
Max pooling layer.
Parameters: - pooling_size (tuple) – The height and width of the pooling region i.e. this is the factor by which your input’s last two dimensions will be downscaled.
- step (tuple, optional) – The vertical and horizontal shift (stride) between pooling regions. By default this is equal to pooling_size. Setting this to a lower number results in overlapping pooling regions.
- input_dim (tuple, optional) – A tuple of integers representing the shape of the input. The last two dimensions will be used to calculate the output dimension.
- apply¶
Apply the pooling (subsampling) transformation.
Parameters: input (TensorVariable) – An tensor with dimension greater or equal to 2. The last two dimensions will be downsampled. For example, with images this means that the last two dimensions should represent the height and width of your image. Returns: output – A tensor with the same number of dimensions as input_, but with the last two dimensions downsampled. Return type: TensorVariable
- get_dim(name)¶
Routing bricks¶
- class blocks.bricks.parallel.Distribute(*args, **kwargs)¶
Bases: blocks.bricks.parallel.Fork
Transform an input and add it to other inputs.
This brick is designed for the following scenario: one has a group of variables and another separate variable, and one needs to somehow distribute information from the latter across the former. We call that “to distribute a varible across other variables”, and refer to the separate variable as “the source” and to the variables from the group as “the targets”.
Given a prototype brick, a Parallel brick makes several copies of it (each with its own parameters). At the application time the copies are applied to the source and the transformation results are added to the targets (in the literate sense).
>>> from theano import tensor >>> from blocks.initialization import Constant >>> x = tensor.matrix('x') >>> y = tensor.matrix('y') >>> z = tensor.matrix('z') >>> distribute = Distribute(target_names=['x', 'y'], source_name='z', ... target_dims=[2, 3], source_dim=3, ... weights_init=Constant(2)) >>> distribute.initialize() >>> new_x, new_y = distribute.apply(x=x, y=y, z=z) >>> new_x.eval({x: [[2, 2]], z: [[1, 1, 1]]}) array([[ 8., 8.]]... >>> new_y.eval({y: [[1, 1, 1]], z: [[1, 1, 1]]}) array([[ 7., 7., 7.]]...
Parameters: - target_names (list) – The names of the targets.
- source_name (str) – The name of the source.
- target_dims (list) – A list of target dimensions, corresponding to target_names.
- source_dim (int) – The dimension of the source input.
- prototype (Feedforward, optional) – The transformation prototype. A copy will be created for every input. By default a linear transformation is used.
- target_dims¶
list
- source_dim¶
int
Notes
See Initializable for initialization parameters.
- apply¶
Distribute the source across the targets.
Parameters: **kwargs (dict) – The source and the target variables. Returns: output – The new target variables. Return type: list
- apply_inputs()¶
- apply_outputs()¶
- class blocks.bricks.parallel.Fork(*args, **kwargs)¶
Bases: blocks.bricks.parallel.Parallel
Several outputs from one input by applying similar transformations.
Given a prototype brick, a Fork brick makes several copies of it (each with its own parameters). At the application time the copies are applied to the input to produce different outputs.
A typical usecase for this brick is to produce inputs for gates of gated recurrent bricks, such as GatedRecurrent.
>>> from theano import tensor >>> from blocks.initialization import Constant >>> x = tensor.matrix('x') >>> fork = Fork(output_names=['y', 'z'], ... input_dim=2, output_dims=[3, 4], ... weights_init=Constant(2), biases_init=Constant(1)) >>> fork.initialize() >>> y, z = fork.apply(x) >>> y.eval({x: [[1, 1]]}) array([[ 5., 5., 5.]]... >>> z.eval({x: [[1, 1]]}) array([[ 5., 5., 5., 5.]]...
Parameters: - output_names (list of str) – Names of the outputs to produce.
- input_dim (int) – The input dimension.
- prototype (Feedforward, optional) – The transformation prototype. A copy will be created for every input. By default an affine transformation is used.
- input_dim¶
int
The input dimension.
- output_dims¶
list
The output dimensions as a list of integers, corresponding to output_names.
See also
- apply¶
- apply_outputs()¶
- class blocks.bricks.parallel.Merge(*args, **kwargs)¶
Bases: blocks.bricks.parallel.Parallel
Merges several variables by applying a transformation and summing.
Parameters: - input_names (list) – The input names.
- input_dims (list) – The dictionary of input dimensions, keys are input names, values are dimensions.
- output_dim (int) – The output dimension of the merged variables.
- prototype (Feedforward, optional) – A transformation prototype. A copy will be created for every input. If None, a linear transformation is used.
- child_prefix (str, optional) – A prefix for children names. By default “transform” is used.
- warning (..) – Note that if you want to have a bias you can pass a Linear brick as a prototype, but this will result in several redundant biases. It is a better idea to use merge.children[0].use_bias = True.
- input_names¶
list
The input names.
- input_dims¶
list
List of input dimensions corresponding to input_names.
- output_dim¶
int
The output dimension.
Examples
>>> from theano import tensor >>> from blocks.initialization import Constant >>> a = tensor.matrix('a') >>> b = tensor.matrix('b') >>> merge = Merge(input_names=['a', 'b'], input_dims=[3, 4], ... output_dim=2, weights_init=Constant(1.)) >>> merge.initialize() >>> c = merge.apply(a=a, b=b) >>> c.eval({a: [[1, 1, 1]], b: [[2, 2, 2, 2]]}) array([[ 11., 11.]]...
- apply¶
- apply_inputs()¶
- class blocks.bricks.parallel.Parallel(*args, **kwargs)¶
Bases: blocks.bricks.Initializable
Apply similar transformations to several inputs.
Given a prototype brick, a Parallel brick makes several copies of it (each with its own parameters). At the application time every copy is applied to the respective input.
>>> from theano import tensor >>> from blocks.initialization import Constant >>> x, y = tensor.matrix('x'), tensor.matrix('y') >>> parallel = Parallel( ... prototype=Linear(use_bias=False), ... input_names=['x', 'y'], input_dims=[2, 3], output_dims=[4, 5], ... weights_init=Constant(2)) >>> parallel.initialize() >>> new_x, new_y = parallel.apply(x=x, y=y) >>> new_x.eval({x: [[1, 1]]}) array([[ 4., 4., 4., 4.]]... >>> new_y.eval({y: [[1, 1, 1]]}) array([[ 6., 6., 6., 6., 6.]]...
Parameters: - input_names (list) – The input names.
- input_dims (list) – List of input dimensions, given in the same order as input_names.
- output_dims (list) – List of output dimensions.
- prototype (Feedforward) – The transformation prototype. A copy will be created for every input.
- child_prefix (str, optional) – The prefix for children names. By default “transform” is used.
- input_names¶
list
The input names.
- input_dims¶
list
Input dimensions.
- output_dims¶
list
Output dimensions.
Notes
See Initializable for initialization parameters.
- apply¶
- apply_inputs()¶
- apply_outputs()¶
Recurrent bricks¶
- class blocks.bricks.recurrent.BaseRecurrent(name=None)¶
Bases: blocks.bricks.base.Brick
Base class for brick with recurrent application method.
- has_bias = False¶
- initial_states¶
Return initial states for an application call.
Default implementation assumes that the recurrent application method is called apply. It fetches the state names from apply.states and a returns a zero matrix for each of them.
SimpleRecurrent, LSTM and GatedRecurrent override this method with trainable initial states initialized with zeros.
Parameters: - batch_size (int) – The batch size.
- *args – The positional arguments of the application call.
- **kwargs – The keyword arguments of the application call.
- initial_states_outputs()¶
- class blocks.bricks.recurrent.Bidirectional(*args, **kwargs)¶
Bases: blocks.bricks.Initializable
Bidirectional network.
A bidirectional network is a combination of forward and backward recurrent networks which process inputs in different order.
Parameters: prototype (instance of BaseRecurrent) – A prototype brick from which the forward and backward bricks are cloned. Notes
See Initializable for initialization parameters.
- apply¶
Applies forward and backward networks and concatenates outputs.
- apply_delegate()¶
- has_bias = False¶
- class blocks.bricks.recurrent.GatedRecurrent(*args, **kwargs)¶
Bases: blocks.bricks.recurrent.BaseRecurrent, blocks.bricks.Initializable
Gated recurrent neural network.
Gated recurrent neural network (GRNN) as introduced in [CvMG14]. Every unit of a GRNN is equipped with update and reset gates that facilitate better gradient propagation.
Parameters: Notes
See Initializable for initialization parameters.
[CvMG14] Kyunghyun Cho, Bart van Merriënboer, Çağlar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, EMNLP (2014), pp. 1724-1734. - apply¶
Apply the gated recurrent transition.
Parameters: - states (TensorVariable) – The 2 dimensional matrix of current states in the shape (batch_size, dim). Required for one_step usage.
- inputs (TensorVariable) – The 2 dimensional matrix of inputs in the shape (batch_size, dim)
- gate_inputs (TensorVariable) – The 2 dimensional matrix of inputs to the gates in the shape (batch_size, 2 * dim).
- mask (TensorVariable) – A 1D binary array in the shape (batch,) which is 1 if there is data available, 0 if not. Assumed to be 1-s only if not given.
Returns: output – Next states of the network.
Return type: TensorVariable
- get_dim(name)¶
- initial_states¶
- state_to_gates¶
- state_to_state¶
- class blocks.bricks.recurrent.LSTM(*args, **kwargs)¶
Bases: blocks.bricks.recurrent.BaseRecurrent, blocks.bricks.Initializable
Long Short Term Memory.
Every unit of an LSTM is equipped with input, forget and output gates. This implementation is based on code by Mohammad Pezeshki that implements the architecture used in [GSS03] and [Grav13]. It aims to do as many computations in parallel as possible and expects the last dimension of the input to be four times the output dimension.
Unlike a vanilla LSTM as described in [HS97], this model has peephole connections from the cells to the gates. The output gates receive information about the cells at the current time step, while the other gates only receive information about the cells at the previous time step. All ‘peephole’ weight matrices are diagonal.
[GSS03] Gers, Felix A., Nicol N. Schraudolph, and Jürgen Schmidhuber, Learning precise timing with LSTM recurrent networks, Journal of Machine Learning Research 3 (2003), pp. 115-143. [Grav13] (1, 2) Graves, Alex, Generating sequences with recurrent neural networks, arXiv preprint arXiv:1308.0850 (2013). [HS97] Sepp Hochreiter, and Jürgen Schmidhuber, Long Short-Term Memory, Neural Computation 9(8) (1997), pp. 1735-1780. Parameters: Notes
See Initializable for initialization parameters.
- apply¶
Apply the Long Short Term Memory transition.
Parameters: - states (TensorVariable) – The 2 dimensional matrix of current states in the shape (batch_size, features). Required for one_step usage.
- cells (TensorVariable) – The 2 dimensional matrix of current cells in the shape (batch_size, features). Required for one_step usage.
- inputs (TensorVariable) – The 2 dimensional matrix of inputs in the shape (batch_size, features * 4). The inputs needs to be four times the dimension of the LSTM brick to insure each four gates receive different transformations of the input. See [Grav13] equations 7 to 10 for more details.
- mask (TensorVariable) – A 1D binary array in the shape (batch,) which is 1 if there is data available, 0 if not. Assumed to be 1-s only if not given.
- [Grav13] Graves, Alex, *Generating sequences with recurrent (..) –
neural networks*, arXiv preprint arXiv:1308.0850 (2013).
Returns: - states (TensorVariable) – Next states of the network.
- cells (TensorVariable) – Next cell activations of the network.
- get_dim(name)¶
- initial_states¶
- class blocks.bricks.recurrent.RecurrentStack(transitions, fork_prototype=None, states_name='states', skip_connections=False, **kwargs)¶
Bases: blocks.bricks.recurrent.BaseRecurrent, blocks.bricks.Initializable
Stack of recurrent networks.
Builds a stack of recurrent layers from a supplied list of BaseRecurrent objects. Each object must have a sequences, contexts, states and outputs parameters to its apply method, such as the ones required by the recurrent decorator from blocks.bricks.recurrent.
In Blocks in general each brick can have an apply method and this method has attributes that list the names of the arguments that can be passed to the method and the name of the outputs returned by the method. The attributes of the apply method of this class is made from concatenating the attributes of the apply methods of each of the transitions from which the stack is made. In order to avoid conflict, the names of the arguments appearing in the states and outputs attributes of the apply method of each layers are renamed. The names of the bottom layer are used as-is and a suffix of the form ‘#<n>’ is added to the names from other layers, where ‘<n>’ is the number of the layer starting from 1, used for first layer above bottom.
The contexts of all layers are merged into a single list of unique names, and no suffix is added. Different layers with the same context name will receive the same value.
The names that appear in sequences are treated in the same way as the names of states and outputs if skip_connections is “True”. The only exception is the “mask” element that may appear in the sequences attribute of all layers, no suffix is added to it and all layers will receive the same mask value. If you set skip_connections to False then only the arguments of the sequences from the bottom layer will appear in the sequences attribute of the apply method of this class. When using this class, with skip_connections set to “True”, you can supply all inputs to all layers using a single fork which is created with output_names set to the apply.sequences attribute of this class. For example, SequenceGenerator will create a such a fork.
Whether or not skip_connections is set, each layer above the bottom also receives an input (values to its sequences arguments) from a fork of the state of the layer below it. Not to be confused with the external fork discussed in the previous paragraph. It is assumed that all states attributes have a “states” argument name (this can be configured with states_name parameter.) The output argument with this name is forked and then added to all the elements appearing in the sequences of the next layer (except for “mask”.) If skip_connections is False then this fork has a bias by default. This allows direct usage of this class with input supplied only to the first layer. But if you do supply inputs to all layers (by setting skip_connections to “True”) then by default there is no bias and the external fork you use to supply the inputs should have its own separate bias.
Parameters: - transitions (list) – List of recurrent units to use in each layer. Each derived from BaseRecurrent Note: A suffix with layer number is added to transitions’ names.
- fork_prototype (FeedForward, optional) – A prototype for the transformation applied to states_name from the states of each layer. The transformation is used when the states_name argument from the outputs of one layer is used as input to the sequences of the next layer. By default it Linear transformation is used, with bias if skip_connections is “False”. If you supply your own prototype you have to enable/disable bias depending on the value of skip_connections.
- states_name (string) – In a stack of RNN the state of each layer is used as input to the next. The states_name identify the argument of the states and outputs attributes of each layer that should be used for this task. By default the argument is called “states”. To be more precise, this is the name of the argument in the outputs attribute of the apply method of each transition (layer.) It is used, via fork, as the sequences (input) of the next layer. The same element should also appear in the states attribute of the apply method.
- skip_connections (bool) – By default False. When true, the sequences of all layers are add to the sequences of the apply of this class. When false only the sequences of the bottom layer appear in the sequences of the apply of this class. In this case the default fork used internally between layers has a bias (see fork_prototype.) An external code can inspect the sequences attribute of the apply method of this class to decide which arguments it need (and in what order.) With skip_connections you can control what is exposed to the externl code. If it is false then the external code is expected to supply inputs only to the bottom layer and if it is true then the external code is expected to supply inputs to all layers. There is just one small problem, the external inputs to the layers above the bottom layer are added to a fork of the state of the layer below it. As a result the output of two forks is added together and it will be problematic if both will have a bias. It is assumed that the external fork has a bias and therefore by default the internal fork will not have a bias if skip_connections is true.
Notes
See BaseRecurrent for more initialization parameters.
- apply¶
Apply the stack of transitions.
Parameters: - low_memory (bool) – Use the slow, but also memory efficient, implementation of this code.
- *args –
Positional argumentes in the order in which they appear in self.apply.sequences followed by self.apply.contexts.
- **kwargs –
Named argument defined in self.apply.sequences, self.apply.states or self.apply.contexts
Returns: outputs – The outputs of all transitions as defined in self.apply.outputs
Return type: (list of) TensorVariable
See also
See docstring of this class for arguments appearing in the lists self.apply.sequences, self.apply.states, self.apply.contexts. See recurrent() : for all other parameters such as iterate and return_initial_states however reverse is currently not implemented.
- do_apply(*args, **kwargs)¶
Apply the stack of transitions.
This is the undecorated implementation of the apply method. A method with an @apply decoration should call this method with iterate=True to indicate that the iteration over all steps should be done internally by this method. A method with a @recurrent method should have iterate=False (or unset) to indicate that the iteration over all steps is done externally.
- get_dim(name)¶
- initial_states¶
- low_memory_apply¶
- normal_inputs(level)¶
- static split_suffix(name)¶
- static suffix(name, level)¶
- static suffixes(names, level)¶
- class blocks.bricks.recurrent.SimpleRecurrent(*args, **kwargs)¶
Bases: blocks.bricks.recurrent.BaseRecurrent, blocks.bricks.Initializable
The traditional recurrent transition.
The most well-known recurrent transition: a matrix multiplication, optionally followed by a non-linearity.
Parameters: Notes
See Initializable for initialization parameters.
- W¶
- apply¶
Apply the simple transition.
Parameters: - inputs (TensorVariable) – The 2D inputs, in the shape (batch, features).
- states (TensorVariable) – The 2D states, in the shape (batch, features).
- mask (TensorVariable) – A 1D binary array in the shape (batch,) which is 1 if there is data available, 0 if not. Assumed to be 1-s only if not given.
- get_dim(name)¶
- initial_states¶
- blocks.bricks.recurrent.recurrent(*args, **kwargs)¶
Wraps an apply method to allow its iterative application.
This decorator allows you to implement only one step of a recurrent network and enjoy applying it to sequences for free. The idea behind is that its most general form information flow of an RNN can be described as follows: depending on the context and driven by input sequences the RNN updates its states and produces output sequences.
Given a method describing one step of an RNN and a specification which of its inputs are the elements of the input sequence, which are the states and which are the contexts, this decorator returns an application method which implements the whole RNN loop. The returned application method also has additional parameters, see documentation of the recurrent_apply inner function below.
Parameters: - sequences (list of strs) – Specifies which of the arguments are elements of input sequences.
- states (list of strs) – Specifies which of the arguments are the states.
- contexts (list of strs) – Specifies which of the arguments are the contexts.
- outputs (list of strs) – Names of the outputs. The outputs whose names match with those in the state parameter are interpreted as next step states.
Returns: recurrent_apply – The new application method that applies the RNN to sequences.
Return type: See also
Attention bricks¶
This module defines the interface of attention mechanisms and a few concrete implementations. For a gentle introduction and usage examples see the tutorial TODO.
An attention mechanism decides to what part of the input to pay attention. It is typically used as a component of a recurrent network, though one can imagine it used in other conditions as well. When the input is big and has certain structure, for instance when it is sequence or an image, an attention mechanism can be applied to extract only information which is relevant for the network in its current state.
For the purpose of documentation clarity, we fix the following terminology in this file:
- network is the network, typically a recurrent one, which uses the attention mechanism.
- The network has states. Using this word in plural might seem weird, but some recurrent networks like LSTM do have several states.
- The big structured input, to which the attention mechanism is applied, is called the attended. When it has variable structure, e.g. a sequence of variable length, there might be a mask associated with it.
- The information extracted by the attention from the attended is called glimpse, more specifically glimpses because there might be a few pieces of this information.
Using this terminology, the attention mechanism computes glimpses given the states of the network and the attended.
An example: in the machine translation network from [BCB] the attended is a sequence of so-called annotations, that is states of a bidirectional network that was driven by word embeddings of the source sentence. The attention mechanism assigns weights to the annotations. The weighted sum of the annotations is further used by the translation network to predict the next word of the generated translation. The weights and the weighted sum are the glimpses. A generalized attention mechanism for this paper is represented here as SequenceContentAttention.
- class blocks.bricks.attention.AbstractAttention(*args, **kwargs)¶
Bases: blocks.bricks.base.Brick
The common interface for attention bricks.
First, see the module-level docstring for terminology.
A generic attention mechanism functions as follows. Its inputs are the states of the network and the attended. Given these two it produces so-called glimpses, that is it extracts information from the attended which is necessary for the network in its current states
For computational reasons we separate the process described above into two stages:
1. The preprocessing stage, preprocess(), includes computation that do not involve the state. Those can be often performed in advance. The outcome of this stage is called preprocessed_attended.
- The main stage, take_glimpses(), includes all the rest.
When an attention mechanism is applied sequentially, some glimpses from the previous step might be necessary to compute the new ones. A typical example for that is when the focus position from the previous step is required. In such cases take_glimpses() should specify such need in its interface (its docstring explains how to do that). In addition initial_glimpses() should specify some sensible initialization for the glimpses to be carried over.
Todo
Only single attended is currently allowed.
preprocess() and initial_glimpses() might end up needing masks, which are currently not provided for them.
Parameters: - state_names¶
list
- state_dims¶
list
- attended_dim¶
int
- get_dim(name)¶
- initial_glimpses(batch_size, attended)¶
Return sensible initial values for carried over glimpses.
Parameters: - batch_size (int or Variable) – The batch size.
- attended (Variable) – The attended.
Returns: initial_glimpses – The initial values for the requested glimpses. These might simply consist of zeros or be somehow extracted from the attended.
Return type: list of Variable
- preprocess¶
Perform the preprocessing of the attended.
Stage 1 of the attention mechanism, see AbstractAttention docstring for an explanation of stages. The default implementation simply returns attended.
Parameters: attended (Variable) – The attended. Returns: preprocessed_attended – The preprocessed attended. Return type: Variable
- take_glimpses(attended, preprocessed_attended=None, attended_mask=None, **kwargs)¶
Extract glimpses from the attended given the current states.
Stage 2 of the attention mechanism, see AbstractAttention for an explanation of stages. If preprocessed_attended is not given, should trigger the stage 1.
This application method must declare its inputs and outputs. The glimpses to be carried over are identified by their presence in both inputs and outputs list. The attended must be the first input, the preprocessed attended must be the second one.
Parameters: - attended (Variable) – The attended.
- preprocessed_attended (Variable, optional) – The preprocessed attended computed by preprocess(). When not given, preprocess() should be called.
- attended_mask (Variable, optional) – The mask for the attended. This is required in the case of padded structured output, e.g. when a number of sequences are force to be the same length. The mask identifies position of the attended that actually contain information.
- **kwargs (dict) – Includes the states and the glimpses to be carried over from the previous step in the case when the attention mechanism is applied sequentially.
- class blocks.bricks.attention.AbstractAttentionRecurrent(name=None)¶
Bases: blocks.bricks.recurrent.BaseRecurrent
The interface for attention-equipped recurrent transitions.
When a recurrent network is equipped with an attention mechanism its transition typically consists of two steps: (1) the glimpses are taken by the attention mechanism and (2) the next states are computed using the current states and the glimpses. It is required for certain usecases (such as sequence generator) that apart from a do-it-all recurrent application method interfaces for the first step and the second steps of the transition are provided.
- apply(**kwargs)¶
Compute next states taking glimpses on the way.
- compute_states(**kwargs)¶
Compute next states given current states and glimpses.
- take_glimpses(**kwargs)¶
Compute glimpses given the current states.
- class blocks.bricks.attention.AttentionRecurrent(transition, attention, distribute=None, add_contexts=True, attended_name=None, attended_mask_name=None, **kwargs)¶
Bases: blocks.bricks.attention.AbstractAttentionRecurrent, blocks.bricks.Initializable
Combines an attention mechanism and a recurrent transition.
This brick equips a recurrent transition with an attention mechanism. In order to do this two more contexts are added: one to be attended and a mask for it. It is also possible to use the contexts of the given recurrent transition for these purposes and not add any new ones, see add_context parameter.
At the beginning of each step attention mechanism produces glimpses; these glimpses together with the current states are used to compute the next state and finish the transition. In some cases glimpses from the previous steps are also necessary for the attention mechanism, e.g. in order to focus on an area close to the one from the previous step. This is also supported: such glimpses become states of the new transition.
To let the user control the way glimpses are used, this brick also takes a “distribute” brick as parameter that distributes the information from glimpses across the sequential inputs of the wrapped recurrent transition.
Parameters: - transition (BaseRecurrent) – The recurrent transition.
- attention (Brick) – The attention mechanism.
- distribute (Brick, optional) – Distributes the information from glimpses across the input sequences of the transition. By default a Distribute is used, and those inputs containing the “mask” substring in their name are not affected.
- add_contexts (bool, optional) – If True, new contexts for the attended and the attended mask are added to this transition, otherwise existing contexts of the wrapped transition are used. True by default.
- attended_name (str) – The name of the attended context. If None, “attended” or the first context of the recurrent transition is used depending on the value of add_contents flag.
- attended_mask_name (str) – The name of the mask for the attended context. If None, “attended_mask” or the second context of the recurrent transition is used depending on the value of add_contents flag.
Notes
See Initializable for initialization parameters.
Wrapping your recurrent brick with this class makes all the states mandatory. If you feel this is a limitation for you, try to make it better! This restriction does not apply to sequences and contexts: those keep being as optional as they were for your brick.
Those coming to Blocks from Groundhog might recognize that this is a RecurrentLayerWithSearch, but on steroids :)
- apply¶
Preprocess a sequence attending the attended context at every step.
Preprocesses the attended context and runs do_apply(). See do_apply() documentation for further information.
- apply_contexts()¶
- apply_delegate()¶
- compute_states¶
Compute current states when glimpses have already been computed.
Combines an application of the distribute that alter the sequential inputs of the wrapped transition and an application of the wrapped transition. All unknown keyword arguments go to the wrapped transition.
Parameters: **kwargs – Should contain everything what self.transition needs and in addition the current glimpses. Returns: current_states – Current states computed by self.transition. Return type: list of TensorVariable
- compute_states_outputs()¶
- do_apply¶
Process a sequence attending the attended context every step.
In addition to the original sequence this method also requires its preprocessed version, the one computed by the preprocess method of the attention mechanism. Unknown keyword arguments are passed to the wrapped transition.
Parameters: **kwargs – Should contain current inputs, previous step states, contexts, the preprocessed attended context, previous step glimpses. Returns: outputs – The current step states and glimpses. Return type: list of TensorVariable
- do_apply_contexts()¶
- do_apply_outputs()¶
- do_apply_sequences()¶
- do_apply_states()¶
- get_dim(name)¶
- initial_states¶
- initial_states_outputs()¶
- take_glimpses¶
Compute glimpses with the attention mechanism.
A thin wrapper over self.attention.take_glimpses: takes care of choosing and renaming the necessary arguments.
Parameters: **kwargs – Must contain the attended, previous step states and glimpses. Can optionaly contain the attended mask and the preprocessed attended. Returns: glimpses – Current step glimpses. Return type: list of TensorVariable
- take_glimpses_outputs()¶
- class blocks.bricks.attention.GenericSequenceAttention(*args, **kwargs)¶
Bases: blocks.bricks.attention.AbstractAttention
Logic common for sequence attention mechanisms.
- compute_weighted_averages¶
Compute weighted averages of the attended sequence vectors.
Parameters: - weights (Variable) – The weights. The shape must be equal to the attended shape without the last dimension.
- attended (Variable) – The attended. The index in the sequence must be the first dimension.
Returns: weighted_averages – The weighted averages of the attended elements. The shape is equal to the attended shape with the first dimension dropped.
Return type: Variable
- compute_weights¶
Compute weights from energies in softmax-like fashion.
Todo
Use Softmax.
Parameters: - energies (Variable) – The energies. Must be of the same shape as the mask.
- attended_mask (Variable) – The mask for the attended. The index in the sequence must be the first dimension.
Returns: weights – Summing to 1 non-negative weights of the same shape as energies.
Return type: Variable
- class blocks.bricks.attention.SequenceContentAttention(*args, **kwargs)¶
Bases: blocks.bricks.attention.GenericSequenceAttention, blocks.bricks.Initializable
Attention mechanism that looks for relevant content in a sequence.
This is the attention mechanism used in [BCB]. The idea in a nutshell:
- The states and the sequence are transformed independently,
- The transformed states are summed with every transformed sequence element to obtain match vectors,
- A match vector is transformed into a single number interpreted as energy,
- Energies are normalized in softmax-like fashion. The resulting summing to one weights are called attention weights,
- Weighted average of the sequence elements with attention weights is computed.
In terms of the AbstractAttention documentation, the sequence is the attended. The weighted averages from 5 and the attention weights from 4 form the set of glimpses produced by this attention mechanism.
Parameters: - state_names (list of str) – The names of the network states.
- attended_dim (int) – The dimension of the sequence elements.
- match_dim (int) – The dimension of the match vector.
- state_transformer (Brick) – A prototype for state transformations. If None, a linear transformation is used.
- attended_transformer (Feedforward) – The transformation to be applied to the sequence. If None an affine transformation is used.
- energy_computer (Feedforward) – Computes energy from the match vector. If None, an affine transformations preceeded by \(tanh\) is used.
Notes
See Initializable for initialization parameters.
[BCB] (1, 2) Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. - compute_energies¶
- get_dim(name)¶
- initial_glimpses¶
- preprocess¶
Preprocess the sequence for computing attention weights.
Parameters: attended (TensorVariable) – The attended sequence, time is the 1-st dimension.
- take_glimpses¶
Compute attention weights and produce glimpses.
Parameters: - attended (TensorVariable) – The sequence, time is the 1-st dimension.
- preprocessed_attended (TensorVariable) – The preprocessed sequence. If None, is computed by calling preprocess().
- attended_mask (TensorVariable) – A 0/1 mask specifying available data. 0 means that the corresponding sequence element is fake.
- **states – The states of the network.
Returns: - weighted_averages (Variable) – Linear combinations of sequence elements with the attention weights.
- weights (Variable) – The attention weights. The first dimension is batch, the second is time.
- take_glimpses_inputs()¶
- class blocks.bricks.attention.ShallowEnergyComputer(*args, **kwargs)¶
Bases: blocks.bricks.Sequence, blocks.bricks.Initializable, blocks.bricks.Feedforward
A simple energy computer: first tanh, then weighted sum.
- input_dim¶
- output_dim¶
Sequence generators¶
Recurrent networks are often used to generate/model sequences. Examples include language modelling, machine translation, handwriting synthesis, etc.. A typical pattern in this context is that sequence elements are generated one often another, and every generated element is fed back into the recurrent network state. Sometimes also an attention mechanism is used to condition sequence generation on some structured input like another sequence or an image.
This module provides SequenceGenerator that builds a sequence generating network from three main components:
- a core recurrent transition, e.g. LSTM or GatedRecurrent
- a readout component that can produce sequence elements using the network state and the information from the attention mechanism
- an attention mechanism (see attention for more information)
Implementation-wise SequenceGenerator fully relies on BaseSequenceGenerator. At the level of the latter an attention is mandatory, moreover it must be a part of the recurrent transition (see AttentionRecurrent). To simulate optional attention, SequenceGenerator wraps the pure recurrent network in FakeAttentionRecurrent.
- class blocks.bricks.sequence_generators.AbstractEmitter(name=None)¶
Bases: blocks.bricks.base.Brick
The interface for the emitter component of a readout.
- class blocks.bricks.sequence_generators.AbstractFeedback(name=None)¶
Bases: blocks.bricks.base.Brick
The interface for the feedback component of a readout.
See also
- class blocks.bricks.sequence_generators.AbstractReadout(*args, **kwargs)¶
Bases: blocks.bricks.Initializable
The interface for the readout component of a sequence generator.
The readout component of a sequence generator is a bridge between the core recurrent network and the output sequence.
Parameters: - source_names¶
list
- readout_dim¶
int
See also
- BaseSequenceGenerator
- see how exactly a readout is used
- Readout
- the typically used readout brick
- cost(readouts, outputs)¶
Compute generation cost of outputs given readouts.
Parameters: - readouts (Variable) – Readouts produced by the readout() method of a (..., readout dim) shape.
- outputs (Variable) – Outputs whose cost should be computed. Should have as many or one less dimensions compared to readout. If readout has n dimensions, first n - 1 dimensions of outputs should match with those of readouts.
- emit(readouts)¶
Produce outputs from readouts.
Parameters: readouts (Variable) – Readouts produced by the readout() method of a (batch_size, readout_dim) shape.
- feedback(outputs)¶
Feeds outputs back to be used as inputs of the transition.
- initial_outputs(batch_size)¶
Compute initial outputs for the generator’s first step.
In the notation from the BaseSequenceGenerator documentation this method should compute \(y_0\).
- class blocks.bricks.sequence_generators.BaseSequenceGenerator(*args, **kwargs)¶
Bases: blocks.bricks.Initializable
A generic sequence generator.
This class combines two components, a readout network and an attention-equipped recurrent transition, into a context-dependent sequence generator. Third component must be also given which forks feedback from the readout network to obtain inputs for the transition.
The class provides two methods: generate() and cost(). The former is to actually generate sequences and the latter is to compute the cost of generating given sequences.
The generation algorithm description follows.
Definitions and notation:
- States \(s_i\) of the generator are the states of the transition as specified in transition.state_names.
- Contexts of the generator are the contexts of the transition as specified in transition.context_names.
- Glimpses \(g_i\) are intermediate entities computed at every generation step from states, contexts and the previous step glimpses. They are computed in the transition’s apply method when not given or by explicitly calling the transition’s take_glimpses method. The set of glimpses considered is specified in transition.glimpse_names.
- Outputs \(y_i\) are produced at every step and form the output sequence. A generation cost \(c_i\) is assigned to each output.
Algorithm:
Initialization.
\[\begin{split}y_0 = readout.initial\_outputs(contexts)\\ s_0, g_0 = transition.initial\_states(contexts)\\ i = 1\\\end{split}\]By default all recurrent bricks from recurrent have trainable initial states initialized with zeros. Subclass them or BaseRecurrent directly to get custom initial states.
New glimpses are computed:
\[g_i = transition.take\_glimpses( s_{i-1}, g_{i-1}, contexts)\]A new output is generated by the readout and its cost is computed:
\[\begin{split}f_{i-1} = readout.feedback(y_{i-1}) \\ r_i = readout.readout(f_{i-1}, s_{i-1}, g_i, contexts) \\ y_i = readout.emit(r_i) \\ c_i = readout.cost(r_i, y_i)\end{split}\]Note that the new glimpses and the old states are used at this step. The reason for not merging all readout methods into one is to make an efficient implementation of cost() possible.
New states are computed and iteration is done:
\[\begin{split}f_i = readout.feedback(y_i) \\ s_i = transition.compute\_states(s_{i-1}, g_i, fork.apply(f_i), contexts) \\ i = i + 1\end{split}\]Back to step 2 if the desired sequence length has not been yet reached.
A scheme of the algorithm described above follows.Parameters: - readout (instance of AbstractReadout) – The readout component of the sequence generator.
- transition (instance of AbstractAttentionRecurrent) – The transition component of the sequence generator.
- fork (Brick) – The brick to compute the transition’s inputs from the feedback.
See also
- Initializable
- for initialization parameters
- SequenceGenerator
- more user friendly interface to thisbrick
- cost¶
Returns the average cost over the minibatch.
The cost is computed by averaging the sum of per token costs for each sequence over the minibatch.
Warning
Note that, the computed cost can be problematic when batches consist of vastly different sequence lengths.
Parameters: - outputs (TensorVariable) – The 3(2) dimensional tensor containing output sequences. The axis 0 must stand for time, the axis 1 for the position in the batch.
- mask (TensorVariable) – The binary matrix identifying fake outputs.
Returns: cost – Theano variable for cost, computed by summing over timesteps and then averaging over the minibatch.
Return type: Variable
Notes
The contexts are expected as keyword arguments.
Adds average cost per sequence element AUXILIARY variable to the computational graph with name per_sequence_element.
- generate¶
A sequence generation step.
Parameters: outputs (TensorVariable) – The outputs from the previous step. Notes
The contexts, previous states and glimpses are expected as keyword arguments.
- generate_delegate()¶
- generate_outputs()¶
- generate_states()¶
- get_dim(name)¶
- initial_states¶
- initial_states_outputs()¶
- class blocks.bricks.sequence_generators.FakeAttentionRecurrent(transition, **kwargs)¶
Bases: blocks.bricks.attention.AbstractAttentionRecurrent, blocks.bricks.Initializable
Adds fake attention interface to a transition.
BaseSequenceGenerator requires its transition brick to support AbstractAttentionRecurrent interface, that is to have an embedded attention mechanism. For the cases when no attention is required (e.g. language modeling or encoder-decoder models), FakeAttentionRecurrent is used to wrap a usual recurrent brick. The resulting brick has no glimpses and simply passes all states and contexts to the wrapped one.
Todo
Get rid of this brick and support attention-less transitions in BaseSequenceGenerator.
- apply¶
- apply_delegate()¶
- compute_states¶
- compute_states_delegate()¶
- get_dim(name)¶
- initial_states¶
- initial_states_outputs()¶
- take_glimpses¶
- class blocks.bricks.sequence_generators.LookupFeedback(num_outputs=None, feedback_dim=None, **kwargs)¶
Bases: blocks.bricks.sequence_generators.AbstractFeedback, blocks.bricks.Initializable
A feedback brick for the case when readout are integers.
Stores and retrieves distributed representations of integers.
- feedback¶
- get_dim(name)¶
- class blocks.bricks.sequence_generators.Readout(emitter=None, feedback_brick=None, merge=None, merge_prototype=None, post_merge=None, merged_dim=None, **kwargs)¶
Bases: blocks.bricks.sequence_generators.AbstractReadout
Readout brick with separated emitter and feedback parts.
Readout combines a few bits and pieces into an object that can be used as the readout component in BaseSequenceGenerator. This includes an emitter brick, to which emit(), cost() and initial_outputs() calls are delegated, a feedback brick to which feedback() functionality is delegated, and a pipeline to actually compute readouts from all the sources (see the source_names attribute of AbstractReadout).
The readout computation pipeline is constructed from merge and post_merge brick, whose responsibilites are described in the respective docstrings.
Parameters: - emitter (an instance of AbstractEmitter) – The emitter component.
- feedback_brick (an instance of AbstractFeedback) – The feedback component.
- merge (Brick, optional) – A brick that takes the sources given in source_names as an input and combines them into a single output. If given, merge_prototype cannot be given.
- merge_prototype (FeedForward, optional) – If merge isn’t given, the transformation given by merge_prototype is applied to each input before being summed. By default a Linear transformation without biases is used. If given, merge cannot be given.
- post_merge (Feedforward, optional) – This transformation is applied to the merged inputs. By default Bias is used.
- merged_dim (int, optional) – The input dimension of post_merge i.e. the output dimension of merge (or merge_prototype). If not give, it is assumed to be the same as readout_dim (i.e. post_merge is assumed to not change dimensions).
- **kwargs (dict) – Passed to the parent’s constructor.
- cost¶
- emit¶
- feedback¶
- get_dim(name)¶
- initial_outputs¶
- readout¶
- class blocks.bricks.sequence_generators.SequenceGenerator(readout, transition, attention=None, add_contexts=True, **kwargs)¶
Bases: blocks.bricks.sequence_generators.BaseSequenceGenerator
A more user-friendly interface for BaseSequenceGenerator.
Parameters: - readout (instance of AbstractReadout) – The readout component for the sequence generator.
- transition (instance of BaseRecurrent) – The recurrent transition to be used in the sequence generator. Will be combined with attention, if that one is given.
- attention (object, optional) – The attention mechanism to be added to transition, an instance of AbstractAttention.
- add_contexts (bool) – If True, the AttentionRecurrent wrapping the transition will add additional contexts for the attended and its mask.
- **kwargs (dict) – All keywords arguments are passed to the base class. If fork keyword argument is not provided, Fork is created that forks all transition sequential inputs without a “mask” substring in them.
- class blocks.bricks.sequence_generators.SoftmaxEmitter(initial_output=0, **kwargs)¶
Bases: blocks.bricks.sequence_generators.AbstractEmitter, blocks.bricks.Initializable, blocks.bricks.Random
A softmax emitter for the case of integer outputs.
Interprets readout elements as energies corresponding to their indices.
Parameters: initial_output (int or a scalar Variable) – The initial output. - cost¶
- emit¶
- get_dim(name)¶
- initial_outputs¶
- probs¶
- class blocks.bricks.sequence_generators.TrivialEmitter(*args, **kwargs)¶
Bases: blocks.bricks.sequence_generators.AbstractEmitter
An emitter for the trivial case when readouts are outputs.
Parameters: readout_dim (int) – The dimension of the readout. Notes
By default cost() always returns zero tensor.
- cost¶
- emit¶
- get_dim(name)¶
- initial_outputs¶
- class blocks.bricks.sequence_generators.TrivialFeedback(*args, **kwargs)¶
Bases: blocks.bricks.sequence_generators.AbstractFeedback
A feedback brick for the case when readout are outputs.
- feedback¶
- get_dim(name)¶
Cost bricks¶
- class blocks.bricks.cost.AbsoluteError(name=None)¶
Bases: blocks.bricks.cost.CostMatrix
- cost_matrix¶
- class blocks.bricks.cost.BinaryCrossEntropy(name=None)¶
Bases: blocks.bricks.cost.CostMatrix
- cost_matrix¶
- class blocks.bricks.cost.CategoricalCrossEntropy(name=None)¶
Bases: blocks.bricks.cost.Cost
- apply¶
- class blocks.bricks.cost.Cost(name=None)¶
Bases: blocks.bricks.base.Brick
- apply¶
- class blocks.bricks.cost.CostMatrix(name=None)¶
Bases: blocks.bricks.cost.Cost
Base class for costs which can be calculated element-wise.
Assumes that the data has format (batch, features).
- apply¶
- cost_matrix¶
- class blocks.bricks.cost.MisclassificationRate(top_k=1)¶
Bases: blocks.bricks.cost.Cost
Calculates the misclassification rate for a mini-batch.
Parameters: top_k (int, optional) – If the ground truth class is within the top_k highest responses for a given example, the model is considered to have predicted correctly. Default: 1. Notes
Ties for top_k-th place are broken pessimistically, i.e. in the (in practice, rare) case that there is a tie for top_k-th highest output for a given example, it is considered an incorrect prediction.
- apply¶
- class blocks.bricks.cost.SquaredError(name=None)¶
Bases: blocks.bricks.cost.CostMatrix
- cost_matrix¶
Wrapper bricks¶
- class blocks.bricks.wrappers.BrickWrapper¶
Bases: object
Base class for wrapper metaclasses.
Sometimes one wants to extend a brick with the capability to handle inputs different from what it was designed to handle. A typical example are inputs with more dimensions that was foreseen at the development stage. One way to proceed in such a situation is to write a decorator that wraps all application methods of the brick class by some additional logic before and after the application call. BrickWrapper serves as a convenient base class for such decorators.
Note, that since directly applying a decorator to a Brick subclass will only take place after __new__() is called, subclasses of BrickWrapper should be applied by setting the decorators attribute of the new brick class, like in the example below:
>>> from blocks.bricks.base import Brick >>> class WrappedBrick(Brick): ... decorators = [WithExtraDims()]
- wrap(wrapped, namespace)¶
Wrap an application of the base brick.
This method should be overriden to write into its namespace argument all required changes.
Parameters: - mcs (type) – The metaclass.
- wrapped (Application) – The application to be wrapped.
- namespace (dict) – The namespace of the class being created.
- class blocks.bricks.wrappers.WithExtraDims¶
Bases: blocks.bricks.wrappers.BrickWrapper
Wraps a brick’s applications to handle inputs with extra dimensions.
A brick can be often reused even when data has more dimensions than in the default setting. An example is a situation when one wants to apply categorical_cross_entropy() to temporal data, that is when an additional ‘time’ axis is prepended to its both x and y inputs.
This wrapper adds reshapes required to use application methods of a brick with such data by merging the extra dimensions with the first non-extra one. Two key assumptions are made: that all inputs and outputs have the same number of extra dimensions and that these extra dimensions are equal throughout all inputs and outputs.
While this might be inconvinient, the wrapped brick does not try to guess the number of extra dimensions, but demands it as an argument. The considerations of simplicity and reliability motivated this design choice. Upon availability in Blocks of a mechanism to request the expected number of dimensions for an input of a brick, this can be reconsidered.
- wrap(wrapped, namespace)¶
Extensions¶
- class blocks.extensions.CallbackName¶
Bases: str
A name of a TrainingExtension callback.
Raises: - class:TypeError on comparison with a string which is not a name of
- TrainingExtension callback.
- class blocks.extensions.FinishAfter(**kwargs)¶
Bases: blocks.extensions.SimpleExtension
Finishes the training process when triggered.
- do(which_callback, *args)¶
- class blocks.extensions.Printing(**kwargs)¶
Bases: blocks.extensions.SimpleExtension
Prints log messages to the screen.
- do(which_callback, *args)¶
- class blocks.extensions.ProgressBar(**kwargs)¶
Bases: blocks.extensions.TrainingExtension
Display a progress bar during training.
This extension tries to infer the number of iterations per epoch by querying the num_batches, num_examples and batch_size attributes from the IterationScheme. When this information is not available it will display a simplified progress bar that does not include the estimated time until the end of this epoch.
Notes
This extension should be run before other extensions that print to the screen at the end or at the beginning of the epoch (e.g. the Printing extension). Placing ProgressBar before these extension will ensure you won’t get intermingled output on your terminal.
- after_epoch()¶
- before_batch(batch)¶
- before_epoch()¶
- create_bar()¶
Create a new progress bar.
Calls self.get_iter_per_epoch(), selects an appropriate set of widgets and creates a ProgressBar.
- get_iter_per_epoch()¶
Try to infer the number of iterations per epoch.
- class blocks.extensions.SimpleExtension(**kwargs)¶
Bases: blocks.extensions.TrainingExtension
A base class for simple extensions.
All logic of simple extensions is concentrated in the method do(). This method is called when certain conditions are fulfilled. The user can manage the conditions by calling the add_condition method and by passing arguments to the constructor. In addition to specifying when do() is called, it is possible to specify additional arguments passed to do() under different conditions.
Parameters: - before_training (bool) – If True, do() is invoked before training.
- before_first_epoch (bool) – If True, do() is invoked before the first epoch.
- before_epoch (bool) – If True, do() is invoked before every epoch.
- on_resumption (bool, optional) – If True, do() is invoked when training is resumed.
- on_interrupt (bool, optional) – If True, do() is invoked when training is interrupted.
- after_epoch (bool) – If True, do() is invoked after every epoch.
- after_batch (bool) – If True, do() is invoked after every batch.
- after_training (bool) – If True, do() is invoked after training.
- after_n_epochs (int, optional) – If not None, do() is invoked when after_n_epochs epochs are done.
- every_n_epochs (int, optional) – If not None, do() is invoked after every n-th epoch.
- after_n_batches (int, optional) – If not None, do() is invoked when after_n_batches batches are processed.
- every_n_batches (int, optional) – If not None, do() is invoked after every n-th batch.
- BOOLEAN_TRIGGERS = frozenset(['after_batch', 'after_training', 'before_epoch', 'before_training', 'before_first_epoch', 'after_epoch', 'on_interrupt', 'on_resumption'])¶
- INTEGER_TRIGGERS = frozenset(['every_n_batches', 'after_n_epochs', 'every_n_epochs', 'after_n_batches'])¶
- add_condition(callbacks_names, predicate=None, arguments=None)¶
Adds a condition under which a do() is called.
Parameters: - callbacks_names (list of str) – The names of the callback in which the method.
- predicate (function) – A predicate function the main loop’s log as the single parameter and returning True when the method should be called and False when should not. If None, an always True predicate is used.
- arguments (iterable) – Additional arguments to be passed to do(). They will be concatenated with the ones passed from the main loop (e.g. the batch in case of after_epoch callback).
Returns: Return type: The extension object (allow chaining calls)
- dispatch(callback_invoked, *from_main_loop)¶
Check conditions and call the do() method.
Also adds additional arguments if specified for a condition.
Todo
Add a check for a situation when several conditions are met at the same time and do something.
- do(which_callback, *args)¶
Does the job of the training extension.
Parameters: Notes
Subclasses must accept additional positional arguments in their call signature for this method, even if they are unused.
- static parse_args(which_callback, args)¶
Separates do() arguments coming from different sources.
When a do() method receives arguments from both the main loop (e.g. a batch) and the user, it often has to separate them. This method is the right tool to use.
Parameters: - which_callback (str) – The name of the callback.
- args (iterable) – The arguments.
Returns: - from_main_loop (tuple)
- from_user (tuple)
- set_conditions(**kwargs)¶
Set the conditions for which this extension should be run.
Parameters: - the (See) –
- parameters. (possible) –
- class blocks.extensions.Timing(**kwargs)¶
Bases: blocks.extensions.SimpleExtension
Add timing information to the log.
This adds data about the time spent in the algorithm’s process_batch() method as well as the time spent reading data per batch or epoch. It also reports the time spent initializing the algorithm.
Notes
Add this extension before the Printing extension.
This extension does not enable full profiling information. To see a full profile of the main loop at the end of training, use the profile configuration (e.g. by setting BLOCKS_PROFILE=true).
- do(which_callback, *args)¶
- class blocks.extensions.TrainingExtension(name=None)¶
Bases: object
The base class for training extensions.
An extension is a set of callbacks sharing a joint context that are invoked at certain stages of the training procedure. These callbacks typically add a certain functionality to the training procedure, e.g. running validation on auxiliary datasets or early stopping.
Parameters: name (str, optional) – The name of the extension. The names are useful in order to distinguish between several extensions of the same type that belongs to the same main loop. By default the name is set to the name of the class. - name¶
str
The name of the extension.
- after_batch(batch)¶
The callback invoked after a batch is processed.
Parameters: batch (object) – The data batch just processed.
- after_epoch()¶
The callback invoked after an epoch is finished.
- after_training()¶
The callback invoked after training is finished.
- before_batch(batch)¶
The callback invoked before a batch is processed.
Parameters: batch (object) – The data batch to be processed.
- before_epoch()¶
The callback invoked before starting an epoch.
- before_training()¶
The callback invoked before training is started.
- dispatch(callback_name, *args)¶
Runs callback with the given name.
The reason for having this method is to allow the descendants of the TrainingExtension to intercept callback invocations and do something with them, e.g. block when certain condition does not hold. The default implementation simply invokes the callback by its name.
- main_loop
- on_error()¶
The callback invoked when an error occurs.
- on_interrupt()¶
The callback invoked when training is interrupted.
- on_resumption()¶
The callback invoked after training is resumed.
- blocks.extensions.always_true(log)¶
- blocks.extensions.callback(func)¶
- blocks.extensions.has_done_epochs(log)¶
Monitoring extensions¶
- class blocks.extensions.monitoring.DataStreamMonitoring(variables, data_stream, updates=None, **kwargs)¶
Bases: blocks.extensions.SimpleExtension, blocks.extensions.monitoring.MonitoringExtension
Monitors Theano variables and monitored-quantities on a data stream.
By default monitoring is done before the first and after every epoch.
Parameters: - variables (list of TensorVariable and) – MonitoredQuantity The variables to monitor. The variable names are used as record names in the logs.
- updates (list of tuples or OrderedDict or None) – TensorSharedVariable updates to be performed during evaluation. This parameter is only for Theano variables. Be careful not to update any model parameters as this is not intended to alter your model in any meaningful way. A typical use case of this option arises when the theano function used for evaluation contains a call to scan() which might have returned shared variable updates.
- data_stream (instance of DataStream) – The data stream to monitor on. A data epoch is requested each time monitoring is done.
- PREFIX_SEPARATOR = '_'¶
- do(callback_name, *args)¶
Write the values of monitored variables to the log.
- class blocks.extensions.monitoring.MonitoringExtension(prefix=None, **kwargs)¶
Bases: blocks.extensions.TrainingExtension
A mixin with logic shared by monitoring extensions.
Parameters: prefix (str, optional) – The prefix for the log records done by the extension. It is appended to the variable names with an underscore as a separator. If not given, the names of the observed variables are used as is. - add_records(log, record_tuples)¶
Helper function to add monitoring records to the log.
- record_name(variable)¶
The record name for a variable.
- class blocks.extensions.monitoring.TrainingDataMonitoring(variables, **kwargs)¶
Bases: blocks.extensions.SimpleExtension, blocks.extensions.monitoring.MonitoringExtension
Monitors values of Theano variables on training batches.
Use this extension to monitor a quantity on every training batch cheaply. It integrates with the training algorithm in order to avoid recomputing same things several times. For instance, if you are training a network and you want to log the norm of the gradient on every batch, the backpropagation will only be done once. By controlling the frequency with which the do() method is called, you can aggregate the monitored variables, e.g. only log the gradient norm average over an epoch.
Parameters: variables (list of TensorVariable) – The variables to monitor. The variable names are used as record names in the logs. Notes
All the monitored variables are evaluated _before_ the parameter update.
Requires the training algorithm to be an instance of DifferentiableCostMinimizer.
- do(callback_name, *args)¶
Initializes the buffer or commits the values to the log.
What this method does depends on from what callback it is called. When called within before_training, it initializes the aggregation buffer and instructs the training algorithm what additional computations should be carried at each step by adding corresponding updates to it. In all other cases it writes aggregated values of the monitored variables to the log.
Training¶
Bases: blocks.extensions.SimpleExtension
Adjusts shared variable parameter using some function.
Applies a function to compute the new value of a shared parameter each iteration.
This class can be used to adapt over the training process parameters like learning rate, momentum, etc.
Parameters: - parameter (TensorSharedVariable) – Shared variable to be adjusted
- function (callable) –
A function which outputs a numeric value to which the given shared variable will be set and may take one or two arguments.
In the first case, function that takes the total number of iterations done (int) as an input.
In the second case, it is a function which takes number of iterations done (int) and old value of the shared variable (with the same dtype as parameter).
- class blocks.extensions.training.TrackTheBest(record_name, notification_name=None, choose_best=<built-in function min>, **kwargs)¶
Bases: blocks.extensions.SimpleExtension
Check if a log quantity has the minimum/maximum value so far.
Parameters: - record_name (str) – The name of the record to track.
- notification_name (str, optional) – The name for the record to be made in the log when the current value of the tracked quantity is the best so far. It not given, ‘record_name’ plus “best_so_far” suffix is used.
- choose_best (callable, optional) – A function that takes the current value and the best so far and return the best of two. By default min(), which corresponds to tracking the minimum value.
- best_name¶
str
The name of the status record to keep the best value so far.
- notification_name¶
str
The name of the record written to the log when the current value of the tracked quantity is the best so far.
Notes
In the likely case that you are relying on another extension to add the tracked quantity to the log, make sure to place this extension after the extension that writes the quantity to the log in the extensions argument to blocks.main_loop.MainLoop.
- do(which_callback, *args)¶
Serialization¶
- class blocks.extensions.saveload.Checkpoint(path, save_separately=None, use_cpickle=False, **kwargs)¶
Bases: blocks.extensions.SimpleExtension
Saves a pickled version of the main loop to the disk.
The pickled main loop can be later reloaded and training can be resumed.
Makes a SAVED_TO record in the log with the serialization destination in the case of success and None in the case of failure. The value of the record is a tuple of paths to which saving was done (there can be more than one if the user added a condition with an argument, see do() docs).
Parameters: - path (str) – The destination path for pickling.
- save_separately (list of str, optional) – The list of the main loop’s attributes to be pickled separately to their own files. The paths will be formed by adding the attribute name preceded by an underscore before the path extension. The whole main loop will still be pickled as usual.
- use_cpickle (bool) – See documentation of dump().
Notes
Using pickling for saving the whole main loop object comes with certain limitations:
- Theano computation graphs build in the GPU-mode (theano.config.device == “gpu”) can not be used in the usual mode (and vice-versa). Therefore using this extension binds you to using only one kind of device.
- do(callback_name, *args)¶
Pickle the main loop object to the disk.
If *args contain an argument from user, it is treated as saving path to be used instead of the one given at the construction stage.
- class blocks.extensions.saveload.Load(path, load_iteration_state=False, load_log=False, **kwargs)¶
Bases: blocks.extensions.TrainingExtension
Loads a saved checkpoint into the main loop.
Makes a LOADED_FROM record in the log with the dump path.
Parameters: - path (str) – The path to the folder with dump.
- load_iteration_state (bool) – If True, load the iteration state. This can be useful when your model has very long epochs, and you want to resume when you were in the middle of one. Defaults to False.
- load_log (bool) – If True, load the old log and continue logging from there. Convenient because you end up with a single log of the entire training history. Defaults to False.
Notes
Requires the model to be created entirely using bricks, with a unique path/name for each brick, so that the parameters can be matched to their values.
In order to load the iteration state and the log, the saved model needs to be unpickled. Note that resuming training this way is still not entirely seamless because e.g. extensions will not be reloaded.
- before_training()¶
- load_to(main_loop)¶
Filter¶
- class blocks.filter.VariableFilter(roles=None, bricks=None, each_role=False, name=None, name_regex=None, theano_name=None, theano_name_regex=None, applications=None)¶
Bases: object
Filters Theano variables based on a range of criteria.
Parameters: - roles (list of VariableRole instances, optional) – Matches any variable which has one of the roles given.
- bricks (list of Brick classes or list of instances of) – Brick, optional Matches any variable that is instance of any of the given classes or that is owned by any of the given brick instances.
- each_role (bool, optional) – If True, the variable needs to have all given roles. If False, a variable matching any of the roles given will be returned. False by default.
- name (str, optional) – The variable name. The Blocks name (i.e. x.tag.name) is used.
- name_regex (str, optional) – A regular expression for the variable name. The Blocks name (i.e. x.tag.name) is used.
- theano_name (str, optional) – The variable name. The Theano name (i.e. x.name) is used.
- theano_name_regex (str, optional) – A regular expression for the variable name. The Theano name (i.e. x.name) is used.
- applications (list of Application, optional) – Matches a variable that was produced by any of the applications given.
Notes
Note that only auxiliary variables, parameters, inputs and outputs are tagged with the brick that created them. Other Theano variables that were created in the process of applying a brick will be filtered out.
Note that technically speaking, bricks are able to have non-shared variables as parameters. For example, we can use the transpose of another weight matrix as the parameter of a particular brick. This means that in some unusual cases, filtering by the PARAMETER role alone will not be enough to retrieve all trainable parameters in your model; you will need to filter out the shared variables from these (using e.g. is_shared_variable()).
Examples
>>> from blocks.bricks import MLP, Linear, Logistic, Identity, BIAS >>> mlp = MLP(activations=[Identity(), Logistic()], dims=[20, 10, 20]) >>> from theano import tensor >>> x = tensor.matrix() >>> y_hat = mlp.apply(x) >>> from blocks.graph import ComputationGraph >>> cg = ComputationGraph(y_hat) >>> from blocks.filter import VariableFilter >>> var_filter = VariableFilter(roles=[BIAS], ... bricks=[mlp.linear_transformations[0]]) >>> var_filter(cg.variables) [b]
- __call__(variables)¶
Filter the given variables.
Parameters: variables (list of TensorVariable) –
- blocks.filter.get_annotation(var, cls)¶
A helper function to retrieve an annotation of a particular type.
Notes
This function returns the first annotation of a particular type. If there are multiple–there shouldn’t be–it will ignore them.
- blocks.filter.get_application_call(var)¶
Retrieves the application call that created this variable.
See get_annotation().
- blocks.filter.get_brick(var)¶
Retrieves the brick that created this variable.
See get_annotation().
Computational graph¶
- class blocks.graph.Annotation¶
Bases: object
Annotations on Theano variables in a graph.
In Blocks annotations are automatically attached to variables created using bricks. One form of annotation is that many variables are assigned a role (see VariableRole). A second form of annotation comes in the form of attaching a Annotation instance to the variable’s tag attribute, with auxiliary variables and/or updates.
For example, we might be interested in the mean activation of certain application of a Linear brick. The variable representing the mean activation is attached as an auxiliary variable to the annotations of the input and output variables of this brick. Using the ComputationGraph class (the variables, auxiliary_variables, etc. attributes in particular) we can retrieve these Theano variables to pass on to the monitor, use as a regularizer, etc.
In most cases, annotations are added on a brick level (e.g. each brick will assign the weight norm of its weights as an auxiliary value) or on an application level (e.g. each time a brick is applied, its mean activation will become an auxiliary variable). However, you can also add annotations manually, by setting the annotation value of a variable’s tag field.
Examples
>>> from theano import tensor >>> x = tensor.vector() >>> annotation = Annotation() >>> annotation.add_auxiliary_variable(x + 1, name='x_plus_1') >>> add_annotation(x, annotation) >>> y = x ** 2 >>> from blocks.graph import ComputationGraph >>> cg = ComputationGraph([y]) >>> cg.auxiliary_variables [x_plus_1]
- add_auxiliary_variable(variable, roles=None, name=None)¶
Attach an auxiliary variable to the graph.
Auxiliary variables are Theano variables that are not part of a brick’s output, but can be useful nonetheless e.g. as a regularizer or to monitor during training progress.
Parameters: - variable (TensorVariable) – The variable you want to add.
- roles (list of VariableRole instances, optional) – The roles of this variable. The AUXILIARY role will automatically be added. Other options are COST, WEIGHT, etc.
- name (str, optional) – Name to give to the variable. If the variable already has a name it will be overwritten.
Examples
>>> from blocks.bricks.base import application, Brick >>> from blocks.roles import COST >>> from blocks.utils import shared_floatx_nans >>> class Foo(Brick): ... def _allocate(self): ... W = shared_floatx_nans((10, 10)) ... self.add_auxiliary_variable(W.mean(), name='mean_W') ... @application ... def apply(self, x, application_call): ... application_call.add_auxiliary_variable( ... x - 1, name='x_minus_1') ... application_call.add_auxiliary_variable( ... x.mean(), roles=[COST], name='mean_x') ... return x + 1 >>> from theano import tensor >>> x = tensor.vector() >>> y = Foo().apply(x) >>> from blocks.filter import VariableFilter >>> cg = ComputationGraph([y]) >>> var_filter = VariableFilter(roles=[AUXILIARY]) >>> var_filter(cg.variables) {x_minus_1, mean_W, mean_x} >>> var_filter = VariableFilter(roles=[COST]) >>> var_filter(cg.variables) {mean_x}
- class blocks.graph.ComputationGraph(outputs)¶
Bases: object
Encapsulates a managed Theano computation graph.
This implies that it not only contains the variables required to compute the given outputs, but also all the auxiliary variables and updates that were attached to these variables through the annotation system.
All variables are presented in topologically sorted order according to the apply nodes that they are an input to.
Parameters: outputs ((list of) TensorVariable) – The output(s) of the computation graph. - inputs¶
list of TensorVariable
The inputs of the computation graph. This does not include shared variables and constants.
list of TensorSharedVariable
All the shared variables in the graph.
- parameters¶
list of TensorSharedVariable
All the shared variables which have the PARAMETER role.
- outputs¶
list of TensorVariable
The outputs of the computations graph (as passed to the constructor).
- auxiliary_variables¶
list of TensorVariable
All variables which have the AUXILIARY role.
- variables¶
list of TensorVariable
All variables (including auxiliary) in the managed graph.
- scans¶
list of Scan
All Scan ops used in this computation graph.
- scan_variables¶
list of TensorVariable
All variables of the inner graphs of Scan ops.
- updates¶
TensorSharedVariable updates
All the updates found attached to the annotations.
- auxiliary_variables
- dict_of_inputs()¶
Return a mapping from an input name to the input.
- get_snapshot(data)¶
Evaluate all role-carrying Theano variables on given data.
Parameters: data (dict of (data source, data) pairs) – Data for input variables. The sources should match with the names of the input variables. Returns: Return type: Dictionary of (variable, variable value on given data) pairs.
- get_theano_function(additional_updates=None, **kwargs)¶
Create Theano function from the graph contained.
Parameters: **kwargs (dict) – Keyword arguments to theano.function. Useful for specifying compilation modes or profiling.
- has_inputs(variable)¶
Check if a variable depends on input variables.
Returns: True if the given variable depends on input variables, False otherwise. Return type: bool
- inputs
Inputs to the graph, excluding constants and shared variables.
- intermediary_variables
- parameters
- replace(replacements)¶
Replace certain variables in the computation graph.
Parameters: replacements (dict) – The mapping from variables to be replaced to the corresponding substitutes. Examples
>>> import theano >>> from theano import tensor, function >>> x = tensor.scalar('x') >>> y = x + 2 >>> z = y + 3 >>> a = z + 5
Let’s suppose we have dependent replacements like
>>> replacements = {y: x * 2, z: y * 3} >>> cg = ComputationGraph([a]) >>> theano.pprint(a) '(((x + TensorConstant{2}) + TensorConstant{3}) + TensorConstant{5})' >>> cg_new = cg.replace(replacements) >>> theano.pprint( ... cg_new.outputs[0]) '(((x * TensorConstant{2}) * TensorConstant{3}) + TensorConstant{5})'
First two sums turned into multiplications
>>> float(function(cg_new.inputs, cg_new.outputs)(3.)[0]) 23.0
- scan_variables
Variables of Scan ops.
- shared_variables
- blocks.graph.add_annotation(var, annotation)¶
- blocks.graph.apply_dropout(computation_graph, variables, drop_prob, rng=None, seed=None)¶
Returns a graph to variables in a computational graph.
Parameters: - computation_graph (instance of ComputationGraph) – The computation graph.
- variables (list of TensorVariable) – Variables to be dropped out.
- drop_prob (float) – Probability of dropping out. If you want to apply the dropout with different probabilities for different layers, call it several times.
- rng (MRG_RandomStreams) – Random number generator.
- seed (int) – Random seed to be used if rng was not specified.
Notes
For more information, see [DROPOUT].
[DROPOUT] Hinton et al. Improving neural networks by preventing co-adaptation of feature detectors, arXiv:1207.0580. Examples
>>> import numpy >>> from theano import tensor, function >>> from blocks.bricks import MLP, Identity >>> from blocks.filter import VariableFilter >>> from blocks.initialization import Constant >>> from blocks.roles import INPUT >>> linear = MLP([Identity(), Identity()], [2, 10, 2], ... weights_init=Constant(1), biases_init=Constant(2)) >>> x = tensor.matrix('x') >>> y = linear.apply(x) >>> cg = ComputationGraph(y)
We are going to drop out all the input variables
>>> inputs = VariableFilter(roles=[INPUT])(cg.variables)
Here we apply dropout with default setting to our computation graph
>>> cg_dropout = apply_dropout(cg, inputs, 0.5)
Dropped out variables have role DROPOUT and are tagged with replacement_of tag. Let’s filter these variables and check if they have the links to original ones.
>>> dropped_out = VariableFilter(roles=[DROPOUT])(cg_dropout.variables) >>> inputs_referenced = [var.tag.replacement_of for var in dropped_out] >>> set(inputs) == set(inputs_referenced) True
Compiling theano functions to forward propagate in original and dropped out graphs
>>> fprop = function(cg.inputs, cg.outputs[0]) >>> fprop_dropout = function(cg_dropout.inputs, cg_dropout.outputs[0])
Initialize an MLP and apply these functions
>>> linear.initialize() >>> fprop(numpy.ones((3, 2), ... dtype=theano.config.floatX)) array([[ 42., 42.], [ 42., 42.], [ 42., 42.]]... >>> fprop_dropout(numpy.ones((3, 2), ... dtype=theano.config.floatX)) array([[ 0., 0.], [ 0., 0.], [ 0., 0.]]...
And after the second run answer is different
>>> fprop_dropout(numpy.ones((3, 2), ... dtype=theano.config.floatX)) array([[ 0., 52.], [ 100., 0.], [ 0., 0.]]...
- blocks.graph.apply_noise(computation_graph, variables, level, seed=None)¶
Add Gaussian noise to certain variable of a computation graph.
Parameters: - computation_graph (instance of ComputationGraph) – The computation graph.
- variables (TensorVariable) – Variables to add noise to.
- level (float) – Noise level.
- seed (int, optional) – The seed with which MRG_RandomStreams is initialized, is set to 1 by default.
- blocks.graph.collect_parameters(computation_graph, parameters)¶
Replace parameters with a single shared variable.
This can be useful if you need to calculate the full Hessian of a computational graph. It replaces parameters with slices of a single large vectors like
>>> from blocks.utils import shared_floatx >>> W1 = shared_floatx(numpy.random.rand(10, 10)) >>> W2 = shared_floatx(numpy.random.rand(10, 10)) >>> all_parameters = shared_floatx(numpy.concatenate( ... [W1.get_value().flatten(), W2.get_value().flatten()])) >>> W1 = all_parameters[:W1.size] >>> W2 = all_parameters[W1.size:]
Parameters: - computation_graph (ComputationGraph instance) – The managed Theano graph in which to collect parameters.
- parameters (list of Theano shared variables) – The parameters whose values should be collected.
Returns: A new Theano graph which has all the given parameters collected into a single large shared variable.
Return type: ComputationGraph instance
Notes
Note that this replacement makes the training of the model significantly slower because of the large amount of Theano’s set_subtensor calls needed to train the model.
Examples
>>> from blocks.bricks import MLP, Logistic >>> from blocks.bricks.cost import SquaredError >>> from theano import tensor >>> x = tensor.matrix() >>> mlp = MLP(activations=[Logistic(), Logistic()], ... dims=[784, 100, 784]) >>> cost = SquaredError().apply(x, mlp.apply(x)) >>> cg = ComputationGraph(cost) >>> new_cg = collect_parameters(cg, cg.shared_variables)
The new graph only has a single shared variable. This variable receives the COLLECTOR role.
>>> new_cg.shared_variables [collected_parameters]
The bricks’ variables have been replaced with reshaped segments of this single shared variable. These replacements are given the COLLECTED role.
>>> from blocks.filter import VariableFilter >>> from blocks.roles import PARAMETER >>> var_filter = VariableFilter(roles=[COLLECTED]) >>> var_filter(new_cg.variables) [Reshape{1}.0, Reshape{1}.0, Reshape{2}.0, Reshape{2}.0]
Parameter initialization¶
- class blocks.initialization.Constant(constant)¶
Bases: blocks.initialization.NdarrayInitialization
Initialize parameters to a constant.
The constant may be a scalar or a ndarray of any shape that is broadcastable with the requested parameter arrays.
Parameters: constant (ndarray) – The initialization value to use. Must be a scalar or an ndarray (or compatible object, such as a nested list) that has a shape that is broadcastable with any shape requested by initialize. - generate(rng, shape)¶
- class blocks.initialization.Identity(mult=1)¶
Bases: blocks.initialization.NdarrayInitialization
Initialize to the identity matrix.
Only works for 2D arrays. If the number of columns is not equal to the number of rows, the array will be truncated or padded with zeros.
Parameters: mult (float, optional) – Multiply the identity matrix with a scalar. Defaults to 1. - generate(rng, shape)¶
- class blocks.initialization.IsotropicGaussian(std=1, mean=0)¶
Bases: blocks.initialization.NdarrayInitialization
Initialize parameters from an isotropic Gaussian distribution.
Parameters: - std (float, optional) – The standard deviation of the Gaussian distribution. Defaults to 1.
- mean (float, optional) – The mean of the Gaussian distribution. Defaults to 0
Notes
Be careful: the standard deviation goes first and the mean goes second!
- generate(rng, shape)¶
- class blocks.initialization.NdarrayInitialization¶
Bases: object
Base class specifying the interface for ndarray initialization.
- generate(rng, shape)¶
Generate an initial set of parameters from a given distribution.
Parameters: - rng (numpy.random.RandomState) –
- shape (tuple) – A shape tuple for the requested parameter array shape.
Returns: output – An ndarray with values drawn from the distribution specified by this object, of shape shape, with dtype config.floatX.
Return type:
- initialize(var, rng, shape=None)¶
Initialize a shared variable with generated parameters.
Parameters: - var (object) – A Theano shared variable whose value will be set with values drawn from this NdarrayInitialization instance.
- rng (numpy.random.RandomState) –
- shape (tuple) – A shape tuple for the requested parameter array shape.
- class blocks.initialization.Orthogonal(scale=1)¶
Bases: blocks.initialization.NdarrayInitialization
Initialize a random orthogonal matrix.
Only works for 2D arrays.
Parameters: - scale (float, optional) – Multiply the resulting matrix with a scalar. Defaults to 1. For a discussion of the importance of scale for training time and generalization refer to [Saxe2013].
- .. – [Saxe2014] Saxe, A.M., McClelland, J.L., Ganguli, S., 2013. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120 [cond-mat, q-bio, stat].
- generate(rng, shape)¶
- class blocks.initialization.Sparse(num_init, weights_init, sparse_init=None)¶
Bases: blocks.initialization.NdarrayInitialization
Initialize only a fraction of the weights, row-wise.
Parameters: - num_init (int or float) – If int, this is the number of weights to initialize per row. If float, it’s the fraction of the weights per row to initialize.
- weights_init (NdarrayInitialization instance) – The initialization scheme to initialize the weights with.
- sparse_init (NdarrayInitialization instance, optional) – What to set the non-initialized weights to (0. by default)
- generate(rng, shape)¶
- class blocks.initialization.Uniform(mean=0.0, width=None, std=None)¶
Bases: blocks.initialization.NdarrayInitialization
Initialize parameters from a uniform distribution.
Parameters: - mean (float, optional) – The mean of the uniform distribution (i.e. the center of mass for the density function); Defaults to 0.
- width (float, optional) – One way of specifying the range of the uniform distribution. The support will be [mean - width/2, mean + width/2]. Exactly one of width or std must be specified.
- std (float, optional) – An alternative method of specifying the range of the uniform distribution. Chooses the width of the uniform such that random variates will have a desired standard deviation. Exactly one of width or std must be specified.
- generate(rng, shape)¶
Logging¶
Main loop¶
- class blocks.main_loop.MainLoop(algorithm, data_stream, model=None, log=None, log_backend=None, extensions=None)¶
Bases: object
The standard main loop of Blocks.
In the MainLoop a model is trained by a training algorithm using data extracted from a data stream. This process is scrupulously documented in a log object.
The MainLoop itself does very little: only fetching the data from the data stream and feeding it to the algorithm. It expects the extensions to do most of the job. A respective callback of every extension is called at every stage of training. The extensions should communicate between themselves and with the main loop object by means of making records in the log. For instance in order to stop the training procedure an extension can make a record training_finish_requested=True in the log. The main loop checks for such a record after every batch and every epoch and terminates when finds it.
The MainLoop also handles interruption signal SIGINT for you (e.g. the one program receives when you press Ctrl + C). It notes this event in the log and at the next iteration or epoch end the main loop will be gracefully finished, with calling all necessary extension callbacks and waiting until they finish.
Parameters: - algorithm (object) – The training algorithm.
- data_stream (instance of DataStream.) – The data stream.
- model (AbstractModel instance, optional) – The model object. It is entirely transparent for the main loop but may be used by extensions.
- log (instance of TrainingLog, optional) – The log. When not given, a TrainingLog is created.
- log_backend (str) – The backend to use for the log. Currently python and sqlite are available. If not given, config.log_backend will be used. Ignored if log is passed.
- extensions (list of TrainingExtension instances) – The training extensions. Will be called in the same order as given here.
- find_extension(name)¶
Find an extension with a given name.
Parameters: name (str) – The name of the extension looked for. Notes
Will crash if there no or several extension found.
- iteration_state¶
Quick access to the (data stream, epoch iterator) pair.
- model¶
- run()¶
Starts the main loop.
The main loop ends when a training extension makes a training_finish_requested record in the log.
- status¶
A shortcut for self.log.status.
- exception blocks.main_loop.TrainingFinish¶
Bases: exceptions.Exception
An exception raised when a finish request is found in the log.
Model¶
A model is a thin layer of abstraction between the user-defined computation graph, bricks, parameters and main loop extensions. This module provides the basic AbstractModel interface as well as its implementations (currently only Model).
- class blocks.model.AbstractModel¶
Bases: object
A parameterized entity trained in the main loop.
A model is a parameterized entity the user trains a in a main loop. The following are traits of every model:
- It has parameters and supports a way to access them. In addition to returning handles to parameter objects it can return their values as numpy arrays and set their values to given numpy arrays.
- It has an optimality objective.
- It can be serialized and deserialized by mean of pickling.
- It might have bricks as its components.
This class provides an interface for models. For experiments use a subclass, e.g. the Model.
- get_objective()¶
Return the optimization objective.
- get_parameter_dict()¶
Return the model parameters.
Returns: parameters – Dictionary of (name, parameter) pairs. Return type: OrderedDict
- get_parameter_values()¶
Return the values of model parameters.
The default implementation assumes that parameters are Theano shared variables.
Returns: parameter_values – Dictionary of (parameter name, ndarray) pairs. Return type: OrderedDict
- get_top_bricks()¶
Return the top-level bricks that are used in the model.
Returns: bricks – List of bricks. Return type: list
- class blocks.model.Model(outputs)¶
Bases: blocks.model.AbstractModel, blocks.graph.ComputationGraph
Wraps a computation graph to support model interface.
This model covers the most common case when all information about the model is contained in an annotated computation graph: parameters are identified by the roles, bricks found by annotations. Due to frequency of this case this class is called simply ‘Model’ and not ‘ComputationGraphModel’.
Todo
Overriding the automatically found parameters and bricks might be needed.
If there are top bricks in scan inner graphs, those will not be found.
Parameters: outputs ((list of) Variable) – The output variables of the computation graph. - get_objective()¶
Return the output variable, if there is a single one.
If there is only one output variable, it is a reasonable default setting to assume that it is the optimization objective.
- get_parameter_dict()¶
Get model parameters.
The parameter names are formed from positions of their owner bricks in the bricks hierarchy. The variable names are used for the parameters that do not belong to any brick.
- get_top_bricks()¶
Variable roles¶
- blocks.roles.add_role(var, role)¶
Add a role to a given Theano variable.
Parameters: - var (TensorVariable) – The variable to assign the new role to.
- role (VariableRole instance) –
Notes
Some roles are subroles of others (e.g. WEIGHT is a subrole of PARAMETER). This function will not add a role if a more specific role has already been added. If you need to replace a role with a parent role (e.g. replace WEIGHT with PARAMETER) you must do so manually.
Examples
>>> from theano import tensor >>> W = tensor.matrix() >>> from blocks.roles import PARAMETER, WEIGHT >>> add_role(W, PARAMETER) >>> print(*W.tag.roles) PARAMETER >>> add_role(W, WEIGHT) >>> print(*W.tag.roles) WEIGHT >>> add_role(W, PARAMETER) >>> print(*W.tag.roles) WEIGHT
Roles¶
All roles are implemented as subclasses of VariableRole.
- class blocks.roles.VariableRole¶
Base class for all variable roles.
The actual roles are instances of the different subclasses of VariableRole. They are:
- blocks.roles.AUXILIARY = AUXILIARY¶
Variables added to the graph as annotations
- blocks.roles.COST = COST¶
A scalar cost that can be used to train or regularize
- blocks.roles.PARAMETER = PARAMETER¶
A parameter of the model
- blocks.roles.WEIGHT = WEIGHT¶
The weight matrices of linear transformations
- blocks.roles.BIAS = BIAS¶
Biases of linear transformations
- blocks.roles.FILTER = FILTER¶
The filters (kernels) of a convolution operation
Brick selectors¶
- class blocks.select.Path(nodes)¶
Bases: object
Encapsulates a path in a hierarchy of bricks.
Currently the only allowed elements of paths are names of the bricks and names of parameters. The latter can only be put in the end of the path. It is planned to support regular expressions in some way later.
Parameters: nodes (list or tuple of path nodes) – The nodes of the path. - nodes¶
tuple
The tuple containing path nodes.
- Path.parameter_separator = '.'¶
- static Path.parse(string)¶
Constructs a path from its string representation.
Todo
More error checking.
Parameters: string (str) – String representation of the path.
- Path.separator = '/'¶
- Path.separator_re = <_sre.SRE_Pattern object at 0x7f2d1599a880>¶
- class blocks.select.Selector(bricks)¶
Bases: object
Selection of elements of a hierarchy of bricks.
Parameters: bricks (list of Brick) – The bricks of the selection. - get_parameters(parameter_name=None)¶
Returns parameters from selected bricks and their descendants.
Parameters: parameter_name (Path.ParameterName, optional) – If given, only parameters with a name attribute equal to parameter_name are returned. Returns: parameters – A dictionary of (path, parameter) pairs, where path is a string representation of the path in the brick hierarchy to the parameter (i.e. the slash-delimited path to the brick that owns the parameter, followed by a dot, followed by the parameter’s name), and parameter is the Theano variable representing the parameter. Return type: OrderedDict Examples
>>> from blocks.bricks import MLP, Tanh >>> mlp = MLP([Tanh(), Tanh(), Tanh()], [5, 7, 11, 2]) >>> mlp.allocate() >>> selector = Selector([mlp]) >>> selector.get_parameters() OrderedDict([('/mlp/linear_0.W', W), ('/mlp/linear_0.b', b), ('/mlp/linear_1.W', W), ('/mlp/linear_1.b', b), ('/mlp/linear_2.W', W), ('/mlp/linear_2.b', b)])
Or, select just the weights of the MLP by passing the parameter name W:
>>> w_select = Selector([mlp]) >>> w_select.get_parameters('W') OrderedDict([('/mlp/linear_0.W', W), ('/mlp/linear_1.W', W), ('/mlp/linear_2.W', W)])
- select(path)¶
Select a subset of current selection matching the path given.
Warning
Current implementation is very inefficient (theoretical complexity is \(O(n^3)\), where \(n\) is the number of bricks in the hierarchy). It can be sped up easily.
Parameters: path (Path or str) – The path for the desired selection. If a string is given it is parsed into a path. Returns: - Depending on the path given, one of the following
- * (class:Selector with desired bricks.)
- * list of (class:~tensor.SharedTensorVariable.)
Theano expressions¶
- blocks.theano_expressions.hessian_times_vector(gradient, parameter, vector, r_op=False)¶
Return an expression for the Hessian times a vector.
Parameters: - gradient (TensorVariable) – The gradient of a cost with respect to parameter
- parameter (TensorVariable) – The parameter with respect to which to take the gradient
- vector (TensorVariable) – The vector with which to multiply the Hessian
- r_op (bool, optional) – Whether to use Rop() or not. Defaults to False. Which solution is fastest normally needs to be determined by profiling.
- blocks.theano_expressions.l2_norm(tensors)¶
Computes the total L2 norm of a set of tensors.
Converts all operands to TensorVariable (see as_tensor_variable()).
Parameters: tensors (iterable of TensorVariable (or compatible)) – The tensors.
Utilities¶
- blocks.utils.change_recursion_limit(*args, **kwds)¶
Temporarily changes the recursion limit.
- blocks.utils.check_theano_variable(variable, n_dim, dtype_prefix)¶
Check number of dimensions and dtype of a Theano variable.
If the input is not a Theano variable, it is converted to one. None input is handled as a special case: no checks are done.
Parameters:
- blocks.utils.dict_subset(dict_, keys, pop=False, must_have=True)¶
Return a subset of a dictionary corresponding to a set of keys.
Parameters: Returns: result – An ordered dictionary of retrieved pairs. The order is the same as in the keys argument.
Return type: OrderedDict
- blocks.utils.dict_union(*dicts, **kwargs)¶
Return union of a sequence of disjoint dictionaries.
Parameters: - dicts (dicts) – A set of dictionaries with no keys in common. If the first dictionary in the sequence is an instance of OrderedDict, the result will be OrderedDict.
- **kwargs – Keywords and values to add to the resulting dictionary.
Raises: ValueError – If a key appears twice in the dictionaries or keyword arguments.
- blocks.utils.extract_args(expected, *args, **kwargs)¶
Route keyword and positional arguments to a list of names.
A frequent situation is that a method of the class gets to know its positional arguments only when an instance of the class has been created. In such cases the signature of such method has to be *args, **kwargs. The downside of such signatures is that the validity of a call is not checked.
Use extract_args() if your method knows at runtime, but not at evaluation/compile time, what arguments it actually expects, in order to check that they are correctly received.
Parameters: - expected (list of str) – A list of strings denoting names for the expected arguments, in order.
- args (iterable) – Positional arguments that have been passed.
- kwargs (Mapping) – Keyword arguments that have been passed.
Returns: routed_args – An OrderedDict mapping the names in expected to values drawn from either args or kwargs in the usual Python fashion.
Return type: OrderedDict
Raises: - KeyError – If a keyword argument is passed, the key for which is not contained within expected.
- TypeError – If an expected argument is accounted for in both the positional and keyword arguments.
- ValueError – If certain arguments in expected are not assigned a value by either a positional or keyword argument.
- blocks.utils.ipdb_breakpoint(x)¶
A simple hook function for put_hook() that runs ipdb.
Parameters: x (ndarray) – The value of the hooked variable.
- blocks.utils.is_graph_input(variable)¶
Check if variable is a user-provided graph input.
To be considered an input the variable must have no owner, and not be a constant or shared variable.
Parameters: variable (TensorVariable) – Returns: True If the variable is a user-provided input to the graph. Return type: bool
Check if a variable is a Theano shared variable.
Notes
This function excludes shared variables that store the state of Theano random number generators.
- blocks.utils.pack(arg)¶
Pack variables into a list.
Parameters: arg (object) – Either a list or tuple, or any other Python object. Lists will be returned as is, and tuples will be cast to lists. Any other variable will be returned in a singleton list. Returns: List containing the arguments Return type: list
- blocks.utils.print_shape(x, header=None)¶
- blocks.utils.print_sum(x, header=None)¶
- blocks.utils.put_hook(variable, hook_fn, *args)¶
Put a hook on a Theano variables.
Ensures that the hook function is executed every time when the value of the Theano variable is available.
Parameters:
- blocks.utils.repr_attrs(instance, *attrs)¶
Prints a representation of an object with certain attributes.
Parameters: - instance (object) – The object of which to print the string representation
- *attrs – Names of attributes that should be printed.
Examples
>>> class A(object): ... def __init__(self, value): ... self.value = value >>> a = A('a_value') >>> repr(a) <blocks.utils.A object at 0x7fb2b4741a10> >>> repr_attrs(a, 'value') <blocks.utils.A object at 0x7fb2b4741a10: value=a_value>
- blocks.utils.reraise_as(new_exc)¶
Reraise an exception as a different type or with a message.
This function ensures that the original traceback is kept, making for easier debugging.
Parameters: new_exc (Exception or str) – The new error to be raised e.g. (ValueError(“New message”)) or a string that will be prepended to the original exception message Notes
Note that when reraising exceptions, the arguments of the original exception are cast to strings and appended to the error message. If you want to retain the original exception arguments, please use:
>>> try: ... 1 / 0 ... except Exception as e: ... reraise_as(Exception("Extra information", *e.args)) Traceback (most recent call last): ... Exception: 'Extra information, ...
Examples
>>> class NewException(Exception): ... def __init__(self, message): ... super(NewException, self).__init__(message) >>> try: ... do_something_crazy() ... except Exception: ... reraise_as(NewException("Informative message")) Traceback (most recent call last): ... NewException: Informative message ...
Transform a value into a shared variable of type floatX.
Parameters: - value (ndarray) – The value to associate with the Theano shared.
- name (str, optional) – The name for the shared variable. Defaults to None.
- borrow (bool, optional) – If set to True, the given value will not be copied if possible. This can save memory and speed. Defaults to False.
- dtype (str, optional) – The dtype of the shared variable. Default value is config.floatX.
Returns: A Theano shared variable with the requested value and dtype.
Return type: class:tensor.TensorSharedVariable
Creates a shared variable array filled with nans.
Parameters: - shape (tuple) – A tuple of integers representing the shape of the array.
- **kwargs – Keyword arguments to pass to the shared_floatx() function.
Returns: A Theano shared variable filled with nans.
Return type: class:’tensor.TensorSharedVariable’
Creates a shared variable array filled with zeros.
Parameters: - shape (tuple) – A tuple of integers representing the shape of the array.
- **kwargs – Keyword arguments to pass to the shared_floatx() function.
Returns: A Theano shared variable filled with zeros.
Return type: class:’tensor.TensorSharedVariable’
Construct a shared variable to hold the value of a tensor variable.
Parameters:
- blocks.utils.unpack(arg, singleton=False)¶
Unpack variables from a list or tuple.
Parameters: - arg (object) – Either a list or tuple, or any other Python object. If passed a list or tuple of length one, the only element of that list will be returned. If passed a tuple of length greater than one, it will be cast to a list before returning. Any other variable will be returned as is.
- singleton (bool) – If True, arg is expected to be a singleton (a list or tuple with exactly one element) and an exception is raised if this is not the case. False by default.
Returns: A list of length greater than one, or any other Python object except tuple.
Return type: object
Development¶
We want to encourage everyone to contribute to the development of Blocks. To ensure the codebase is of high quality, we ask all new developers to have a quick read through these rules to make sure that any code you contribute will be easy to merge!

Formatting guidelines¶
Blocks follows the PEP8 style guide closely, so please make sure you are familiar with it. Our Travis CI buildbot runs flake8 as part of every build, which checks for PEP8 compliance (using the pep8 tool) and for some common coding errors using pyflakes. You might want to install and run flake8 on your code before submitting a PR to make sure that your build doesn’t fail because of e.g. a bit of extra whitespace.
Note that passing flake8 does not necessarily mean that your code is PEP8 compliant! Some guidelines which aren’t checked by flake8:
- Imports should be grouped into standard library, third party, and local imports with a blank line in between groups.
- Variable names should be explanatory and unambiguous.
There are also some style guideline decisions that were made specifically for Blocks:
- Do not rename imports i.e. do not use import theano.tensor as T or import numpy as np.
- Direct imports, import ..., precede from ... import ... statements.
- Imports are otherwise listed alphabetically.
- Don’t recycle variable names (i.e. don’t use the same variable name to refer to different things in a particular part of code), especially when they are arguments to functions.
- Group trivial attribute assignments from arguments and keyword arguments together, and separate them from remaining code with a blank line. Avoid the use of implicit methods such as self.__dict__.update(locals()).
class Foo(object):
def __init__(self, foo, bar, baz=None, **kwargs):
super(Foo, self).__init__(**kwargs)
if baz is None:
baz = []
self.foo = foo
self.bar = bar
self.baz = baz
Code guidelines¶
Some guidelines to keep in mind when coding for Blocks. Some of these are simply preferences, others stem from particular requirements we have e.g. in order to serialize training progress, support Python 2 and 3 simultaneously, etc.
Validating function arguments¶
In general, be Pythonic and rely on duck typing.
When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.
—James Whitcomb Riley
That is, avoid trivial checks such as
isinstance(var, numbers.Integral)
isinstance(var, (tuple, list))
in cases where any number (like a float without a fractional part or a NumPy scalar) or iterable (like a dictionary view, custom iterator) would work too.
If you need to perform some sort of input validation, don’t use assert statements. Raise a ValueError instead. assert statements should only be used for sanity tests i.e. they should never be triggered, unless there is a bug in the code.
Abstract classes¶
If a class is an abstract base class, use Python’s abc to mark it as such.
from abc import ABCMeta
from six import add_metaclass
@add_metaclass(ABCMeta)
class Abstract(object):
pass
Our documentation generator (Sphinx with the autodoc extension, running on Read the Docs) doesn’t recognize classes which inherit the ABCMeta metaclass as abstract and will try to instantiate them, causing errors when building documentation. To prevent this, make sure to always use the add_metaclass decorator, regardless of the parent.
Python 2 and 3¶
Blocks aims to be both Python 2 and Python 3 compliant using a single code-base, without using 2to3. There are many online resources which discuss the writing of compatible code. For a quick overview see the cheatsheet from Python Charmers. For non-trivial cases, we use the six compatibility library.
Documentation should be written to be Python 3 compliant.
Reraising exceptions¶
When catching exceptions, use the reraise_as() function to reraise the exception (optionally with a new message or as a different type). Not doing so clobbers the original traceback, making it impossible to use pdb to debug the problems.
Serialization¶
To ensure the reproducibility of scientific experiments Blocks tries to make sure that stopping and resuming training doesn’t affect the final results. In order to do so it takes a radical approach, serializing the entire training state using pickle. Some things cannot be pickled, so their use should be avoided when the object will be pickled as part of the main loop:
- Lambda functions
- Iterators and generators (use picklable_itertools)
- References to methods as attributes
- Any variable that lies outside of the global namespace e.g. nested functions
- Dynamically generated classes (possible but complicated)
Mutable types as keyword argument defaults¶
A common source of mysterious bugs is the use of mutable types as defaults for keyword arguments.
class Foo(object):
def __init__(self, bar=[]):
bar.append('baz')
self.bar = bar
Initializing two instances of this class results in two objects sharing the same attribute bar with the value ['baz', 'baz'], which is often not what was intended. Instead, use:
class Foo(object):
def __init__(self, bar=None):
if bar is None:
bar = []
bar.append('baz')
self.bar = bar
Writing error messages¶
Comprehensive error messages can be a great way to inform users of what could have gone wrong. However, lengthy error messages can clutter code, and implicitly concatenated strings over multiple lines are frustrating to edit. To prevent this, use a separate triple-quoted string with escaped newlines to store the detailed explanation of your error. Keep a terse error message directly in the code though, so that someone reading the code still knows what the error is being raised for.
informative_error = """
You probably passed the wrong keyword argument, which caused this error. \
Please pass `b` instead of `{value}`, and have a look at the documentation \
of the `is_b` method for details."""
def is_b(value):
"""Raises an error if the value is not 'b'."""
if value != 'b':
raise ValueError("wrong value" + informative_error.format(value))
return value
Unit testing¶
Blocks uses unit testing to ensure that individual parts of the library behave as intended. It’s also essential in ensuring that parts of the library are not broken by proposed changes.
All new code should be accompanied by extensive unit tests. Whenever a pull request is made, the full test suite is run on Travis CI, and pull requests are not merged until all tests pass. Coverage analysis is performed using coveralls. Please make sure that at the very least your unit tests cover the core parts of your committed code. In the ideal case, all of your code should be unit tested.
If you are fixing a bug, please be sure to add a unit test to make sure that the bug does not get re-introduced later on.
The test suite can be executed locally using nose2 [1].
[1] | For all tests but the doctests, nose can also be used. |
Writing and building documentation¶
The documentation guidelines outline how to write documentation for Blocks, and how to build a local copy of the documentation for testing purposes.
Internal API¶
The development API reference contains documentation on the internal classes that Blocks uses. If you are not planning on contributing to Blocks, have a look at the user API reference instead.
Installation¶
See the instructions at the bottom of the installation instructions.
Sending a pull request¶
See our pull request workflow for a refresher on the general recipe for sending a pull request to Blocks.
Internal API¶
Bricks¶
- class blocks.bricks.base.Application(application_function)¶
Bases: object
An application method belonging to a particular type of brick.
The application methods of each Brick class are automatically replaced by an instance of Application. This allows us to store metadata about particular application methods (such as their in- and outputs) easily.
- delegate_function¶
callable
A function that takes a Brick instance as an argument and returns a BoundApplication object to which attribute requests should be routed.
- properties¶
-
A dictionary of property getters that should be called when an attribute with the given name is requested.
- instances¶
dict (Brick, BoundApplication)
A record of bound application instances created by the descriptor protocol.
- call_stack¶
-
The call stack of brick application methods. Used to check whether the current call was made by a parent brick.
- brick¶
type
The brick class to which this instance belongs.
Raises: - ValueError – If a brick’s application method is applied by another brick which does not list the former as a child.
- ValueError – If the application method’s inputs and/or outputs don’t match with the function signature or the values returned (respectively).
Notes
When a Brick is instantiated and its application method (i.e. an instance of this class) requested, the descriptor protocol (through the __get__() method) automatically instantiates a BoundApplication class and returns this. This bound application class can be used to store application information particular to a brick instance. Any attributes unknown to the bounded application are automatically routed to the application that instantiated it.
- __get__(instance, owner)¶
Instantiate BoundApplication for each Brick.
- application_function¶
- apply(bound_application, *args, **kwargs)¶
- call_stack = []
- delegate(f)¶
Decorator to assign a delegate application.
An application method can assign a delegate application. Whenever an attribute is not available, it will be requested from the delegate instead.
Examples
>>> class Foo(Brick): ... @application(outputs=['baz']) ... def apply(self, x): ... return x + 1 ... ... @apply.property('inputs') ... def apply_inputs(self): ... return ['foo', 'bar'] >>> class Bar(Brick): ... def __init__(self, foo): ... self.foo = foo ... ... @application(outputs=['foo']) ... def apply(self, x): ... return x + 1 ... ... @apply.delegate ... def apply_delegate(self): ... return self.foo.apply >>> foo = Foo() >>> bar = Bar(foo) >>> bar.apply.outputs ['foo'] >>> bar.apply.inputs ['foo', 'bar']
- inputs¶
- name¶
- property(name)¶
Decorator to make application properties.
Parameters: name (str) – The name the property should take. Examples
>>> class Foo(Brick): ... @application ... def apply(self, x): ... return x + 1 ... ... @apply.property('inputs') ... def apply_inputs(self): ... return ['foo', 'bar'] >>> foo = Foo() >>> foo.apply.inputs ['foo', 'bar']
- class blocks.bricks.base.ApplicationCall(application)¶
Bases: blocks.graph.Annotation
A link between the variable tags and bricks.
The application call can be used to attach to an apply call auxiliary variables (e.g. monitors or regularizers) that do not form part of the main computation graph.
The application call object is created before the call to the application method and can be accessed by specifying an application_call argument.
Also see Annotation.
Parameters: application (BoundApplication instance) – The bound application (i.e. belong to a brick instance) object being called Examples
>>> class Foo(Brick): ... @application ... def apply(self, x, application_call): ... application_call.add_auxiliary_variable(x.mean()) ... return x + 1 >>> x = tensor.vector() >>> y = Foo().apply(x) >>> from blocks.filter import get_application_call >>> get_application_call(y) <blocks.bricks.base.ApplicationCall object at ...>
- add_auxiliary_variable(variable, roles=None, name=None)¶
- class blocks.bricks.base.BoundApplication(application, brick)¶
Bases: object
An application method bound to a Brick instance.
- name¶
- class blocks.bricks.base.Brick(name=None)¶
Bases: blocks.graph.Annotation
A brick encapsulates Theano operations with parameters.
A brick goes through the following stages:
- Construction: The call to __init__() constructs a Brick instance with a name and creates any child bricks as well.
- Allocation of parameters:
- Allocation configuration of children: The push_allocation_config() method configures any children of this block.
- Allocation: The allocate() method allocates the shared Theano variables required for the parameters. Also allocates parameters for all children.
- The following can be done in either order:
- Application: By applying the brick to a set of Theano variables a part of the computational graph of the final model is constructed.
- The initialization of parameters:
- Initialization configuration of children: The push_initialization_config() method configures any children of this block.
- Initialization: This sets the initial values of the parameters by a call to initialize(), which is needed to call the final compiled Theano function. Also initializes all children.
Not all stages need to be called explicitly. Step 3(a) will automatically allocate the parameters if needed. Similarly, step 3(b.2) and 2(b) will automatically perform steps 3(b.1) and 2(a) if needed. They only need to be called separately if greater control is required. The only two methods which always need to be called are an application method to construct the computational graph, and the initialize() method in order to initialize the parameters.
At each different stage, a brick might need a certain set of configuration settings. All of these settings can be passed to the __init__() constructor. However, by default many bricks support lazy initialization. This means that the configuration settings can be set later.
Note
Some arguments to __init__() are always required, even when lazy initialization is enabled. Other arguments must be given before calling allocate(), while others yet only need to be given in order to call initialize(). Always read the documentation of each brick carefully.
Lazy initialization can be turned off by setting Brick.lazy = False. In this case, there is no need to call initialize() manually anymore, but all the configuration must be passed to the __init__() method.
Parameters: name (str, optional) – The name of this brick. This can be used to filter the application of certain modifications by brick names. By default, the brick receives the name of its class (lowercased). - name¶
str
The name of this brick.
- print_shapes¶
bool
False by default. If True it logs the shapes of all the input and output variables, which can be useful for debugging.
- parameters¶
list of TensorSharedVariable and None
After calling the allocate() method this attribute will be populated with the shared variables storing this brick’s parameters. Allows for None so that parameters can always be accessed at the same index, even if some parameters are only defined given a particular configuration.
- children¶
list of bricks
The children of this brick.
- allocated¶
bool
False if allocate() has not been called yet. True otherwise.
- initialized¶
bool
False if allocate() has not been called yet. True otherwise.
- allocation_config_pushed¶
bool
False if allocate() or push_allocation_config() hasn’t been called yet. True otherwise.
- initialization_config_pushed¶
bool
False if initialize() or push_initialization_config() hasn’t been called yet. True otherwise.
Notes
To provide support for lazy initialization, apply the lazy() decorator to the __init__() method.
Brick implementations must call the __init__() constructor of their parent using super(BlockImplementation, self).__init__(**kwargs) at the beginning of the overriding __init__.
The methods _allocate() and _initialize() need to be overridden if the brick needs to allocate shared variables and initialize their values in order to function.
A brick can have any number of methods which apply the brick on Theano variables. These methods should be decorated with the application() decorator.
If a brick has children, they must be listed in the children attribute. Moreover, if the brick wants to control the configuration of its children, the _push_allocation_config() and _push_initialization_config() methods need to be overridden.
Examples
Most bricks have lazy initialization enabled.
>>> import theano >>> from blocks.initialization import IsotropicGaussian, Constant >>> from blocks.bricks import Linear >>> linear = Linear(input_dim=5, output_dim=3, ... weights_init=IsotropicGaussian(), ... biases_init=Constant(0)) >>> x = theano.tensor.vector() >>> linear.apply(x) # Calls linear.allocate() automatically linear_apply_output >>> linear.initialize() # Initializes the weight matrix
- _abc_cache = <_weakrefset.WeakSet object at 0x7f2d16d7d510>¶
- _abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2d16d7d590>¶
- _abc_negative_cache_version = 28¶
- _abc_registry = <_weakrefset.WeakSet object at 0x7f2d16d7d490>¶
- _allocate()¶
Brick implementation of parameter initialization.
Implement this if your brick needs to allocate its parameters.
Warning
This method should never be called directly. Call initialize() instead.
- _initialize()¶
Brick implementation of parameter initialization.
Implement this if your brick needs to initialize its parameters.
Warning
This method should never be called directly. Call initialize() instead.
- _push_allocation_config()¶
Brick implementation of configuring child before allocation.
Implement this if your brick needs to set the configuration of its children before allocation.
Warning
This method should never be called directly. Call push_allocation_config() instead.
- _push_initialization_config()¶
Brick implementation of configuring child before initialization.
Implement this if your brick needs to set the configuration of its children before initialization.
Warning
This method should never be called directly. Call push_initialization_config() instead.
- allocate()¶
Allocate shared variables for parameters.
Based on the current configuration of this Brick create Theano shared variables to store the parameters. After allocation, parameters are accessible through the parameters attribute.
This method calls the allocate() method of all children first, allowing the _allocate() method to override the parameters of the children if needed.
Raises: ValueError – If the configuration of this brick is insufficient to determine the number of parameters or their dimensionality to be initialized. Notes
This method sets the parameters attribute to an empty list. This is in order to ensure that calls to this method completely reset the parameters.
- children
- get_dim(name)¶
Get dimension of an input/output variable of a brick.
Parameters: name (str) – The name of the variable.
- get_dims(names)¶
Get list of dimensions for a set of input/output variables.
Parameters: names (list) – The variable names. Returns: dims – The dimensions of the sources. Return type: list
- get_unique_path()¶
Returns unique path to this brick in the application graph.
- initialize()¶
Initialize parameters.
Intialize parameters, such as weight matrices and biases.
Notes
If the brick has not allocated its parameters yet, this method will call the allocate() method in order to do so.
- parameters
- print_shapes = False
- push_allocation_config()¶
Push the configuration for allocation to child bricks.
Bricks can configure their children, based on their own current configuration. This will be automatically done by a call to allocate(), but if you want to override the configuration of child bricks manually, then you can call this function manually.
- push_initialization_config()¶
Push the configuration for initialization to child bricks.
Bricks can configure their children, based on their own current configuration. This will be automatically done by a call to initialize(), but if you want to override the configuration of child bricks manually, then you can call this function manually.
- class blocks.bricks.base.Children(brick, *args, **kwargs)¶
Bases: blocks.utils.containers.AnnotatingList
Adds the brick to the list of parents of its children.
- _abc_cache = <_weakrefset.WeakSet object at 0x7f2d16d7d350>¶
- _abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2d16d75e90>¶
- _abc_negative_cache_version = 28¶
- _abc_registry = <_weakrefset.WeakSet object at 0x7f2d16d7d290>¶
- _delitem(key)¶
- _setitem(key, value)¶
- class blocks.bricks.base.Parameters(brick, *args, **kwargs)¶
Bases: blocks.utils.containers.AnnotatingList
Adds the PARAMETER role to parameters automatically.
- _abc_cache = <_weakrefset.WeakSet object at 0x7f2d16d7d210>¶
- _abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2d16d7d250>¶
- _abc_negative_cache_version = 28¶
- _abc_registry = <_weakrefset.WeakSet object at 0x7f2d16d75d50>¶
- _setitem(key, value)¶
- class blocks.bricks.base._Brick¶
Bases: abc.ABCMeta
Metaclass which attaches brick instances to the applications.
In addition picklability of Application objects is ensured. This means that Application objects can not be added to a brick class after it is created. To allow adding application methods programatically, the following hook is supported: the class namespace is searched for decorators attribute, which can contain a list of functions to be applied to the namespace of the class being created. These functions can arbitratily modify this namespace.
- blocks.bricks.base._variable_name(brick_name, application_name, name)¶
- blocks.bricks.base.application(*args, **kwargs)¶
Decorator for methods that apply a brick to inputs.
Parameters: - optional (**kwargs,) – The application method to wrap.
- optional – Attributes to attach to this application.
Notes
This decorator replaces application methods with Application instances. It also sets the attributes given as keyword arguments to the decorator.
Note that this decorator purposely does not wrap the original method using e.g. wraps() or update_wrapper(), since that would make the class impossible to pickle (see notes at Application).
Examples
>>> class Foo(Brick): ... @application(inputs=['x'], outputs=['y']) ... def apply(self, x): ... return x + 1 ... @application ... def other_apply(self, x): ... return x - 1 >>> foo = Foo() >>> Foo.apply.inputs ['x'] >>> foo.apply.outputs ['y'] >>> Foo.other_apply <blocks.bricks.base.Application object at ...>
- blocks.bricks.base.args_to_kwargs(args, f)¶
- blocks.bricks.base.create_unbound_method(func, cls)¶
Create an unbounded method from a function and a class.
Notes
- blocks.bricks.base.lazy(allocation=None, initialization=None)¶
Makes the initialization lazy.
This decorator allows the user to define positional arguments which will not be needed until the allocation or initialization stage of the brick. If these arguments are not passed, it will automatically replace them with a custom None object. It is assumed that the missing arguments can be set after initialization by setting attributes with the same name.
Parameters: Examples
>>> class SomeBrick(Brick): ... @lazy(allocation=['a'], initialization=['b']) ... def __init__(self, a, b, c='c', d=None): ... print(a, b, c, d) >>> brick = SomeBrick('a') a NoneInitialization c None >>> brick = SomeBrick(d='d', b='b') NoneAllocation b c d
- blocks.bricks.base.rename_function(function, new_name)¶
- class blocks.bricks.Activation(name=None)¶
Bases: blocks.bricks.base.Brick
Elementwise application of activation function.
- _abc_cache = <_weakrefset.WeakSet object at 0x7f2d16d07a90>¶
- _abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2d16d07b10>¶
- _abc_negative_cache_version = 28¶
- _abc_registry = <_weakrefset.WeakSet object at 0x7f2d16d07a10>¶
- class blocks.bricks.ActivationDocumentation¶
Bases: blocks.bricks.base._Brick
Dynamically adds documentation to activations.
Notes
Extensions¶
- class blocks.extensions.predicates.OnLogRecord(record_name)¶
Bases: object
Trigger a callback when a certain log record is found.
Parameters: record_name (str) – The record name to check.
- class blocks.monitoring.evaluators.AggregationBuffer(variables, use_take_last=False)¶
Bases: object
Intermediate results of aggregating values of Theano variables.
Encapsulates aggregators for a list of Theano variables. Collects the respective updates and provides initialization and readout routines.
Parameters: - variables (list of TensorVariable) – The variable names are used as record names in the logs. Hence, all the variable names must be different.
- use_take_last (bool) – When True, the TakeLast aggregation scheme is used instead of _DataIndependent for those variables that do not require data to be computed.
- initialization_updates¶
list of tuples
Initialization updates of the aggregators.
- accumulation_updates¶
list of tuples
Accumulation updates of the aggregators.
- readout_variables¶
dict
A dictionary of record names to TensorVariable representing the aggregated values.
- inputs¶
list of TensorVariable
The list of inputs needed for accumulation.
- _compile()¶
Compiles Theano functions.
Todo
The current compilation method does not account for updates attached to ComputationGraph elements. Compiling should be out-sourced to ComputationGraph to deal with it.
- _create_aggregators()¶
Create aggregators and collect updates.
- get_aggregated_values()¶
Readout the aggregated values.
- initialize_aggregators()¶
Initialize the aggregators.
- class blocks.monitoring.evaluators.DatasetEvaluator(variables, updates=None)¶
Bases: object
A DatasetEvaluator evaluates many Theano variables or other quantities.
The DatasetEvaluator provides a do-it-all method, evaluate(), which computes values of variables on a dataset.
Alternatively, methods initialize_aggregators(), process_batch(), get_aggregated_values() can be used with a custom loop over data.
The values computed on subsets of the given dataset are aggregated using the AggregationScheme`s provided in the `aggregation_scheme tags. If no tag is given, the value is averaged over minibatches. However, care is taken to ensure that variables which do not depend on data are not unnecessarily recomputed.
Parameters: - variables (list of TensorVariable and) –
MonitoredQuantity The variable names are used as record names in the logs. Hence, all the names must be different.
Each variable can be tagged with an AggregationScheme that specifies how the value can be computed for a data set by aggregating minibatches.
- updates (list of tuples or OrderedDict or None) – TensorSharedVariable updates to be performed during evaluation. This parameter is only for Theano variables. Be careful not to update any model parameters as this is not intended to alter your model in any meaningfullway. A typical use case of this option arises when the theano function used for evaluation contains a call to:function:~theano.scan which might have returned shared variable updates.
- _compile()¶
Compiles Theano functions.
Todo
The current compilation method does not account for updates attached to ComputationGraph elements. Compiling should be out-sourced to ComputationGraph to deal with it.
- evaluate(data_stream)¶
Compute the variables over a data stream.
Parameters: data_stream (instance of DataStream) – The data stream. Only the first epoch of data is used. Returns: - A mapping from record names to the values computed on the provided
- dataset.
- get_aggregated_values()¶
- initialize_aggregators()¶
- process_batch(batch)¶
- variables (list of TensorVariable and) –
- class blocks.monitoring.evaluators.MonitoredQuantityBuffer(quantities)¶
Bases: object
Intermediate results of aggregating values of monitored-quantity.
Accumulate results for a list of monitored-quantity for every single batch. Provides initialization and readout routines to initialize each quantity and capture its accumulated results.
Parameters: quantities (list of MonitoredQuantity) – The quantity names are used as record names in the logs. Hence, all the quantity names must be different. - requires¶
list of TensorVariable
Needed to calculate monitored-quantities.
- quantity_names¶
list of str
Names of quantities.
- inputs¶
list of TensorVariable
The list of inputs needed for variables in requires.
- accumulate_quantities(numerical_values)¶
Accumulate the results for every batch.
- get_aggregated_values()¶
Readout the accumulated values.
- initialize()¶
Initialize the quantities.
Utils¶
- class blocks.utils.containers.AnnotatingList(items=None)¶
Bases: _abcoll.MutableSequence
Mutable sequence performing operations on inserted/removed items.
Parameters: items (iterable, optional) – An iterable of items to initialize the sequence with. - _abc_cache = <_weakrefset.WeakSet object at 0x7f2d16d7d110>¶
- _abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2d16d7d190>¶
- _abc_negative_cache_version = 28¶
- _abc_registry = <_weakrefset.WeakSet object at 0x7f2d16d7d090>¶
- _delitem(key)¶
The operation to perform when an item is deleted.
- _setitem(key, value)¶
The operation to perform when an item is inserted/appended.
- insert(key, value)¶
- class blocks.utils.profile.Profile¶
Bases: object
A profile of hierarchical timers.
Keeps track of timings performed with Timer. It also keeps track of the way these timings were nested and makes use of this information when reporting.
- enter(name)¶
- exit(t)¶
- report(f=<open file '<stderr>', mode 'w' at 0x7f2d2959c1e0>)¶
Print a report of timing information to standard output.
Parameters: f (object, optional) – An object with a write method that accepts string inputs. Can be a file object, sys.stdout, etc. Defaults to sys.stderr.
- class blocks.utils.profile.Timer(name, profile)¶
Bases: object
A context manager to time the execution time of code within it.
This timer is attached to a Profile object that it reports timings to. The Profile object accumulates the timings. Timers can be nested, which the Profile will automatically keep track of and use in its reporting.
Parameters: Notes
Timings are reported using timeit.default_timer().
- class blocks.serialization.PersistentParameterID(zip_file, allow_unnamed=True, allow_duplicates=True)¶
Bases: theano.misc.pkl_utils.PersistentSharedVariableID
Persist the names of parameter arrays in the zip file.
Only Theano shared variables are persisted to the zip file using this method. Names are determined using the brick hierarchy, or the shared variable name.
Parameters: - allow_unnamed (bool, optional) – Allow shared variables without a name to be persisted. Defaults to True.
- allow_duplicates (bool, optional) – Allow multiple shared variables to have the same name, in which case they will be numbered e.g. x, x_2, x_3, etc. Defaults to True.
Raises: ValueError – If an unnamed shared variable is encountered and allow_unnamed is False, or if two shared variables have the same name, and allow_duplicates is False.
- class blocks.serialization.PicklerWithWarning(file, protocol=None)¶
Bases: pickle.Pickler
- dispatch = {<type 'long'>: <function save_long at 0x7f2d26db1b18>, <type 'instance'>: <function save_inst at 0x7f2d26dc4050>, <type 'float'>: <function save_float at 0x7f2d26db1b90>, <type 'classobj'>: <function save_global at 0x7f2d15e94cf8>, <type 'tuple'>: <function save_tuple at 0x7f2d26db1cf8>, <type 'NoneType'>: <function save_none at 0x7f2d26db19b0>, <type 'function'>: <function save_global at 0x7f2d15e94cf8>, <type 'str'>: <function save_string at 0x7f2d26db1c08>, <type 'type'>: <function save_global at 0x7f2d15e94cf8>, <type 'bool'>: <function save_bool at 0x7f2d26db1a28>, <type 'builtin_function_or_method'>: <function save_global at 0x7f2d15e94cf8>, <type 'list'>: <function save_list at 0x7f2d26db1de8>, <type 'int'>: <function save_int at 0x7f2d26db1aa0>, <type 'unicode'>: <function save_unicode at 0x7f2d26db1c80>, <type 'dict'>: <function save_dict at 0x7f2d26db1ed8>}¶
- save_global(obj, name=None, **kwargs)¶
- blocks.serialization.continue_training(path)¶
Continues training using checkpoint.
Parameters: path (str) – Path to checkpoint. Notes
Python picklers can unpickle objects from global namespace only if they are present in namespace where unpickling happens. Often global functions are needed for mapping, filtering and other data stream operations. In a case if the main loop uses global objects and this function fails with a message like ` AttributeError: 'module' object has no attribute '...' ` it means that you need to import these objects.
Examples
This function can be used in two ways: in your script where a main loop defined or in a different script. For later options see Notes section.
- blocks.serialization.dump(obj, file_handler, protocol=2, persistent_id=<class 'blocks.serialization.PersistentParameterID'>, use_cpickle=False)¶
Pickles an object to a zip file using external persistence.
Parameters: - obj (object) – The object to pickle.
- file_handler (file) – The file handle to save the object to.
- protocol (int, optional) – The pickling protocol to use. Unlike Python’s built-in pickle, the default is set to 2 instead of 0 for Python 2. The Python 3 default (level 3) is maintained.
- persistent_id (callable) – The callable that persists certain objects in the object hierarchy to separate files inside of the zip file. For example, PersistentNdarrayID saves any numpy.ndarray to a separate NPY file inside of the zip file.
- use_cpickle (bool) – This enables the use of C-version of pickle (known as cPickle in Python 2). Note that this disables warnings about trying to pickle objects in the __main__ namespace.
Notes
The final file is simply a zipped file containing at least one file, pkl, which contains the pickled object. It can contain any other number of external objects. Note that the zip files are compatible with NumPy’s numpy.load() function.
>>> import numpy >>> from blocks.bricks import MLP, Identity >>> from blocks.initialization import Constant >>> mlp = MLP([Identity()], [10, 10], weights_init=Constant(0.), ... biases_init=Constant(0.)) >>> mlp.initialize() >>> with open('model.zip', 'wb') as f: ... dump(mlp, f) >>> 'mlp-linear_0.W' in numpy.load('model.zip').keys() True >>> 'mlp-linear_0.b' in numpy.load('model.zip').keys() True >>> numpy.load('model.zip')['mlp-linear_0.W'].shape (10, 10) >>> with open('model.zip', 'rb') as f: ... mlp2 = load(f) >>> mlp2 <blocks.bricks.MLP object at ...: name=mlp>
- blocks.serialization.load_parameter_values(path)¶
Load parameter values saved by dump().
This is a thin wrapper over numpy.load(). It changes the names of the arrays to ones compatible with Model.set_param_values().
Parameters: path (str or file) – The source for loading from. Returns: Return type: A dictionary of (parameter name, numpy array) pairs.
- blocks.serialization.secure_dump(object_, path, dump_function=<function dump at 0x7f2d15e94aa0>, **kwargs)¶
Robust serialization - does not corrupt your files when failed.
Parameters: - object (object) – The object to be saved to the disk.
- path (str) – The destination path.
- dump_function (function) – The function that is used to perform the serialization. Must take an object and file object as arguments. By default, dump() is used. An alternative would be pickle.dump().
- **kwargs – Keyword arguments to be passed to dump_function.
Building documentation¶
If you’ve made significant changes to the documentation, you can build a local to see how your changes are rendered. You will need to install Sphinx, the Napoleon extension (to enable NumPy docstring support), and the Read the Docs theme. You can do this by installing the optional docs requirements:
$ pip install --upgrade git+git://github.com/user/blocks.git#egg=blocks[docs]
After the requirements have been installed, you can build a copy of the documentation by running the following command from the root blocks directory.
$ sphinx-build -b html docs docs/_build/html
Docstrings¶
Blocks follows the NumPy docstring standards. For a quick introduction, have a look at the NumPy or Napoleon examples of compliant docstrings. A few common mistakes to avoid:
- There is no line break after the opening quotes (""").
- There is an empty line before the closing quotes (""").
- The summary should not be more than one line.
The docstrings are formatted using reStructuredText, and can make use of all the formatting capabilities this provides. They are rendered into HTML documentation using the Read the Docs service. After code has been merged, please ensure that documentation was built successfully and that your docstrings rendered as you intended by looking at the online documentation, which is automatically updated.
Writing doctests is encouraged, and they are run as part of the test suite. They should use Python 3 syntax.
References and Intersphinx¶
Sphinx allows you to reference other objects in the framework. This automatically creates links to the API documentation of that object (if it exists).
This is a link to :class:`SomeClass` in the same file. If you want to
reference an object in another file, you can use a leading dot to tell
Sphinx to look in all files e.g. :meth:`.SomeClass.a_method`.
Intersphinx is an extension that is enabled which allows to you to reference the documentation of other projects such as Theano, NumPy and Scipy.
The input to a method can be of the type :class:`~numpy.ndarray`. Note that
in this case we need to give the full path. The tilde (~) tells Sphinx not
to render the full path (numpy.ndarray), but only the object itself
(ndarray).
Warning
Because of a bug in Napoleon you can’t use the reference to a type in the “Returns” section of your docstring without giving it a name. This doesn’t render correctly:
Returns
-------
:class:`Brick`
The returned Brick.
But this does:
Returns
-------
retured_brick : :class:`Brick`
The returned Brick.
Pull request workflow¶
Blocks development takes place on GitHub; developers (including project leads!) add new features by sending pull requests from their personal fork (we operate on the so-called fork & pull model).
This page serves as a “quick reference” for the recommended pull request workflow. It assumes you are working on a UNIX-like environment with Git already installed. It is not intended to be an exhaustive tutorial on Git; there are many of those available.
Before you begin¶
If you don’t already have one, you should create yourself a GitHub account.
Once you’ve set up your account and logged in, you should fork the Blocks repository to your account by clicking the “Fork” button on the official repository’s web page. More information on forking is available in the GitHub documentation.
In the side bar of your newly created fork of the Blocks repository, you should see a field that says HTTPS clone URL above it. Copy that to your clipboard and run, at the terminal,
$ git clone CLONE_URL
where CLONE_URL is the URL you copied from your GitHub fork.
If you’re doing a lot of development with GitHub you should look into setting up SSH key authentication.
In order to keep up with changes to the official Blocks repository, notify Git of its existence and location by running
$ git remote add upstream https://github.com/mila-udem/blocks.git
You only need to do this once.
Beginning a pull request¶
Running the command
$ git remote -v | grep origin
should display two lines. The URLs therein should contain your GitHub username.
Your cloned repository stores a local history of the activity in remote repositories, and only interacts with the Internet when certain commands are invoked. In order to synchronize the activity in the official Blocks repository (which Git now knows as upstream) with the local mirror of the history related to upstream, run
$ git fetch upstream
You should do this before starting every pull request, for reasons that will become clear below.
In order to create a new branch starting from the latest commit in the master branch of the official Blocks repository, make sure you’ve fetched from upstream (see above) and run
$ git checkout -b my_branch_name_for_my_cool_feature upstream/master
Obviously, you’ll probably want to choose a better branch name.
Note that doing this (rather than simply creating a new branch from some arbtirary point) may save you from a (possibly painful) rebase later on.
Working on your pull request¶
Repeat until satisfied:
- Make some modifications to the code
- Stage them using git add (git add -p is particularly useful)
- git commit them, alternately git reset to undo staging by git add.
$ git push -u origin my_branch_name_for_my_cool_feature
Submitting for review¶
This can be done from the GitHub web interface for your fork. See this documentation from GitHub for more information.
Give your pull request an appropriate title which makes it obvious what the content is. If it is intended to resolve a specific ticket, put “Fixes #NNN.” in the pull request description field, where NNN is the issue number. By doing this, GitHub will know to automatically close the issue when your pull request is merged.
Blocks development occurs in two separate branches: The master branch is the development branch. If you want to contribute a new feature or change the behavior of Blocks in any way, please make your pull request to this branch.
The stable branch contains the latest release of Blocks. If you are fixing a bug (that is present in the latest release), make a pull request to this branch. If the bug is present in both the master and stable branch, two separate pull requests are in order. The command git-cherry-pick_ could be useful here.
Incorporating feedback¶
In order to add additional commits responding to reviewer feedback, simply follow the instructions above for using git add and git commit, and finally git push (after running the initial command with -u, you should simply be able to use git push without any further arguments).
Occasionally you will be asked to rebase your branch against the latest master. To do this, run (while you have your branch checked out)
$ git fetch upstream && git rebase upstream/master
You may encounter an error message about one or more conflicts. See GitHub’s help page on the subject. Note that after a rebase you will usually have to overwrite previous commits on your fork’s copy of the branch with git push --force.
Quickstart¶
Construct your model.
>>> mlp = MLP(activations=[Tanh(), Softmax()], dims=[784, 100, 10],
... weights_init=IsotropicGaussian(0.01), biases_init=Constant(0))
>>> mlp.initialize()
Calculate your loss function.
>>> x = tensor.matrix('features')
>>> y = tensor.lmatrix('targets')
>>> y_hat = mlp.apply(x)
>>> cost = CategoricalCrossEntropy().apply(y.flatten(), y_hat)
>>> error_rate = MisclassificationRate().apply(y.flatten(), y_hat)
Load your training data using Fuel.
>>> mnist_train = MNIST(("train",))
>>> train_stream = Flatten(
... DataStream.default_stream(
... dataset=mnist_train,
... iteration_scheme=SequentialScheme(mnist_train.num_examples, 128)),
... which_sources=('features',))
>>> mnist_test = MNIST(("test",))
>>> test_stream = Flatten(
... DataStream.default_stream(
... dataset=mnist_test,
... iteration_scheme=SequentialScheme(mnist_test.num_examples, 1024)),
... which_sources=('features',))
And train!
>>> from blocks.model import Model
>>> main_loop = MainLoop(
... model=Model(cost), data_stream=train_stream,
... algorithm=GradientDescent(
... cost=cost, parameters=ComputationGraph(cost).parameters,
... step_rule=Scale(learning_rate=0.1)),
... extensions=[FinishAfter(after_n_epochs=5),
... DataStreamMonitoring(
... variables=[cost, error_rate],
... data_stream=test_stream,
... prefix="test"),
... Printing()])
>>> main_loop.run()
...
For a runnable version of this code, please see the MNIST demo in our repository with examples.
Features¶
Currently Blocks supports and provides:
- Constructing parametrized Theano operations, called “bricks”
- Pattern matching to select variables and bricks in large models
- Algorithms to optimize your model
- Saving and resuming of training
- Monitoring and analyzing values during training progress (on the training set as well as on test sets)
- Application of graph transformations, such as dropout (limited support)
In the future we also hope to support:
- Dimension, type and axes-checking