Aboleth¶
A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation [1] with stochastic gradient variational Bayes inference [2].
Features¶
Some of the features of Aboleth:
- Bayesian fully-connected, embedding and convolutional layers using SGVB [2] for inference.
- Random Fourier and arc-cosine features for approximate Gaussian processes. Optional variational optimisation of these feature weights as per [1].
- Imputation layers with parameters that are learned as part of a model.
- Very flexible construction of networks, e.g. multiple inputs, ResNets etc.
- Optional maximum-likelihood type II inference for model parameters such as weight priors/regularizers and regression observation noise.
Why?¶
The purpose of Aboleth is to provide a set of high performance and light weight components for building Bayesian neural nets and approximate (deep) Gaussian process computational graphs. We aim for minimal abstraction over pure TensorFlow, so you can still assign parts of the computational graph to different hardware, use your own data feeds/queues, and manage your own sessions etc.
Here is an example of building a simple BNN classifier with one hidden layer and Normal prior/posterior distributions on the network weights:
import tensorflow as tf
import aboleth as ab
# Define the network, ">>" implements function composition,
# the InputLayer gives a kwarg for this network, and
# allows us to specify the number of samples for stochastic
# gradient variational Bayes.
layers = (
ab.InputLayer(name="X", n_samples=5) >>
ab.DenseVariational(output_dim=100) >>
ab.Activation(tf.nn.relu) >>
ab.DenseVariational(output_dim=1) >>
ab.Activation(tf.nn.sigmoid)
)
X_ = tf.placeholder(tf.float, shape=(None, D))
Y_ = tf.placeholder(tf.float, shape=(None, 1))
# Define the likelihood model
likelihood = ab.likelihoods.Bernoulli()
# Build the network, net, and the parameter regularization, kl
net, kl = net(X=X_)
# Build the final loss function to use with TensorFlow train
loss = ab.elbo(net, Y_, N, kl, likelihood)
# Now your TensorFlow training code here!
...
At the moment the focus of Aboleth is on supervised tasks, however this is subject to change in subsequent releases if there is interest in this capability.
Installation¶
To get up and running quickly you can use pip and get the Aboleth package from PyPI:
$ pip install aboleth
For the best performance on your architecture, we recommend installing TensorFlow from sources.
Or, to install additional dependencies required by the demos:
$ pip install aboleth[demos]
To install in develop mode with packages required for development we recommend you clone the repository from GitHub:
$ git clone git@github.com:data61/aboleth.git
Then in the directory that you cloned into, issue the following:
$ pip install -e .[dev]
Getting Started¶
See the quick start guide to get started. Also see the demos folder for more examples of creating and training algorithms with Aboleth.
The full project documentation can be found on readthedocs.
References¶
[1] | (1, 2) Cutajar, K. Bonilla, E. Michiardi, P. Filippone, M. Random Feature Expansions for Deep Gaussian Processes. In ICML, 2017. |
[2] | (1, 2) Kingma, D. P. and Welling, M. Auto-encoding variational Bayes. In ICLR, 2014. |
License¶
Copyright 2017 CSIRO (Data61)
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Documentation Contents¶
Installation¶
Firstly, make sure you have TensorFlow installed, preferably compiled specifically for your architecture, see installing TensorFlow from sources.
To get up and running quickly you can use pip and get the Aboleth package from PyPI:
$ pip install aboleth
Or, to install additional dependencies required by the demos:
$ pip install aboleth[demos]
To install in develop mode with packages required for development we recommend you clone the repository from GitHub:
$ git clone git@github.com:data61/aboleth.git
Then in the directory that you cloned into, issue the following:
$ pip install -e .[dev]
Or:
$ pip install -e .[dev,demos]
If you also want to develop with the demos.
Quick Start Guide¶
In Aboleth we use function composition to compose machine learning models.
These models are callable python classes that when called return a TensorFlow
computational graph (really a tf.Tensor
). We can best demonstrate this with
a few examples.
Logistic Classification¶
For our first example, lets make a simple logistic classifier with \(L_2\) regularisation on the model weights:
import tensorflow as tf
import aboleth as ab
layers = (
ab.InputLayer(name="X") >>
ab.DenseMap(output_dim=1, l1_reg=0, l2_reg=.05) >>
ab.Activation(tf.nn.sigmoid)
)
Here the right shift operator, >>
, implements functions composition (or
specifically, a writer monad) from the innermost function to the outermost.
The above code block has has implemented the following function,
where \(\mathbf{w} \in \mathbb{R}^D\) are the model weights,
\(\mathbf{y} \in \mathbb{Z}^N_2\) are the binary labels, \(\mathbf{X}
\in \mathbb{R}^{N \times D}\) are the predictive inputs and
\(\sigma(\cdot)\) is a logistic sigmoid function. At this stage layers
is a callable class (ab.baselayers.MultiLayerComposite
), and no
computational graph has been built. ab.InputLayer
allows us to name our
inputs so we can refer to them later when we call our class layers
. This is
useful when we have multiple inputs into our model, for examples, if we want to
deal with continuous and categorical features separately (see Multiple Input Data).
So now we have defined the structure of the predictive model, if we wish we can create its computational graph,
net, reg = layers(X=X_)
where the keyword argument X
was defined in the InputLayer
and X_
is a placeholder (tf.placeholder
) or the actual predictive data we want to
build into our model. net
is the resulting computational graph of our
predictive model/network, and reg
are the regularisation terms associated
with the model parameters (layer weights in this case).
If we wanted, we could evaluate net
right now in a TensorFlow session,
however none of the weights have been fit to the data. In order to fit the
weights, we need to define a loss function. For this we need to define a
likelihood model for our classifier, here we choose a Bernoulli distribution
for our binary classifier (which corresponds to a log-loss):
likelihood = ab.likelihoods.Bernoulli()
When we call likelihood, it will return a tensor that implements the log of a Bernoulli probability mass function,
where we have used \(p_n\) as shorthand for \(p(y_n = 1)\). Now we have enough to build the loss function we will use to optimize the model weights:
loss = ab.max_posterior(net, Y_, reg, likelihood)
This is a maximum a-posteriori loss function, which can be thought of as a
maximum likelihood objective with a penalty on the magnitude of the weights
from a Gaussian prior (controlled by l2_reg
or \(\lambda\)),
Now we have enough to use the tf.train
module to learn the weights of our
model:
optimizer = tf.train.AdamOptimizer()
train = optimizer.minimize(loss)
with tf.Session() as sess:
tf.global_variables_initializer().run()
for _ in range(1000):
sess.run(train, feed_dict={X_: X, Y_: Y})
This will run 1000 iterations of stochastic gradient optimization (using the
Adam learning rate algorithm) where the model sees all of the data every
iteration. We can also run this on mini-batches, see ab.batch
for a simple
batch generator, or TensorFlow’s train module for a more comprehensive set of
utilities (we recommend looking at
tf.train.MonitoredTrainingSession,
tf.train.limit_epochs and
tf.train.shuffle_batch).
Now that we have learned our classifier’s weights, \(\hat{\mathbf{w}}\), we will probably want to use for predicting class label probabilities on unseen data \(\mathbf{x}^*\),
This can be very easily achieved by just evaluating our model on the unseen predictive data (still in the TensorFlow session from above):
probabilities = net.eval(feed_dict={X_: X_query})
And that is it!
Bayesian Logistic Classification¶
Aboleth is all about Bayesian inference, so now we’ll demonstrate how to make a variational inference version of the logistic classifier. Now we explicitly place a prior distribution on the weights,
Here \(\psi\) is the prior weight variance (note that this corresponds to \(\lambda^{-1}\) in the MAP logistic classifier). We use the same likelihood model as before,
and ideally we would like to infer the posterior distribution over these weights using Bayes’ rule (as opposed to just the MAP value, \(\hat{\mathbf{w}}\)),
Unfortunately the integral in the denominator is intractable for this model. This is where variational inference comes to the rescue by approximating the posterior with a known form – in this case a Gaussian,
where \(\boldsymbol{\mu} \in \mathbb{R}^D\) and \(\boldsymbol{\Sigma} \in \mathbb{R}^{D \times D}\). To make this approximation as close as possible, variational inference optimizes the Kullback Leibler divergence between this and true posterior using the evidence lower bound, ELBO, and the reparameterization trick in [1]:
One question you may ask is why would we want to go to all this bother over the MAP approach? Specifically, why learn an extra \(\mathcal{O}(D^2)\) number of parameters over the MAP approach? Well, a few reasons, the first being that the weights are well regularised in this formulation, for instance we can actually learn \(\psi\), rather than having to set it (this optimization of the prior is called empirical Bayes). Secondly, we have a principled way of incorporating modelling uncertainty over the weights into our predictions,
This will have the effect of making our predictive probabilities closer to 0.5 when the model is uncertain. The MAP approach has no mechanism to achieve this since it only learns the mode of the posterior, \(\hat{\mathbf{w}}\), with no notion of variance.
So how do we implement this with Aboleth? Easy; we change layers
to the
following,
import numpy as np
import tensorflow as tf
import aboleth as ab
layers = (
ab.InputLayer(name="X", n_samples=5) >>
ab.DenseVariational(output_dim=1, var=1., full=True) >>
ab.Activation(tf.nn.sigmoid)
)
Note we are using DenseVariational
instead of DenseMAP
. In the
DenseVariational
layer the full
parameter tells the layer to use a full
covariance Gaussian, and var
is initial value of the weight prior variance,
\(\psi\), which is optimized. Also we’ve set n_samples=5
in the
InputLayer
, this lets the subsequent layers know that we are making a
stochastic model, that is, whenever we call layers
we are actually
expecting back 5 samples of the model output. This makes the
DenseVariational
layer multiply its input with 5 samples of the weights
from the approximate posterior, \(\mathbf{X}\mathbf{w}^{(s)}\), where
\(\mathbf{w}^{(s)} \sim q(\mathbf{w}),~\text{for}~s = \{1 \ldots 5\}\).
These 5 samples are then passed to the Activation
layer.
Then like before to complete the model specification:
likelihood = ab.likelihoods.Bernoulli()
net, kl = layers(X=X_)
loss = ab.elbo(net, Y_, N=10000, KL=kl, likelihood=likelihood)
The main differences here are that reg
is now kl
, and we use the
elbo
loss function. For all intents and purposes kl
is still a
regularizer on the weights (it is the Kullback Leibler divergence between the
posterior and the prior distributions on the weights), and elbo
is the
evidence lower bound objective. Here N
is the (expected) size of the
dataset, we need to know this term in order to properly calculate the evidence
lower bound when using mini-batches of data.
We train this model in exactly the same way as the logistic classifier, however
prediction is slightly different - that is, probabilities
,
probabilities = net.eval(feed_dict={X_: X_query})
now has a shape of \((5, N^*, 1)\) where we have 5 samples of \(N^*\) predictions; before we had \((N^*, 1)\). You can simply take the mean of these samples for the predicted class probability,
expected_p = np.mean(probabilities, axis=0)
or, you can generate more samples to get a more accurate expected
probabilities (again with the TensorFlow session, sess
),
probabilities = ab.predict_samples(net, feed_dict={X_: X_query},
n_groups=10, session=sess)
This effectively calls net
10 times (n_groups
) and concatenates the
results into 50 samples (n_groups * n_samples
), then we can take the mean
of these samples exactly as before.
Approximate Gaussian Processes¶
Aboleth also provides the building blocks to easily create scalable (approximate) Gaussian processes. We’ll implement a simple Gaussian process regressor here, but for brevity, we’ll skip the introduction to Gaussian processes, and refer the interested reader to [2].
The approximation we have implemented in Aboleth is the random feature expansions (see [3] and [4]), where we can approximate a kernel function from a set of random basis functions,
with equality in the infinite limit. The trick is to find the right family of basis functions, \(\phi\), that corresponds to a particular family of kernel functions, e.g. radial basis, Matern, etc. This insight allows us to approximate a Gaussian process regressor with a Bayesian linear regressor using these random basis functions, \(\phi^{(s)}(\mathbf{X})\)!
We can easily do this using Aboleth, for example, with a radial basis kernel,
import tensorflow as tf
import aboleth as ab
lenscale = ab.pos(tf.Variable(1.)) # learn isotropic length scale
kern = ab.RBF(lenscale=lenscale)
layers = (
ab.InputLayer(name="X", n_samples=5) >>
ab.RandomFourier(n_features=100, kernel=kern) >>
ab.DenseVariational(output_dim=1, full=True)
)
Here we have made lenscale
a TensorFlow variable so it will be optimized,
and we have also used the ab.pos
function to make sure it stays positive.
The ab.RandomFourier
class implements random Fourier features [3], that
can model shift invariant kernel functions like radial basis, Matern, etc. See
ab.kernels for implemented kernels. We have also implemented random
arc-cosine kernels [4] see ab.RandomArcCosine
in ab.layers.
Then to complete the formulation of the Gaussian process (likelihood and loss),
var = ab.pos(tf.Variable(1.)) # learn likelihood variance
likelihood = ab.likelihoods.Normal(var=var)
net, kl = layers(X=X_)
loss = ab.elbo(net, Y_, kl, N=10000, likelihood=likelihood)
Here we just have a Normal likelihood since we are creating a model for
regression, and we can also get TensorFlow to optimise the likelihood variance,
var
.
Training and prediction work in exactly the same way as the Bayesian logistic classifier. Here is an example of the approximate GP in action (see Regression for a more detailed demonstration);

Example of an approximate Gaussian process with a radial basis kernel. We have shown 50 samples of the predicted latent functions, the mean of these draws, and the heatmap is the probability of observing a target under the predictive distribution, \(p(y^*|\mathbf{X}, \mathbf{y}, \mathbf{x}^*)\).
See Also¶
For more detailed demonstrations of the functionality within Aboleth, we recommend you check out the demos,
- Regression and SARCOS - for more regression applications.
- Multiple Input Data - models with multiple input data types.
- Bayesian Classification with Dropout - Bayesian nets using dropout.
- Imputation Layers - let Aboleth deal with missing data for you.
References¶
[1] | Kingma, D. P. and Welling, M. Auto-encoding variational Bayes. In ICLR, 2014. |
[2] | Rasmussen, C. E., and Williams, C. K. I. Gaussian processes for machine learning. Cambridge: MIT press, 2006. |
[3] | (1, 2) Rahimi, A., & Recht, B. Random features for large-scale kernel machines. Advances in neural information processing systems. 2007. |
[4] | (1, 2) Cutajar, K. Bonilla, E. Michiardi, P. Filippone, M. Random Feature Expansions for Deep Gaussian Processes. In ICML, 2017. |
Demos¶
We have included some demonstration scripts with Aboleth to help you get familiar with some of the possible model architectures that can be build with Aboleth. We also demonstrate in these scripts a few methods for actually training models using TensorFlow, and how to get up and running with TensorBoard, etc.
Regression¶
This is a simple demo that draws a random, non linear function from a Gaussian process with a specified kernel and length scale. We then use Aboleth (in Gaussian process approximation mode) to try to learn this function given only a few noisy observations of it. This script also demonstrates how we can divide the data into mini-batches using utilities in the tf.train module, and how we can use tf.train.MonitoredTrainingSession to log the learning progress.
This demo can be used to generate figures like the following:

You can find the full script here: regression.py.
SARCOS¶
Here we use Aboleth, again in Gaussian process regression mode, to fit the venerable SARCOS robot arm inverse kinematics dataset. The aim is to learn the inverse kinematics from 44484 observations of joint positions, velocities and accelerations to joint torques.
This problem is too large for a regular Gaussian process, and so is a good demonstration of why the approximation in Aboleth is useful (see Approximate Gaussian Processes). It also demonstrates how we learn automatic relevance determination (ARD, or anisotropic) kernels.
We have also demonstrated how you can use TensorBoard with the models you construct in Aboleth, so you can visually monitor the progress of learning. This has the nice side-effect of also enabling model check point saving, so you can actually resume learning this model if you run the script again!!

Using TensorBoard to visualise the learning progress of the Aboleth model fitting the SARCOS dataset.
This demo will make a sarcos
folder in the directory you run the demo from.
This contains all of the model checkpoints, and to visualise these with
TensorBoard, run the following:
$ tensorboard --logdir=./sarcos
The full script is here: sarcos.py.
Multiple Input Data¶
This demo takes inspiration from TensorFlow’s Wide & Deep tutorial in that it treats continuous data separately from categorical data, though we combine both input types into a “deep” network. It also uses the census dataset from the TensorFlow tutorial.
We demonstrate a few things in this script:
- How to use Aboleth to learn embeddings of categorical data using the
ab.EmbedVariational
layer (see ab.layers). - How to easily apply these embeddings over multiple columns using the
ab.PerFeature
higher-order layer (see ab.hlayers). - Concatenating these input layers (using
ab.Concat
) before feeding them into subsequent layers to learn joint representations. - How to loop over mini-batches directly using a
feed_dict
and an appropriate mini-batch generator,ab.batch
(see ab.util).
Using this set up we get an accuracy of about 85.9%, compared to the wide and deep model that achieves 84.4%.
The full script is here: multi_input.py.
Bayesian Classification with Dropout¶
Here we demonstrate a slightly different take on Bayesian deep learning. Yarin Gal in his thesis and associate publications demonstrates that we can view regular neural networks with dropout as a form of variational inference with specific prior and posterior distributions on the weights.
In this demo we implement this elegant idea with maximum a-posteriori weight and dropout layers in a classifier (see ab.layers). We leave these layers as stochastic in the prediction step, and draw samples from the network’s predictive distribution, as we would in variational networks.
We test the classifier against a random forest classifier on the breast cancer dataset with 5-fold cross validation, and get quite good and robust performance.
The script can be found here: classification.py
Imputation Layers¶
Aboleth has a few layers that we can use to impute data and also to learn imputation statistics, see ab.impute. This drastically simplifies the pipeline for dealing with messy data, and means our imputation methods can benefit from information contained in the labels (as opposed to imputing as a separate stage from supervised learning).
This script demonstrates an imputation layer that learns a “mean” and a “variance” of a Normal distribution (per column) to randomly impute the data from! We compare it to just imputing the missing values with the column means.
The task is a multi-task classification problem in which we have to predict forest coverage types from 54 features or various types, described here. We have randomly removed elements from the features, which we impute using the two aforementioned techniques.
Naive mean imputation gives 68.9% accuracy (0.727 log loss), and the per-column Normal imputation gives 69.2% accuracy (0.710 log loss).
You can find the script here: imputation.py
Authors¶
Development Leads¶
- Daniel Steinberg
- Lachlan McCalman
- Louis Tiao
Contributors¶
- Simon O’Callaghan
- Alistair Reid
- Joyce Wang
Contributing Guidelines¶
Please contribute if you think a feature is missing in Aboleth, if you think an implementation could be better or if you can solve an existing issue!
We just request you read the following before making any changes to the codebase.
Pull Requests¶
This is the best way to contribute. We usually follow a git-flow based development cycle. The way this works is quite simple:
- Make an issue on our github with your proposed feature or fix.
- Fork, or make a branch name with the issue number like feature/#113.
- When finished, submit a pull request to merge into develop, and refer to which issue is being closed in the pull request comment (i.e. closes #113).
- One of use will review the pull request.
- If accepted, your feature will be merged into develop.
- Your change will eventually be merged into master and tagged with a release number.
Code and Documentation Style¶
In Aboleth we are only targeting python 3 - our code is much more elegant as a result, and we don’t have the resources to also support python 2, sorry.
We adhere to the PEP 8 convention for code, and the PEP 257 convention for docstrings, using the NumPy/SciPy documentation style. Our continuous integration automatically runs linting checks, so any pull-request will automatically fail if these conventions are not followed.
The builtin Sphinx extension Napoleon is used to parse NumPy style docstrings.
To build the documentation you can run make
from the docs
directory
with the html option:
$ make html
Testing¶
We use py.test for all of our unit
testing, most of which lives in the tests
directory, with judicious use
of doctests – i.e. only when they are illustrative of a functions usage.
All of the dependencies for testing can be installed by issuing:
$ pip install -e .[dev]
You can run the tests by issuing from the top level repository directory:
$ pytest .
Our continuous integration (CI) will fail if coverage drops below 90%, and we generally want coverage to remain significantly above this. Furthermore, our CI will fail if the code doesn’t pass PEP 8 and PEP 257 conventions. You can run the exact CI tests by issuing:
$ make coverage
$ make lint
from the top level repository directory.
API¶
This is the application programming interface guide for Aboleth.
Aboleth is implemented in pure Python 3 only (we don’t test Python 2, and we may use Python 3 language specific features). If you would like to contribute (please do), see the Contributing Guidelines.
ab.losses¶
Network loss functions.
-
aboleth.losses.
elbo
(Net, Y, N, KL, likelihood, like_weights=None)¶ Build the evidence lower bound loss for a neural net.
Parameters: - Net (ndarray, Tensor) – the neural net featues of shape (n_samples, N, tasks).
- Y (ndarray, Tensor) – the targets of shape (N, tasks).
- N (int, Tensor) – the total size of the dataset (i.e. number of observations).
- likelihood (Tensor) – the likelihood model to use on the output of the last layer of the neural net, see the ab.likelihoods module.
- like_weights (callable, ndarray, Tensor) – weights to apply to each observation in the expected log likelihood.
This should be an array of shape (N, 1) or can be called as
like_weights(Y)
and should return a (N, 1) array.
Returns: nelbo – the loss function of the Bayesian neural net (negative ELBO).
Return type: Tensor
-
aboleth.losses.
max_posterior
(Net, Y, regulariser, likelihood, like_weights=None, first_axis_is_obs=True)¶ Build maximum a-posteriori (MAP) loss for a neural net.
Parameters: - Net (ndarray, Tensor) – the neural net featues of shape (N, tasks) or (n_samples, N, tasks).
- Y (ndarray, Tensor) – the targets of shape (N, tasks).
- likelihood (Tensor) – the likelihood model to use on the output of the last layer of the neural net, see the ab.likelihoods module.
- like_weights (callable, ndarray, Tensor) – weights to apply to each observation in the expected log likelihood.
This should be an array of shape (N, 1) or can be called as
like_weights(Y)
and should return a (N, 1) array. - first_axis_is_obs (bool) – indicates if the first axis indexes the observations/data or not. This
will be True if
Net
is of shape (N, tasks) or False ifNet
is of shape (n_samples, N, tasks).
Returns: map – the loss function of the MAP neural net.
Return type: Tensor
ab.baselayers¶
Base Classes for Layers.
-
class
aboleth.baselayers.
Layer
¶ Bases:
object
Layer base class.
This is an identity layer, and is primarily meant to be subclassed to construct more intersting layers.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.baselayers.
LayerComposite
(*layers)¶ Bases:
aboleth.baselayers.Layer
Composition of Layers.
Parameters: *layers – the layers to compose. All must be of type Layer. -
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.baselayers.
MultiLayer
¶ Bases:
object
Base class for layers that take multiple inputs as kwargs.
This is an Abstract class as there is no canonical identity for this layer (because it must do some kind of reduction).
-
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.baselayers.
MultiLayerComposite
(*layers)¶ Bases:
aboleth.baselayers.MultiLayer
Composition of MultiLayers.
Parameters: *layers – the layers to compose. First layer must be of type Multilayer, subsequent layers must be of type Layer. -
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
aboleth.baselayers.
stack
(l, *layers)¶ Stack multiple Layers.
This is a convenience function that acts as an alternative to the rshift operator implemented for Layers and Multilayers. It is syntatically more compact for stacking large numbers of layers or lists of layers.
The type of stacking (Layer or Multilayer) is dispatched on the first argument.
Parameters: - l (Layer or MultiLayer) – The first layer to stack. The type of this layer determines the type of the output; MultiLayerComposite or LayerComposite.
- *layers – list of additional layers to stack. Must all be of type Layer, because function composition only works with the first function having multiple arguments.
Returns: result – A single layer that is the composition of the input layers.
Return type:
ab.layers¶
Network layers and utilities.
-
class
aboleth.layers.
Activation
(h=<function Activation.<lambda>>)¶ Bases:
aboleth.baselayers.Layer
Activation function layer.
Parameters: h (callable) – the element-wise activation function. -
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
DenseMAP
(output_dim, l1_reg=1.0, l2_reg=1.0, use_bias=True)¶ Bases:
aboleth.layers.SampleLayer
Dense (fully connected) linear layer, with MAP inference.
Parameters: - output_dim (int) – the dimension of the output of this layer
- l1_reg (float) – the value of the l1 weight regularizer, \(\text{l1_reg} \times \|\mathbf{W}\|_1\)
- l2_reg (float) – the value of the l2 weight regularizer, \(\frac{1}{2} \text{l2_reg} \times \|\mathbf{W}\|^2_2\)
- use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
DenseVariational
(output_dim, var=1.0, full=False, use_bias=True, prior_W=None, prior_b=None, post_W=None, post_b=None)¶ Bases:
aboleth.layers.SampleLayer3
Dense (fully connected) linear layer, with variational inference.
Parameters: - output_dim (int) – the dimension of the output of this layer
- var (float) – the initial value of the weight prior variance, which defaults to \(\mathbf{W} \sim \mathcal{N}(\mathbf{0}, \text{var} \mathbf{I})\), this is optimized (a la maximum likelihood type II).
- full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
- use_bias (bool) – If true, also learn a bias weight, e.g. a constant offset weight.
- prior_W (distributions.Normal, distributions.Gaussian, optional) – This is the prior distribution object to use on the layer weights. It
must have parameters compatible with (input_dim, output_dim) shaped
weights. This ignores the
var
parameter. - prior_b (distributions.Normal, distributions.Gaussian, optional) – This is the prior distribution object to use on the layer intercept. It
must have parameters compatible with (output_dim,) shaped weights.
This ignores the
var
anduse_bias
parameters. - post_W (distributions.Normal, distributions.Gaussian, optional) – It must have parameters compatible with (input_dim, output_dim) shaped
weights. This ignores the
full
parameter. See alsodistributions.gaus_posterior
. - post_b (distributions.Normal, distributions.Gaussian, optional) – This is the posterior distribution object to use on the layer
intercept. It must have parameters compatible with (output_dim,) shaped
weights. This ignores the
use_bias
parameters. See alsodistributions.norm_posterior
.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
DropOut
(keep_prob)¶ Bases:
aboleth.baselayers.Layer
Dropout layer, Bernoulli probability of not setting an input to zero.
This is just a thin wrapper around tf.dropout
Parameters: keep_prob (float, Tensor) – the probability of keeping an input. See tf.dropout.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
EmbedVariational
(output_dim, n_categories, var=1.0, full=False, prior_W=None, post_W=None)¶ Bases:
aboleth.layers.DenseVariational
Dense (fully connected) embedding layer, with variational inference.
This layer works directly on shape (N, 1) inputs of category indices rather than one-hot representations, for efficiency.
Parameters: - output_dim (int) – the dimension of the output (embedding) of this layer
- n_categories (int) – the number of categories in the input variable
- var (float) – the initial value of the weight prior variance, which defaults to \(\mathbf{W} \sim \mathcal{N}(\mathbf{0}, \text{var} \mathbf{I})\), this is optimized (a la maximum likelihood type II).
- full (bool) – If true, use a full covariance Gaussian posterior for each of the output weight columns, otherwise use an independent (diagonal) Normal posterior.
- prior_W (distributions.Normal, distributions.Gaussian, optional) – This is the prior distribution object to use on the layer weights. It
must have parameters compatible with (input_dim, output_dim) shaped
weights. This ignores the
var
parameter. - post_W (distributions.Normal, distributions.Gaussian, optional) – This is the posterior distribution object to use on the layer weights.
It must have parameters compatible with (input_dim, output_dim) shaped
weights. This ignores the
full
parameter. See alsodistributions.gaus_posterior
.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
InputLayer
(name, n_samples=None)¶ Bases:
aboleth.baselayers.MultiLayer
Create an input layer.
This layer defines input kwargs so that a user may easily provide the right inputs to a complex set of layers. It takes a 2D tensor of shape (N, D). If n_samples is specified, the input is tiled along a new first axis creating a (n_samples, N, D) tensor for propogating samples through a variational deep net.
Parameters: - name (string) – The name of the input. Used as the agument for input into the net.
- n_samples (int > 0) – The number of samples.
-
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
MaxPool2D
(pool_size, strides, padding='SAME')¶ Bases:
aboleth.baselayers.Layer
Max pooling layer for 2D inputs (e.g. images).
This is just a thin wrapper around tf.nn.max_pool
Parameters: - pool_size (tuple or list of 2 ints) – width and height of the pooling window.
- strides (tuple or list of 2 ints) – the strides of the pooling operation along the height and width.
- padding (str) – One of ‘SAME’ or ‘VALID’. Defaults to ‘SAME’. The type of padding
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
RandomArcCosine
(n_features, lenscale=1.0, p=1, variational=False, lenscale_posterior=None)¶ Bases:
aboleth.layers.RandomFourier
Random arc-cosine kernel layer.
- NOTE: This should be followed by a dense layer to properly implement a
- kernel approximation.
Parameters: - n_features (int) – the number of unique random features, the actual output dimension of
this layer will be
2 * n_features
. - lenscale (float, ndarray, Tensor) – the lenght scales of the ar-cosine kernel, this can be a scalar for an isotropic kernel, or a vector for an automatic relevance detection (ARD) kernel.
- p (int) – The order of the arc-cosine kernel, this must be an integer greater than, or eual to zero. 0 will lead to sigmoid-like kernels, 1 will lead to relu-like kernels, 2 quadratic-relu kernels etc.
- variational (bool) – use variational features instead of random features, (i.e. VAR-FIXED in [2]).
- lenscale_posterior (float, ndarray, optional) – the initial value for the posterior length scale. This is only used
if
variational==True
. This can be a scalar or vector (different initial value per input dimension). If this is left as None, it will be set tosqrt(1 / input_dim)
(this is similar to the ‘auto’ setting for a scikit learn SVM with a RBF kernel).
See also
- [1] Cho, Youngmin, and Lawrence K. Saul.
- “Analysis and extension of arc-cosine kernels for large margin classification.” arXiv preprint arXiv:1112.3712 (2011).
- [2] Cutajar, K. Bonilla, E. Michiardi, P. Filippone, M.
- Random Feature Expansions for Deep Gaussian Processes. In ICML, 2017.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
RandomFourier
(n_features, kernel)¶ Bases:
aboleth.layers.SampleLayer3
Random Fourier feature (RFF) kernel approximation layer.
- NOTE: This should be followed by a dense layer to properly implement a
- kernel approximation.
Parameters: - n_features (int) – the number of unique random features, the actual output dimension of
this layer will be
2 * n_features
. - kernel (kernels.ShiftInvariant) – the kernel object that yeilds the random samples from the fourier spectrum of a particular kernel to approximate. See the ab.kernels module.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.layers.
Reshape
(target_shape)¶ Bases:
aboleth.baselayers.Layer
Reshape layer.
Reshape and output an tensor to a specified shape.
Parameters: targe_shape (tuple of ints) – Does not include the samples or batch axes. -
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
SampleLayer
¶ Bases:
aboleth.baselayers.Layer
Sample Layer base class.
This is the base class for layers that build upon stochastic (variational) nets. These expect rank >= 3 input Tensors, where the first dimension indexes the random samples of the stochastic net.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.layers.
SampleLayer3
¶ Bases:
aboleth.layers.SampleLayer
Special case of SampleLayer restricted to rank == 3 input Tensors.
-
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
ab.hlayers¶
Higher-order neural network layers (made from other layers).
-
class
aboleth.hlayers.
Concat
(*layers)¶ Bases:
aboleth.baselayers.MultiLayer
Concatenates the output of multiple layers.
Parameters: layers ([MultiLayer]) – The layers to concatenate. -
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.hlayers.
PerFeature
(*layers)¶ Bases:
aboleth.baselayers.Layer
Concatenate multiple layers with sliced inputs.
Each layer will recieve a slice along the last axis of the input to the new function. In other words,
PerFeature(l1, l2)(X)
will calll1(X[..., 0]) and l2(X[..., 1])
then concatenate their outputs into a single tensor. This is mostly useful for simplifying embedding multiple categorical inputs that are stored columnwise in the same 2D tensor.This function assumes the tensor being provided is 3D.
Parameters: layers ([Layer]) – The layers to concatenate. -
__call__
(X)¶ Construct the subgraph for this layer.
Parameters: X (Tensor) – the input to this layer Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
-
class
aboleth.hlayers.
Sum
(*layers)¶ Bases:
aboleth.baselayers.MultiLayer
Sums multiple layers by adding their outputs.
Parameters: layers ([MultiLayer]) – The layers to add. -
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
ab.kernels¶
Random kernel classes for use with the RandomKernel layers.
-
class
aboleth.kernels.
Matern
(lenscale=1.0, p=1)¶ Bases:
aboleth.kernels.ShiftInvariant
Matern kernel approximation.
Parameters: - lenscale (float, ndarray, Tensor, Variable) – the length scales of the shift invariant kernel, this can be a scalar
for an isotropic kernel, or a vector of shape (input_dim, 1) for an
automatic relevance detection (ARD) kernel. If you wish to learn this
parameter, make it a Variable (or
ab.pos(tf.Variable(...))
to keep it positively constrained). - p (int) – a zero or positive integer specifying the number of the Matern kernel,
e.g.
p == 0
results int a Matern 1/2 kernel,p == 1
results in the Matern 3/2 kernel etc.
-
weights
(input_dim, n_features)¶ Generate the random fourier weights for this kernel.
Parameters: - input_dim (int) – the input dimension to this layer.
- n_features (int) – the number of unique random features, the actual output dimension
of this layer will be
2 * n_features
.
Returns: - P (ndarray) – the random weights of the fourier features of shape
(input_dim, n_features)
. - KL (Tensor, float) – the KL penalty associated with the parameters in this kernel (0.0).
- lenscale (float, ndarray, Tensor, Variable) – the length scales of the shift invariant kernel, this can be a scalar
for an isotropic kernel, or a vector of shape (input_dim, 1) for an
automatic relevance detection (ARD) kernel. If you wish to learn this
parameter, make it a Variable (or
-
class
aboleth.kernels.
RBF
(lenscale=1.0)¶ Bases:
aboleth.kernels.ShiftInvariant
Radial basis kernel approximation.
Parameters: lenscale (float, ndarray, Tensor, Variable) – the length scales of the shift invariant kernel, this can be a scalar for an isotropic kernel, or a vector of shape (input_dim, 1) for an automatic relevance detection (ARD) kernel. If you wish to learn this parameter, make it a Variable (or ab.pos(tf.Variable(...))
to keep it positively constrained).-
weights
(input_dim, n_features)¶ Generate the random fourier weights for this kernel.
Parameters: - input_dim (int) – the input dimension to this layer.
- n_features (int) – the number of unique random features, the actual output dimension
of this layer will be
2 * n_features
.
Returns: - P (ndarray) – the random weights of the fourier features of shape
(input_dim, n_features)
. - KL (Tensor, float) – the KL penalty associated with the parameters in this kernel (0.0).
-
-
class
aboleth.kernels.
RBFVariational
(lenscale=1.0, lenscale_posterior=None)¶ Bases:
aboleth.kernels.ShiftInvariant
Variational Radial basis kernel approximation.
This kernel is similar to the RBF kernel, however we learn an independant Gaussian posterior distribution over the kernel weights to sample from.
Parameters: - lenscale (float, ndarray, Tensor, Variable) – the length scales of the shift invariant kernel, this can be a scalar
for an isotropic kernel, or a vector of shape (input_dim, 1) for an
automatic relevance detection (ARD) kernel. If you wish to learn this
parameter, make it a Variable (or
ab.pos(tf.Variable(...))
to keep it positively constrained). - lenscale_posterior (float, ndarray, optional) – the initial value for the posterior length scale, this can be a
scalar or vector (different initial value per input dimension). If this
is left as None, it will be set to
sqrt(1 / input_dim)
(this is similar to the ‘auto’ setting for a scikit learn SVM with a RBF kernel).
-
weights
(input_dim, n_features)¶ Generate the random fourier weights for this kernel.
Parameters: - input_dim (int) – the input dimension to this layer.
- n_features (int) – the number of unique random features, the actual output dimension
of this layer will be
2 * n_features
.
Returns: - P (ndarray) – the random weights of the fourier features of shape
(input_dim, n_features)
. - KL (Tensor, float) – the KL penalty associated with the parameters in this kernel.
- lenscale (float, ndarray, Tensor, Variable) – the length scales of the shift invariant kernel, this can be a scalar
for an isotropic kernel, or a vector of shape (input_dim, 1) for an
automatic relevance detection (ARD) kernel. If you wish to learn this
parameter, make it a Variable (or
-
class
aboleth.kernels.
ShiftInvariant
(lenscale=1.0)¶ Bases:
object
Abstract base class for shift invariant kernel approximations.
Parameters: lenscale (float, ndarray, Tensor, Variable) – the length scales of the shift invariant kernel, this can be a scalar for an isotropic kernel, or a vector of shape (input_dim, 1) for an automatic relevance detection (ARD) kernel. If you wish to learn this parameter, make it a Variable (or ab.pos(tf.Variable(...))
to keep it positively constrained).-
weights
(input_dim, n_features)¶ Generate the random fourier weights for this kernel.
Parameters: - input_dim (int) – the input dimension to this layer.
- n_features (int) – the number of unique random features, the actual output dimension
of this layer will be
2 * n_features
.
Returns: - P (ndarray) – the random weights of the fourier features of shape
(input_dim, n_features)
. - KL (Tensor, float) – the KL penalty associated with the parameters in this kernel.
-
ab.likelihoods¶
Target/Output likelihoods.
-
class
aboleth.likelihoods.
Bernoulli
¶ Bases:
aboleth.likelihoods.Likelihood
Bernoulli log-likelihood.
-
class
aboleth.likelihoods.
Binomial
(n)¶ Bases:
aboleth.likelihoods.Likelihood
Binomial log-likelihood.
Parameters: n (float, ndarray, Tensor) – the number of trials of this binomial distribution
-
class
aboleth.likelihoods.
Categorical
¶ Bases:
aboleth.likelihoods.Likelihood
Categorical, or Generalized Bernoulli log-likelihood.
-
class
aboleth.likelihoods.
Likelihood
¶ Bases:
object
Abstract base class for likelihood objects.
-
class
aboleth.likelihoods.
Normal
(variance)¶ Bases:
aboleth.likelihoods.Likelihood
Normal log-likelihood.
Parameters: variance (float, Tensor) – the variance of the Normal likelihood, this can be made a tf.Variable if you want to learn this.
ab.distributions¶
Model parameter distributions.
-
class
aboleth.distributions.
Gaussian
(mu, L)¶ Bases:
aboleth.distributions.ParameterDistribution
Gaussian prior/posterior.
Parameters: - mu (Tensor) – mean, shape (d_in, d_out)
- L (Tensor) – Cholesky of the covariance matrix, shape (d_out, d_in, d_in)
- eps (ndarray, Tensor, optional) – random draw from a unit normal if you want to “fix” the sampling, this
should be of shape (d_in, d_out). If this is
None
then a new random draw is used for every call to sample().
-
static
itransform_w
(wt)¶ Un-transform a weight matrix, (d_out, d_in, 1) -> (d_in, d_out).
-
sample
(e=None)¶ Draw a random sample from this object.
Parameters: e (ndarray, Tensor, optional) – the random standard-Normal samples to transform to yeild samples from this distrubution. These must be of shape (d_in, ...). If this is none, these are generated in this method. Returns: x – a sample of shape (d_in, d_out), or e.shape
if providedReturn type: Tensor
-
static
transform_w
(w)¶ Transform a weight matrix, (d_in, d_out) -> (d_out, d_in, 1).
-
class
aboleth.distributions.
Normal
(mu=0.0, var=1.0)¶ Bases:
aboleth.distributions.ParameterDistribution
Normal (IID) prior/posterior.
Parameters: - mu (Tensor) – mean, shape (d_in, d_out)
- var (Tensor) – variance, shape (d_in, d_out)
-
sample
(e=None)¶ Draw a random sample from this object.
Parameters: e (ndarray, Tensor, optional) – the random standard-Normal samples to transform to yeild samples from this distrubution. These must be of shape (d_in, ...). If this is none, these are generated in this method. Returns: x – a sample of shape (d_in, d_out), or e.shape
if providedReturn type: Tensor
-
class
aboleth.distributions.
ParameterDistribution
¶ Bases:
object
Abstract base class for parameter distribution objects.
-
sample
()¶ Draw a random sample from the distribution.
-
-
aboleth.distributions.
gaus_posterior
(dim, var0)¶ Initialise a posterior Gaussian distribution with a diagonal covariance.
Even though this is initialised with a diagonal covariance, a full covariance will be learned, using a lower triangular Cholesky parameterisation.
Parameters: - dim (tuple or list) – the dimension of this distribution.
- var0 (float) – the initial (unoptimized) diagonal variance of this distribution.
Returns: Q – the initialised posterior Gaussian object.
Return type: Note
This will make tf.Variables on the randomly initialised mean and covariance of the posterior. The initialisation of the mean is from a Normal with zero mean, and
var0
variance, and the initialisation of the variance is from a gamma distribution with an alpha ofvar0
and a beta of 1.
-
aboleth.distributions.
norm_posterior
(dim, var0)¶ Initialise a posterior (diagonal) Normal distribution.
Parameters: - dim (tuple or list) – the dimension of this distribution.
- var0 (float) – the initial (unoptimized) variance of this distribution.
Returns: Q – the initialised posterior Normal object.
Return type: Note
This will make tf.Variables on the randomly initialised mean and variance of the posterior. The initialisation of the mean is from a Normal with zero mean, and
var0
variance, and the initialisation of the variance is from a gamma distribution with an alpha ofvar0
and a beta of 1.
-
aboleth.distributions.
norm_prior
(dim, var)¶ Initialise a prior (zero mean, diagonal) Normal distribution.
Parameters: - dim (tuple or list) – the dimension of this distribution.
- var (float) – the prior variance of this distribution.
Returns: P – the initialised prior Normal object.
Return type: Note
This will make a tf.Variable on the variance of the prior that is initialised with
var
.
ab.impute¶
Layers that impute missing data.
-
class
aboleth.impute.
FixedNormalImpute
(datalayer, masklayer, mu_array, var_array)¶ Bases:
aboleth.impute.ImputeOp
Impute the missing values using marginal Gaussians over each column.
Takes two layers, one the returns a data tensor and the other returns a mask layer. Creates a layer that returns a tensor in which the masked values have been imputed as random draws from the marginal Gaussians.
Parameters: - datalayer (callable) – A layer that returns a data tensor. Must be of form
f(**kwargs)
. - masklayer (callable) – A layer that returns a boolean mask tensor where True values are
masked. Must be of form
f(**kwargs)
. - mu_array (array-like) – A list of the global mean values of each dat column
- var_array (array-like) – A list of the global variance of each data column
-
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
- datalayer (callable) – A layer that returns a data tensor. Must be of form
-
class
aboleth.impute.
ImputeOp
(datalayer, masklayer)¶ Bases:
aboleth.baselayers.MultiLayer
Abstract Base Impute operation. These specialise MultiLayers.
They expect a data InputLayer and a mask InputLayer. They return layers in which the masked values have been imputed.
Parameters: - datalayer (callable) – A layer that returns a data tensor. Must be of form
f(**kwargs)
. - masklayer (callable) – A layer that returns a boolean mask tensor where True values are
masked. Must be of form
f(**kwargs)
.
-
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
- datalayer (callable) – A layer that returns a data tensor. Must be of form
-
class
aboleth.impute.
LearnedNormalImpute
(datalayer, masklayer)¶ Bases:
aboleth.impute.ImputeOp
Impute the missing values with draws from learned normal distributions.
Takes two layers, one the returns a data tensor and the other returns a mask layer. This creates a layer that will learn marginal Gaussian parameters per column, and infill missing values using draws from these Gaussians.
Parameters: - datalayer (callable) – A layer that returns a data tensor. Must be an InputLayer.
- masklayer (callable) – A layer that returns a boolean mask tensor where True values are masked. Must be an InputLayer.
-
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.impute.
LearnedScalarImpute
(datalayer, masklayer)¶ Bases:
aboleth.impute.ImputeOp
Impute the missing values using learnt scalar for each column.
Takes two layers, one the returns a data tensor and the other returns a mask layer. Creates a layer that returns a tensor in which the masked values have been imputed with a learned scalar value per colum.
Parameters: - datalayer (callable) – A layer that returns a data tensor. Must be an InputLayer.
- masklayer (callable) – A layer that returns a boolean mask tensor where True values are masked. Must be an InputLayer.
-
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
-
class
aboleth.impute.
MeanImpute
(datalayer, masklayer)¶ Bases:
aboleth.impute.ImputeOp
Impute the missing values using the stochastic mean of their column.
Takes two layers, one the returns a data tensor and the other returns a mask layer. Returns a layer that returns a tensor in which the masked values have been imputed as the column means calculated from the batch.
Parameters: - datalayer (callable) – A layer that returns a data tensor. Must be of form
f(**kwargs)
. - masklayer (callable) – A layer that returns a boolean mask tensor where True values are
masked. Must be of form
f(**kwargs)
.
-
__call__
(**kwargs)¶ Construct the subgraph for this layer.
Parameters: **kwargs – the inputs to this layer (Tensors) Returns: - Net (Tensor) – the output of this layer
- KL (float, Tensor) – the regularizer/Kullback Leibler ‘cost’ of the parameters in this layer.
- datalayer (callable) – A layer that returns a data tensor. Must be of form
ab.random¶
Random generators and state.
-
class
aboleth.random.
SeedGenerator
¶ Bases:
object
Make new random seeds deterministically from a base random seed.
-
next
()¶ Generate a random int using this object’s base state.
Returns: result – an integer that can be used to seed other random states deterministically. Return type: int
-
set_hyperseed
(hs)¶ Set the random seed state in this object.
Parameters: hs (None, int, array_like) – seed the random state of this object, see numpy.random.RandomState for valid inputs.
-
-
aboleth.random.
endless_permutations
(N)¶ Generate an endless sequence of permutations of the set [0, ..., N).
If we call this N times, we will sweep through the entire set without replacement, on the (N+1)th call a new permutation will be created, etc.
Parameters: N (int) – the length of the set Yields: int – yeilds a random int from the set [0, ..., N) Examples
>>> perm = endless_permutations(5) >>> type(perm) <class 'generator'> >>> p = next(perm) >>> p < 5 True >>> p2 = next(perm) >>> p2 != p True
-
aboleth.random.
set_hyperseed
(hs)¶ Set the global hyperseed from which to generate all other seeds.
Parameters: hs (None, int, array_like) – seed the random state of the global hyperseed, see numpy.random.RandomState for valid inputs.
ab.util¶
Package helper utilities.
-
aboleth.util.
batch
(feed_dict, batch_size, n_iter=10000, N_=None)¶ Create random batches for Stochastic gradients.
Feed dict data generator for SGD that will yeild random batches for a a defined number of iterations, which can be infinite. This generator makes consecutive passes through the data, drawing without replacement on each pass.
Parameters: - feed_dict (dict of ndarrays) – The data with
{tf.placeholder: data}
entries. This assumes all items have the same length! - batch_size (int) – number of data points in each batch.
- n_iter (int, optional) – The number of iterations
- N (tf.placeholder (int), optional) – Place holder for the size of the dataset. This will be fed to an algorithm.
Yields: dict – with each element an array length
batch_size
, i.e. a subset of data, and an element forN_
. Use this as your feed-dict when evaluating a loss, training, etc.- feed_dict (dict of ndarrays) – The data with
-
aboleth.util.
batch_prediction
(feed_dict, batch_size)¶ Split the data in a feed_dict into contiguous batches for prediction.
Parameters: - feed_dict (dict of ndarrays) – The data with
{tf.placeholder: data}
entries. This assumes all items have the same length! - batch_size (int) – number of data points in each batch.
Yields: - ndarray – an array of shape approximately (
batch_size
,) of indices into the original data for the current batch - dict – with each element an array length
batch_size
, i.e. a subset of data. Use this as your feed-dict when evaluating a model, prediction, etc.
Note
The exact size of the batch may not be
batch_size
, but the nearest size that splits the size of the data most evenly.- feed_dict (dict of ndarrays) – The data with
-
aboleth.util.
pos
(X, minval=1e-15)¶ Constrain a
tf.Variable
to be positive only.At the moment this is implemented as:
\(\max(|\mathbf{X}|, \text{minval})\)This is fast and does not result in vanishing gradients, but will lead to non-smooth gradients and more local minima. In practice we haven’t noticed this being a problem.
Parameters: - X (Tensor) – any Tensor in which all elements will be made positive.
- minval (float) – the minimum “positive” value the resulting tensor will have.
Returns: X – a tensor the same shape as the input
X
but positively constrained.Return type: Tensor
Examples
>>> X = tf.constant(np.array([1.0, -1.0, 0.0])) >>> Xp = pos(X) >>> with tf.Session(): ... xp = Xp.eval() >>> xp array([ 1.00000000e+00, 1.00000000e+00, 1.00000000e-15])
-
aboleth.util.
predict_expected
(predictor, feed_dict, n_groups=1, session=None)¶ Help to get the expected value from a predictor.
Parameters: - predictor (Tensor) – a tensor that outputs a shape (n_samples, N, tasks) where
n_samples
are the random samples from the predictor (e.g. the output ofNet
),N
is the size of the query dataset, andtasks
the number of prediction tasks. - feed_dict (dict) – The data with
{tf.placeholder: data}
entries. - n_groups (int) – The number of times to evaluate the
predictor
and concatenate the samples. - session (Session) – the session to be used to evaluate the predictor.
Returns: pred – expected value of the prediction with shape (N, tasks).
n_samples * n_groups
samples go into evaluating this expectation.Return type: ndarray
Note
This has to be called in an active tensorflow session!
- predictor (Tensor) – a tensor that outputs a shape (n_samples, N, tasks) where
-
aboleth.util.
predict_samples
(predictor, feed_dict, n_groups=1, session=None)¶ Help to get samples from a predictor.
Parameters: - predictor (Tensor) – a tensor that outputs a shape (n_samples, N, tasks) where
n_samples
are the random samples from the predictor (e.g. the output ofNet
),N
is the size of the query dataset, andtasks
the number of prediction tasks. - feed_dict (dict) – The data with
{tf.placeholder: data}
entries. - n_groups (int) – The number of times to evaluate the
predictor
and concatenate the samples. - session (Session) – the session to be used to evaluate the predictor.
Returns: pred – prediction samples of shape (n_samples * n_groups, N, tasks).
Return type: ndarray
Note
This has to be called in an active tensorflow session!
- predictor (Tensor) – a tensor that outputs a shape (n_samples, N, tasks) where
ab.datasets¶
Feedback¶
If you have any suggestions or questions about Aboleth feel free to email us at lachlan.mccalman@data61.csiro.au or daniel.steinberg@data61.csiro.au.
If you encounter any errors or problems with Aboleth, please let us know! Open an Issue at the GitHub http://github.com/determinant-io/aboleth main repository.