Welcome to apsis’s documentation!¶
A toolkit for hyperparameter optimization for machine learning algorithms.
Our goal is to provide a flexible, simple and scaleable approach - parallel, on clusters and/or on your own machine. Check out our usage tutorials to get started or the design pages to understand how apsis works.
Contents¶
Using apsis for Hyperparameter Optimization¶
First steps¶
Once we have installed apsis, we can continue by doing our first project.
Our most important interface to apsis is the PrettyLabAssistant
or - if we don’t need plots, storage and so on - the BasicLabAssistant
.
Firstly, let’s talk about Experiments
. Experiments are one of the building blocks of apsis.
Each Experiment represents a series of trials, each with a different parameter configuration. These trials are called Candidates
. Each Candidate stores the parameter configuration used in the corresponding trial and - once evaluated - the evaluation’s result.
An Experiment, aside from the Candidates, also stores some details on the experimental setup. It stores the experiment name, whether the experiment’s goal is to minimize or maximize the result and the parameter definitions. The latter is probably the most important part of an experiment. It defines whether a parameter is nominal, ordinal or numeric.
A complete list of parameter definitions can be found here
, but the two most useful are the MinMaxNumericParamDef
and the NominalParamDef
. The first one represents a numeric parameter whose prior is uniformly distributed between a minimum and a maximum value, while the second is just an unordered list of nominal values.
Each Experiment has a dictionary of parameter definitions, which have to be designated to define that Experiment. For example, let us try to optimize a one-dimensional function, \(f(x) = cos(x) + x/4\) for x between 0 and 10:
import math
def f(x):
return math.cos(x) + x/4
As said above, we are now defining our parameter space. Our only parameter is a numeric parameter between 0 and 10, called x.:
from apsis.models.parameter_definition import MinMaxNumericParamDef
param_defs = {
'x': MinMaxNumericParamDef(0, 10)
}
Now, let’s initialize the LabAssistant and the first experiment:
from apsis.assistants.lab_assistant import PrettyLabAssistant
assistant = PrettyLabAssistant()
assistant.init_experiment("tutorial_experiment", "BayOpt", param_defs, minimization=True)
As you can see, we have first initialized the LabAssistant, then the first experiment. The experiment is called tutorial_experiment (each name must be unique). It uses the BayOpt optimizer, is defined by the param_defs we have set above and the goal is one of minimization. We might also give the experiment’s optimizer further parameters, but don’t do so in this case.
Now, there are two main functionalities of the LabAssistant we usually use: getting the next candidate to try, and returning its result. First, our first proposal:
candidate = assistant.get_next_candidate("tutorial_experiment")
As usual, the first argument specifies which experiment we want to get the next candidate from. There are two important fields in such a Candidate: params and result. We use the first one to set our next evaluation, and the second one to report our evaluation.:
x = candidate.params['x']
candidate.result = f(x)
assistant.update("tutorial_experiment", candidate)
We can continue doing so until we have reached a break criterium, for example a certain number of steps or a sufficiently good result:
for i in range(30):
candidate = assistant.get_next_candidate("tutorial_experiment")
x = candidate.params['x']
candidate.result = f(x)
assistant.update("tutorial_experiment", candidate)
Afterwards, we probably want to get the best result and the parameters we’d used to get there. This is quite simple:
best_cand = assistant.get_best_candidate("tutorial_experiment")
This gives us the best candidate we have found. We can then check result and params.:
print("Best result: " + str(best_cand.result))
print("with parameters: " + str(best_cand.params))
In my case, the result was -0.245, with an x of 2.87. Yours will vary depending on the randomization.
But of course, we want to see how we performed over time! For this, the PrettyLabAssistant has the ability to plot experiment results over time, and to compare them. Currently, we just need one, though:
assistant.plot_result_per_step(['tutorial_experiment'])
My plot looks like this:

On the y-axis is the step, on the x axis the result. The line represents the best result found for each step, while the dots are the hypothesis tested in that step. Since the standard values for BayesianOptimization define a ten-step random search, we can see the following: First, we test ten points at random. Beginning with the first step where bayesian optimization begins, we find a very good solution, which is then improved in step 11. The following steps find only slight improvements to this.
That’s it! We have optimized our first problem. Further tutorials will follow.
Bayesian Optimization with apsis - Advanced Tutorial¶
apsis implements the technique called Bayesian Optimization for optimizing your hyperparameters. There are several problems involved with optimizing hyperparameters, why it took up to now for automated methods to become available. First, you usually have any assumption on the form and behaviour of the loss function you want to optimize over. Second, you are actually running a surrounding optimization loop around some other sort of optimization loop. This means that in every iteration your optimization for the hyperparameters usually includes another optimization, which is your acutal algorithm - e.g. a machine learning algorithm - for which you optimize the parameters. This makes it necessary to carefully select the next hyperparameter vector to try, since that migh involve to train your machine learning algorithm for a couple of hours or more.
Hence in Bayesian Optimization a Gaussian Process is trained as a surrogate model to approximate the loss function or whatever measure of your algorithm shall be optimized. A central role here is played by the so called acquisition function that is responsible for interpreting the surrogate model and suggest new hyperparameter vectors to sample from your original algorithm. This function is maximized and the hyperparameter vector maximizing it will be the next one to be tested.
A comprehensive introduction to the field of hyperparameter optimization for machine learning algorithms can be found in
[1] | James Bergstra, R ́emy Bardenet, Yoshua Bengio, and Balazs Kegl. Algorithms for hyper-parameter optimization. In NIPS’2011, 2011. See here for downloading the paper. |
In there you can also find a very short introduction to Bayesian Optimization. For a more clear and in-depth understanding how Bayesian Optimization itself works the following great tutorial is a must reader.
[2] | Eric Brochu, Vlad M. Cora, and Nando de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. IEEE Transactions on Reliability, 2010. See here for downloading the paper. |
Finally the following paper provides a comprehensive introduction in the usage of Bayesian Optimization for hyperparameter optimization.
[3] | Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In NIPS, pages 2960–2968, 2012. See here for downloading the paper. |
This tutorial gives an overview on how to switch around between several possibilities implemented in apsis, including how to switch acquisition functions, kernels or the way acquisition functions are optimized.
Choosing Acquisition Functions¶
So far apsis contains only two acquisition functions, the ProbabilityOfImprovement
function, as well as ExpectedImprovement
. You can easily provide your own acquisition functions by extending to the AcquisitionFunction
class. While Probability Of Improvement is mainly included for the sake of completeness the Expected Improvement function is the one most commonly used in Bayesian Optimization. It is said to have a good balance between exploration of unknown regions and exploitation of well-known but promising regions.
To choose which acquisition function to use you can do so using the LabAssistant
or ExperimentAssistant
interface and passing the acquisition function name in the optimizer_arguments.:
from apsis.optimizers.bayesian.acquisition_functions import ExpectedImprovement, ProbabilityOfImprovement
from apsis.assistants.lab_assistant import PrettyLabAssistant
LAss = PrettyLabAssistant()
LAss.init_experiment("bay_EI", "BayOpt", param_defs, minimization=True, optimizer_arguments={"acquisition": ExpectedImprovement, "initial_random_runs": 5} )
LAss.init_experiment("bay_POI", "BayOpt", param_defs, minimization=True, optimizer_arguments={"acquisition": ProbabilityOfImprovement, "initial_random_runs": 5} )
Furthermore an acquisition function can receive hyperparameters, e.g. for telling apsis how to optimize this function. These hyperparameters are specific to the acquisition function. ExpectedImprovement
for example can be told to use another optimization method. More on this in the section on Acquisition Optimization.
LAss.init_experiment(“bay_EI_BFGS”, “BayOpt”, param_defs, minimization=True, optimizer_arguments={“acquisition”: ExpectedImprovement, “initial_random_runs”: 5, “acquisition_hyperparams”:{“optimization”: “BFGS”}} )
Choosing Kernels¶
Another central point to tweak your bayesian optimization is the kernel used. apsis supports the Matern 5-2 and the RBF kernel, whereas the first one is selected as standard choice. For both kernels the implementation of the gpY package is used. Choosing your kernel works similar to choosing your acquisition function.
You can either specify the kernel as one of those two strings [“matern52”, “rbf”] or supply a class inheriting from the GPy.kern.Kern class.:
from apsis.assistants.lab_assistant import PrettyLabAssistant
impoort GPy
LAss = PrettyLabAssistant()
LAss.init_experiment("bay_RBF", "BayOpt", param_defs, minimization=True, optimizer_arguments={"kernel": "rbf", "initial_random_runs": 5} )
LAss.init_experiment("bay_Matern52", "BayOpt", param_defs, minimization=True, optimizer_arguments={"kernel": GPy.kern.Matern52, "initial_random_runs": 5} )
A kernel can also be given parameters if necessary. For example a frequent parameter to the gpY kernels is if automatic relevance determination (ARD) shall be used or not.:
LAss.init_experiment("bay_Matern52", "BayOpt", param_defs, minimization=True, optimizer_arguments={"kernel": GPy.kern.Matern52, "kernel_params": {"ARD": True}, "initial_random_runs": 5} )
By default the Matern 5-2 kernel with ARD will be used.
Minimizing or Maximizing your Objective Function¶
By default apsis assumes you want to minimize your objective function, e.g. that it represents the error of your machine learning algorithm. However, apsis can easily be switched around to be used for maximization when specifying the minimization property of LabAssistant
or ExperimentAssistant
.:
from apsis.assistants.lab_assistant import PrettyLabAssistant
LAss = PrettyLabAssistant()
LAss.init_experiment("bay_Matern52", "BayOpt", param_defs, minimization=False, optimizer_arguments={"kernel": GPy.kern.Matern52, "initial_random_runs": 5} )
Dealing with the GPs Hyperparameter¶
In addition to those hyperparameters that are subject of optimization the Gaussian Process used to approximate the underlying model also has hyperparameters. Especially the kernel usually has a relevance parameter influencing the shape of the distribution. This can be one parameter or several depending on if an ARD kernel is used or not. By default apsis uses maximum likelyhood as implemented by the gpY package to optimize these parameters.
Additionally you can switch to use Hybrid Monte Carlo sampling provided by the gpY package to integrate these parameters out. This will only apply for the GP and kernel hyperparameters, not for those of the acquisition function. To do so simply switch the mcmc parameter to True.
from apsis.assistants.lab_assistant import PrettyLabAssistant
LAss = PrettyLabAssistant()
LAss.init_experiment("bay_rand", "BayOpt", param_defs, minimization=True, optimizer_arguments={"initial_random_runs": 5, "mcmc": True})
Note that using the Monte Carlo sampling takes considerably more than time than not. You should consider this option only of you are optimizing a several ours or more running ml algorithm.
Expected Improvement¶
This section describes how Expected Improvement is implemented in apsis. You migh also want to see the source code..
The Expected Improvement function implemented in apsis has a couple of places that can be tuned
- maximization method of ExpectedImprovement
- exploration/exploitation tradeoff
- minimization or maximization
Closed Form Computation and Gradient¶
Expected Improvement (EI) is generally defined as the expectation value of the improvement, hence being the integral of the improvement times its probability for every possible hyperparameter vector, called \(\lambda\) here.
\(y\) represents the GP model’s prediction for the value of the objective function if the hyperparameter vector is set to \(\lambda\) and \(y^{*}\) marks the best value measured on the true objective function so far. Fortunately there is a closed form of this equation available.
with
In apsis there is an adopted version in use that allows for switching maximization and minimization of the objective function, and adds an additional parameter \(\zeta\) used to balance the exploitation/exploration tradeoff in EI. \(MAX\) is assumed to be a binary value of either \(0\) if the function is being minimized or \(1\) for maxmimization of the objective function.
Also the gradient has been derived for EI in order to be able to apply gradient based optimization methods.
EI Optimization¶
No matter if the underlying objective function is to be maximized or minimized EI always has to be maximized since we want to do the maximum possible improvement in every step.
apsis provides the following possibilities for maximization of EI. The value in [“XX”] denotes the key for activating the respective method.
- random search [“random”]
- Quasi-Newton optimization using the inverse BFGS method. [“BFGS”]
- Nelder-Mead method [“Nelder-Mead”]
- Powell method [“Powell”]
- Conjugate Gradient method [“CG”]
- inexact/truncated Newton method using Conjugate Gradient to solve the Newton Equation [“Newton-CG”]
For the latter 5 it shall be referred to the docs of the scipy project since their implementation is used in apsis. The first one is implemented directly in apsis.
To switch the optimization method simply specify the acquisition hyperparameter optimization when initializing your experiments.:
from apsis.assistants.lab_assistant import PrettyLabAssistant
LAss = PrettyLabAssistant()
LAss.init_experiment("bay_RBF", "BayOpt", param_defs, minimization=True, optimizer_arguments={"acquisition": ExpectedImprovement, "initial_random_runs": 5, "acquisition_hyperparams":{"optimization": "BFGS"}} )
Since the gradient of EI can also be computed in closed form it is desirable to make use of that first order information during optimization. Hence BFGS optimization is set as default method since it generally performs better than the others when gradients are available. For all of the optimization methods above a random search is performed first and the best samples from random search will be used as initializers for the more sophisticated optimization methods.
To prevent keeping stuck in local extrema too much optimization can use multiple restarts. By default random search uses 1000 iterations.:
from apsis.assistants.lab_assistant import PrettyLabAssistant
LAss = PrettyLabAssistant()
LAss.init_experiment("bay_RBF", "BayOpt", param_defs, minimization=True, optimizer_arguments={"acquisition": ExpectedImprovement, "initial_random_runs": 5, "acquisition_hyperparams":{"optimization_random_steps": 100000}} )
Also the number of function evaluations for random search can be specified as follows. This will have an effect on all optimizations methods you select since in every case a random search is done at first place. By default random search uses 10 random restarts will be done.:
from apsis.assistants.lab_assistant import PrettyLabAssistant
LAss = PrettyLabAssistant()
LAss.init_experiment("bay_RBF", "BayOpt", param_defs, minimization=True, optimizer_arguments={"acquisition": ExpectedImprovement, "initial_random_runs": 5, "acquisition_hyperparams":{"optimization_random_restarts": 10}} )
Installing Apsis¶
This guide provides instructions for how to get apsis running on your system. The guide is manily targetted at Ubuntu/Debian and Mac OS users, however as a user of another linux based OS you should easily to be able to follow this guide with the methods used in your distro.
Prerequisites¶
Since GPy requires Python 2, so does apsis.
Apsis requires the following python frameworks and their dependencies to be installed.
- numpy
- scipy
- sklearn
- gpY, versions >= 0.6.0
- matplotlib
Note
For apsis versions newer than December 2014 older gpY versions will no longer work. It has been developed and tested to work with gpY version 0.6.0.
Operating Systems
- developed on Ubuntu 14.04. Tested on Mac OS X Yosemite.
- most unix based operating systems for which the dependencies listed above are available should work.
- no support for non-unix systems right now.
Installation using PIP¶
apsis can easiest be installed using PIP by just executing
$ pip install apsis --pre
If the installation fails then you most likely do not have the appropriate non-python requirements for one of the packages installed above. These are a fortran compiler and a blas library (for scipy), libpng and libfreetpye (for matplotlib).
On a newly installed Ubuntu system (tested with 15.04), execute
$ sudo apt-get install python-pip python-dev gfortran libpng12-dev libfreetype6-dev libopenblas-dev
followed by the following pip commands:
$ pip install numpy
$ pip install scipy
$ pip install --pre apsis
Manual Installation¶
Installing Non-Python Requirements by Operating System¶
Installing Non-Python Prerequisites on Debian/Ubuntu¶
The compilation of matplotlib and scipy have several non-python dependencies such as C and fortran compilers or linear algebra libraries. Also you should install pip
to install the newest versions of the python dependencies.
Tested on Ubuntu 14.04 the following command should give you what you need. If you run on another OS please check out the documentation of the listed prerequesites above for how to install them.
$ sudo apt-get install git build-essential python-pip gfortran libopenblas-dev liblapack-dev libfreetype6-dev libpng12-dev python-dev
Optional In order to be able to use the Markov Chain Monte Carlo sampling for integrating over GP Hyperparameters you need to install a HDFS distribution on your system. For Ubuntu 14.04 the following will do the trick.
$ sudo apt-get install libhdf5-serial-dev
Installing Non-Python Prerequesites on Mac OS X¶
You need to update your python version to a later version than the one distributed with your OS.
Installation works easy when using homebrew package manager, please see the homebrew page for how to install it.
When homebrew is installed follow these instructions.
Install another and up to date Python distribution.
$ brew install python $ brew linkapps python
Install pip
$ brew install pip $ brew linkapps pip
Installing Python Prerequisites with PIP¶
Make sure you have
pip
and the non-python prerequisites for the libraries listed above installed on your systemInstall numpy.
$ pip install --upgrade numpy
Install scikit learn.
$ pip install --upgrade scikit-learn
Install matplotlib.
$ pip install --upgrade matplotlib
Install gpY. It will also install the required scipy version for you.
$ pip install --upgrade gpy==0.6.0
Optional If you want to use MCMC sampling for the hyperparameters of the acquisition functions in bayesian optimization then you need to install pymc. The installation is easy and you only need to clone the git repository and run the setup script. See the following link for details.
https://github.com/ebilionis/py-mcmc
Installing and Running Apsis¶
Apsis doesn’t have an installation routine yet. To be ready to use you need to
Pull the code repository
$ git clone https://github.com/FrederikDiehl/apsis.git
Set the PYTHONPATH environment variable to include th apsis folder
$ export PYTHONPATH=[WHEREVER]/apsis/code/apsis
Finally run the test suite to see if everything works alright:
$ cd apsis/code/apsis
$ nosetests
Which should print something like
$ nosetests
.
----------------------------------------------------------------------
Ran XX tests in YYs
OK
Design¶
Overview¶

In general, the building blocks of apsis are Experiments
. An experiment corresponds to a series of evaluations, the result of which should be optimized. To store the results, the Candidates
exist. Each candidate corresponds to one evaluation of the experiment. It stores the parameters used for the evaluation, and eventually the result.
For administration of an experiment, you can use ExperimentAssistants
. They store their corresponding experiment and an optimizer, and let communicate with the outside. This communication happens mostly via two functions: update and next_candidate. next_candidate gives you the next parameter combination one should try out. update allows you to update the ExperimentAssistant with new data.
Optimizers
are the base for optimization in this module. Each optimizer stores which parameter definitions are supported, and has a single important function: get_next_candidates. This, given an experiment, returns several candidates one should try next. This also means that, by changing the optimizer parameter in ExperimentAssistant, one can hot-swap optimizers.
Since it is quite likely that we have several experiments running at once, we better use the LabAssistant
. The LabAssistant administrates several experiments, identified by their unique names. Experiments can be compared and plotted in different combinations. Additionally ValidationLabAssistant
can run multiple experiments and cross validate them. It offers additional plots showing the error bars obtained by CV.
Parameter storage¶
Similar to Scikit-learn, we have decided to store parameters in a dictionary of string keys. The advantage is that the translation from inside the program to the outside is easy, and each parameter is clearly defined. It also makes debugging easier.
Parameter Definitions¶
For Parameter Definitions to work, we have defined a parameter definition tree: Each parameter definition inherits from ParamDef, and inherits from other parameter definitions. The tree is shown below:

The reason for this is simple: Since we do not know in advance which optimizers may be implemented, and since one optimizer may not have support for every parameter definition - for example, Bayesian Optimization does not support nominal parameters right now. Since each parameter definition inherits from an already existing one, an optimizer just has to support a base class of one to work, and no further work is necessary. If, on the other hand, some special cases for a parameter definition exists, it can be used without a problem.
In general, all parameter definitions inherit from ParamDef
. This defines only two functions: One is the distance between two points from the parameter definition, the other is to test whether a value is part of this ParamDef.
Below of ParamDef, there is a distinction into two different classes. NumericParamDef
defines continuous, numeric values. These are defined by their warping functions. Warping maps some undefined parameter space to [0, 1] and back. It is used to make all following parameters conform to the same internal format. While NumericParamDef allows you to specify the warping function yourself, MinMaxNumericParamDef
predefines a uniform distribution from Min to Max. In addition AsymptoticNumericParamDef
meets the frequent use case of having parameter that is close to a certain boundary but may never exceed this boundary. For example when optimizing learning rates they are often asymptotic at 0 but must never become 0.
On the other side, there is NominalParamDef
, which defines a nominal parameter definition. It is defined by a list of possible values. It is extended by OrdinalParamDef
, which defines an order on that, and PositionParamDef
which defines positions for each of its values. That is, the distance between value A and B is the same as the difference between the position of A and the position of B. FixedValueParamDef
can be used for integer values or similar, and builds on PositionParamDef by defining that position from the value of the values. It can be used to represent any fixed points.
API Documentation¶
apsis package¶
Subpackages¶
apsis.assistants package¶
apsis.models package¶
-
class
apsis.models.candidate.
Candidate
(params, worker_information=None)¶ Bases:
object
A Candidate is a dictionary of parameter values, which should - or have been - evaluated.
A Candidate object can be seen as a single iteration of the experiment. It is first generated as a suggestion of which parameter set to evaluate next, then updated with the result and cost of the evaluation.
Attributes
Methods
-
cost
= None¶
-
params
= None¶
-
result
= None¶
-
to_csv_entry
(delimiter=', ', key_order=None)¶ Returns a csv entry representing this candidate.
It is delimited by delimiter, and first consists of all parameters in the order defined by key_order, followed by the cost and result.
Parameters: delimiter : string, optional
The string delimiting csv entries
- key_order : list of param names, optional
A list defining the order of keys written to csv. If None, the order will be set by sorting the keys.
Returns: string : string
The (one-line) string representing this Candidate as a csv line
-
to_dict
()¶ EXPERIMENTAL
-
worker_information
= None¶
-
-
apsis.models.candidate.
from_dict
(dict)¶ EXPERIMENTAL
-
class
apsis.models.experiment.
Experiment
(name, parameter_definitions, minimization_problem=True)¶ Bases:
object
An Experiment is a set of parameter definitions and multiple candidate evaluations thereon.
Attributes
Methods
-
add_finished
(candidate)¶ Announces a Candidate instance to be finished evaluating.
This moves the Candidate instance to the candidates_finished list and updates the best_candidate.
Parameters: candidate : Candidate
The Candidate to be added to the finished list.
Raises: ValueError : :
Iff candidate is not a Candidate object.
-
add_pausing
(candidate)¶ Updates the experiment that work on candidate has been paused.
This updates candidates_pending list and the candidates_working list if it contains the candidate.
Parameters: candidate : Candidate
The Candidate instance that is currently paused.
Raises: ValueError : :
Iff candidate is no Candidate object.
-
add_pending
(candidate)¶ Adds a new pending Candidate object to be evaluated.
This function should be used when a new pending candidate is supposed to be evaluated. If an old Candidate should be updated as just pausing, use the add_pausing function.
Parameters: candidate : Candidate
The Candidate instance that is supposed to be evaluated soon.
Raises: ValueError : :
Iff candidate is no Candidate object.
-
add_working
(candidate)¶ Updates the experiment to now start working on candidate.
This updates candidates_working list and the candidates_pending list if candidate is in the candidates_pending list.
Parameters: candidate : Candidate
The Candidate instance that is currently being worked on.
Raises: ValueError : :
Iff candidate is no Candidate object.
-
best_candidate
= None¶
-
better_cand
(candidateA, candidateB)¶ Determines whether CandidateA is better than candidateB in the context of this experiment. This is done as follows: If candidateA’s result is None, it is not better. If candidateB’s result is None, it is better. If it is a minimization problem and the result is smaller than B’s, it is better. Corresponding for being a maximization problem.
Parameters: candidateA : Candidate
The candidate which should be better.
candidateB : Candidate
The baseline candidate.
Returns: result : bool
True iff A is better than B.
Raises: ValueError : :
If candidateA or candidateB are no Candidates.
-
candidates_finished
= None¶
-
candidates_pending
= None¶
-
candidates_working
= None¶
-
clone
()¶ Create a deep copy of this experiment and return it.
Returns: copied_experiment : Experiment
A deep copy of this experiment.
-
minimization_problem
= None¶
-
name
= None¶
-
parameter_definitions
= None¶
-
to_csv_results
(delimiter=', ', line_delimiter='\n', key_order=None, wHeader=True, fromIndex=0)¶ Generates a csv result string from this experiment.
Parameters: delimiter : string, optional
The column delimiter
- line_delimiter : string, optional
The line delimiter
- key_order : list of strings, optional
The order in which the parameters should be written. If None, the order is defined by sorting the parameter names.
- wHeader : bool, optional
Whether a header should be written. Defaults to true.
- from_index : int, optional
Beginning from which result the csv should be generated.
Returns: csv_string : string
The corresponding csv string
- steps_included : int
The number of steps included in the csv.
-
warp_pt_in
(params)¶ Warps in a point.
Parameters: params : dict of string keys
The point to warp in
Returns: warped_in : dict of string keys
The warped-in parameters.
-
warp_pt_out
(params)¶ Warps out a point.
Parameters: params : dict of string keys
The point to warp out
Returns: warped_out : dict of string keys
The warped-out parameters.
-
-
class
apsis.models.parameter_definition.
AsymptoticNumericParamDef
(asymptotic_border, border)¶ Bases:
apsis.models.parameter_definition.NumericParamDef
This represents an asymptotic parameter definition.
It consists of a fixed border - represented at 0 - and an asymptotic border - represented at 1.
In general, multiplying the input parameter by 1/10th means a multiplication of the warped-in value by 1/2. This means that each interval between 10^-i and 10^-(i-1) is represented by an interval of length 1/2^i on the hypercube.
For example, assume that you want to optimize over a learning rate. Generally, they are close to 0, with parameter values (and therefore possible optimization values) like 10^-1, 10^-4 or 10^-6. This could be done by initializing this class with asymptotic_border = 0 and border = 1.
Trying to optimize a learning rate decay - which normally is close to 1 - one could initialize this class with asymptotic_border = 1 and border = 0.
Attributes
Methods
-
asymptotic_border
= None¶
-
border
= None¶
-
warp_in
(value_in)¶ Warps value_in in.
Parameters: value_in : float
Should be between (including) border and asymptotic_border. If outside the corresponding interval, it is automatically translated to 0 and 1 respectively.
Returns: value_in : float
The [0, 1]-translated value.
-
warp_out
(value_out)¶ Warps value_in out.
Parameters: value_out : float
Should be between (including) 0 and 1. If bigger than 1, it is translated to border. If smaller than 0, it is translated to asymptotic_border.
Returns: value_out : float
The translated value.
-
-
class
apsis.models.parameter_definition.
ComparableParamDef
¶ Bases:
object
- This class defines an ordinal parameter definition subclass, that is a
- parameter definition in which the values are comparable.
It additionally implements the compare_values_function.
Methods
-
compare_values
(one, two)¶ Compare values one and two of this datatype.
It has to follow the same return semantics as the Python standard __cmp__ methods, meaning it returns negative integer if one < two, zero if one == two, a positive integer if one > two.
Parameters: one : object in parameter definition
The first value used in comparison.
two : object in parameter definition
The second value used in comparison.
Returns: comp: integer :
comp < 0 iff one < two. comp = 0 iff one = two. comp > 0 iff one > two.
-
class
apsis.models.parameter_definition.
FixedValueParamDef
(values)¶ Bases:
apsis.models.parameter_definition.PositionParamDef
Extension of PositionParamDef, in which the position is equal to the value of each entry from values.
Attributes
Methods
-
class
apsis.models.parameter_definition.
MinMaxNumericParamDef
(lower_bound, upper_bound)¶ Bases:
apsis.models.parameter_definition.NumericParamDef
Defines a numeric parameter definition defined by a lower and upper bound.
Attributes
Methods
-
is_in_parameter_domain
(value)¶
-
warp_in
(value_in)¶
-
warp_out
(value_out)¶
-
x_max
= None¶
-
x_min
= None¶
-
-
class
apsis.models.parameter_definition.
NominalParamDef
(values)¶ Bases:
apsis.models.parameter_definition.ParamDef
This defines a nominal parameter definition.
A nominal parameter definition is defined by the values as given in the init function. These are a list of possible values it can take.
Attributes
Methods
-
is_in_parameter_domain
(value)¶ Tests whether value is in self.values as defined during the init function.
-
values
= None¶
-
-
class
apsis.models.parameter_definition.
NumericParamDef
(warping_in, warping_out)¶ Bases:
apsis.models.parameter_definition.ParamDef
,apsis.models.parameter_definition.ComparableParamDef
This class defines a numeric parameter definition.
It is characterized through the existence of a warp_in and a warp_out function. The warp_in function squishes the whole parameter space to the unit space [0, 1], while the warp_out function reverses this. Note that it is necessary that
x = warp_in(warp_out(x)) for x in [0, 1] and x = warp_out(warp_in(x)) for x in allowed parameter space.Attributes
Methods
-
compare_values
(one, two)¶
-
distance
(valueA, valueB)¶
-
is_in_parameter_domain
(value)¶ Uses the warp_out function for tests.
-
warp_in
(value_in)¶ Warps value_in into the [0, 1] space.
Parameters: value_in : float
The input value
Returns: value_in_scaled: float in [0, 1] :
The scaled output value.
-
warp_out
(value_out)¶ Warps value_out out of the [0, 1] space.
Parameters: value_out : float in [0, 1]
The output value.
Returns: value_out_unscaled : float
The unscaled value in the parameter space.
-
warping_in
= None¶
-
warping_out
= None¶
-
-
class
apsis.models.parameter_definition.
OrdinalParamDef
(values)¶ Bases:
apsis.models.parameter_definition.NominalParamDef
,apsis.models.parameter_definition.ComparableParamDef
Defines an ordinal parameter definition.
This class inherits from NominalParamDef and ComparableParameterDef, and consists of basically a list of possible values with a defined order. This defined order is simply the order in which the elements are in the list.
Attributes
Methods
-
compare_values
(one, two)¶ Compare values of this ordinal data type. Return is the same semantic as in __cmp__.
Comparison takes place based on the index the given values one and two have in the values list in this object. Meaning if this ordinal parameter definition has a values list of [3,5,1,4]’, then ‘5’ will be considered smaller than ‘1’ and ‘1’ bigger than ‘5’ because the index of ‘1’ in this list is higher than the index of ‘5’.
-
distance
(valueA, valueB)¶ This distance is defined as the absolute difference between the values’ position in the list, normed to the [0, 1] hypercube.
-
-
class
apsis.models.parameter_definition.
ParamDef
¶ Bases:
object
This represents the base class for a parameter definition.
Every member of this class has to implement at least the is_in_parameter_domain method to check whether objects are in the parameter domain.
Methods
-
distance
(valueA, valueB)¶ Returns the distance between valueA and valueB. In this case, it’s 0 iff valueA == valueB, 1 otherwise.
-
is_in_parameter_domain
(value)¶ Should test whether a certain value is in the parameter domain as defined by this class.
Parameters: value : object
Tests whether the object is in the parameter domain.
Returns: is_in_parameter_domain : bool
True iff value is in the parameter domain as defined by this instance.
-
-
class
apsis.models.parameter_definition.
PositionParamDef
(values, positions)¶ Bases:
apsis.models.parameter_definition.OrdinalParamDef
Defines positions for each of its values.
Attributes
Methods
-
distance
(valueA, valueB)¶
-
positions
= None¶
-
warp_in
(value_in)¶ Warps in the value to a [0, 1] hypercube value.
-
warp_out
(value_out)¶ Warps out a value from a [0, 1] hypercube to one of the values.
-
apsis.optimizers package¶
-
class
apsis.optimizers.optimizer.
Optimizer
(optimizer_params)¶ Bases:
object
This defines a basic optimizer interface.
Methods
-
SUPPORTED_PARAM_TYPES
= []¶
-
get_next_candidates
(experiment)¶ Returns several Candidate objects given an experiment.
It is the free choice of the optimizer how many Candidates to provide, but it will provide at least one. Parameters ———- experiment : Experiment
The experiment to form the base of the next candidate.Returns: next_candidate : list of Candidate
The Candidate to next evaluate.
-
-
class
apsis.optimizers.random_search.
RandomSearch
(optimizer_arguments=None)¶ Bases:
apsis.optimizers.optimizer.Optimizer
Implements a random searcher for parameter optimization.
Attributes
Methods
-
SUPPORTED_PARAM_TYPES
= [<class 'apsis.models.parameter_definition.NominalParamDef'>, <class 'apsis.models.parameter_definition.NumericParamDef'>]¶
-
get_next_candidates
(experiment, num_candidates=1)¶
-
random_state
= None¶
-
apsis.tests package¶
apsis.utilities package¶
-
apsis.utilities.file_utils.
ensure_directory_exists
(directory)¶ Creates the given directory if not existed.
Parameters: directory : String
The name of the directory that shall be created if not exists.
-
apsis.utilities.randomization.
check_random_state
(seed)¶ Adapted from sklearn. Turn seed into a np.random.RandomState instance
If seed is None, return the RandomState singleton used by np.random. If seed is an int, return a new RandomState instance seeded with seed. If seed is already a RandomState instance, return it. Otherwise raise ValueError.
Module contents¶
Evaluation¶
Overview¶
This section evaluates apsis on several benchmark and one real world example. All experiments are evaluated using cross validation and 10 initial random samples that are shared between all optimizers in an experiment to ensure comparability.
Branin-Hoo Benchmark¶
Some recent papers on Bayesian Optimization for machine learning publish evaluations on the Branin Hoo optimization function. The Branin Hoo function
using values proposed here is defined as
In contrast to our expectations Bayesian Optimization was not able to outperform random search on Branin Hoo. Still the result is much more stable and the bayesian optimizer samples only close to the optimum.

Comparison of Bayesian Optimization vs. random search optimization on Branin Hoo function. The upper picture shows the best result in every step. Here, random search clearly outperforms Bayesian Optimization. The right plot additionally plots each function evaluation as a dot. Here, it is apparent that Bayesian Optimization works a lot more stable and does not evaluate as many non-promising candidates as random search.

Evaluation on Multidimensional Artificial Noise Function¶

Plot of artificial noise function used as an optimization benchmark in apsis. This is generated using a grid of random values smoothed over by a gaussian of varying variance.
Compared to random search an intelligent optimizer should be better on less noisy function than on very noisy functions in theory. A very noisy function has a tremendous amount of local extrema making it hard to impossible for Bayesian Optimization methods to outperform random search. To investigate this proposition an artificial multidimensional noise function has been implemented in apsis as shown above.
Using this noise function, one can generate multi-dimensional noises with varying smoothness. The construction process first constructs an $n$-dimensional grid of random points, which remains constant under varying smoothness. Evaluating a point is done by averaging the randomly generated points, weighted by a gaussian with zero mean and varying variance. This variance influences the final smoothness. A one-dimensional example of generated functions for differing variances can be seen above.

Plot of the end result after 20 optimization steps on a 3D artificial noise problem depending on the smoothing used. Values to the right are for smoother functions. A lower result is better.
The result can be seen in figure above. As expected, Bayesian Optimization outperforms random search for smoother functions, while achieving a rough parity on rough functions.
Evaluation on Neural Network on MNIST¶
To evaluate the hyperparameter optimization on a real world problem, we used it to optimize a neural network on the MNIST dataset. We used Breze as a neural network library in Python. The network is a simple feed-forward neural network with 784 input neurons, 800 hidden neurons and 10 output neurons. It uses sigmoid units in the hidden layers, and softmax as output. We learn over 100 epochs. These parameters stay fixed throughout the optimization. For assigning the neural network weights, we use a backpropagation algorithm.
Its parameters - step_rate, momentum and decay - are optimized over, as is $c_{wd}$, a weight penalty term, resulting in a four dimensional hyperparameter optimization. We ran all neural network experiments with a five-fold cross validation. Even so, total evaluation time ran close to 24 hours on an Nvidia Quadro K2100M.

Comparison of random search and Bayesian Optimization in the context of a neural network. Each point represents one parameter evaluation of the respective algorithm. The line represents the mean result of the algorithm at the corresponding step including the boundaries of the 75% confidence interval.
The figure above shows the performance of the optimizers for each step. As can be seen, Bayesian Optimization - after the first ten, shared steps, rapidly improves the performance of the neural network by a huge amount. This is significantly more stable than the random search optimizer it is compared with.
However, the optimization above uses no previous knowledge of the problem. In an attempt to investigate the influence of such previous knowledge, we then set the parameter definition for the step_rate to assume it to be close to 0, and the decay to be close to 1. This is assumed to be knowledge easily obtainable from any neural network tutorial.

Comparison of random search and Bayesian Optimization in the context of a neural network. This optimization uses additional knowledge in that step_rate is assumed to be close to 0 and decay to be close to 1.
The effects of this can be seen in above, and are dramatic. First of all, even random search performs significantly better than before, reaching a similar value as the uninformed Bayesian Optimization. Bayesian optimization profits, too, and decreases the mean error by about half.
Project State¶
We have reached the beta state now. Documentation is ready, test covergae is at 90%.
Scientific Project Description¶
If you want to learn more on the project and are interested in the theoretical background on hyperparameter optimization used in apsis you may want to check out the scientific project documentation.
Furthermore, a presentation slide deck is available at slideshare.
License¶
The project is licensed under the MIT license, see the License file on github.