Welcome to the EPIC user’s guide!¶
Easy Parameter Inference in Cosmology (EPIC) is my implementation in Python of a MCMC code for Bayesian inference of parameters of cosmological models and model comparison via the computation of Bayesian evidences.
Details¶
Author: | Rafael J. F. Marcondes |
---|---|
Contact: | rafaelmarcondes@usp.br |
Repository: | https://bitbucket.org/rmarcondes/epic |
License: | BSD License 2.0 with an added clause that if you use it in your work
you must cite this user’s guide published in the arXiv repository
as: Marcondes R. J. F., “EPIC - Easy Parameter inference in
Cosmology: The user’s guide to the MCMC sampler”. arXiv:1712.00263
[astro-pm.IM]. See the LICENSE.txt file in the root directory of
the source code for more details. |
Contents¶
Welcome to the EPIC user’s guide!¶
Easy Parameter Inference in Cosmology (EPIC) is my implementation in Python of a MCMC code for Bayesian inference of parameters of cosmological models and model comparison via the computation of Bayesian evidences.
Why EPIC?¶
I started to develop EPIC as a means of learning how inference can be made with Markov Chain Monte Carlo, rather than trying to decipher other codes or using them as black boxes. The program has fulfilled this purposed and went on to incorporate a few cosmological observables that I have actually employed in some of my publications. Now I release this code in the hope it can be useful for students to learn some of the methods used in Observational Cosmology and even use it for their own work. It still lacks some important features. A Boltzmann solver is not available. It is possible that I will integrate it with CLASS [1] to make it more useful for more advanced research. At this moment it can be used very well for comparisons using background data alone or maybe some perturbative aspects that can be calculated in a simple way (for example the approximation of the growth rate in some models). Stay tuned for more. Meanwhile, enjoy these nice features:
- The code is compatible with both Python 2 and Python 3 (maybe not with Python below 2.7 or 2.6), in any operating system.
- It uses Python’s
multiprocessing
library for evolution of chains in parallel. The separate processes can communicate with each other through somemultiprocessing
utilities, which made possible the implementation of the Parallel Tempering algorithm. This method is capable of detecting and accurately sampling posterior distributions that present two or more separated peaks. - Convergence between independent chains is tested with the multivariate version of the Gelman and Rubin test, a very robust method.
- Also, the plots are beautiful and can be customized to a certain extent directly from the command line, without having to change the code. You can view triangle plots with marginalized distributions of parameters, derived parameters, two-dimensional joint-posterior distributions, autocorrelation plots, cross-correlation plots, sequence plots, convergence diagnosis and more.
Besides, of course it can be altered to your needs A few changes can be made to the code so you can constrain other models and/or use different datasets. In this case consider cloning the repository with git so you can leverage all the version control tools and even contribute to this project.
The main changes will be in the observables
module. This is where the
physical observables are calculated. There is a general function for computing
the Hubble rate as function of the redshift (or the scale factor converted to
redshift), with conditionals for the different models. For some cases a numeric
integration is done with Runge-Kutta. The simplified JLA binned data only
depends on distances, for this case just properly including your model
calculation in the function Eh
, that is, \(h E(z) = H(z)/100\) or H
will be sufficient. BAO calculations also depend mainly on the Hubble rate but
you should pay attention to the necessary density parameters too. CMB shift
parameters is similar. In any case, this is the place where you should
introduce your calculations. You will choose a label for your model, which
should be used in you .ini
file. This label becomes the attribute model
of your Cosmology
class object. Insert the free parameters (and their
priors) accordingly and their \(\LaTeX\) representation at the end of the
file for the plots.
If you want to use other data, you can look at the ones provided to see the
format used. There are a few different ways. Standard Gaussian likelihoods
generally use the function simple
, if the data contain only the
measurements and their uncertainties, or the function matrixform
if there
is a covariance matrix. Of course you can create a new one if these are not
adequate for the specific needs. Then you just need to tell which function this
dataset uses in the dictionary allprobes
at the end of the file.
The load_data
module contains functions to read the data files. You will
have to create one there too.
Detailed explanations are given at the end of the sections about the data and the models.
Try it now!
[1] | Lesgourgues, J. “The Cosmic Linear Anisotropy Solving System (CLASS) I: Overview”. arXiv:1104.2932 [astro-ph.IM]; Blas, D., Lesgourgues, J., Tram, T. “The Cosmic Linear Anisotropy Solving System (CLASS). Part II: Approximation schemes”. JCAP07(2011)034 |
How to install¶
There are two ways to install this program. You can download and install from PyPi or you can clone it from BitBucket. But first, it is recommended that you make these changes inside a virtual environment.
Setting up a virtual environment¶
if you are on Python 3.3 or superior, you can run:
$ python3 -m venv vEPIC
to create a virtual Python environment inside a folder named vEPIC
. Activate it with:
$ source vEPIC/bin/activate
When you finish using the environment and want to leave it you can just use
deactivate
. To activate it again, which you need in a new session, just run the command above, you do not need to create it again.
More details about Python3’s venv here.
With inferior versions of Python, you can install pyenv and pyenv-virtualenv, which let you create a virtual environment and even choose another versionof Python to install. This is done with:
$ pyenv virtualenv 3.6.1 vEPIC # or other version you like.
Then activate it running:
$ pyenv activate vEPIC # use pyenv deactivate vEPIC to deactivate it.
Your virtual environment name will appear in front of your username at the beginning of the line to indicate that the environment is activated.
Installing with pip¶
The easiest way to install this program is to get it from PyPi. It is also the most recommended since it can be easily updated when new versions come out. Inside your virtual environment, run:
$ pip install epic-code
Then, if you are using Linux, create a bind mount from your python path to an
external directory for your convenience.
For example, in your home, create the directories cosmology_codes/EPIC
and
cosmology_codes/EPIC-simulations
. Run:
$ sudo mount --bind $PYTHONPATH/lib/python3.5/site-packages/EPIC/ cosmology_codes/EPIC
This will make your newly created folder reflect the contents of the installation directory and you will be able to run the scripts and access the associated files from there, rather than only importing the modules from inside the python interactive interpreter. If you do not have the privileges to mount the directory then follow the instructions below for downloading and extracting manually to your home directory.
Note
If you are on Mac OS X or macOS, you can achieve the same effect but will need to install osxfuse and bindfs to create the link:
$ brew install osxfuse # or brew cask install osxfuse
$ brew install bindfs
$ bindfs $PYTHONPATH/lib/python3.5/site-packages/EPIC/ cosmology_codes/EPIC
Next, cd
into cosmology_codes/EPIC
, run:
$ python define_altdir.py
and pass the full path to cosmology_codes/EPIC-simulations
to define this
folder as the location for saving simulations results.
You are now good to go.
To check and install updates if available, just run:
$ pip install --upgrade epic-code
On a Windows computer, if you have Windows 10 Pro, I recommend
enabling the Windows Subsystem for Linux and installing Bash on
Ubuntu on Windows 10, then
proceeding with the instructions above for installation on Linux.
If not possible, then use the zip
file as indicated below.
Downloading and extracting the tarball¶
If you rather not install it to your system you can just extract the
tar.gz
files from the Python Package Index at
https://pypi.python.org/pypi/epic-code. Extract the file with:
$ tar -xzf epic-code-[version].tar.gz
cd
into the root folder and install to your (virtual)
environment:
$ python setup.py install
This is necessary to guarantee that all modules will be loaded
correctly. It also applies to the source code extracted from the zip
file or
other compression format. You can then run the program from the EPIC
folder.
To update the program you will have to download the new tarball/zip file
and execute this process again.
Cloning the repository¶
If you plan to contribute to this program you can clone the git repository at https://bitbucket.org/rmarcondes/epic.
Introduction to MCMC¶
Users familiar with the Markov Chain Monte Carlo (MCMC) method may want to skip to the next section. The typical problem the user will want to tackle with this program is the problem of parameter estimation of a given theoretical model confronted with one or more sets of observational data. This is a very common task in Cosmology these days, specially in the light of numerous data from several surveys, with increasing quality. Important discoveries are expected to be made with the data from new generation telescopes in the next decade.
In the following I give a very brief introduction to the MCMC technique and describe how to use program.
The Bayes Theorem¶
Bayesian inference is based on the inversion of the data-parameters probability relation, which is the Bayes theorem [1]. This theorem states that the posterior probability \(p(\theta \mid D, \mathcal{M})\) of the parameter set \(\theta\) given the data \(D\) and other information from the model \(\mathcal{M}\) can be given by
where \(\mathcal{L}(D \mid \theta, \mathcal{M})\) is the likelihood of the data given the model parameters, \(\pi(\theta \mid \mathcal{M})\) is the prior probability, containing any information known a priori about the distribution of the parameters, and \(p(D, \mathcal{M})\) is the marginal likelihood, also popularly known as the evidence, giving the normalization of the posterior probability. The evidence is not required for the parameter inference but is essential in problems of selection model, when comparing two or more different models to see which of them is favored by the data.
Direct evaluation of \(p(\theta \mid D, \mathcal{M})\) is generally a difficult integration in a multiparameter space that we do not know how to perform. Usually we do know how to compute the likelihood \(\mathcal{L}(D \mid \theta, \mathcal{M})\) that is assigned to the experiment (most commonly a distribution that is Gaussian on the data or the parameters), thus the use of the Bayes theorem to give the posterior probability. Flat priors are commonly assumed, which makes the computation of the right-hand side of the equation above trivial. Remember that the evidence is a normalization constant not necessary for us to learn about the most likely values of the parameters.
The Metropolis-Hastings sampler¶
The MCMC method shifts the problem of calculating the unknown posterior probability distribution in the entire space, which can be extremly expensive for models with large number of parameters, to the problem of sampling from the posterior distribution. This is possible, for example, by growing a Markov chain with new states generated by the Metropolis sampler [2].
The Markov chain has the property that every new state depends on its current state, and only on this current state. Dependence on more previous states or on some statistics involving all states is not allowed. That can be done and can even also be useful for purposes like ours, but then the chain can not be called Markovian.
The standard MCMC consists of generating a random state \(y\) according to a proposal probability \(Q({} \cdot \mid x_t)\) given the current state \(x_t\) at time \(t\). Then a random number \(u\) is drawn from a uniform distribution between 0 and 1. The new state is accepted if \(r \ge u\), where
The fraction is the Metropolis-Hastings ratio. When the proposal function is symmetrical, \(\frac{Q(x_t \mid y)}{Q(y \mid x_t)}\) reduces to 1 and the ratio is just the original Metropolis ratio of the posteriors. If the new state is accepted, we set \(x_{t+1} := y\), otherwise we repeat the state in the chain by setting \(x_{t+1} := x_t\).
The acceptance rate \(\alpha = \frac{\text{number of accepted states}}{\text{total number of states}}\) of a chain should be around 0.234 for optimal efficiency [3]. This can be obtained by tuning the parameters of the function \(Q\). In this implementation, I use a multivariate Gaussian distribution with a diagonal covariance matrix \(S\).
The Parallel Tempering algorithm¶
Standard MCMC is powerful and works in most cases but there are some problems where the user may be better off using some other method. Due to the characteristic behavior of a Markov chain, it is possible (and even likely) that a chain become stuck in a single mode of a multimodal distribution. If two or more peaks are far away from each other, the proposal function tuned for good performance in a peak may have difficulty escaping that peak to explore the other, because the jump may be too short. To overcome this inefficiency, a neat variation of MCMC, called Parallel Tempering [4], favors a better exploration of the entire parameter space in such cases thanks to an arrangement of multiple chains that are run in parallel, each one with a different ‘’temperature’’ \(T\). The posterior is calculated as \(\mathcal{L}^{\beta} \pi\), with \(\beta = 1/T\). The first chain is the one that corresponds to the real life posterior we are interested in; the other chains, at higher temperatures, will have wider distributions, which makes it easier to jump between peaks, thus exploring more properly the parameter space. Periodically, a swap of states between neighboring chains is proposed and accepted or rejected according to a Hastings-like ratio.
[1] | Hobson M. P., Jaffe A. H., Liddle A. R., Mukherjee P. & Parkinson D., “Bayesian methods in cosmology”. (Cambridge University Press, 2010). |
[2] | Gayer C., “Introduction to Markov Chain Monte Carlo”. in “Handbook of Markov Chain Monte Carlo” http://www.mcmchandbook.net/ |
[3] | Roberts G. O. & Rosenthal J. S., “Optimal scaling for various Metropolis-Hastings algorithms”. Statistical Science 16 (2001) 351-367. |
[4] | Gregory P. C., “Bayesian logical data analysis for the physical sciences: a comparative approach with Mathematica support”. (Cambridge University Press, 2005). |
Using EPIC¶
You can use standard MCMC and Parallel-Tempering MCMC with this program. Besides, there is one experimental adaptive feature to each method that adjusts the covariance of the proposal multivariate Gaussian probability density for optimal efficiency, aiming at an acceptance rate around 0.234. In the PT-MCMC, the adaptation also adjusts the temperature levels of the chains and reduces the number of necessary chains if possible.
In the following sections I guide the user through the two approaches, with typical examples for demonstration.
Before starting¶
The problem is set up through a .ini
file containing the parameters,
priors, model type, data files to be used and tex strings for plotting.
The program creates a folder in the working directory with the same name of
this .ini
file, if not already existing.
Another folder is created with the date and time for the output of each run of
the code, but you can always continue a previous run from where it stopped,
just giving the folder name instead of the .ini
file.
The folder epic
is the root of the project.
The python codes are in the EPIC
source folder, where the .ini
files should
also be placed.
The default working directory is the EPIC
’s parent directory, i.e., the
epic
repository folder.
Changing the default working directory¶
By default, the folders with the name of the .ini
files are created at the
repository root level.
But the chains can get very long and you might want to have them stored in a
different drive.
In order to set a new default location for all the new files, run:
python define_altdir.py
This will ask for the path of the folder where you want to save all the output
of the program and keep this information in a file altdir.txt
.
If you want to revert this change you can delete the altdir.txt
file or run
again the command above and leave the answer empty when prompted.
The parameters names¶
Each cosmological parameter already implemented has a name to be used in the
code that must be respected in the .ini
file.
The table below lists some of the parameters, their definitions and their names in the
program.
Parameter | Description | Name |
---|---|---|
\(h\) | Reduced Hubble constant | h |
\(\Omega_{b0} h^2\) | Physical baryonic density parameter | Obh2 |
\(\Omega_{c0} h^2\) | Physical cold dark matter density parameter | Och2 |
\(\Omega_{r0} h^2\) | Physical radiation density parameter | Orh2 |
\(\Omega_{b0}\) | Baryonic density parameter | Ob0 |
\(\Omega_{c0}\) | Cold dark matter density parameter | Oc0 |
\(\Omega_{m0}\) | Matter (baryons plus DM) density parameter | Om0 |
\(\Omega_{d0}\) | Dark energy density parameter | Oc0 |
\(w_d\) | Dark energy equation-of-state parameter | w or wd |
The structure of the .ini
file¶
Let us work with an example, with a simple flat \(\Lambda\text{CDM}\) model.
Suppose we want to constrain its parameters with \(H(z)\), supernovae data,
CMB shift parameters and BAO data.
The model parameters are the reduced Hubble constant \(h\), the present-day
values of the physical density parameters of dark matter \(\Omega_{c0} h^2\),
baryons \(\Omega_{b0} h^2\) and radiation \(\Omega_{r0} h^2\).
We will not consider perturbations, we are only constraining the parameters at
the background level.
Since we are using supernovae data we must include a nuisance parameter
\(M\), which represents a shift in the absolute magnitudes of the
supernovae.
It is also possible to use the full JLA catalogue with this program. In this
case, we would have to include also the nuisance parameters \(\alpha\),
\(\beta\) and \(\Delta M\) from the light-curve fit.
The first section of the LCDM.ini
file defines the priors and at the same
time the names of the parameters.
They must be inserted in the following form:
## PRIORS
h 0.6 0.8 4e-4
Och2 0.08 0.20 2e-4
Obh2 0.01 0.08 5e-5
Orh2 4e-5 8e-5 2e-7
M -0.3 0.3 2e-3 nuisance
## end of priors
Delimiting lines are mandatory: the line defining the beginning of the priors
section must contain the word PRIORS
.
The priors are read until the word end
is found.
The first column are the names of the parameters in the code.
The second and third columns set the flat priors intervals.
The fourth column is optional.
Gaussian priors are also supported. In this case, one should add gaussian
at the end of the line.
The second and third values will then be the Gaussian parameters \(\mu\) and
\(\sigma\) instead.
It gives the standard deviation for each parameter in the covariance of the
proposal function.
The values will vary according to the problem but a first guess can be 1/10 or
other small fraction of the prior interval.
Notice that the nuisance parameter is signaled as such by the word nuisance
at the end of the line.
To remove a parameter, you can just comment out its line with the character
#
.
If the parameter removed is necessary for the model predictions then a default
value will be assumed.
After the priors come the type of the model, needed to select the proper
functions in the calculation of model predictions, and a display name, both
with identifiers MODEL
and NAME
.
They are signaled in the .ini
file in the same way of the priors, but since
they are single-line the final delimiter is not needed:
## MODEL
lcdm
## NAME
$\Lambda$CDM
The class model could also be wcdm
for a \(w\text{CDM}\) model, of
which \(\Lambda\text{CDM}\) is a special case.
Omitting the dark energy equation of state parameter would automatically set it
to -1.
Notice that the use of \(\LaTeX\) input is allowed in the model name.
Next is the list of observables to be used, with the filename or directory with
the data.
The list is read until an occurrence of end
is found:
## DATA files
JLAsimple jla_likelihood_v4
Planck2015_distances Planck2015_LCDM
BAO BAO-6dF+SDSS+BOSS+Lyalpha+WiggleZ.txt
CosmicChrono thirtypointsHz.txt
## end of data
Finally, tex strings must be given to be used in the plots:
## TEX REPRESENTATION
h h
Och2 \Omega_{c0}h^2
Obh2 \Omega_{b0}h^2
Orh2 \Omega_{r0}h^2
Oc0 \Omega_{c0}
Ob0 \Omega_{b0}
Or0 \Omega_{r0}
Od0 \Omega_{d0}
Om0 \Omega_{m0}
M \Delta{M}
# end of tex
Here, strings for the derived parameters are also needed.
The tex strings should not contain any spaces.
If you need to separate a tex command from a following variable, use {}
instead, as is done for the last parameter in this example.
Also note that the first column needs to be consistent with the names given in
the priors and also in the parameters_names.txt
dictionary.
To summarize, we end up with the LCDM.ini
file in the EPIC
folder
looking like this:
## PRIORS
h 0.6 0.8 4e-4
Och2 0.08 0.20 2e-4
Obh2 0.01 0.08 5e-5
Orh2 4e-5 8e-5 2e-7
M -0.3 0.3 2e-3 nuisance
## end of priors
## MODEL
lcdm
## NAME
$\Lambda$CDM
## DATA files
JLAsimple jla_likelihood_v4
Planck2015_distances Planck2015_LCDM
BAO BAO-6dF+SDSS+BOSS+Lyalpha+WiggleZ.txt
CosmicChrono thirtypointsHz.txt
## end of data
## TEX REPRESENTATION
h h
Och2 \Omega_{c0}h^2
Obh2 \Omega_{b0}h^2
Orh2 \Omega_{r0}h^2
Oc0 \Omega_{c0}
Ob0 \Omega_{b0}
Or0 \Omega_{r0}
Od0 \Omega_{d0}
Om0 \Omega_{m0}
M \Delta{M}
# end of tex
The datasets¶
We list and detail below the datasets already implemented and give instructions on how to include other data.
Type Ia supernovae¶
Two types of analyses can be made with the JLA catalogue.
Here we are using the binned data consisting of distance modulus estimates at
31 points (defining 30 bins of redshift).
This dataset is select via the JLAsimple
entry in the list of data sources.
If you want to use the full dataset (which makes the analysis much slower since
it involves three more nuisance parameters and requires the program to invert a
740 by 740 matrix at every iteration for the calculation of the JLA
likelihood), insert SNeJLA
instead. Also, you need to download the full
data from http://supernovae.in2p3.fr/sdss_snls_jla/ReadMe.html or from my
copy on Google Drive.
Since the data files are too big, EPIC includes only the
jla_likelihood_v4/data
folder. In this case you will need the
jla_likelihood_v4/covmat
. Make sure to properly include it in you
installation directory.
Either way, the argument is the same data folder jla_likelihood_v4
.
Note that the binned dataset introduces one nuisance parameter M
,
representing an overall shift in the absolute magnitudes, and the full dataset
introduce other four nuisance parameters related to the light-curve parametrization.
See Betoule et al. (2014) [1] for more details.
Note
This version of EPIC supports version V4 (June 2014) of the dataset release. Current version is V6 (March 2015). An update to implement use of V6 is being considered for the near future.
CMB distance priors¶
Constraining models with temperature or polarization anisotropy amplitudes is
not currently implemented.
However, you can include the CMB distance priors from Planck2015 with
Planck2015_distances
.
These data consist of an acoustic scale \(l_A\), a shift parameter
\(R\) and the physical density of baryons \(\Omega_{b0}h^2\).
You can choose between the data for \(\Lambda\text{CDM}\),
\(w\text{CDM}\) and \(\Lambda\text{CDM} + \Omega_k\) models [2], specifying
the folders Planck2015_LCDM
, Planck2015_wCDM
or
Planck2015_LCDM+Omega_k
, respectively.
BAO data¶
Measurements of the baryon acoustic scales from the Six Degree Field Galaxy
Survey (6dF), the Main Galaxy Sample of Data Release 7 of Sloan Digital Sky
Survey (SDSS-MGS), the LOWZ and CMASS galaxy samples of the Baryon Oscillation
Spectroscopic Survey (BOSS-LOWZ and BOSS-CMASS), the WiggleZ Dark Energy Survey
and the distribution of the Lyman \(\alpha\) forest in BOSS (BOSS-Ly) are
compiled file BAO-6dF+SDSS+BOSS+Lyalpha+WiggleZ.txt
.
The subsamples BAO-6dF+SDSS+BOSS+Lyalpha.txt
and BAO-6dF.txt
are also
available, the first excluding the WiggleZ data and the second only with the
6dF data.
Since these files simply contain the redshift of the measurement, the value of
the characteristic ratio \(r_{\text{BAO}}(z) \equiv r_s(z_d)/d_V(z)\) between
the sound horizon \(r_s\) at decoupling time (\(z_d\)) and the
effective BAO distance \(d_V\) and the measurement error, it is really
simple to remove or add new measurements of this observable.
\(H(z)\) data¶
These are the cosmic chronometer data.
30 measurements of the Hubble expansion rate \(H(z)\) at redshifts between
0 and 2, plus a \(2.4\%\) precision local measure of \(H_0\).
The values of redshift, \(H\) and the uncertainties are given in the file thirtypointsHz.txt
.
\(f\sigma_8\) data¶
Large-scale structure data from redshift-space distortion and peculiar velocity measurements giving the growth rate times the RMS amplitude of matter perturbations \(f \sigma_8(z)\) can also be used. Notice that in this case you need to provide an analytic evaluation of the growth rate (for example, as \(f = \Omega_m^{\gamma}\)) in your model since this program does not evolve perturbations numerically at the present moment. See for example the implementation in Marcondes et al. (2016) [3].
Including other data¶
You can include other data if you want.
In the simplest case, you will have measurement values and error bars
for a certain quantity, say \(f(z)\), at given points.
Put them on a text file in three columns separated by TAB or spaces, as in, for example, the SDSS+BOSS.txt
file:
0.15 4.47 0.17
0.32 8.47 0.17
0.57 13.77 0.13
Choose a label for this dataset and put it in the .ini
file in the data files section:
my_data my_data_file.txt
Next, you need to tell the code how to read this new type of dataset. In this simple case you add a function like:
def my_data(obsble):
return simplest_data(obsble, 'my_data', r'$f(z)$')
to the load_data
module.
The name of this function must be the
same as the label that goes in the .ini
file. This function, or its
more complicated equivalent, is the only change needed to be made in the
load_data.py
file.
In the simplest_data
function call, the obsble
variable carries
the dataset file information (the filename or directory), the second
argument is a string to be displayed indicating that your data is loaded.
The third argument is a raw string, possibly including
\(\LaTeX\) notation, to be displayed in the triangle plots
compounding the label that describes the datasets used in this analysis
when you plot together results from more than one analysis.
You then need to define the likelihood calculation
in the likelihood
module. At the end of the file, add to the
dictionary allprobes
an entry with the key 'my_data'
(again the
same label) and the likelihood function, say my_data_likelihood
, as
its value. For a Gaussian likelihood, this function can be:
def my_data_likelihood(datapoints, cosmology):
return simple(datapoints, observables.my_f, cosmology)
Now, as you may have already figured out, the next step is to add the
theoretical calculation my_f
of your observable \(f\) to the
observables.py
file. Of course, this will vary with each case. Do
not forget to use the model
attribute of the Cosmology
class
object named cosmology
to set different calculations according to
each cosmological model you might want to use.
[1] | Betoule M. et al. “Improved cosmological constraints from a joint analysis of the SDSS-II and SNLS supernova samples”. Astronomy & Astrophysics 568, A22 (2014). |
[2] | Huang Q.-G., Wang K. & Wang S. “Distance priors from Planck 2015 data”. Journal of Cosmology and Astroparticle Physics 1512, 022 (2015). |
[3] | Marcondes R. J. F., Landim R. C. G., Costa A. A., Wang B. & Abdalla E. “Analytic study of the effect of dark energy-dark matter interaction on the growth of structures”. Journal of Cosmology and Astroparticle Physics 1612, 009 (2016). |
The cosmological models¶
A few models are already implemented. I give a brief description below,
with references for works that discuss some of them in detail and works that
analyzed them with this code.
There are some models for which we have analytical solutions for the evolutions
of the energy densities and can write the Hubble rate in a closed form: this is
done via the function \(h\,E(z)\) (Eh
in the module observables
)
from which \(H(z) = 100\,h\, E(z)\) is obtained in the function H
from
the same module.
Generally, this is implemented using the physical densities
\(\Omega_{i0} h^2\) and \(h\) as parameters.
For other models, a numerical integral is performed with fourth-order Runge-Kutta.
Since this integration is done backwards in time from \(a = 1\), it is
easier to deal with the density parameters \(\Omega_{i0}\) and \(H_0\).
The result is a \(H(z)\) function that can be interpolated for any redshift
within the interval of integration. This is registered in a Cosmology
class
object as the attribute H_solution
, from which \(h\,E(z)\) as well can be
obtained when convenient.
The code is written so that the integration is done only once given a set of
parameters and model.
In all cases the dark energy density parameter is obtained, as a derived
parameter, from the flatness condition that all the density parameters must add up to 1.
At the end of this section, I instruct the user on how to include a new model
for use with this program.
The \(\Lambda\text{CDM}\) model¶
The standard models accepts the parameters \(h\), \(\Omega_{c0} h^2\), \(\Omega_{b0} h^2\), \(\Omega_{r0} h^2\), and has \(\Omega_{c0}\), \(\Omega_{b0}\), \(\Omega_{r0}\), \(\Omega_{d0}\), as derived parameters. Extending this model to allow curvature is not completely supported yet. The Friedmann equation is
or
with \(\Omega_d = 1 - \Omega_{b0} - \Omega_{c0} - \Omega_{r0}\) and \(H_0 = 100 \, h \,\, \text{km s$^{-1}$ Mpc$^{-1}$}\).
This model is identified in the code by the label lcdm
.
The \(w\text{CDM}\) model¶
Identified by wcdm
, this is like the standard model except that the dark
energy equation of state can be any constant \(w_d\), thus having the
\(\Lambda\text{CDM}\) model as a specific case with \(w_d = -1\).
The Friedmann equation is like the above but with the dark energy contribution
multiplied by \((1+z)^{3(1+w_d)}\).
Interacting Dark Energy models¶
A comprehensive review of models that consider a possible interaction between dark energy and dark matter is given by Wang et al. [2]. In interacting models, the individual conservation equations of the two dark fluids are violated, although still preserving the total energy conservation:
The shape of \(Q\) is what characterizes each model. Common forms are proportional to \(\rho_c\), to \(\rho_d\) or to some combination of both.
Proportional to \(\rho_d\)¶
This model uses \(Q = 3 H \xi \rho_d\). We were interested in studying the growth rate of matter perturbations in such a model in an analytic way in a previous work [3]. Some assumptions were made and the growth rate was approximated as \(f \approx \Omega_m^{\gamma}\), with \(\gamma\) determined in terms of the coupling constant \(\xi\), a free parameter of the model. Other parameters involved are \(\sigma_{8,0}\) and \(\Omega_{d0}\), and \(w_0\) and \(w_1\) from
A recurrence relation is used to approximate the evolution of densities.
This implementation, dubbed model2
in the code, is meant to be used with the
myfsigma8
or growth_rate_sigma_eight
functions only, to have its
predictions compared with the \(f\sigma_8(z)\) data.
Interacting dark energy in the dark \(SU(2)_R\) model¶
As per the recent paper by Landim et al [4], This model
actually proposes an interaction possibly between more than two components of a
dark sector from the decay of the heaviest particle (\(\varphi^{+}\)) of
the dark energy doublet (\(\varphi^0, \varphi^{+}\)). The other particles
in the sector are a dark matter candidate \(\psi\), a dark matter neutrino
\(\nu_d\).
The right-hand side of the conservation equations of these particles are
obtained in terms of a characteristic decay time scale \(\Gamma\) and mass
ratios with respect to the heaviest particle.
The model is implemented in two flavours: the more general darkSU2
and the
specific case with the neutrino mass set to zero, darkSU2wnu0
.
The solution to the full set of background density evolutions is obtained
numerically.
Fast-varying dark energy equation-of-state models¶
Models of dark energy with fast-varying equation-of-state parameter have been studied in some works [1]. Three such models were implemented as described in Marcondes and Pan (2017) [5]. We used this code in that work. They have all the density parameters present in the \(\Lambda\text{CDM}\) model besides the dark energy parameters that we describe in the following.
Model 1¶
This model fastvarying1
has the free parameters
\(w_p\),
\(w_f\),
\(a_t\) and
\(\tau\) characterizing the equation of state
\(w_p\) and \(w_f\) are the asymptotic values of \(w_d\) in the past (\(a \to 0\)) and in the future (\(a \to \infty\)), respectively; \(a_t\) is the scale factor at the transition epoch and \(\tau\) is the transition width. The Friedmann equation is
where
Model 2¶
This model fastvarying2
alters the previous model to allow the dark energy
to feature an extremum value of the equation of state:
where \(w_0\) is the current value of the equation of state and the other parameters have the interpretation as in the previous model. The Friedmann equation is
with
Model 3¶
Finally, we have a third model fastvarying3
with the same parameters as in
Model 2 but with equation of state
It has a Friedmann equation identical to Model 2’s except that \(f_2(a)\) is replaced by
Including a new model¶
To define a new model in the program, you will need to implement your model
calculation of the observables you want to use. Choose a label for the model
and use conditionals on the cosmology.model
attribute.
In your .ini
file, use the same label to describe this new model in the
corresponding section and do not forget to give \(\LaTeX\) representations
for your parameters.
If there are derived parameters that you want to get, include their recipes in
derived.py
and derived_bf.py
. The first one handles the Markov chains,
while the second calculates the best-fit estimate point.
No other changes are required throughout the code.
[1] | Corasaniti P. S. & Copeland E. J., “Constraining the quintessence equation of state with SnIa data and CMB peaks”. Physical Review D 65 (2002) 043004; Basset B. A., Kunz M., Silk J., “A late-time transition in the cosmic dark energy?”. Monthly Notices of the Royal Astronomical Society 336 (2002) 1217-1222; De Felice A., Nesseris S., Tsujikawa S., “Observational constraints on dark energy with a fast varying equation of state”. Journal of Cosmology and Astroparticle Physics 1205, 029 (2012). |
[2] | Wang B., Abdalla E., Atrio-Barandela F., Pavón D., “Dark matter and dark energy interactions: theoretial challenges, cosmological implications and observational signatures”. Reports on Progress in Physics 79 (2016) 096901. |
[3] | Marcondes R. J. F., Landim R. C. G., Costa A. A., Wang B. and Abdalla E., “Analytic study of the effect of dark energy-dark matter interaction on the growth of structures”. Journal of Cosmology and Astroparticle Physics 1612, 009 (2016). |
[4] | Landim R. C. G., Marcondes R. J. F., Bernardi F. F. and Abdalla E., “Interacting dark energy in the dark SU(2)R model”. arXiv:1711.07282 [astro-ph.CO] |
[5] | Marcondes R. J. F. & Pan S., “Cosmic chronometer constraints on some fast-varying dark energy equations of state”. arXiv:1711.06157 [astro-ph.CO]. |
Running MCMC¶
This is the vanilla Monte Carlo Markov Chain with the Metropolis algorithm, as introduced in the previous section The Metropolis-Hastings sampler.
We will now proceed to run MCMC for the \(\Lambda\text{CDM}\) model.
All the following commands must be issued from your terminal at the EPIC
directory.
Initializing the chains¶
Standard MCMC is the default option for sampling. The chains are initialized with:
python initialize_chains.py LCDM.ini
We already know the .ini
file, a configuration file containing all the
information for that specific simulation.
It defines the parameters names and priors, their TeX strings, the type of
cosmological model, which will determine how to calculate distances and the
observables and sets a label for the model.
The code will ask the number of chains to be used. It must be at least 2
.
You can also run the command above with chains=8
as an argument to suppress
this prompt.
Each chain will create an independent process with Python’s
multiprocessing
, so you should not choose a
number higher than the number of CPUs multiprocessing.cpu_count()
of your
machine.
This command will create a folder with the files that will store the states of
the chains, named reducedchain1.txt
, reducedchain2.txt
, etc, besides
other relevant files.
The sampling mode (MCMC
or PT
) is prepended to the name of the folder,
which is always the date-time of creation (in UTC), unless the option
cfullname=MyFirstRun
(c standing for custom) is used, then
MCMC-MyFirstRun
will be the folder name.
A custom label can also be prepended with cname=my_label
creating, for
example, the folder my_label-MCMC-171110-173000
. It will be stored within
another folder with the same name of the .ini
file, thus in this case
LCDM/my_label-MCMC-170704-201125
.
The folder name will be displayed in the first line of the output.
Alternatively, this first command can be skipped by running the main file
epic.py
directly, as I explain next.
The main file can distinguish if the first argument is a .ini
file or a
directory.
In the first case, it will call initialize_chains.py
with the same
arguments.
In the second case, if the path given corresponds to a MCMC
or PT
(signaled by the mode.txt
file) directory just created with
initialize_chains.py
or any previously initiated simulation, it will
continue to evolve its chains from where they stopped.
Sampling the posterior distribution¶
The magic starts with:
python epic.py <FULL-PATH-TO>/LCDM/MCMC-MyFirstRun/
or simply:
python epic.py LCDM.ini
if you have not initialized the chains yet with the previous command or want to start a new run. Two parameters of the code will be prompted:
Please specify the number of steps in each passage in the MCMC loop: 10000
Please specify the tolerance for convergence: 1e-2
but these can also be informed directly when executing epic.py
in the form
of arguments steps=3000
and tol=1e-2
.
This number of steps means that the chains will be written to the disk (the new
states are appended to the chains files) after each steps
states in all
chains.
A large number prevents frequent writing operations, which could otherwise
affect overall performance unnecessarily.
The relevant information will be displaying, in our example case looking
similar to the following:
Loading INI file...
Reading simplified JLA dataset...
jla_likelihood_v4
Reading distance priors from Planck 2015...
Planck2015_LCDM
Reading BAO (6dF+SDSS+BOSS+Lyalpha+WiggleZ) data...
BAO-6dF+SDSS+BOSS+Lyalpha+WiggleZ.txt
Reading cosmic chorometer H(z) data...
thirtypointsHz.txt
File <FULL-PATH-TO>/LCDM/MCMC-MyFirstRun/
and the MCMC will start.
In the MCMC mode, the code will periodically check for convergence according
to the Gelman-Rubin method (by default it is done every two hours but can be
specified differently as 12h
or 45min
in the arguments, for example.
Ideally this does not need to be a small time interval, but the option of
specifying this time in minutes or even in seconds (30sec
) is implemented
and available for testing purposes.
The MCMC run stops if convergence is achieved with a tolerance smaller than
tol
.
The following is the output of our example after the MCMC has started.
The 10000 steps take a bit less than twelve minutes in my workstation running 8
chains in parallel. The number of chains will not make much impact on this
unless we use too many steps by iteration and are close to the limit of memory.
After approximately two hours, convergence is checked. Since it is smaller than
our required tol
, the code continues with new iterations for more two hours
before checking convergence again and so on. When convergence smaller than
tol
is achieved the code makes the relevant plots and quits.
Initiating MCMC...
i 1, 10000 steps, 8 ch; 11m48s, Mon Oct 16 19:09:09 2017. Next: ~1h48m.
i 2, 20000 steps, 8 ch; 11m50s, Mon Oct 16 19:21:00 2017. Next: ~1h36m.
i 3, 30000 steps, 8 ch; 11m51s, Mon Oct 16 19:32:51 2017. Next: ~1h24m.
i 4, 40000 steps, 8 ch; 11m50s, Mon Oct 16 19:44:42 2017. Next: ~1h12m.
i 5, 50000 steps, 8 ch; 11m49s, Mon Oct 16 19:56:31 2017. Next: ~1h0m.
i 6, 60000 steps, 8 ch; 11m51s, Mon Oct 16 20:08:23 2017. Next: ~48m58s.
i 7, 70000 steps, 8 ch; 11m50s, Mon Oct 16 20:20:14 2017. Next: ~37m7s.
i 8, 80000 steps, 8 ch; 11m50s, Mon Oct 16 20:32:04 2017. Next: ~25m16s.
i 9, 90000 steps, 8 ch; 11m52s, Mon Oct 16 20:43:57 2017. Next: ~13m24s.
i 10, 100000 steps, 8 ch; 11m53s, Mon Oct 16 20:55:50 2017. Checking now...
Loading chains... [##########] 8/8
n = 100000
Saving information for histograms... [##########] 100%
Monitoring convergence... [##########] 100%
Total time for data analysis: 14.5 seconds.
After 1h58m and 100000 steps, with 8 chains,
Convergence 3.73e-01 achieved. Current time: Mon Oct 16 20:56:04 2017.
i 11, 110000 steps, 8 ch; 11m49s, Mon Oct 16 21:07:54 2017. Next: ~1h48m.
...
At any time after the first iteration, the user can inspect the acceptance ratio
of the chains. The information after updated at every iteration and can be found
in the file LCDM/MCMC-MyFirstRun/llc.txt
.
Adaptation (beta)¶
In development. Come back later to check!
Analyzing the chains¶
The code reads the chains and compile the distributions for a nice plot with the command:
python analyze.py <FULL-OR-RELATIVE-PATH-TO>/LCDM/MCMC-MyFirstRun/
Histograms are generated for the marginalized distributions using 20 bins or
any other number given with combobins=40
, for example, if you think you
have sufficient data.
At any time one can run this command with calculate_R
to check the state of
convergence. By default, it will calculate \(\hat R^p - 1\) for twenty
different sizes considering the size of the chains to provide an idea of the
evolution of the convergence.
Convergence is assessed based on the Gelman-Rubin criteria.
I will not enter into details of the method here, but I refer the reader to the
original papers [1] [2] for more information.
The variation of the original method for multivariate distribution is implemented.
When using MCMC
, all the chains are checked for convergence and the final
resulting distribution which is analyzed and plotted is the concatenation of
all the chains, since they are essentially all the same once they have
converged.
Additional options¶
If for some reason you want to view the results for an intermediate point of
the simulation, you can tell the script to stop_at=18000
, everything will
be analyzed until that point.
Depending on the model being analyzed, one may be interested in seeing the
derived parameters. Once these are defined appropriately in the files
derived.py
and derived_bf.py
, include them in the analysis with the
option use_derived
.
If you want to check the random walk of the chains you can plot the sequences
with plot_sequences
. This will make a grid plot containing all the chains
and all parameters. Keep in mind that this can take some time and generate a
big output file if the chains are very long. You can contour this problem by
thinning the distributions by some factor thin=10
.
This also applies for the calculation of the correlation of the parameters in
each chain, enabled with the ACF
option.
We generally represent distributions by their histograms but sometimes we may
prefer to exhibit smooth curves. Although it is possible to choose a higher
number of histogram bins, this may not be sufficient and may required much more
data.
Much better (although possibly slow when there are too many states) are the
kernel density estimates (KDE) tuned for Gaussian-like distributions [3].
To obtain smoothed shapes use smooth_hist=kde
. This will compute KDE curves
for the marginalized parameter distributions and also the two-parameter joint
posterior probabilities. Try it for the 1D distributions only if it is taking
too long, using kde1
instead. The option kde2
for the 2D distributions
only is also available for completeness.
Making the triangle plots¶
The analyze.py
routine will also produce the plots automatically (unless
supressed with dontdraw
), but you can always redraw everything when you
want, maybe you would like to tweak some colors? Loading the chains again is
not necessarily since the analysis already saves the information for the plots
anyway.
So, let’s say that we already saved this information for the chains after 20000
steps. Now we must pass the path to the folder containing the results for
those specific size of chains and number of bins:
python drawhistograms.py <FULL-OR-RELATIVE-PATH-TO>/LCDM/MCMC-MyFirstRun/n20000/results20/
You can choose to view the best-fit point with bf=+
or any other marker. To
change the color, use singlecolor=m
(for magenta) or any other Python color
name. The default is C0
(a shade of blue, from the default Matplotlib
palette).
The \(1\sigma\) and \(2\sigma\) confidence levels are shown and are
written to the file hist_table.tex
inside the results20
folder. This is
a \(\LaTeX\)-ready file that can be compiled to make a nice table in a PDF
file or included in your paper as you want.
To view and save information for more levels, use the argument
levels=1,2,3
, for example.
If you have used the option smooth_hist
with either kde
, kde1
or
kde2
option you need to specify it again in order to make the corresponding
plots, otherwise the histograms will be drawn.
Additional options¶
You can further tweak your plots and tables. usetex
will make the program
render the plot using \(\LaTeX\), fitting nicely to the rest of your paper.
You can even choose the Times New Roman font if you prefer it over the default
Computer Modern, using font=Times
.
If you have nuisance parameters, you can opt to not show them with nonuisance
.
fmt=3
can be used to set the
number of figures to report the results (default is 5
).
You can plot Gaussian fits together with the histograms (the Gaussian curve and
the two first sigma levels), using show_gaussian_fit
, but probably only for
the sake of comparison since this is not usually done in publications.
All these options can be given to the analyze.py
command, they will be
passed to drawhistograms.py
accordingly.
Below we see the triangle plot of the histograms, with the default settings,
in comparison with a perfected version using the smoothed distributions, the
Python color C9
, the \(\LaTeX\) renderer, including the best-fit point
and the \(3\sigma\) confidence level. By default, the prior intervals are
used as axes limits, with ticks at predefined positions, but both images below
override this setting reading custom intervals and ticks given in the files
LCDM.range
and LCDM.ticks
.
With use_derived
, we can also check out the marginalized distributions of
the derived parameters in the der_pars
directory
Combining two or more simulations in one plot¶
You just need to run drawhistograms.py
with two or more paths in the
arguments. To illustrate this, I run a second MCMC simulation for the same
model but this time without including the distance priors data. It is then
interesting to plot both realizations together so we can seethe effect that
including that dataset has on the results:
python drawhistograms.py \
<FULL-OR-RELATIVE-PATH-TO>/LCDM-no-Planck/MCMC-MySecondRun/n32500/results20/ \
<FULL-OR-RELATIVE-PATH-TO>/LCDM/MCMC-MyFirstRun/n1300000/results20/ \
colors=C3,k smooth_hist=kde units=Orh2:5,Obh2:2 \
range=h:0.66_0.75,Obh2:0.01_0.056,Och2:0.08_0.16,M:-0.12_0.18 \
ticks=Och2:0.09_0.12_0.15,Obh2:1.8e-2_3.3e-2_4.8e-2,Orh2:4.7e-5_6e-5_7.3e-5 \
usetex font=Times gname=noplanck
Notice how crucial the CMB data is to resolve the degeneracy between the baryon and radiation densities when both are free parameters. The constraints are improved considerably.
You can combine as many results as you like.
When this is done, any custom ranges or tick positions defined in .range
and .ticks
files will be ignored (as well as the factor in the .units
file), since different simulations might give considerably different results
that could go outside of the ranges of one of them.
In this case, after checking the results and determining the ranges that you
want to plot, pass this information in the command line as in the example
above.
The gname
option specifies the prefix for the name of the pdf file that
will be generated.
If you omit this option you will be prompted to type it.
All other settings are optional.
A legend will be included in the top right corner using the labels defined in
the labels.txt
files: the lines of these files are combined with a plus
sign but you can use whatever label you prefer by replacing the contents of
this file, for example, giving a name in a single line.
The legend title uses the model name of the first simulation in the arguments.
This is intended for showing, at the same time, results from different datasets
with the same model. If this is not your case and you would like to use a
different title, change accordingly the name entry in the .ini
file of the
corresponding simulation.
Visualizing chain sequences and convergence¶
If you include the argument plot_sequences
you will get a plot like this
When monitoring convergence, the values of \(\hat{R}^p - 1\) at twenty
different lengths for the multivariate analysis and separate one-dimensional
analyses for each parameter are plotted in the files
monitor_convergence_<N>.pdf
and monitor_each_parameter_<N>.pdf
, where
N
is the total utilized length of the chains. The absolute values of the between-chains and within-chains variances, \(\hat{V}\) and \(W\) are also shown.
For our \(\Lambda\text{CDM}\) example, we got
Finally, the correlation of the chains can also be inspected if we use the option ACF
. This includes all
the cross- and auto-correlations.
[1] | Gelman A & Rubin D. B. “Inference from Iterative Simulation Using Multiple Sequences”. Statistical Science 7 (1992) 457-472. |
[2] | Brooks S. P. & Gelman A. “General Methods for Monitoring Convergence of Iterative Simulations”. Journal of Computational and Graphical Statistics 7 (1998) 434. |
[3] | Kristan M., Leonardis A. and Skocaj D. “Multivariate online kernel density estimation with Gaussian kernels”. Pattern Recognit 44 (2011) 2630-2642. |
Running PT-MCMC¶
Alternatively, we can use the Parallel Tempering algorithm. This is specially useful when the posterior distribution is likely to present more than one peak and these peaks are well separated. In such cases, the standard MCMC sampler is not efficient and may even fail to detect peaks away from where the chain begun. With Parallel Tempering, the ground-temperature chain that samples the true posterior distribution is more likely to jump between separate peaks, more efficiently visiting the entire parameter space. This is possible thanks to a mechanism providing state exchange between chains following a temperature ladder, in which the posterior is progressively flattened so as to reduce the differences of probability density amplitude that otherwise would keep the chains locally stuck.
In order to illustrate the potential of this method, we will run a PT-MCMC on
an artificial distribution. A test model is created with a likelihood given as
a combination of various univariate Gaussian distributions. We will see in the
following how we are able to recover the exact marginalized posterior
distributions.
The function reserved for this is the disttest
in the likelihood.py
file.
In our example, this model has three free parameters, \(a, b, c\), all with
flat priors between 0 and 1. The likelihood consists of the Gaussian functions
\(N_a(0.5, 0.01)\), \(N_b(0.2, 0.03)\) and two Gaussian functions for
the third parameter, \(N_c(0.3, 0.02)\) and \(N_c(0.75, 0.02)\),
giving two distant peaks several sigmas apart, with respective weights
\(0.7, 0.3\).
Initializing the chains¶
This works pretty much in the same way as the previous method. Just run:
python initialize_chains.py disttest.ini chains=4 mode=PT
mode=PT
is mandatory to specify that you want to use PT and not the
default sampler, standard MCMC.
You will be prompted about after how many steps the algorithm should propose a
state swap:
Please specify after how many steps should propose a swap: 30
or you can bypass the information with nswap=30
.
At every step a random number \(u\) between 0 and 1 is generated and a
swap will be proposed if \(u < 1/n_s\).
The default temperature ladder is between \(\beta=1\) (ground chain) and
\(2^{10}\), evenly distributed over the log scale over the number of chains
given.
This can be changed with betascale=linear
, for example (it will actually be
linear for whatever given value different than log
) and betamax=6
(instead of 10
), for example.
Sampling the posterior distribution¶
Next, we run as we would a standard MCMC set of chains:
python epic.py <FULL-PATH-TO>/disttest/PT-171022-185848/ steps=3000 limit=90000
Unlike the standard method, convergence will not be checked automatically
(remember only the ground temperature chain corresponds to the true posterior
distribution).
We thus need to specify a limit
so it stops after some large number of
states.
Because the higher-temperature distributions are wider, in this code
I try to guarantee an efficient sampling for all chains by dividing
the variances in the proposal function in each chain by their
respective value of \(\beta\).
Analyzing the chains¶
In the PT
method, convergence is not assessed automatically, since each
chain has a different temperature, thus not sampling from the same
distribution.
We could think of comparing the different chains after taking the frequencies
of each bin to the power \(1/\beta\), but I believe this could lead to
inaccuracies due to statistical noise amplification in the higher-temperature
chains, so I have not tested nor implemented this yet.
In order to determine if you can already stop evolving your chains, I recommend
running a second similar simulation so you can compare the ground temperature
chains of the two (or more, but two is enough). To do this, create and run a
second simulation with:
python epic.py disttest.ini chains=4 mode=PT steps=9000 limit=90000 nswap=30
and then:
python analyze.py <FULL-OR-RELATIVE-PATH-TO-FIRST-SIMULATION> calculate_R=<FULL-OR-RELATIVE-PATH-TO-SECOND-SIMULATION>
You can do this at any time while the simulations are still running, even if
they have different lengths. The only requirement is that the one in the
calculate_R
argument is the longer one.
The minimum of the lengths of the two ground temperature chains will be used,
the rest is disregarded.
The directory of the main argument (the shorter chain) will contain the convergence results.
During the simulation, the code will print the acceptance rates of the chains
every 10 iterations.
The acceptance rate of the ground temperature chain and the swap rate are
printed every iteration.
After limit
steps, the code outputs the Akaike and Bayesian information
criteria, AIC
and BIC
, and generates the usual plots.
Analyzing the resulting posterior and even checking convergence is fast since
only the ground temperature chains need to be loaded.
However, since this is Parallel Temperature, we can use the information of the
higher-temperature chains to compute the Bayesian evidence, we just need to
specify that we want so with the option evidence
. If we ask to
plot_sequences
or the correlations (with ACF
), since all chains will
have to be loaded the evidence will also be computed anyway even without the
argument evidence
.
It is interesting to inspect the sequences of the different temperatures. We
can clearly see the jumps between the peaks occurring in the lower-temperature
chains and the good mixing of the high-temperature chains, where the peaks are
smoothed out.
The evidence is calculated from the area under the following curve
Of course, the more chains we use the more precise this calculation will be. So let us now see how accurate is our simulation. In the following triangle plot, we show the exact curves of the Gaussian distributions superposing the histograms.
Notice how the resulting histograms approximate very well the true functions, including the separated peaks given by the sum of two Gaussians. In standard MCMC, we would very probably obtain only one of the peaks. Which one depends on where the chain starts.
Let us improve this a little. These histograms with 20 bins are ok for the
first two the parameters, but because of the wider range of data along the
\(c\) axis, it is too little.
Using a higher number of bins helps but most bins would be wasted since the
division spans uniformly over the entire range of data and there are practically
no points inside a large interval between the two peaks.
The standard kernel density estimates with the bandwidth tuned for
Gaussian-like distributions does not work well with multi-peaked distributions
because the standard deviation will be rather large. We circumvented this
limitation in this program by allowing the calculation of KDE on separate
subsets of data. For this case, instead of running make_kde([array,])
we
can do make_kde([array[array < 0.5], array[array > 0.5]])
, since the peak
separation is very clear and there will probably be no points above
\(c=0.5\) belonging to the first peak and no points below this value
belonging to the second peak (when the peaks are closer together this may be
much more complicated to accomplish). This will make KDE for the two subsets
separately and then join them together again, resulting in an accurate
representation of the data.
The make_kde
function already receives a list of arrays to make this
possible when we have separate arrays of data. In this case, we just need to
tell the program where to split the single array in two. This is done by saving a file disttest.true_fit
containing the weight, \(\mu\) and \(\sigma\) parameters of the separated true distributions (only Gaussian is supported):
1 0.5 0.01
1 0.2 0.03
0.7 0.3 0.02 0.3 0.75 0.02
Notice how the KDE curves match very well the true functions, in blue lines:
The confidence levels are extracted from this marginalized distributions. The precision in the determination of the intervals is then limited by the resolution of the histograms (the number of bins). I recommend using KDE whenever is possible. Using more bins is also an option but requires having very long chains so this number can be raised considerably. KDE provides very good precision without needing too much data points (as long as there is enough for the chains to have converged). It is clear from these images how the KDE curves match much better \(1\sigma\) and \(2\sigma\) confidence intervals. The shaded regions are not expected to match the dashed lines in the marginalized distribution of the third parameter (unless the two Gaussians had equal weights and widths), however, as they are calculated globally rather than locally.
At this moment, more than two separated peaks are not supported for this
splitting by the analyze.py
routine, although it can be done directly with
any number of separated arrays in a list in the make_kde
function.
Finally, let us see the results from this simulation. The image below is a screenshot of the kde_table.pdf
file obtained compiling the kde_table.tex
file saved in the kde_estimates
directory:
Since the parameter \(c\) has two peaks of high density probability, the
resulting confidence regions are composed of two intervals. The union symbol
\(\cup\) is used to represent this configuration. The result for the
parameter \(b\) may look different from the true Gaussian function because
of an artifact in the peak, showing doubled tips. The reported result considers
the highest of the tips as the central value, the error bars being calculated
from that value. For cases like this, it is interesting to simply report the
Gaussian fits instead. They can be inserted automatically in the tables if we
save a file named normal_kde.txt
in the simulation directory informing the
name of the parameters (one per line) that look Gaussian. Then the mean and
standard deviation will be printed in the table when drawhistograms.py
is
run again. This has been done for the result shown above.
Adaptation (beta)¶
An adaptive routine is implemented following the ideas in Łącki & Miasojedow 2016 [1], but this needs more tests and is still in beta. Come back later to check.
[1] | Łącki M. K. & Miasojedow B. “State-dependent swap strategies and automatic reduction of number of temperatures in adaptive parallel tempering algorithm”. Statistics and Computing 26, 951–964 (2016). |
Acknowledgments¶
I want to thank Gabriel Marcondes and Luiz Irber for their help. This work has made use of the computing facilities of the Laboratory of Astroinformatics (IAG/USP, NAT/Unicsul), whose purchase was made possible by the Brazilian agency FAPESP (grant 2009/54006-4) and the INCT-A.
About the author¶
I’m a Brazilian engineer and physicist graduated from the Federal University of Sao Carlos (Engineering Physics, 2009), the University of Campinas (M.Sc. in Physics, 2012) and the University of Sao Paulo (Ph.D. in Physics, 2016). Besides developing EPIC, I have worked mainly with tests of interacting dark energy models using growth of structure and galaxy clusters data.
See the inSPIRE-HEP website to access my publications.
Changelog¶
Version 1.0.4¶
- Uses astropy.io.fits instead of pyfits for loading JLA (v4) files.
Version 1.0.2¶
New in this version:
- Included instructions in documentation on how to show two or more results in the same triangle plot;
- Added section “About the author” to documentation;
- This changelog;
- New background in html documentation favicon, uses readthedocs’ color;
- Included arXiv eprint number of the PDF version of this documentation in license information;
- Slightly reduced mathjax fontsize in html documentation;
- Other minor changes to documentation.
Version 1.0.1¶
- First release on PyPi