MSMTools¶
MSMTools is a Python library for the estimation, validation and analysis Markov state models.
It supports the following main features:
- Markov state model (MSM) estimation and validation.
- Computing Metastable states and structures with Perron-cluster cluster analysis (PCCA).
- Systematic coarse-graining of MSMs to transition models with few states.
- Extensive analysis options for MSMs, e.g. calculation of committors, mean first passage times, transition rates, experimental expectation values and time-correlation functions, etc.
- Transition Path Theory (TPT).
For a high-level interface to these functionalities, we encourage you to use PyEMMA.
Technical features:
- Code is implemented in Python (supports 2.7, 3.3/3.4) and C.
- Runs on Linux (64 bit), Windows (32 or 64 bit) or MacOS (64 bit).
Documentation¶
Installation¶
To install the msmtools Python package, you need a few Python package dependencies. If these dependencies are not available in their required versions, the installation will fail. We recommend one particular way for the installation that is relatively safe, but you are welcome to try another approaches if you know what you are doing.
Anaconda install (Recommended)¶
We strongly recommend to use the Anaconda scientific python distribution in order to install python-based software, including msmtools. Python-based software is not trivial to distribute and this approach saves you many headaches and problems that frequently arise in other installation methods. You are free to use a different approach (see below) if you know how to sort out problems, but play at your own risk.
If you already have a conda installation, directly go to step 3:
Download and install miniconda for Python 2 or 3, 32 or 64 bit depending on your system: http://conda.pydata.org/miniconda.html
For Windows users, who do not know what to choose for 32 or 64 bit, it is strongly recommended to read the second question of this FAQ first: http://windows.microsoft.com/en-us/windows/32-bit-and-64-bit-windows
Run the installer and select yes to add conda to the PATH variable.
If you have installed from a Linux shell, either open a new shell to have an updated PATH, or update your PATH variable by
source ~/.bashrc
(or .tcsh, .csh - whichever shell you are using).Add the omnia-md software channel, and install (or update) msmtools:
conda config --add channels omnia conda install msmtools
if the command conda is unknown, the PATH variable is probably not set correctly (see 1. and 2.)
Check installation:
conda list
shows you the installed python packages. You should find a msmtools 1.1 (or later) and ipython, ipython-notebook 3.1 (or later). If ipython is not up to date, you can still use msmtools, but you won’t be able to load our example notebooks. In that case, update it by
conda install ipython-notebook
Python Package Index (PyPI)¶
If you do not like Anaconda for some reason you should use the Python package manager pip to install. This is not recommended, because in the past, various problems have arisen with pip in compiling the packages that msmtools depends upon.
If you do not have pip, please read the install guide: install guide.
Make sure pip is enabled to install so called wheel packages:
pip install wheel
Now you are able to install binaries if you use MacOSX or Windows. At the moment of writing PyPI does not support Linux binaries at all, so Linux users have to compile by themselves.
Install msmtools using
pip install msmtools
Check your installation
python >>> import msmtools >>> msmtools.__version__
should print 1.1 or later
>>> import IPython >>> IPython.__version__
should print 3.1 or later. If ipython is not up to date, update it by
pip install ipython
Building from Source¶
If you refuse to use Anaconda, you will build msmtools from the source. In this approach, all msmtools dependencies will be built from the source too. Building these from source is sometimes (if not usually) tricky, takes a long time and is error prone - though it is not recommended nor supported by us. If unsure, use the anaconda installation.
Ensure that you fulfill the following prerequisites. You can install either with pip or conda, as long as you met the version requirements.
- C/C++ compiler
- setuptools > 18
- cython >= 0.22
- numpy >= 1.6
- scipy >= 0.11
If you do not fulfill these requirements, try to upgrade all packages:
pip install --upgrade setuptools pip install --upgrade cython pip install --upgrade numpy pip install --upgrade scipy
Note that if pip finds a newer version, it will trigger an update which will most likely involve compilation. Especially NumPy and SciPy are hard to build. You might want to take a look at this guide here: http://www.scipy.org/scipylib/building/
2. The build and install process is in one step, as all dependencies are dragged in via the provided setup.py script. So you only need to get the source of Emma and run it to build Emma itself and all of its dependencies (if not already supplied) from source.
pip install msmtools
For Developers¶
If you are a developer, clone the code repository from GitHub and install it as follows
Ensure the prerequisites (point 1) described for “Building from Source” above.
Make a suitable directory, and inside clone the repository via
git clone https://github.com/markovmodel/msmtools.git
install msmtools via
python setup.py develop [--user]
The develop install has the advantage that if only python scripts are being changed e.g. via an pull or a local edit, you do not have to re-install anything, because the setup command simply created a link to your working copy. Repeating point 3 is only necessary if any of msmtools C-files change and need to be rebuilt.
Documentation¶
msmtools provides these packages to perform estimation and further analysis of Markov models.
estimation - MSM estimation from data (msmtools.estimation
)¶
Countmatrix¶
count_matrix (dtraj, lag[, sliding, …]) |
Generate a count matrix from given microstate trajectory. |
cmatrix (dtraj, lag[, sliding, …]) |
Generate a count matrix from given microstate trajectory. |
Connectivity¶
connected_sets (C[, directed]) |
Compute connected sets of microstates. |
largest_connected_set (C[, directed]) |
Largest connected component for a directed graph with edge-weights given by the count matrix. |
largest_connected_submatrix (C[, directed, lcc]) |
Compute the count matrix on the largest connected set. |
connected_cmatrix (C[, directed, lcc]) |
Compute the count matrix on the largest connected set. |
is_connected (C[, directed]) |
Check connectivity of the given matrix. |
Estimation¶
transition_matrix (C[, reversible, mu, method]) |
Estimate the transition matrix from the given countmatrix. |
tmatrix (C[, reversible, mu, method]) |
Estimate the transition matrix from the given countmatrix. |
rate_matrix (C[, dt, method, sparsity, …]) |
Estimate a reversible rate matrix from a count matrix. |
log_likelihood (C, T) |
Log-likelihood of the count matrix given a transition matrix. |
tmatrix_cov (C[, k]) |
Covariance tensor for non-reversible transition matrix posterior. |
error_perturbation (C, S) |
Error perturbation for given sensitivity matrix. |
Sampling¶
tmatrix_sampler (C[, reversible, mu, T0, …]) |
Generate transition matrix sampler object. |
Bootstrap¶
bootstrap_counts (dtrajs, lagtime[, corrlength]) |
Generates a randomly resampled count matrix given the input coordinates. |
bootstrap_trajectories (trajs, correlation_length) |
Generates a randomly resampled trajectory segments. |
Priors¶
prior_neighbor (C[, alpha]) |
Neighbor prior for the given count matrix. |
prior_const (C[, alpha]) |
Constant prior for given count matrix. |
prior_rev (C[, alpha]) |
Prior counts for sampling of reversible transition matrices. |
analysis - MSM analysis functions (msmtools.analysis
)¶
This module contains functions to analyze a created Markov model, which is specified with a transition matrix T.
Validation¶
is_transition_matrix (T[, tol]) |
Check if the given matrix is a transition matrix. |
is_tmatrix (T[, tol]) |
Check if the given matrix is a transition matrix. |
is_rate_matrix (K[, tol]) |
Check if the given matrix is a rate matrix. |
is_connected (T[, directed]) |
Check connectivity of the given matrix. |
is_reversible (T[, mu, tol]) |
Check reversibility of the given transition matrix. |
Decomposition¶
Decomposition routines use the scipy LAPACK bindings for dense numpy-arrays and the ARPACK bindings for scipy sparse matrices.
stationary_distribution (T) |
Compute stationary distribution of stochastic matrix T. |
statdist (T) |
Compute stationary distribution of stochastic matrix T. |
eigenvalues (T[, k, ncv, reversible, mu]) |
Find eigenvalues of the transition matrix. |
eigenvectors (T[, k, right, ncv, reversible, mu]) |
Compute eigenvectors of given transition matrix. |
rdl_decomposition (T[, k, norm, ncv, …]) |
Compute the decomposition into eigenvalues, left and right eigenvectors. |
timescales (T[, tau, k, ncv, reversible, mu]) |
Compute implied time scales of given transition matrix. |
Expected counts¶
expected_counts (T, p0, N) |
Compute expected transition counts for Markov chain with n steps. |
expected_counts_stationary (T, N[, mu]) |
Expected transition counts for Markov chain in equilibrium. |
Passage times¶
mfpt (T, target[, origin, tau, mu]) |
Mean first passage times (from a set of starting states - optional) to a set of target states. |
Committors and PCCA¶
committor (T, A, B[, forward, mu]) |
Compute the committor between sets of microstates. |
pcca (T, m) |
Compute meta-stable sets using PCCA++ _[1] and return the membership of all states to these sets. |
Fingerprints¶
fingerprint_correlation (T, obs1[, obs2, …]) |
Dynamical fingerprint for equilibrium correlation experiment. |
fingerprint_relaxation (T, p0, obs[, tau, k, ncv]) |
Dynamical fingerprint for relaxation experiment. |
expectation (T, a[, mu]) |
Equilibrium expectation value of a given observable. |
correlation (T, obs1[, obs2, times, maxtime, …]) |
Time-correlation for equilibrium experiment. |
relaxation (T, p0, obs[, times, k, ncv]) |
Relaxation experiment. |
Sensitivity analysis¶
stationary_distribution_sensitivity (T, j) |
Sensitivity matrix of a stationary distribution element. |
eigenvalue_sensitivity (T, k) |
Sensitivity matrix of a specified eigenvalue. |
timescale_sensitivity (T, k) |
Sensitivity matrix of a specified time-scale. |
eigenvector_sensitivity (T, k, j[, right]) |
Sensitivity matrix of a selected eigenvector element. |
mfpt_sensitivity (T, target, i) |
Sensitivity matrix of the mean first-passage time from specified state. |
committor_sensitivity (T, A, B, i[, forward]) |
Sensitivity matrix of a specified committor entry. |
expectation_sensitivity (T, a) |
Sensitivity of expectation value of observable A=(a_i). |
flux - Reactive flux an transition pathways (msmtools.flux
)¶
This module contains functions to compute reactive flux networks and find dominant reaction pathways in such networks.
TPT-object¶
tpt (T, A, B[, mu, qminus, qplus, rate_matrix]) |
Computes the A->B reactive flux using transition path theory (TPT) |
ReactiveFlux (A, B, flux[, mu, qminus, …]) |
A->B reactive flux from transition path theory (TPT) |
Reactive flux¶
flux_matrix (T, pi, qminus, qplus[, netflux]) |
Compute the TPT flux network for the reaction A–>B. |
to_netflux (flux) |
Compute the netflux from the gross flux. |
flux_production (F) |
Returns the net flux production for all states |
flux_producers (F[, rtol, atol]) |
Return indexes of states that are net flux producers. |
flux_consumers (F[, rtol, atol]) |
Return indexes of states that are net flux producers. |
coarsegrain (F, sets) |
Coarse-grains the flux to the given sets. |
Reaction rates and fluxes¶
total_flux (F[, A]) |
Compute the total flux, or turnover flux, that is produced by |
rate (totflux, pi, qminus) |
Transition rate for reaction A to B. |
mfpt (totflux, pi, qminus) |
Mean first passage time for reaction A to B. |
dtraj - Discrete trajectories functions (msmtools.dtraj
)¶
Discrete trajectory io¶
read_discrete_trajectory (filename) |
Read discrete trajectory from ascii file. |
read_dtraj (filename) |
Read discrete trajectory from ascii file. |
write_discrete_trajectory (filename, dtraj) |
Write discrete trajectory to ascii file. |
write_dtraj (filename, dtraj) |
Write discrete trajectory to ascii file. |
load_discrete_trajectory (filename) |
Read discrete trajectory form binary file. |
load_dtraj (filename) |
Read discrete trajectory form binary file. |
save_discrete_trajectory (filename, dtraj) |
Write discrete trajectory to binary file. |
save_dtraj (filename, dtraj) |
Write discrete trajectory to binary file. |
Simple statistics¶
count_states (dtrajs[, ignore_negative]) |
returns a histogram count |
visited_set (dtrajs) |
returns the set of states that have at least one count |
number_of_states (dtrajs[, only_used]) |
returns the number of states in the given trajectories. |
index_states (dtrajs[, subset]) |
Generates a trajectory/time indexes for the given list of states |
Sampling trajectory indexes¶
sample_indexes_by_distribution (indexes, …) |
Samples trajectory/time indexes according to the given probability distributions |
sample_indexes_by_state (indexes, nsample[, …]) |
Samples trajectory/time indexes according to the given sequence of states |
sample_indexes_by_sequence (indexes, sequence) |
Samples trajectory/time indexes according to the given sequence of states |
generation - MSM generation tools (msmtools.generation
)¶
This module contains function to generate simple MSMs.
Metropolis-Hastings chain¶
transition_matrix_metropolis_1d (E[, d]) |
Transition matrix describing the Metropolis chain jumping between neighbors in a discrete 1D energy landscape. |
generate_traj (P, N[, start, stop, dt]) |
Generates a realization of the Markov chain with transition matrix P. |
Development¶
Changelog¶
1.2.4 (11-12-18)¶
Fixes: - Effective count matrix parallel evaluation fixes. #116, #117
1.2.2 (6-25-18)¶
New features
- Added new transition matrix estimator which uses a primal-dual interior-point iteration scheme. #82
Fixes:
- Fixed a corner case, in which the pathway decomposition could fail (no more paths left). #107
1.2.1 (5-16-17)¶
New features
- Added fast reversible transition matrix estimation. #94
Fixes:
- Fixed some minor issues in rate matrix estimation. #97 #98
1.1.4 (9-23-16)¶
Fixes:
- Fixed sparsity pattern check in transition matrix sampler. #91, thanks @fabian-paul
Developer’s Guide¶
Contributing¶
Basic Idea¶
We use the “devel” branch to develop pyEMMA. When “devel” has reached a mature state in terms of current functionality and stability, we merge “devel” into the master branch. This happens at the time when a release is made.
In order to develop certain features you should not work on “devel” directly either, but rather branch it to a new personal feature branch (here you can do whatever you want). Then, after testing your feature, you can offer the changes to be merged on to the devel branch by doing a pull request (see below). If accepted, that branch will be merged into devel, and unless overridden by other changes your feature will make it eventually to master and the next release.
Why¶
- Always have a tested and stable master branch.
- Avoid interfering with other developers until changes are merged.
How¶
One of the package maintainers merges the development branch(es) periodically. All you need to do is to make your changes in the feature branch (see below for details), and then offer a pull request. When doing so, a bunch of automatic code tests will be run to test for direct or indirect bugs that have been introduced by the change. This is done by a continuous integration (CI) software like Jenkins http://jenkins-ci.org or Travis-CI http://travis-ci.org, the first one is open source and the second one is free for open source projects only. Again, you do not have to do anything here, as this happens automatically after a pull request. You will see the output of these tests in the pull request page on github.
Commit messages¶
Use commit messages in the style “[$package]: change” whenever the changes belong to one package or module. You can suppress “pyemma” (that is trivial) and “api” (which doesn’t show up in the import).
E.g.:
[msm.analysis]: implemented sparse pcca
That way other developers and the package managers immediately know which modules have been changed and can watch out for possible cross-effects. Also this makes commits look uniform.
If you have a complex commit affecting several modules or packages, break it down into little pieces with easily understandable commit messages. This allows us to go back to intermediate stages of your work if something fails at the end.
Testing¶
We use Pythons unittest module to write test cases for all algorithm.
To run all tests invoke:
python setup.py test
or directly invoke nosetests in pyemma working copy:
nosetests $PYEMMA_DIR
It is encouraged to run all tests (if you are changing core features), you can also run individual tests by directly invoking them with the python interpreter.
Documentation¶
Every function, class, and module that you write must be documented. Please check out
how to do this. In short, after every function (class, module) header, there should be a docstring enclosed by “”” … “”“, containing a short and long description what the function does, a clear description of the input parameters, and the return value, and any relevant cross-references or citations. You can include Latex-style math in the docstring.
For a deeper understanding and reference please have a look at the Sphinx documentation
In particular, the API functions (those publicly visible to the external user) should be well documented.
To build the documentation you need the dependencies from the file requirements-build-docs.txt which you can install via pip:
pip install -r requirements-build-docs.txt
afterwards you are ready to build the documentation:
cd doc
make html
The HTML docs can then be found in the doc/build/html directory.
Workflow¶
A developer creates a feature branch “feature” and commits his or her work to this branch. When he or she is done with his work (have written at least a working test case for it), he or she pushes this feature branch to his or her fork and creates a pull request. The pull request can then be reviewed and merged upstream.
- Get up to date - pull the latest changes from devel
# first get the latest changes
git pull
- Compile extension modules (also works with conda distributions)
python setup.py develop
In contrast to install, which copies the development version into your package directory, the develop flag results in simply putting a link from your package directory into your development directory. That way local changes to python files are immediately active when you import the package. You only need to re-execute the above command, when a C extension was changed.
- Create a new feature branch by copying from the devel branch and switch to it:
# switch to development branch
git checkout devel
# create new branch and switch to it
git checkout -b feature
- Work on your feature branch. Here you can roam freely.
- Write unit test and TEST IT (see above)! :-)
touch fancy_feat_test.py
# test the unit
python fancy_feat_test.py
# run the whole test-suite
# (to ensure that your newfeature has no side-effects)
cd $PYEMMA_DIR
python setup.py test
- Commit your changes
git commit fancy_feat.py fancy_feat_test.py \
-m "Implementation and unit test for fancy feature"
repeat 3.-5. as often as necessary to accomplish your task. Remember to split your changes into small commits.
- Make changes available by pushing your commits to the server and creating a pull request
# push your branch to your fork on github
git push myfork feature
On github create a pull request from myfork/feature to origin/devel, see https://help.github.com/articles/using-pull-requests
Conclusions¶
- Feature branches allow you to work without interfering with others.
- The devel branch contains all tested implemented features.
- The devel branch is used to test for cross-effects between features.
- Work with pull request to ensure your changes are being tested automatically and can be reviewed.
- The master branch contains all tested features and represents the set of features that are suitable for public usage.
Publish a new release¶
- Merge current devel branch into master
git checkout master; git merge devel
- Make a new tag ‘vmajor.minor.patch’. major means major release (major new functionalities), minor means minor changes and new functionalities, patch means no new functionality but just bugfixes or improvement to the docs.
git tag -m "release description" v1.1
- IMPORTANT: first push, then push –tags
git push; git push --tags
- Update conda recipes and perform binstar pushing (partially automatized)