MSMTools

MSMTools is a Python library for the estimation, validation and analysis Markov state models.

It supports the following main features:

  • Markov state model (MSM) estimation and validation.
  • Computing Metastable states and structures with Perron-cluster cluster analysis (PCCA).
  • Systematic coarse-graining of MSMs to transition models with few states.
  • Extensive analysis options for MSMs, e.g. calculation of committors, mean first passage times, transition rates, experimental expectation values and time-correlation functions, etc.
  • Transition Path Theory (TPT).

For a high-level interface to these functionalities, we encourage you to use PyEMMA.

Technical features:

  • Code is implemented in Python (supports 2.7, 3.3/3.4) and C.
  • Runs on Linux (64 bit), Windows (32 or 64 bit) or MacOS (64 bit).

Documentation

Installation

To install the msmtools Python package, you need a few Python package dependencies. If these dependencies are not available in their required versions, the installation will fail. We recommend one particular way for the installation that is relatively safe, but you are welcome to try another approaches if you know what you are doing.

Python Package Index (PyPI)

If you do not like Anaconda for some reason you should use the Python package manager pip to install. This is not recommended, because in the past, various problems have arisen with pip in compiling the packages that msmtools depends upon.

  1. If you do not have pip, please read the install guide: install guide.

  2. Make sure pip is enabled to install so called wheel packages:

    pip install wheel
    

    Now you are able to install binaries if you use MacOSX or Windows. At the moment of writing PyPI does not support Linux binaries at all, so Linux users have to compile by themselves.

  3. Install msmtools using

    pip install msmtools
    
  4. Check your installation

    python
    >>> import msmtools
    >>> msmtools.__version__
    

    should print 1.1 or later

    >>> import IPython
    >>> IPython.__version__
    

    should print 3.1 or later. If ipython is not up to date, update it by pip install ipython

Building from Source

If you refuse to use Anaconda, you will build msmtools from the source. In this approach, all msmtools dependencies will be built from the source too. Building these from source is sometimes (if not usually) tricky, takes a long time and is error prone - though it is not recommended nor supported by us. If unsure, use the anaconda installation.

  1. Ensure that you fulfill the following prerequisites. You can install either with pip or conda, as long as you met the version requirements.

    • C/C++ compiler
    • setuptools > 18
    • cython >= 0.22
    • numpy >= 1.6
    • scipy >= 0.11

    If you do not fulfill these requirements, try to upgrade all packages:

    pip install --upgrade setuptools
    pip install --upgrade cython
    pip install --upgrade numpy
    pip install --upgrade scipy
    

    Note that if pip finds a newer version, it will trigger an update which will most likely involve compilation. Especially NumPy and SciPy are hard to build. You might want to take a look at this guide here: http://www.scipy.org/scipylib/building/

2. The build and install process is in one step, as all dependencies are dragged in via the provided setup.py script. So you only need to get the source of Emma and run it to build Emma itself and all of its dependencies (if not already supplied) from source.

pip install msmtools

For Developers

If you are a developer, clone the code repository from GitHub and install it as follows

  1. Ensure the prerequisites (point 1) described for “Building from Source” above.

  2. Make a suitable directory, and inside clone the repository via

    git clone https://github.com/markovmodel/msmtools.git
    
  3. install msmtools via

    python setup.py develop [--user]
    

    The develop install has the advantage that if only python scripts are being changed e.g. via an pull or a local edit, you do not have to re-install anything, because the setup command simply created a link to your working copy. Repeating point 3 is only necessary if any of msmtools C-files change and need to be rebuilt.

Documentation

msmtools provides these packages to perform estimation and further analysis of Markov models.

estimation - MSM estimation from data (msmtools.estimation)

Countmatrix
count_matrix(dtraj, lag[, sliding, …]) Generate a count matrix from given microstate trajectory.
cmatrix(dtraj, lag[, sliding, …]) Generate a count matrix from given microstate trajectory.
Connectivity
connected_sets(C[, directed]) Compute connected sets of microstates.
largest_connected_set(C[, directed]) Largest connected component for a directed graph with edge-weights given by the count matrix.
largest_connected_submatrix(C[, directed, lcc]) Compute the count matrix on the largest connected set.
connected_cmatrix(C[, directed, lcc]) Compute the count matrix on the largest connected set.
is_connected(C[, directed]) Check connectivity of the given matrix.
Estimation
transition_matrix(C[, reversible, mu, method]) Estimate the transition matrix from the given countmatrix.
tmatrix(C[, reversible, mu, method]) Estimate the transition matrix from the given countmatrix.
rate_matrix(C[, dt, method, sparsity, …]) Estimate a reversible rate matrix from a count matrix.
log_likelihood(C, T) Log-likelihood of the count matrix given a transition matrix.
tmatrix_cov(C[, k]) Covariance tensor for non-reversible transition matrix posterior.
error_perturbation(C, S) Error perturbation for given sensitivity matrix.
Sampling
tmatrix_sampler(C[, reversible, mu, T0, …]) Generate transition matrix sampler object.
Bootstrap
bootstrap_counts(dtrajs, lagtime[, corrlength]) Generates a randomly resampled count matrix given the input coordinates.
bootstrap_trajectories(trajs, correlation_length) Generates a randomly resampled trajectory segments.
Priors
prior_neighbor(C[, alpha]) Neighbor prior for the given count matrix.
prior_const(C[, alpha]) Constant prior for given count matrix.
prior_rev(C[, alpha]) Prior counts for sampling of reversible transition matrices.

analysis - MSM analysis functions (msmtools.analysis)

This module contains functions to analyze a created Markov model, which is specified with a transition matrix T.

Validation
is_transition_matrix(T[, tol]) Check if the given matrix is a transition matrix.
is_tmatrix(T[, tol]) Check if the given matrix is a transition matrix.
is_rate_matrix(K[, tol]) Check if the given matrix is a rate matrix.
is_connected(T[, directed]) Check connectivity of the given matrix.
is_reversible(T[, mu, tol]) Check reversibility of the given transition matrix.
Decomposition

Decomposition routines use the scipy LAPACK bindings for dense numpy-arrays and the ARPACK bindings for scipy sparse matrices.

stationary_distribution(T) Compute stationary distribution of stochastic matrix T.
statdist(T) Compute stationary distribution of stochastic matrix T.
eigenvalues(T[, k, ncv, reversible, mu]) Find eigenvalues of the transition matrix.
eigenvectors(T[, k, right, ncv, reversible, mu]) Compute eigenvectors of given transition matrix.
rdl_decomposition(T[, k, norm, ncv, …]) Compute the decomposition into eigenvalues, left and right eigenvectors.
timescales(T[, tau, k, ncv, reversible, mu]) Compute implied time scales of given transition matrix.
Expected counts
expected_counts(T, p0, N) Compute expected transition counts for Markov chain with n steps.
expected_counts_stationary(T, N[, mu]) Expected transition counts for Markov chain in equilibrium.
Passage times
mfpt(T, target[, origin, tau, mu]) Mean first passage times (from a set of starting states - optional) to a set of target states.
Committors and PCCA
committor(T, A, B[, forward, mu]) Compute the committor between sets of microstates.
pcca(T, m) Compute meta-stable sets using PCCA++ _[1] and return the membership of all states to these sets.
Fingerprints
fingerprint_correlation(T, obs1[, obs2, …]) Dynamical fingerprint for equilibrium correlation experiment.
fingerprint_relaxation(T, p0, obs[, tau, k, ncv]) Dynamical fingerprint for relaxation experiment.
expectation(T, a[, mu]) Equilibrium expectation value of a given observable.
correlation(T, obs1[, obs2, times, maxtime, …]) Time-correlation for equilibrium experiment.
relaxation(T, p0, obs[, times, k, ncv]) Relaxation experiment.
Sensitivity analysis
stationary_distribution_sensitivity(T, j) Sensitivity matrix of a stationary distribution element.
eigenvalue_sensitivity(T, k) Sensitivity matrix of a specified eigenvalue.
timescale_sensitivity(T, k) Sensitivity matrix of a specified time-scale.
eigenvector_sensitivity(T, k, j[, right]) Sensitivity matrix of a selected eigenvector element.
mfpt_sensitivity(T, target, i) Sensitivity matrix of the mean first-passage time from specified state.
committor_sensitivity(T, A, B, i[, forward]) Sensitivity matrix of a specified committor entry.
expectation_sensitivity(T, a) Sensitivity of expectation value of observable A=(a_i).

flux - Reactive flux an transition pathways (msmtools.flux)

This module contains functions to compute reactive flux networks and find dominant reaction pathways in such networks.

TPT-object
tpt(T, A, B[, mu, qminus, qplus, rate_matrix]) Computes the A->B reactive flux using transition path theory (TPT)
ReactiveFlux(A, B, flux[, mu, qminus, …]) A->B reactive flux from transition path theory (TPT)
Reactive flux
flux_matrix(T, pi, qminus, qplus[, netflux]) Compute the TPT flux network for the reaction A–>B.
to_netflux(flux) Compute the netflux from the gross flux.
flux_production(F) Returns the net flux production for all states
flux_producers(F[, rtol, atol]) Return indexes of states that are net flux producers.
flux_consumers(F[, rtol, atol]) Return indexes of states that are net flux producers.
coarsegrain(F, sets) Coarse-grains the flux to the given sets.
Reaction rates and fluxes
total_flux(F[, A]) Compute the total flux, or turnover flux, that is produced by
rate(totflux, pi, qminus) Transition rate for reaction A to B.
mfpt(totflux, pi, qminus) Mean first passage time for reaction A to B.
Pathway decomposition
pathways(F, A, B[, fraction, maxiter]) Decompose flux network into dominant reaction paths.

dtraj - Discrete trajectories functions (msmtools.dtraj)

Discrete trajectory io
read_discrete_trajectory(filename) Read discrete trajectory from ascii file.
read_dtraj(filename) Read discrete trajectory from ascii file.
write_discrete_trajectory(filename, dtraj) Write discrete trajectory to ascii file.
write_dtraj(filename, dtraj) Write discrete trajectory to ascii file.
load_discrete_trajectory(filename) Read discrete trajectory form binary file.
load_dtraj(filename) Read discrete trajectory form binary file.
save_discrete_trajectory(filename, dtraj) Write discrete trajectory to binary file.
save_dtraj(filename, dtraj) Write discrete trajectory to binary file.
Simple statistics
count_states(dtrajs[, ignore_negative]) returns a histogram count
visited_set(dtrajs) returns the set of states that have at least one count
number_of_states(dtrajs[, only_used]) returns the number of states in the given trajectories.
index_states(dtrajs[, subset]) Generates a trajectory/time indexes for the given list of states
Sampling trajectory indexes
sample_indexes_by_distribution(indexes, …) Samples trajectory/time indexes according to the given probability distributions
sample_indexes_by_state(indexes, nsample[, …]) Samples trajectory/time indexes according to the given sequence of states
sample_indexes_by_sequence(indexes, sequence) Samples trajectory/time indexes according to the given sequence of states

generation - MSM generation tools (msmtools.generation)

This module contains function to generate simple MSMs.

Metropolis-Hastings chain
transition_matrix_metropolis_1d(E[, d]) Transition matrix describing the Metropolis chain jumping between neighbors in a discrete 1D energy landscape.
generate_traj(P, N[, start, stop, dt]) Generates a realization of the Markov chain with transition matrix P.

Development

Changelog

1.2.4 (11-12-18)

Fixes: - Effective count matrix parallel evaluation fixes. #116, #117

1.2.3 (7-27-18)

New features

  • Effective count matrix can be evaluated in parallel. #112

1.2.2 (6-25-18)

New features

  • Added new transition matrix estimator which uses a primal-dual interior-point iteration scheme. #82

Fixes:

  • Fixed a corner case, in which the pathway decomposition could fail (no more paths left). #107

1.2.1 (5-16-17)

New features

  • Added fast reversible transition matrix estimation. #94

Fixes:

  • Fixed some minor issues in rate matrix estimation. #97 #98

1.2 (10-24-16)

New features:

  • Continous MSM (rate matrix) estimation

1.1.4 (9-23-16)

Fixes:

  • Fixed sparsity pattern check in transition matrix sampler. #91, thanks @fabian-paul

1.1.3 (8-10-16)

New features:

  • added documentation

Developer’s Guide

Contributing

Basic Idea

We use the “devel” branch to develop pyEMMA. When “devel” has reached a mature state in terms of current functionality and stability, we merge “devel” into the master branch. This happens at the time when a release is made.

In order to develop certain features you should not work on “devel” directly either, but rather branch it to a new personal feature branch (here you can do whatever you want). Then, after testing your feature, you can offer the changes to be merged on to the devel branch by doing a pull request (see below). If accepted, that branch will be merged into devel, and unless overridden by other changes your feature will make it eventually to master and the next release.

Why
  • Always have a tested and stable master branch.
  • Avoid interfering with other developers until changes are merged.
How

One of the package maintainers merges the development branch(es) periodically. All you need to do is to make your changes in the feature branch (see below for details), and then offer a pull request. When doing so, a bunch of automatic code tests will be run to test for direct or indirect bugs that have been introduced by the change. This is done by a continuous integration (CI) software like Jenkins http://jenkins-ci.org or Travis-CI http://travis-ci.org, the first one is open source and the second one is free for open source projects only. Again, you do not have to do anything here, as this happens automatically after a pull request. You will see the output of these tests in the pull request page on github.

Commit messages

Use commit messages in the style “[$package]: change” whenever the changes belong to one package or module. You can suppress “pyemma” (that is trivial) and “api” (which doesn’t show up in the import).

E.g.:

[msm.analysis]: implemented sparse pcca

That way other developers and the package managers immediately know which modules have been changed and can watch out for possible cross-effects. Also this makes commits look uniform.

If you have a complex commit affecting several modules or packages, break it down into little pieces with easily understandable commit messages. This allows us to go back to intermediate stages of your work if something fails at the end.

Testing

We use Pythons unittest module to write test cases for all algorithm.

To run all tests invoke:

python setup.py test

or directly invoke nosetests in pyemma working copy:

nosetests $PYEMMA_DIR

It is encouraged to run all tests (if you are changing core features), you can also run individual tests by directly invoking them with the python interpreter.

Documentation

Every function, class, and module that you write must be documented. Please check out

how to do this. In short, after every function (class, module) header, there should be a docstring enclosed by “”” … “”“, containing a short and long description what the function does, a clear description of the input parameters, and the return value, and any relevant cross-references or citations. You can include Latex-style math in the docstring.

For a deeper understanding and reference please have a look at the Sphinx documentation

In particular, the API functions (those publicly visible to the external user) should be well documented.

To build the documentation you need the dependencies from the file requirements-build-docs.txt which you can install via pip:

pip install -r requirements-build-docs.txt

afterwards you are ready to build the documentation:

cd doc
make html

The HTML docs can then be found in the doc/build/html directory.

Workflow

A developer creates a feature branch “feature” and commits his or her work to this branch. When he or she is done with his work (have written at least a working test case for it), he or she pushes this feature branch to his or her fork and creates a pull request. The pull request can then be reviewed and merged upstream.

  1. Get up to date - pull the latest changes from devel
# first get the latest changes
git pull
  1. Compile extension modules (also works with conda distributions)
python setup.py develop

In contrast to install, which copies the development version into your package directory, the develop flag results in simply putting a link from your package directory into your development directory. That way local changes to python files are immediately active when you import the package. You only need to re-execute the above command, when a C extension was changed.

  1. Create a new feature branch by copying from the devel branch and switch to it:
# switch to development branch
git checkout devel
# create new branch and switch to it
git checkout -b feature
  1. Work on your feature branch. Here you can roam freely.
  2. Write unit test and TEST IT (see above)! :-)
touch fancy_feat_test.py
# test the unit
python fancy_feat_test.py
# run the whole test-suite
# (to ensure that your newfeature has no side-effects)
cd $PYEMMA_DIR
python setup.py test
  1. Commit your changes
git commit fancy_feat.py fancy_feat_test.py \
    -m "Implementation and unit test for fancy feature"

repeat 3.-5. as often as necessary to accomplish your task. Remember to split your changes into small commits.

  1. Make changes available by pushing your commits to the server and creating a pull request
# push your branch to your fork on github
git push myfork feature

On github create a pull request from myfork/feature to origin/devel, see https://help.github.com/articles/using-pull-requests

Conclusions
  • Feature branches allow you to work without interfering with others.
  • The devel branch contains all tested implemented features.
  • The devel branch is used to test for cross-effects between features.
  • Work with pull request to ensure your changes are being tested automatically and can be reviewed.
  • The master branch contains all tested features and represents the set of features that are suitable for public usage.

Publish a new release

  1. Merge current devel branch into master
git checkout master; git merge devel
  1. Make a new tag ‘vmajor.minor.patch’. major means major release (major new functionalities), minor means minor changes and new functionalities, patch means no new functionality but just bugfixes or improvement to the docs.
git tag -m "release description" v1.1
  1. IMPORTANT: first push, then push –tags
git push; git push --tags
  1. Update conda recipes and perform binstar pushing (partially automatized)

Indices and tables