Welcome to the documentation for hsmmlearn!

hsmmlearn is a library for unsupervised learning of hidden semi-Markov models with explicit durations. It is a port of the hsmm package for R written by Jan and Ingo Bulla. In fact, the C++ code that powers hsmmlearn comes verbatim from hsmm.

hsmmlearn borrows its name and the design of its Python api from hmmlearn.

Tutorial

In this tutorial we’ll explore the two main pieces of functionality that HSMMLearn provides:

  • Viterbi decoding: given a sequence of observations, find the sequence of hidden states that maximizes the joint probability of the observations and the states.
  • Baum-Welch model inference: given a sequence of observations, find the model parameters (the transition and emission matrices, and the duration distributions) that maximize the probability of the observations given the model.

In case you are reading this as part of the Sphinx documentation, there is an IPython 4 notebook with the contents of this tutorial in the notebooks/ folder at the root of the repository.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Decoding

For this part of the tutorial we’ll use an HSMM with 3 internal states and 4 durations 1, ..., 4. We’ll stick to using Gaussian emissions throughout (i.e. the probability density of the observations given the state is a 1D Gaussian with a fixed mean and standard deviation), but discrete observations (or observations modeled on another class of PDF) can be slotted in equally easily.

The transition matrix is chosen so that there are no self-transitions (all the diagonal elements are 0). This forces the system to go to a different state at the end of each duration. This is not strictly necessary, but allows us to visualize the effect of the different durations a little better.

The duration matrix is chosen so that for state 1, there is a 90% chance of having the duration equal to 4, and 10% duration 1. For state 2 and 3, there is a 90% probability of having duration 3, resp. 2. As the duration lengths don’t differ much, this won’t be very visible, but for sparse duration distributions with many durations it does make a difference.

For the emission PDFs, we choose clearly separated Gaussians, with means 0, 5, and 10, and standard deviation equal to 1 in each case.

from hsmmlearn.hsmm import GaussianHSMM

durations = np.array([
    [0.1, 0.0, 0.0, 0.9],
    [0.1, 0.0, 0.9, 0.0],
    [0.1, 0.9, 0.0, 0.0]
])
tmat = np.array([
    [0.0, 0.5, 0.5],
    [0.3, 0.0, 0.7],
    [0.6, 0.4, 0.0]
])

means = np.array([0.0, 5.0, 10.0])
scales = np.ones_like(means)

hsmm = GaussianHSMM(
    means, scales, durations, tmat,
)

Having initialized the Gaussian HSMM, let’s sample from it to obtain a sequence of internal states and observations:

observations, states = hsmm.sample(300)
print(states[:20])
[2 2 1 1 1 2 2 0 0 0 0 2 2 1 1 1 2 2 1 1]
print(observations[:20])
[ 10.52314041   9.86235128   4.07406132   5.44132853   5.76402099
  11.43504019  10.13734856  -0.93207824   0.75532059  -1.29944693
  -0.47906227   9.9658948   10.36830046   3.4782257    4.58492773
   5.10710389   8.49856484  11.92186325   5.41462254   3.9717193 ]
fig, ax = plt.subplots(figsize=(15, 3))
ax.plot(means[states], 'r', linewidth=2, alpha=.8)
ax.plot(observations)
[<matplotlib.lines.Line2D at 0x1131efdd0>]
_images/tutorial_10_1.png

In the figure, the red line is the mean of the PDF that was selected for that internal state, and the blue line shows the observations. As is expected, the observations are clustered around the mean for each state.

Assuming now that we only have the observations, and we want to reconstruct, or decode, the most likely internal states for those observations. This can be done by means of the classical Viterbi algorithm, which has an extension for HSMMs.

decoded_states = hsmm.decode(observations)

Given that our Gaussian peaks are so clearly separated, the Viterbi decoder manages to reconstruct the entire state sequence correctly. No surprise, and not very exciting.

np.sum(states != decoded_states)  # Number of differences between the original and the decoded states
0

Things become a little more interesting when the peaks are not so well separated. In the example below, we move the mean of the second Gaussian up to 8.0 (up from 5.0), and then we sample and decode again. We also set the duration distribution to something a little more uniform, just to make things harder on the decoder (it turns out that otherwise the decoder is able to infer the internal state sequence almost completely on the basis of the inferred durations alone).

new_means = np.array([0.0, 8.0, 10.0])

hsmm.durations = np.full((3, 4), 0.25)
hsmm.means = new_means

observations, states = hsmm.sample(200)
decoded_states = hsmm.decode(observations)
np.sum(states != decoded_states)
10
fig, ax = plt.subplots(figsize=(15, 3))
ax.plot(new_means[states], 'r', linewidth=2, alpha=.8)
ax.plot(new_means[decoded_states] - 0.5, 'k', linewidth=2, alpha=.5)
ax.plot(observations)
[<matplotlib.lines.Line2D at 0x113d12d90>]
_images/tutorial_18_1.png

In the figure, the red line again shows the mean of the original sequence of internal states, while the gray line (offset by 0.5 to avoid overlapping with the rest of the plot) shows the reconstructed sequence. They track each other pretty faithfully, except in areas where the observations give not much information about the internal state.

Aside: different emission PDFs

In the previous example, we worked throughout with an instance of the class GaussianHSMM. This is a convenience wrapper around a more general class hsmmlearn.hsmm.HSMMModel, which allows for more general emission PDFs via Python descriptors. To see how this works, let’s first re-instantiate our Gaussian HSMM via HSMMModel:

from hsmmlearn.hsmm import HSMMModel
from hsmmlearn.emissions import GaussianEmissions

gaussian_hsmm = HSMMModel(
    GaussianEmissions(means, scales), durations, tmat
)

This class can be used in much the same way as the more practical GaussianHSMM. However, it lacks some convenience attributes (.means, .scales, ...) that GaussianHSMM does have.

To create an HSMM with a different class of emission PDFs, it suffices to derive from hsmmlearn.emissions.AbstractEmissions and supply the required functionality there. This is an abstract base class with a couple of abstract methods that need to be overridden in the concrete class. If you require only some of the functionality, you can override the methods that you don’t need with an empty function. To see this in practice, let’s create an HSMM with Laplace emission PDFs. Below, we override sample_for_state because we want to sample from the HSMM, and likelihood, needed to run Viterbi (not demonstrated).

from scipy.stats import laplace

from hsmmlearn.emissions import AbstractEmissions

# Note: this is almost identical to the GaussianEmissions class,
# the only difference being that we replaced the Gaussian RV (norm)
# with a Laplacian RV (laplace).
class LaplaceEmissions(AbstractEmissions):

    # Note: this property is a hack, and will become unnecessary soon!
    dtype = np.float64

    def __init__(self, means, scales):
        self.means = means
        self.scales = scales

    def likelihood(self, obs):
        obs = np.squeeze(obs)
        # TODO: build in some check for the shape of the likelihoods, otherwise
        # this will silently fail and give the wrong results.
        return laplace.pdf(obs,
                           loc=self.means[:, np.newaxis],
                           scale=self.scales[:, np.newaxis])

    def sample_for_state(self, state, size=None):
        return laplace.rvs(self.means[state], self.scales[state], size)
laplace_hsmm = HSMMModel(
    LaplaceEmissions(means, scales), durations, tmat
)
observations, states = laplace_hsmm.sample(10000)

Let’s check that this defines indeed an HSMM with Laplacian output PDFs:

state0_mask = states == 0
observations_state0 = observations[state0_mask]

fig, ax = plt.subplots()
_ = ax.hist(observations_state0, bins=40, normed=True)
_images/tutorial_27_0.png

Looks indeed like a Laplacian distribution!

Model inference

In the next section, we’ll tackle the “third question” outlined by Rabiner: given a sequence of observed data, what is the “most likely” model that could have given rise to these observations? To achieve this, we’ll employ an iterative procedure known as the expectation maximization algorithm, which adjusts the transition/emission/duration data to generate the optimal model.

We start by sampling from a Gaussian HSMM that has three states, with very clearly separated durations.

from hsmmlearn.hsmm import GaussianHSMM

durations = np.zeros((3, 10)) # XXXX
durations[:, :] = 0.05
durations[0, 1] = durations[1, 5] = durations[2, 9] = 0.55

tmat = np.array([
    [0.0, 0.5, 0.5],
    [0.3, 0.0, 0.7],
    [0.6, 0.4, 0.0]
])

means = np.array([0.0, 5.0, 10.0])
scales = np.ones_like(means)

hsmm = GaussianHSMM(
    means, scales, durations, tmat,
)
hsmm.durations
array([[ 0.05,  0.55,  0.05,  0.05,  0.05,  0.05,  0.05,  0.05,  0.05,
         0.05],
       [ 0.05,  0.05,  0.05,  0.05,  0.05,  0.55,  0.05,  0.05,  0.05,
         0.05],
       [ 0.05,  0.05,  0.05,  0.05,  0.05,  0.05,  0.05,  0.05,  0.05,
         0.55]])
observations, states = hsmm.sample(200)
fig, ax = plt.subplots(figsize=(15, 3))
ax.plot(means[states], 'r', linewidth=2, alpha=.8)
ax.plot(observations)
[<matplotlib.lines.Line2D at 0x113f3bf10>]
_images/tutorial_33_1.png

The plot shows clearly that the durations are separated: state 0, with mean 0.0, has very short durations, while states 1 and 2 have much longer durations.

Having sampled from the HSMM, we’ll now forget our duration distribution and set up a new HSMM with a flat duration distribution.

equal_prob_durations = np.full((3, 10), 0.1)
new_hsmm = GaussianHSMM(
    means, scales, equal_prob_durations, tmat,
)
equal_prob_durations
array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1],
       [ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1],
       [ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]])

Using the .fit method, we’ll run the expectation-maximization algorithm on our new duration-agnostic HSMM. This will adjust the parameters of the HSMM in place, to best match the given observations.

# Fit returns a bool to indicate whether the EM algorithm converged,
# and the log-likelihood after the adjustments are made.
new_hsmm.fit(observations)
(True, -352.16412850351435)

If we examine the adjusted duration distribution, we see that this reproduces to some extent the original distribution, with very pronounced probabilities for duration 2 in state 0, duration 6 in state 1, and duration 10 in state 2.

np.set_printoptions(precision=1)
print new_hsmm.durations
np.set_printoptions()
[[  7.1e-35   5.4e-01   1.2e-31   1.5e-01   4.4e-39   2.1e-01   7.2e-08
    1.0e-01   5.9e-08   5.9e-08]
 [  1.0e-01   5.5e-49   3.1e-49   3.2e-51   1.0e-01   5.0e-01   1.2e-18
    1.0e-01   1.0e-01   1.0e-01]
 [  3.7e-44   9.0e-56   5.8e-41   2.5e-01   8.4e-02   8.3e-02   8.3e-02
    3.8e-57   8.3e-02   4.2e-01]]

In tandem with the duration distributions, the other parameters have also changed. The transition matrix has become a little more pronounced to emphasize the transitions between different states, and the locations and scales of the emission PDFs have shifted a bit.

np.set_printoptions(precision=1)
print 'New transition matrices:'
print new_hsmm.tmat
print 'New emission parameters:'
print 'Means:', new_hsmm.means
print 'Scales:', new_hsmm.scales
np.set_printoptions()
New transition matrices:
[[ 0.   0.4  0.6]
 [ 0.5  0.   0.5]
 [ 0.6  0.4  0. ]]
New emission parameters:
Means: [ -0.2   5.1  10.1]
Scales: [ 1.2  0.9  0.9]

API reference

hsmm module

class hsmmlearn.hsmm.HSMMModel(emissions, durations, tmat, startprob=None, support_cutoff=100)

HSMM model base class.

This class provides the basic functionality to work with hidden semi-Markov models:

  1. Sampling from a HSMM via the sample() method;
  2. Fitting the parameters of a HSMM to a sequence of observations via the fit() method.

These methods are agnostic as to what HSMM is being used: the specifics of the model are all encapsulated in the properties tmat, emissions, and durations, which control the transition matrix, the emission distribution, and the duration distribution. In particular, the emission distribution can be discrete or continuous.

__init__(emissions, durations, tmat, startprob=None, support_cutoff=100)

Create a new HSMM instance.

Parameters:
  • emissions (hsmmlearn.emissions.AbstractEmissions) – Emissions distribution to use.
  • durations (numpy.ndarray or list of random variables.) – Durations matrix with shape (n_states, n_durations). If a list of random variables is passed in, all RVs must be subclasses of scipy.stats.rv_discrete. In this case, the duration probabilities are obtained from the support of the RVs, from 1 to support_cutoff.
  • tmat (numpy.ndarray, shape=(n_states, n_states)) – Transition matrix.
  • startprob (numpy.ndarray, shape=(n_states, )) – Initial probabilities for the Markov chain. If this is None, the uniform distribution is assumed (all states are equally likely).
  • support_cutoff (int) – Maximal duration to take into account. This is used when passing in a list of random variables for the durations, which will then be sampled from 1 to support_cutoff.
decode(obs)

Find most likely internal states for a sequence of observations.

Given a sequence of observations, this method runs the Viterbi algorithm to find the most likely sequence of corresponding internal states. “Most likely” should be interpreted here (as in the classical Viterbi algorithm) as the sequence of states that maximizes the joint probability P(observations, states).

Parameters:obs (numpy.ndarray, shape=(n_obs, )) – Observations.
Returns:states – Reconstructed internal states.
Return type:numpy.ndarray, shape=(n_obs, )
fit(obs, max_iter=20, atol=1e-05, censoring=True)

Fit the parameters of a HSMM to a given sequence of observations.

This method runs the expectation-maximization algorithm to adjust the parameters of the HSMM to fit a given sequence of observations as well as possible. The HSMM will be updated in place, unless an error occurs, in which case the original HSMM is restored.

Parameters:
  • obs (numpy.ndarray, shape=(n_samples, )) – Sequence of observations.
  • max_iter (int) – Maximum number of EM steps to do before terminating.
  • atol (float) – Absolute tolerance to decide whether the EM algorithm has converged.
  • censoring (bool) – Whether to apply right-censoring.
Raises:

NoConvergenceError – If the algorithm terminated abnormally before converging.

n_durations

The number of durations for this HSMM.

n_states

The number of hidden states for this HSMM.

sample(n_samples=1)

Generate a random sample from the HSMM.

Parameters:n_samples (int) – Number of samples to generate.
Returns:observations, states – Random sample of observations and internal states.
Return type:numpy.ndarray, shape=(n_samples, )

emissions module

class hsmmlearn.emissions.AbstractEmissions

Base class for emissions distributions.

To create a HSMM with a custom emission distribution, write a derived class that implements some (or all) of the abstract methods. If you don’t need all of the HSMM functionality, you can get by with implementing only some of the methods.

copy()

Make a copy of this object.

This method is called by hsmmlearn.hsmm.HSMMModel.fit() to make a copy of the emissions object before modifying it.

likelihood(obs)

Compute the likelihood of a sequence of observations.

This method is called by hsmmlearn.hsmm.HSMMModel.fit() and hsmmlearn.hsmm.HSMMModel.decode().

Parameters:obs (numpy.ndarray, shape=(n_obs, )) – Sequence of observations.
Returns:likelihood
Return type:float
reestimate(gamma, obs)

Estimate the distribution parameters given sequences of smoothed probabilities and observations.

The parameter gamma is an array of smoothed probabilities, with the entry gamma[s, i] giving the probability of finding the system in state s given all of the observations up to index i:

\[\gamma_{s, i} = P(s | o_1, \ldots, o_i ).\]

This method is called by hsmmlearn.hsmm.HSMMModel.fit().

Parameters:
  • gamma (numpy.ndarray, shape=(n_obs, )) – Smoothed probabilities.
  • obs (numpy.ndarray, shape=(n_obs, )) – Observations.
sample_for_state(state, size=None)

Return a random emission given a state.

This method is called by hsmmlearn.hsmm.HSMMModel.sample().

Parameters:
  • state (int) – The internal state.
  • size (int) – The number of random samples to generate.
Returns:

observations – Random emissions.

Return type:

numpy.ndarray, shape=(size, )

class hsmmlearn.emissions.GaussianEmissions(means, scales)

An emissions class for Gaussian emissions.

This emissions class models the case where emissions are real-valued and continuous, and the probability of observing an emission given the state is modeled by a Gaussian. The means and standard deviations for each Gaussian (one for each state) are stored as state on the class.

class hsmmlearn.emissions.MultinomialEmissions(probabilities)

An emissions class for multinomial emissions.

This emissions class models the case where the emissions are categorical variables, assuming values from 0 to some value k, and the probability of observing an emission given a state is modeled by a multinomial distribution.

properties module

class hsmmlearn.properties.Durations

Data descriptor for a durations distribution.

__get__(obj, type=None)

Return the durations distribution.

Returns:durations – Durations matrix.
Return type:numpy.ndarray, shape=(n_states, n_durations)
__set__(obj, durations)

Update the durations distribution with new durations.

Parameters:
  • obj (hsmmlearn.hsmm.HSMMModel) – The underlying HSMMModel.
  • durations (numpy.ndarray or list of random variables.) – This can be either a numpy array, in which case the number of rows must be equal to the number of hidden states, or a list of scipy.stats discrete random variables. In the latter case, the duration probabilities are obtained directly from the random variables.
class hsmmlearn.properties.Emissions

Data descriptor for an emissions distribution.

__get__(obj, type=None)

Return the emissions distribution.

Returns:emissions – The emissions distribution.
Return type:hsmmlearn.emissions.AbstractEmissions
__set__(obj, emissions)

Set the emissions distribution.

Parameters:
class hsmmlearn.properties.TransitionMatrix

Data descriptor for a transitions matrix.

__get__(obj, type=None)

Return the transition matrix.

Returns:tmat – A transition matrix.
Return type:numpy.ndarray, shape=(n_states, n_states)
__set__(obj, value)

Set a new transition matrix.

Parameters:
  • obj (hsmmlearn.hsmm.HSMMModel) – The underlying HSMMModel.
  • value (numpy.ndarray, shape=(n_states, n_states)) – The new transition matrix. This must be a square matrix. If a transition matrix was previously assigned to the HSMM, the new transition matrix must have the same number of rows.

Indices and tables