Welcome to the documentation for hsmmlearn!¶
hsmmlearn
is a library for unsupervised learning of hidden semi-Markov
models with explicit durations. It is a port of the hsmm package for R written
by Jan and Ingo Bulla. In fact, the C++ code that powers hsmmlearn comes
verbatim from hsmm.
hsmmlearn
borrows its name and the design of its Python api from hmmlearn.
Tutorial¶
In this tutorial we’ll explore the two main pieces of functionality that
HSMMLearn
provides:
- Viterbi decoding: given a sequence of observations, find the sequence of hidden states that maximizes the joint probability of the observations and the states.
- Baum-Welch model inference: given a sequence of observations, find the model parameters (the transition and emission matrices, and the duration distributions) that maximize the probability of the observations given the model.
In case you are reading this as part of the Sphinx documentation, there
is an IPython 4 notebook with the contents of this tutorial in the
notebooks/
folder at the root of the repository.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Decoding¶
For this part of the tutorial we’ll use an HSMM with 3 internal states and 4 durations 1, ..., 4. We’ll stick to using Gaussian emissions throughout (i.e. the probability density of the observations given the state is a 1D Gaussian with a fixed mean and standard deviation), but discrete observations (or observations modeled on another class of PDF) can be slotted in equally easily.
The transition matrix is chosen so that there are no self-transitions (all the diagonal elements are 0). This forces the system to go to a different state at the end of each duration. This is not strictly necessary, but allows us to visualize the effect of the different durations a little better.
The duration matrix is chosen so that for state 1, there is a 90% chance of having the duration equal to 4, and 10% duration 1. For state 2 and 3, there is a 90% probability of having duration 3, resp. 2. As the duration lengths don’t differ much, this won’t be very visible, but for sparse duration distributions with many durations it does make a difference.
For the emission PDFs, we choose clearly separated Gaussians, with means 0, 5, and 10, and standard deviation equal to 1 in each case.
from hsmmlearn.hsmm import GaussianHSMM
durations = np.array([
[0.1, 0.0, 0.0, 0.9],
[0.1, 0.0, 0.9, 0.0],
[0.1, 0.9, 0.0, 0.0]
])
tmat = np.array([
[0.0, 0.5, 0.5],
[0.3, 0.0, 0.7],
[0.6, 0.4, 0.0]
])
means = np.array([0.0, 5.0, 10.0])
scales = np.ones_like(means)
hsmm = GaussianHSMM(
means, scales, durations, tmat,
)
Having initialized the Gaussian HSMM, let’s sample from it to obtain a sequence of internal states and observations:
observations, states = hsmm.sample(300)
print(states[:20])
[2 2 1 1 1 2 2 0 0 0 0 2 2 1 1 1 2 2 1 1]
print(observations[:20])
[ 10.52314041 9.86235128 4.07406132 5.44132853 5.76402099
11.43504019 10.13734856 -0.93207824 0.75532059 -1.29944693
-0.47906227 9.9658948 10.36830046 3.4782257 4.58492773
5.10710389 8.49856484 11.92186325 5.41462254 3.9717193 ]
fig, ax = plt.subplots(figsize=(15, 3))
ax.plot(means[states], 'r', linewidth=2, alpha=.8)
ax.plot(observations)
[<matplotlib.lines.Line2D at 0x1131efdd0>]

In the figure, the red line is the mean of the PDF that was selected for that internal state, and the blue line shows the observations. As is expected, the observations are clustered around the mean for each state.
Assuming now that we only have the observations, and we want to reconstruct, or decode, the most likely internal states for those observations. This can be done by means of the classical Viterbi algorithm, which has an extension for HSMMs.
decoded_states = hsmm.decode(observations)
Given that our Gaussian peaks are so clearly separated, the Viterbi decoder manages to reconstruct the entire state sequence correctly. No surprise, and not very exciting.
np.sum(states != decoded_states) # Number of differences between the original and the decoded states
0
Things become a little more interesting when the peaks are not so well separated. In the example below, we move the mean of the second Gaussian up to 8.0 (up from 5.0), and then we sample and decode again. We also set the duration distribution to something a little more uniform, just to make things harder on the decoder (it turns out that otherwise the decoder is able to infer the internal state sequence almost completely on the basis of the inferred durations alone).
new_means = np.array([0.0, 8.0, 10.0])
hsmm.durations = np.full((3, 4), 0.25)
hsmm.means = new_means
observations, states = hsmm.sample(200)
decoded_states = hsmm.decode(observations)
np.sum(states != decoded_states)
10
fig, ax = plt.subplots(figsize=(15, 3))
ax.plot(new_means[states], 'r', linewidth=2, alpha=.8)
ax.plot(new_means[decoded_states] - 0.5, 'k', linewidth=2, alpha=.5)
ax.plot(observations)
[<matplotlib.lines.Line2D at 0x113d12d90>]

In the figure, the red line again shows the mean of the original sequence of internal states, while the gray line (offset by 0.5 to avoid overlapping with the rest of the plot) shows the reconstructed sequence. They track each other pretty faithfully, except in areas where the observations give not much information about the internal state.
Aside: different emission PDFs¶
In the previous example, we worked throughout with an instance of the
class GaussianHSMM
. This is a convenience wrapper around a more
general class hsmmlearn.hsmm.HSMMModel
, which allows for more
general emission PDFs via Python descriptors. To see how this works,
let’s first re-instantiate our Gaussian HSMM via HSMMModel
:
from hsmmlearn.hsmm import HSMMModel
from hsmmlearn.emissions import GaussianEmissions
gaussian_hsmm = HSMMModel(
GaussianEmissions(means, scales), durations, tmat
)
This class can be used in much the same way as the more practical
GaussianHSMM
. However, it lacks some convenience attributes
(.means
, .scales
, ...) that GaussianHSMM
does have.
To create an HSMM with a different class of emission PDFs, it suffices
to derive from hsmmlearn.emissions.AbstractEmissions
and supply the
required functionality there. This is an abstract base class with a
couple of abstract methods that need to be overridden in the concrete
class. If you require only some of the functionality, you can override
the methods that you don’t need with an empty function. To see this in
practice, let’s create an HSMM with Laplace emission PDFs. Below, we
override sample_for_state
because we want to sample from the HSMM,
and likelihood
, needed to run Viterbi (not demonstrated).
from scipy.stats import laplace
from hsmmlearn.emissions import AbstractEmissions
# Note: this is almost identical to the GaussianEmissions class,
# the only difference being that we replaced the Gaussian RV (norm)
# with a Laplacian RV (laplace).
class LaplaceEmissions(AbstractEmissions):
# Note: this property is a hack, and will become unnecessary soon!
dtype = np.float64
def __init__(self, means, scales):
self.means = means
self.scales = scales
def likelihood(self, obs):
obs = np.squeeze(obs)
# TODO: build in some check for the shape of the likelihoods, otherwise
# this will silently fail and give the wrong results.
return laplace.pdf(obs,
loc=self.means[:, np.newaxis],
scale=self.scales[:, np.newaxis])
def sample_for_state(self, state, size=None):
return laplace.rvs(self.means[state], self.scales[state], size)
laplace_hsmm = HSMMModel(
LaplaceEmissions(means, scales), durations, tmat
)
observations, states = laplace_hsmm.sample(10000)
Let’s check that this defines indeed an HSMM with Laplacian output PDFs:
state0_mask = states == 0
observations_state0 = observations[state0_mask]
fig, ax = plt.subplots()
_ = ax.hist(observations_state0, bins=40, normed=True)

Looks indeed like a Laplacian distribution!
Model inference¶
In the next section, we’ll tackle the “third question” outlined by Rabiner: given a sequence of observed data, what is the “most likely” model that could have given rise to these observations? To achieve this, we’ll employ an iterative procedure known as the expectation maximization algorithm, which adjusts the transition/emission/duration data to generate the optimal model.
We start by sampling from a Gaussian HSMM that has three states, with very clearly separated durations.
from hsmmlearn.hsmm import GaussianHSMM
durations = np.zeros((3, 10)) # XXXX
durations[:, :] = 0.05
durations[0, 1] = durations[1, 5] = durations[2, 9] = 0.55
tmat = np.array([
[0.0, 0.5, 0.5],
[0.3, 0.0, 0.7],
[0.6, 0.4, 0.0]
])
means = np.array([0.0, 5.0, 10.0])
scales = np.ones_like(means)
hsmm = GaussianHSMM(
means, scales, durations, tmat,
)
hsmm.durations
array([[ 0.05, 0.55, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05,
0.05],
[ 0.05, 0.05, 0.05, 0.05, 0.05, 0.55, 0.05, 0.05, 0.05,
0.05],
[ 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05,
0.55]])
observations, states = hsmm.sample(200)
fig, ax = plt.subplots(figsize=(15, 3))
ax.plot(means[states], 'r', linewidth=2, alpha=.8)
ax.plot(observations)
[<matplotlib.lines.Line2D at 0x113f3bf10>]

The plot shows clearly that the durations are separated: state 0, with mean 0.0, has very short durations, while states 1 and 2 have much longer durations.
Having sampled from the HSMM, we’ll now forget our duration distribution and set up a new HSMM with a flat duration distribution.
equal_prob_durations = np.full((3, 10), 0.1)
new_hsmm = GaussianHSMM(
means, scales, equal_prob_durations, tmat,
)
equal_prob_durations
array([[ 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
[ 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
[ 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]])
Using the .fit
method, we’ll run the expectation-maximization
algorithm on our new duration-agnostic HSMM. This will adjust the
parameters of the HSMM in place, to best match the given observations.
# Fit returns a bool to indicate whether the EM algorithm converged,
# and the log-likelihood after the adjustments are made.
new_hsmm.fit(observations)
(True, -352.16412850351435)
If we examine the adjusted duration distribution, we see that this reproduces to some extent the original distribution, with very pronounced probabilities for duration 2 in state 0, duration 6 in state 1, and duration 10 in state 2.
np.set_printoptions(precision=1)
print new_hsmm.durations
np.set_printoptions()
[[ 7.1e-35 5.4e-01 1.2e-31 1.5e-01 4.4e-39 2.1e-01 7.2e-08
1.0e-01 5.9e-08 5.9e-08]
[ 1.0e-01 5.5e-49 3.1e-49 3.2e-51 1.0e-01 5.0e-01 1.2e-18
1.0e-01 1.0e-01 1.0e-01]
[ 3.7e-44 9.0e-56 5.8e-41 2.5e-01 8.4e-02 8.3e-02 8.3e-02
3.8e-57 8.3e-02 4.2e-01]]
In tandem with the duration distributions, the other parameters have also changed. The transition matrix has become a little more pronounced to emphasize the transitions between different states, and the locations and scales of the emission PDFs have shifted a bit.
np.set_printoptions(precision=1)
print 'New transition matrices:'
print new_hsmm.tmat
print 'New emission parameters:'
print 'Means:', new_hsmm.means
print 'Scales:', new_hsmm.scales
np.set_printoptions()
New transition matrices:
[[ 0. 0.4 0.6]
[ 0.5 0. 0.5]
[ 0.6 0.4 0. ]]
New emission parameters:
Means: [ -0.2 5.1 10.1]
Scales: [ 1.2 0.9 0.9]
API reference¶
hsmm
module¶
-
class
hsmmlearn.hsmm.
HSMMModel
(emissions, durations, tmat, startprob=None, support_cutoff=100)¶ HSMM model base class.
This class provides the basic functionality to work with hidden semi-Markov models:
- Sampling from a HSMM via the
sample()
method; - Fitting the parameters of a HSMM to a sequence of observations via the
fit()
method.
These methods are agnostic as to what HSMM is being used: the specifics of the model are all encapsulated in the properties tmat, emissions, and durations, which control the transition matrix, the emission distribution, and the duration distribution. In particular, the emission distribution can be discrete or continuous.
-
__init__
(emissions, durations, tmat, startprob=None, support_cutoff=100)¶ Create a new HSMM instance.
Parameters: - emissions (hsmmlearn.emissions.AbstractEmissions) – Emissions distribution to use.
- durations (numpy.ndarray or list of random variables.) – Durations matrix with shape (n_states, n_durations). If a list of
random variables is passed in, all RVs must be subclasses of
scipy.stats.rv_discrete
. In this case, the duration probabilities are obtained from the support of the RVs, from 1 to support_cutoff. - tmat (numpy.ndarray, shape=(n_states, n_states)) – Transition matrix.
- startprob (numpy.ndarray, shape=(n_states, )) – Initial probabilities for the Markov chain. If this is
None
, the uniform distribution is assumed (all states are equally likely). - support_cutoff (int) – Maximal duration to take into account. This is used when passing in
a list of random variables for the durations, which will then be
sampled from 1 to
support_cutoff
.
-
decode
(obs)¶ Find most likely internal states for a sequence of observations.
Given a sequence of observations, this method runs the Viterbi algorithm to find the most likely sequence of corresponding internal states. “Most likely” should be interpreted here (as in the classical Viterbi algorithm) as the sequence of states that maximizes the joint probability P(observations, states).
Parameters: obs (numpy.ndarray, shape=(n_obs, )) – Observations. Returns: states – Reconstructed internal states. Return type: numpy.ndarray, shape=(n_obs, )
-
fit
(obs, max_iter=20, atol=1e-05, censoring=True)¶ Fit the parameters of a HSMM to a given sequence of observations.
This method runs the expectation-maximization algorithm to adjust the parameters of the HSMM to fit a given sequence of observations as well as possible. The HSMM will be updated in place, unless an error occurs, in which case the original HSMM is restored.
Parameters: - obs (numpy.ndarray, shape=(n_samples, )) – Sequence of observations.
- max_iter (int) – Maximum number of EM steps to do before terminating.
- atol (float) – Absolute tolerance to decide whether the EM algorithm has converged.
- censoring (bool) – Whether to apply right-censoring.
Raises: NoConvergenceError
– If the algorithm terminated abnormally before converging.
-
n_durations
¶ The number of durations for this HSMM.
-
n_states
¶ The number of hidden states for this HSMM.
-
sample
(n_samples=1)¶ Generate a random sample from the HSMM.
Parameters: n_samples (int) – Number of samples to generate. Returns: observations, states – Random sample of observations and internal states. Return type: numpy.ndarray, shape=(n_samples, )
- Sampling from a HSMM via the
emissions
module¶
-
class
hsmmlearn.emissions.
AbstractEmissions
¶ Base class for emissions distributions.
To create a HSMM with a custom emission distribution, write a derived class that implements some (or all) of the abstract methods. If you don’t need all of the HSMM functionality, you can get by with implementing only some of the methods.
-
copy
()¶ Make a copy of this object.
This method is called by
hsmmlearn.hsmm.HSMMModel.fit()
to make a copy of the emissions object before modifying it.
-
likelihood
(obs)¶ Compute the likelihood of a sequence of observations.
This method is called by
hsmmlearn.hsmm.HSMMModel.fit()
andhsmmlearn.hsmm.HSMMModel.decode()
.Parameters: obs (numpy.ndarray, shape=(n_obs, )) – Sequence of observations. Returns: likelihood Return type: float
-
reestimate
(gamma, obs)¶ Estimate the distribution parameters given sequences of smoothed probabilities and observations.
The parameter
gamma
is an array of smoothed probabilities, with the entrygamma[s, i]
giving the probability of finding the system in states
given all of the observations up to indexi
:\[\gamma_{s, i} = P(s | o_1, \ldots, o_i ).\]This method is called by
hsmmlearn.hsmm.HSMMModel.fit()
.Parameters: - gamma (numpy.ndarray, shape=(n_obs, )) – Smoothed probabilities.
- obs (numpy.ndarray, shape=(n_obs, )) – Observations.
-
sample_for_state
(state, size=None)¶ Return a random emission given a state.
This method is called by
hsmmlearn.hsmm.HSMMModel.sample()
.Parameters: - state (int) – The internal state.
- size (int) – The number of random samples to generate.
Returns: observations – Random emissions.
Return type: numpy.ndarray, shape=(size, )
-
-
class
hsmmlearn.emissions.
GaussianEmissions
(means, scales)¶ An emissions class for Gaussian emissions.
This emissions class models the case where emissions are real-valued and continuous, and the probability of observing an emission given the state is modeled by a Gaussian. The means and standard deviations for each Gaussian (one for each state) are stored as state on the class.
-
class
hsmmlearn.emissions.
MultinomialEmissions
(probabilities)¶ An emissions class for multinomial emissions.
This emissions class models the case where the emissions are categorical variables, assuming values from 0 to some value k, and the probability of observing an emission given a state is modeled by a multinomial distribution.
properties
module¶
-
class
hsmmlearn.properties.
Durations
¶ Data descriptor for a durations distribution.
-
__get__
(obj, type=None)¶ Return the durations distribution.
Returns: durations – Durations matrix. Return type: numpy.ndarray, shape=(n_states, n_durations)
-
__set__
(obj, durations)¶ Update the durations distribution with new durations.
Parameters: - obj (hsmmlearn.hsmm.HSMMModel) – The underlying HSMMModel.
- durations (numpy.ndarray or list of random variables.) – This can be either a numpy array, in which case the number of rows must be equal to the number of hidden states, or a list of scipy.stats discrete random variables. In the latter case, the duration probabilities are obtained directly from the random variables.
-
-
class
hsmmlearn.properties.
Emissions
¶ Data descriptor for an emissions distribution.
-
__get__
(obj, type=None)¶ Return the emissions distribution.
Returns: emissions – The emissions distribution. Return type: hsmmlearn.emissions.AbstractEmissions
-
__set__
(obj, emissions)¶ Set the emissions distribution.
Parameters: - obj (hsmmlearn.hsmm.HSMMModel) – The underlying HSMMModel.
- emissions (hsmmlearn.emissions.AbstractEmissions) – The emissions distribution.
-
-
class
hsmmlearn.properties.
TransitionMatrix
¶ Data descriptor for a transitions matrix.
-
__get__
(obj, type=None)¶ Return the transition matrix.
Returns: tmat – A transition matrix. Return type: numpy.ndarray, shape=(n_states, n_states)
-
__set__
(obj, value)¶ Set a new transition matrix.
Parameters: - obj (hsmmlearn.hsmm.HSMMModel) – The underlying HSMMModel.
- value (numpy.ndarray, shape=(n_states, n_states)) – The new transition matrix. This must be a square matrix. If a transition matrix was previously assigned to the HSMM, the new transition matrix must have the same number of rows.
-