Bayesian nonparametrics Package¶
The BNP package integrates a wide range of state-of-the-art Bayesian nonparametric models. In particular:
- Dirichlet Process Mixture Models
- Hierarchical Dirichlet Process Mixture Models
- Factor analysis Models (e.g. Variable Clustering Model)
Contents:
Getting Started¶
Installation¶
The BNP package is currently not available through the Julia package system but can easily installed by running Pkg.clone("https://github.com/trappmartin/BNP.jl")
.
Clustering data using Dirichlet Process Mixture Model¶
In this example we start by drawing 100 observations from two bivariate Normal distributions.
julia> X = cat(2, rand(2, 50), rand(2, 50) + 10)
julia> Y = cat(2, zeros(50), ones(50))
Now we can initialize the package and construct a Gaussian data distribution using a Normal Inverse Wishart prior.
julia> using BNP
julia> μ0 = vec( mean(X, 2) )
julia> κ0 = 1.0
julia> ν0 = 4.0
julia> Ψ = eye(2) * 10
julia> G0 = GaussianWishart(μ0, κ0, ν0, Ψ)
After constructing G0 we can easily apply a Dirichlet Process Mixture Model using collapsed Gibbs sampling.
julia> models = train(DPM(G0), Gibbs(), KMeansInitialisation(), X)
Please note that this example can also be found in the demos folder, allowing interactive exploration of the model.
Initialization Methods¶
In order to initialize the Bayesian nonparametric models we provide a set of initialization approaches. Currently not every initialization approach is available for all models.
Random Initialization¶
The Random Initialization randomly assigns the data to a predefined number of groups.
julia> init = RandomInitialisation() # Random Initialization with k = 2
julia> init = RandomInitialisation(k = 5) # Random Initialization with k = 5
Incremental Initialization¶
The Incremental Initialization sequentially assigns the data to groups.
julia> init = IncrementalInitialisation() # Incremental Initialization k = 5
K-Means Initialization¶
The K-Means Initialization assigns the data using k-Means clustering to a predefined number of groups.
julia> init = KMeansInitialisation() # K-Means Initialisation with k = 2
julia> init = KMeansInitialisation(k = 5) # K-Means Initialisation with k = 5
Distributions¶
The following distributions are currently supported. We will add additional support for the Distributions package in near future.
Common Interface¶
A common interface to access the sufficient statistics and the log likelihood is provided for all distributions.
julia> add_data!(dist, X) # add datum to dist
julia> dist2 = add_data(dist, X) # add datum to copy of dist
julia> remove_data!(dist, X) # remove datum from dist
julia> dist2 = remove_data(dist, X) # remove datum from copy of dist
julia> logpred(dist, X) # log likelihood datum under dist
Beta-Binomial¶
The Binomial distribution with Beta prior of dimensionality D can be created using:
julia> dist = BinomialBeta(D) # with default α = 1.0 and β = 1.0
julia> dist = BinomialBeta(D, α = 3, β = 4) # specify α and β parameter of Beta distribution
Dirichlet-Multinomial¶
The Multinomial distribution with Dirichlet prior of dimensionality D can be created using:
julia> dist = MultinomialDirichlet(D, 1.0) # with default α = 1.0
Wishart-Gaussian¶
The Gaussian distribution with Wishart prior of dimensionality D can be created using:
julia> dist = GaussianWishart(μ, κ, ν, Ψ) # with specified μ of dimensionality D, κ, ν and Ψ of dimensionality D x D