public class GaussianMixture
extends java.lang.Object
implements scala.Serializable
This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated "mixing" weights specifying each's contribution to the composite.
Given a set of sample points, this class will maximize the log-likelihood for a mixture of k Gaussians, iterating until the log-likelihood changes by less than convergenceTol, or until it has reached the max number of iterations. While this process is generally guaranteed to converge, it is not guaranteed to find a global optimum.
Note: For high-dimensional data (with many features), this algorithm may perform poorly. This is due to high-dimensional data (a) making it difficult to cluster at all (based on statistical/theoretical arguments) and (b) numerical issues with Gaussian distributions.
param: k The number of independent Gaussians in the mixture model param: convergenceTol The maximum change in log-likelihood at which convergence is considered to have occurred. param: maxIterations The maximum number of iterations to perform
Constructor and Description |
---|
GaussianMixture()
Constructs a default instance.
|
Modifier and Type | Method and Description |
---|---|
double |
getConvergenceTol()
Return the largest change in log-likelihood at which convergence is
considered to have occurred.
|
scala.Option<GaussianMixtureModel> |
getInitialModel()
Return the user supplied initial GMM, if supplied
|
int |
getK()
Return the number of Gaussians in the mixture model
|
int |
getMaxIterations()
Return the maximum number of iterations to run
|
long |
getSeed()
Return the random seed
|
GaussianMixtureModel |
run(JavaRDD<Vector> data)
Java-friendly version of
run() |
GaussianMixtureModel |
run(RDD<Vector> data)
Perform expectation maximization
|
GaussianMixture |
setConvergenceTol(double convergenceTol)
Set the largest change in log-likelihood at which convergence is
considered to have occurred.
|
GaussianMixture |
setInitialModel(GaussianMixtureModel model)
Set the initial GMM starting point, bypassing the random initialization.
|
GaussianMixture |
setK(int k)
Set the number of Gaussians in the mixture model.
|
GaussianMixture |
setMaxIterations(int maxIterations)
Set the maximum number of iterations to run.
|
GaussianMixture |
setSeed(long seed)
Set the random seed
|
static boolean |
shouldDistributeGaussians(int k,
int d)
Heuristic to distribute the computation of the
MultivariateGaussian s, approximately when
d > 25 except for when k is very small. |
public GaussianMixture()
public static boolean shouldDistributeGaussians(int k, int d)
MultivariateGaussian
s, approximately when
d > 25 except for when k is very small.k
- Number of topicsd
- Number of featurespublic GaussianMixture setInitialModel(GaussianMixtureModel model)
model
- (undocumented)public scala.Option<GaussianMixtureModel> getInitialModel()
public GaussianMixture setK(int k)
k
- (undocumented)public int getK()
public GaussianMixture setMaxIterations(int maxIterations)
maxIterations
- (undocumented)public int getMaxIterations()
public GaussianMixture setConvergenceTol(double convergenceTol)
convergenceTol
- (undocumented)public double getConvergenceTol()
public GaussianMixture setSeed(long seed)
seed
- (undocumented)public long getSeed()
public GaussianMixtureModel run(RDD<Vector> data)
data
- (undocumented)public GaussianMixtureModel run(JavaRDD<Vector> data)
run()
data
- (undocumented)