public class KMeans
extends java.lang.Object
implements scala.Serializable
This is an iterative algorithm that will make multiple passes over the data, so any RDDs given to it should be cached by the user.
Constructor and Description |
---|
KMeans()
Constructs a KMeans instance with default parameters: {k: 2, maxIterations: 20, runs: 1,
initializationMode: "k-means||", initializationSteps: 5, epsilon: 1e-4, seed: random}.
|
Modifier and Type | Method and Description |
---|---|
double |
getEpsilon()
The distance threshold within which we've consider centers to have converged.
|
java.lang.String |
getInitializationMode()
The initialization algorithm.
|
int |
getInitializationSteps()
Number of steps for the k-means|| initialization mode
|
int |
getK()
Number of clusters to create (k).
|
int |
getMaxIterations()
Maximum number of iterations allowed.
|
int |
getRuns()
This function has no effect since Spark 2.0.0.
|
long |
getSeed()
The random seed for cluster initialization.
|
protected static void |
initializeLogIfNecessary(boolean isInterpreter) |
protected static boolean |
isTraceEnabled() |
static java.lang.String |
K_MEANS_PARALLEL() |
protected static org.slf4j.Logger |
log() |
protected static void |
logDebug(scala.Function0<java.lang.String> msg) |
protected static void |
logDebug(scala.Function0<java.lang.String> msg,
java.lang.Throwable throwable) |
protected static void |
logError(scala.Function0<java.lang.String> msg) |
protected static void |
logError(scala.Function0<java.lang.String> msg,
java.lang.Throwable throwable) |
protected static void |
logInfo(scala.Function0<java.lang.String> msg) |
protected static void |
logInfo(scala.Function0<java.lang.String> msg,
java.lang.Throwable throwable) |
protected static java.lang.String |
logName() |
protected static void |
logTrace(scala.Function0<java.lang.String> msg) |
protected static void |
logTrace(scala.Function0<java.lang.String> msg,
java.lang.Throwable throwable) |
protected static void |
logWarning(scala.Function0<java.lang.String> msg) |
protected static void |
logWarning(scala.Function0<java.lang.String> msg,
java.lang.Throwable throwable) |
static java.lang.String |
RANDOM() |
KMeansModel |
run(RDD<Vector> data)
Train a K-means model on the given set of points;
data should be cached for high
performance, because this is an iterative algorithm. |
KMeans |
setEpsilon(double epsilon)
Set the distance threshold within which we've consider centers to have converged.
|
KMeans |
setInitializationMode(java.lang.String initializationMode)
Set the initialization algorithm.
|
KMeans |
setInitializationSteps(int initializationSteps)
Set the number of steps for the k-means|| initialization mode.
|
KMeans |
setInitialModel(KMeansModel model)
Set the initial starting point, bypassing the random initialization or k-means||
The condition model.k == this.k must be met, failure results
in an IllegalArgumentException.
|
KMeans |
setK(int k)
Set the number of clusters to create (k).
|
KMeans |
setMaxIterations(int maxIterations)
Set maximum number of iterations allowed.
|
KMeans |
setRuns(int runs)
This function has no effect since Spark 2.0.0.
|
KMeans |
setSeed(long seed)
Set the random seed for cluster initialization.
|
static KMeansModel |
train(RDD<Vector> data,
int k,
int maxIterations)
Trains a k-means model using specified parameters and the default values for unspecified.
|
static KMeansModel |
train(RDD<Vector> data,
int k,
int maxIterations,
int runs)
Trains a k-means model using specified parameters and the default values for unspecified.
|
static KMeansModel |
train(RDD<Vector> data,
int k,
int maxIterations,
int runs,
java.lang.String initializationMode)
Trains a k-means model using the given set of parameters.
|
static KMeansModel |
train(RDD<Vector> data,
int k,
int maxIterations,
int runs,
java.lang.String initializationMode,
long seed)
Trains a k-means model using the given set of parameters.
|
public KMeans()
public static java.lang.String RANDOM()
public static java.lang.String K_MEANS_PARALLEL()
public static KMeansModel train(RDD<Vector> data, int k, int maxIterations, int runs, java.lang.String initializationMode, long seed)
data
- Training points as an RDD
of Vector
types.k
- Number of clusters to create.maxIterations
- Maximum number of iterations allowed.runs
- This param has no effect since Spark 2.0.0.initializationMode
- The initialization algorithm. This can either be "random" or
"k-means||". (default: "k-means||")seed
- Random seed for cluster initialization. Default is to generate seed based
on system time.public static KMeansModel train(RDD<Vector> data, int k, int maxIterations, int runs, java.lang.String initializationMode)
data
- Training points as an RDD
of Vector
types.k
- Number of clusters to create.maxIterations
- Maximum number of iterations allowed.runs
- This param has no effect since Spark 2.0.0.initializationMode
- The initialization algorithm. This can either be "random" or
"k-means||". (default: "k-means||")public static KMeansModel train(RDD<Vector> data, int k, int maxIterations)
data
- (undocumented)k
- (undocumented)maxIterations
- (undocumented)public static KMeansModel train(RDD<Vector> data, int k, int maxIterations, int runs)
data
- (undocumented)k
- (undocumented)maxIterations
- (undocumented)runs
- (undocumented)protected static java.lang.String logName()
protected static org.slf4j.Logger log()
protected static void logInfo(scala.Function0<java.lang.String> msg)
protected static void logDebug(scala.Function0<java.lang.String> msg)
protected static void logTrace(scala.Function0<java.lang.String> msg)
protected static void logWarning(scala.Function0<java.lang.String> msg)
protected static void logError(scala.Function0<java.lang.String> msg)
protected static void logInfo(scala.Function0<java.lang.String> msg, java.lang.Throwable throwable)
protected static void logDebug(scala.Function0<java.lang.String> msg, java.lang.Throwable throwable)
protected static void logTrace(scala.Function0<java.lang.String> msg, java.lang.Throwable throwable)
protected static void logWarning(scala.Function0<java.lang.String> msg, java.lang.Throwable throwable)
protected static void logError(scala.Function0<java.lang.String> msg, java.lang.Throwable throwable)
protected static boolean isTraceEnabled()
protected static void initializeLogIfNecessary(boolean isInterpreter)
public int getK()
public KMeans setK(int k)
k
- (undocumented)public int getMaxIterations()
public KMeans setMaxIterations(int maxIterations)
maxIterations
- (undocumented)public java.lang.String getInitializationMode()
public KMeans setInitializationMode(java.lang.String initializationMode)
initializationMode
- (undocumented)public int getRuns()
public KMeans setRuns(int runs)
runs
- (undocumented)public int getInitializationSteps()
public KMeans setInitializationSteps(int initializationSteps)
initializationSteps
- (undocumented)public double getEpsilon()
public KMeans setEpsilon(double epsilon)
epsilon
- (undocumented)public long getSeed()
public KMeans setSeed(long seed)
seed
- (undocumented)public KMeans setInitialModel(KMeansModel model)
model
- (undocumented)public KMeansModel run(RDD<Vector> data)
data
should be cached for high
performance, because this is an iterative algorithm.data
- (undocumented)