org.apache.spark.mllib.clustering
Return the latest model.
Use the clustering model to make predictions on batches of data from a DStream.
Use the clustering model to make predictions on batches of data from a DStream.
DStream containing vector data
DStream containing predictions
Use the model to make predictions on the values of a DStream and carry over its keys.
Use the model to make predictions on the values of a DStream and carry over its keys.
key type
DStream containing (key, feature vector) pairs
DStream containing the input keys and the predictions as values
Set the decay factor directly (for forgetful algorithms).
Set the half life and time unit ("batches" or "points") for forgetful algorithms.
Specify initial centers directly.
Set the number of clusters.
Initialize random centers, requiring only the number of dimensions.
Initialize random centers, requiring only the number of dimensions.
Number of dimensions
Weight for each center
Random seed
Update the clustering model by training on batches of data from a DStream.
Update the clustering model by training on batches of data from a DStream. This operation registers a DStream for training the model, checks whether the cluster centers have been initialized, and updates the model using each batch of data from the stream.
DStream containing vector data
:: DeveloperApi :: StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data. See KMeansModel for details on algorithm and update rules.
Use a builder pattern to construct a streaming k-means analysis in an application, like:
val model = new StreamingKMeans() .setDecayFactor(0.5) .setK(3) .setRandomCenters(5, 100.0) .trainOn(DStream)