public class Word2Vec extends Object implements scala.Serializable, Logging
We used skip-gram model in our implementation and hierarchical softmax method to train the model. The variable names in the implementation matches the original C implementation.
For original C implementation, see https://code.google.com/p/word2vec/ For research papers, see Efficient Estimation of Word Representations in Vector Space and Distributed Representations of Words and Phrases and their Compositionality.
| Constructor and Description | 
|---|
| Word2Vec() | 
| Modifier and Type | Method and Description | 
|---|---|
| <S extends Iterable<String>>  | fit(JavaRDD<S> dataset)Computes the vector representation of each word in vocabulary (Java version). | 
| <S extends scala.collection.Iterable<String>>  | fit(RDD<S> dataset)Computes the vector representation of each word in vocabulary. | 
| Word2Vec | setLearningRate(double learningRate)Sets initial learning rate (default: 0.025). | 
| Word2Vec | setNumIterations(int numIterations)Sets number of iterations (default: 1), which should be smaller than or equal to number of
 partitions. | 
| Word2Vec | setNumPartitions(int numPartitions)Sets number of partitions (default: 1). | 
| Word2Vec | setSeed(long seed)Sets random seed (default: a random long integer). | 
| Word2Vec | setVectorSize(int vectorSize)Sets vector size (default: 100). | 
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitinitializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarningpublic Word2Vec setVectorSize(int vectorSize)
public Word2Vec setLearningRate(double learningRate)
public Word2Vec setNumPartitions(int numPartitions)
public Word2Vec setNumIterations(int numIterations)
public Word2Vec setSeed(long seed)
public <S extends scala.collection.Iterable<String>> Word2VecModel fit(RDD<S> dataset)
dataset - an RDD of wordspublic <S extends Iterable<String>> Word2VecModel fit(JavaRDD<S> dataset)
dataset - a JavaRDD of words