public class CountVectorizerModel extends Model<CountVectorizerModel> implements MLWritable
| Constructor and Description |
|---|
CountVectorizerModel(String[] vocabulary) |
CountVectorizerModel(String uid,
String[] vocabulary) |
| Modifier and Type | Method and Description |
|---|---|
CountVectorizerModel |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
double |
getMinDF() |
double |
getMinTF() |
int |
getVocabSize() |
static CountVectorizerModel |
load(String path) |
DoubleParam |
minDF()
Specifies the minimum number of different documents a term must appear in to be included
in the vocabulary.
|
DoubleParam |
minTF()
Filter to ignore rare words in a document.
|
static MLReader<CountVectorizerModel> |
read() |
CountVectorizerModel |
setInputCol(String value) |
CountVectorizerModel |
setMinTF(double value) |
CountVectorizerModel |
setOutputCol(String value) |
DataFrame |
transform(DataFrame dataset)
Transforms the input dataset.
|
StructType |
transformSchema(StructType schema)
:: DeveloperApi ::
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
StructType |
validateAndTransformSchema(StructType schema)
Validates and transforms the input schema.
|
IntParam |
vocabSize()
Max size of the vocabulary.
|
String[] |
vocabulary() |
MLWriter |
write()
Returns an
MLWriter instance for this ML instance. |
transform, transform, transformequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn, validateParamstoStringsaveinitializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarningpublic CountVectorizerModel(String uid,
String[] vocabulary)
public CountVectorizerModel(String[] vocabulary)
public static MLReader<CountVectorizerModel> read()
public static CountVectorizerModel load(String path)
public String uid()
Identifiableuid in interface Identifiablepublic String[] vocabulary()
public CountVectorizerModel setInputCol(String value)
public CountVectorizerModel setOutputCol(String value)
public CountVectorizerModel setMinTF(double value)
public DataFrame transform(DataFrame dataset)
Transformertransform in class Transformerdataset - (undocumented)public StructType transformSchema(StructType schema)
PipelineStageDerives the output schema from the input schema.
transformSchema in class PipelineStageschema - (undocumented)public CountVectorizerModel copy(ParamMap extra)
Paramscopy in interface Paramscopy in class Model<CountVectorizerModel>extra - (undocumented)defaultCopy()public MLWriter write()
MLWritableMLWriter instance for this ML instance.write in interface MLWritablepublic IntParam vocabSize()
Default: 2^18^
public int getVocabSize()
public DoubleParam minDF()
Default: 1
public double getMinDF()
public StructType validateAndTransformSchema(StructType schema)
public DoubleParam minTF()
Note that the parameter is only used in transform of CountVectorizerModel and does not
affect fitting.
Default: 1
public double getMinTF()