public class VectorIndexerModel extends Model<VectorIndexerModel>
This maintains vector sparsity.
param: numFeatures Number of features, i.e., length of Vectors which this transforms param: categoryMaps Feature value index. Keys are categorical feature indices (column indices). Values are maps from original features values to 0-based category indices. If a feature is not in this map, it is treated as continuous.
Modifier and Type | Method and Description |
---|---|
scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> |
categoryMaps() |
int |
getMaxCategories() |
java.util.Map<Integer,java.util.Map<Double,Integer>> |
javaCategoryMaps()
Java-friendly version of
categoryMaps |
IntParam |
maxCategories()
Threshold for the number of values a categorical feature can take.
|
int |
numFeatures() |
VectorIndexerModel |
setInputCol(String value) |
VectorIndexerModel |
setOutputCol(String value) |
DataFrame |
transform(DataFrame dataset)
Transforms the input dataset.
|
StructType |
transformSchema(StructType schema)
:: DeveloperApi ::
|
String |
uid() |
transform, transform, transform
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
clear, copy, copyValues, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn, validateParams
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public String uid()
public int numFeatures()
public scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> categoryMaps()
public java.util.Map<Integer,java.util.Map<Double,Integer>> javaCategoryMaps()
categoryMaps
public VectorIndexerModel setInputCol(String value)
public VectorIndexerModel setOutputCol(String value)
public DataFrame transform(DataFrame dataset)
Transformer
transform
in class Transformer
dataset
- (undocumented)public StructType transformSchema(StructType schema)
PipelineStage
Derives the output schema from the input schema.
transformSchema
in class PipelineStage
schema
- (undocumented)public IntParam maxCategories()
(default = 20)
public int getMaxCategories()