public class OneHotEncoderEstimator extends Estimator<OneHotEncoderModel> implements DefaultParamsWritable
[0.0, 0.0, 1.0, 0.0]
.
The last category is not included by default (configurable via dropLast
),
because it makes the vector entries sum up to one, and hence linearly dependent.
So an input value of 4.0 maps to [0.0, 0.0, 0.0, 0.0]
.
StringIndexer
for converting categorical values into category indices,
Serialized Form
When handleInvalid
is configured to 'keep', an extra "category" indicating invalid values is
added as last category. So when dropLast
is true, invalid values are encoded as all-zeros
vector.
, When encoding multi-column by using inputCols
and outputCols
params, input/output cols
come in pairs, specified by the order in the arrays, and each pair is treated independently.
Constructor and Description |
---|
OneHotEncoderEstimator() |
OneHotEncoderEstimator(String uid) |
Modifier and Type | Method and Description |
---|---|
static Params |
clear(Param<?> param) |
OneHotEncoderEstimator |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
static BooleanParam |
dropLast() |
BooleanParam |
dropLast()
Whether to drop the last category in the encoded vector (default: true)
|
static String |
explainParam(Param<?> param) |
static String |
explainParams() |
static ParamMap |
extractParamMap() |
static ParamMap |
extractParamMap(ParamMap extra) |
OneHotEncoderModel |
fit(Dataset<?> dataset)
Fits a model to the input data.
|
static <T> scala.Option<T> |
get(Param<T> param) |
static <T> scala.Option<T> |
getDefault(Param<T> param) |
static boolean |
getDropLast() |
boolean |
getDropLast() |
static String |
getHandleInvalid() |
static String[] |
getInputCols() |
static <T> T |
getOrDefault(Param<T> param) |
static String[] |
getOutputCols() |
static Param<Object> |
getParam(String paramName) |
static Param<String> |
handleInvalid() |
Param<String> |
handleInvalid()
Param for how to handle invalid data during transform().
|
static <T> boolean |
hasDefault(Param<T> param) |
static boolean |
hasParam(String paramName) |
static StringArrayParam |
inputCols() |
static boolean |
isDefined(Param<?> param) |
static boolean |
isSet(Param<?> param) |
static OneHotEncoderEstimator |
load(String path) |
static StringArrayParam |
outputCols() |
static Param<?>[] |
params() |
static void |
save(String path) |
static <T> Params |
set(Param<T> param,
T value) |
OneHotEncoderEstimator |
setDropLast(boolean value) |
OneHotEncoderEstimator |
setHandleInvalid(String value) |
OneHotEncoderEstimator |
setInputCols(String[] values) |
OneHotEncoderEstimator |
setOutputCols(String[] values) |
static String |
toString() |
StructType |
transformSchema(StructType schema)
:: DeveloperApi ::
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
StructType |
validateAndTransformSchema(StructType schema,
boolean dropLast,
boolean keepInvalid) |
static MLWriter |
write() |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getHandleInvalid
getInputCols, inputCols
getOutputCols, outputCols
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
toString
write
save
initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public OneHotEncoderEstimator(String uid)
public OneHotEncoderEstimator()
public static OneHotEncoderEstimator load(String path)
public static String toString()
public static Param<?>[] params()
public static String explainParam(Param<?> param)
public static String explainParams()
public static final boolean isSet(Param<?> param)
public static final boolean isDefined(Param<?> param)
public static boolean hasParam(String paramName)
public static Param<Object> getParam(String paramName)
public static final <T> scala.Option<T> get(Param<T> param)
public static final <T> T getOrDefault(Param<T> param)
public static final <T> scala.Option<T> getDefault(Param<T> param)
public static final <T> boolean hasDefault(Param<T> param)
public static final ParamMap extractParamMap()
public static final String getHandleInvalid()
public static final StringArrayParam inputCols()
public static final String[] getInputCols()
public static final StringArrayParam outputCols()
public static final String[] getOutputCols()
public static Param<String> handleInvalid()
public static final BooleanParam dropLast()
public static boolean getDropLast()
public static void save(String path) throws java.io.IOException
java.io.IOException
public static MLWriter write()
public String uid()
Identifiable
uid
in interface Identifiable
public OneHotEncoderEstimator setInputCols(String[] values)
public OneHotEncoderEstimator setOutputCols(String[] values)
public OneHotEncoderEstimator setDropLast(boolean value)
public OneHotEncoderEstimator setHandleInvalid(String value)
public StructType transformSchema(StructType schema)
PipelineStage
Check transform validity and derive the output schema from the input schema.
We check validity for interactions between parameters during transformSchema
and
raise an exception if any parameter value is invalid. Parameter value checks which
do not depend on other parameters are handled by Param.validate()
.
Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
transformSchema
in class PipelineStage
schema
- (undocumented)public OneHotEncoderModel fit(Dataset<?> dataset)
Estimator
fit
in class Estimator<OneHotEncoderModel>
dataset
- (undocumented)public OneHotEncoderEstimator copy(ParamMap extra)
Params
defaultCopy()
.copy
in interface Params
copy
in class Estimator<OneHotEncoderModel>
extra
- (undocumented)public Param<String> handleInvalid()
handleInvalid
in interface HasHandleInvalid
public BooleanParam dropLast()
public boolean getDropLast()
public StructType validateAndTransformSchema(StructType schema, boolean dropLast, boolean keepInvalid)