public class RegexTokenizer extends UnaryTransformer<java.lang.String,scala.collection.Seq<java.lang.String>,RegexTokenizer>
gaps is false).
Optional parameters also allow filtering tokens using a minimal length.
It returns an array of strings that can be empty.| Constructor and Description |
|---|
RegexTokenizer() |
RegexTokenizer(java.lang.String uid) |
| Modifier and Type | Method and Description |
|---|---|
RegexTokenizer |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
protected scala.Function1<java.lang.String,scala.collection.Seq<java.lang.String>> |
createTransformFunc()
Creates the transform function using the given param map.
|
BooleanParam |
gaps()
Indicates whether regex splits on gaps (true) or matches tokens (false).
|
boolean |
getGaps() |
int |
getMinTokenLength() |
java.lang.String |
getPattern() |
IntParam |
minTokenLength()
Minimum token length, >= 0.
|
protected DataType |
outputDataType()
Returns the data type of the output column.
|
Param<java.lang.String> |
pattern()
Regex pattern used to match delimiters if
gaps is true or tokens if gaps is false. |
RegexTokenizer |
setGaps(boolean value) |
RegexTokenizer |
setMinTokenLength(int value) |
RegexTokenizer |
setPattern(java.lang.String value) |
java.lang.String |
uid()
An immutable unique ID for the object and its derivatives.
|
protected void |
validateInputType(DataType inputType)
Validates the input type.
|
setInputCol, setOutputCol, transform, transformSchematransform, transform, transformtransformSchemaclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitinitializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarningclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn, validateParamstoStringpublic RegexTokenizer(java.lang.String uid)
public RegexTokenizer()
public java.lang.String uid()
Identifiablepublic IntParam minTokenLength()
public RegexTokenizer setMinTokenLength(int value)
public int getMinTokenLength()
public BooleanParam gaps()
public RegexTokenizer setGaps(boolean value)
public boolean getGaps()
public Param<java.lang.String> pattern()
gaps is true or tokens if gaps is false.
Default: "\\s+"public RegexTokenizer setPattern(java.lang.String value)
public java.lang.String getPattern()
protected scala.Function1<java.lang.String,scala.collection.Seq<java.lang.String>> createTransformFunc()
UnaryTransformercreateTransformFunc in class UnaryTransformer<java.lang.String,scala.collection.Seq<java.lang.String>,RegexTokenizer>protected void validateInputType(DataType inputType)
UnaryTransformervalidateInputType in class UnaryTransformer<java.lang.String,scala.collection.Seq<java.lang.String>,RegexTokenizer>inputType - (undocumented)protected DataType outputDataType()
UnaryTransformeroutputDataType in class UnaryTransformer<java.lang.String,scala.collection.Seq<java.lang.String>,RegexTokenizer>public RegexTokenizer copy(ParamMap extra)
Paramscopy in interface Paramscopy in class UnaryTransformer<java.lang.String,scala.collection.Seq<java.lang.String>,RegexTokenizer>extra - (undocumented)defaultCopy()