public class ChiSqSelector
extends Object
implements scala.Serializable
numTopFeatures
, percentile
, fpr
.
- numTopFeatures
chooses a fixed number of top features according to a chi-squared test.
- percentile
is similar but chooses a fraction of all features instead of a fixed number.
- fpr
chooses all features whose p-value is below a threshold, thus controlling the false
positive rate of selection.
By default, the selection method is numTopFeatures
, with the default number of top features
set to 50.Constructor and Description |
---|
ChiSqSelector() |
ChiSqSelector(int numTopFeatures)
The is the same to call this() and setNumTopFeatures(numTopFeatures)
|
Modifier and Type | Method and Description |
---|---|
ChiSqSelectorModel |
fit(RDD<LabeledPoint> data)
Returns a ChiSquared feature selector.
|
double |
fpr() |
static String |
FPR()
String name for `fpr` selector type.
|
int |
numTopFeatures() |
static String |
NumTopFeatures()
String name for
numTopFeatures selector type. |
double |
percentile() |
static String |
Percentile()
String name for
percentile selector type. |
String |
selectorType() |
ChiSqSelector |
setFpr(double value) |
ChiSqSelector |
setNumTopFeatures(int value) |
ChiSqSelector |
setPercentile(double value) |
ChiSqSelector |
setSelectorType(String value) |
static String[] |
supportedSelectorTypes()
Set of selector types that ChiSqSelector supports.
|
public ChiSqSelector()
public ChiSqSelector(int numTopFeatures)
numTopFeatures
- (undocumented)public static String NumTopFeatures()
numTopFeatures
selector type.public static String Percentile()
percentile
selector type.public static String FPR()
public static String[] supportedSelectorTypes()
public int numTopFeatures()
public double percentile()
public double fpr()
public String selectorType()
public ChiSqSelector setNumTopFeatures(int value)
public ChiSqSelector setPercentile(double value)
public ChiSqSelector setFpr(double value)
public ChiSqSelector setSelectorType(String value)
public ChiSqSelectorModel fit(RDD<LabeledPoint> data)
data
- an RDD[LabeledPoint]
containing the labeled dataset with categorical features.
Real-valued features will be treated as categorical for each distinct value.
Apply feature discretizer before using this function.