public class GBTRegressor extends Regressor<Vector,GBTRegressor,GBTRegressionModel> implements GBTRegressorParams, DefaultParamsWritable, org.apache.spark.internal.Logging
The implementation is based upon: J.H. Friedman. "Stochastic Gradient Boosting." 1999.
Notes on Gradient Boosting vs. TreeBoost: - This implementation is for Stochastic Gradient Boosting, not for TreeBoost. - Both algorithms learn tree ensembles by minimizing loss functions. - TreeBoost (Friedman, 1999) additionally modifies the outputs at tree leaf nodes based on the loss function, whereas the original gradient boosting method does not. - When the loss is SquaredError, these methods give the same result, but they could differ for other loss functions. - We expect to implement TreeBoost in the future: [https://issues.apache.org/jira/browse/SPARK-4240]
| Constructor and Description |
|---|
GBTRegressor() |
GBTRegressor(String uid) |
| Modifier and Type | Method and Description |
|---|---|
BooleanParam |
cacheNodeIds()
If false, the algorithm will pass trees to executors to match instances with nodes.
|
IntParam |
checkpointInterval()
Param for set checkpoint interval (>= 1) or disable checkpoint (-1).
|
GBTRegressor |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
Param<String> |
featureSubsetStrategy()
The number of features to consider for splits at each tree node.
|
Param<String> |
impurity()
Criterion used for information gain calculation (case-insensitive).
|
Param<String> |
leafCol()
Leaf indices column name.
|
static GBTRegressor |
load(String path) |
Param<String> |
lossType()
Loss function which GBT tries to minimize.
|
IntParam |
maxBins()
Maximum number of bins used for discretizing continuous features and for choosing how to split
on features at each node.
|
IntParam |
maxDepth()
Maximum depth of the tree (nonnegative).
|
IntParam |
maxIter()
Param for maximum number of iterations (>= 0).
|
IntParam |
maxMemoryInMB()
Maximum memory in MB allocated to histogram aggregation.
|
DoubleParam |
minInfoGain()
Minimum information gain for a split to be considered at a tree node.
|
IntParam |
minInstancesPerNode()
Minimum number of instances each child must have after split.
|
DoubleParam |
minWeightFractionPerNode()
Minimum fraction of the weighted sample count that each child must have after split.
|
static MLReader<T> |
read() |
LongParam |
seed()
Param for random seed.
|
GBTRegressor |
setCacheNodeIds(boolean value) |
GBTRegressor |
setCheckpointInterval(int value)
Specifies how often to checkpoint the cached node IDs.
|
GBTRegressor |
setFeatureSubsetStrategy(String value) |
GBTRegressor |
setImpurity(String value)
The impurity setting is ignored for GBT models.
|
GBTRegressor |
setLossType(String value) |
GBTRegressor |
setMaxBins(int value) |
GBTRegressor |
setMaxDepth(int value) |
GBTRegressor |
setMaxIter(int value) |
GBTRegressor |
setMaxMemoryInMB(int value) |
GBTRegressor |
setMinInfoGain(double value) |
GBTRegressor |
setMinInstancesPerNode(int value) |
GBTRegressor |
setMinWeightFractionPerNode(double value) |
GBTRegressor |
setSeed(long value) |
GBTRegressor |
setStepSize(double value) |
GBTRegressor |
setSubsamplingRate(double value) |
GBTRegressor |
setValidationIndicatorCol(String value) |
GBTRegressor |
setWeightCol(String value)
Sets the value of param
weightCol. |
DoubleParam |
stepSize()
Param for Step size (a.k.a.
|
DoubleParam |
subsamplingRate()
Fraction of the training data used for learning each decision tree, in range (0, 1].
|
static String[] |
supportedLossTypes()
Accessor for supported loss settings: squared (L2), absolute (L1)
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
Param<String> |
validationIndicatorCol()
Param for name of the column that indicates whether each row is for training or for validation.
|
DoubleParam |
validationTol()
Threshold for stopping early when fit with validation is used.
|
Param<String> |
weightCol()
Param for weight column name.
|
featuresCol, fit, labelCol, predictionCol, setFeaturesCol, setLabelCol, setPredictionCol, transformSchemaparamsequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitconvertToOldLossType, getLossType, getOldLossTypegetOldBoostingStrategy, getValidationTolgetMaxItergetStepSizegetValidationIndicatorColvalidateAndTransformSchemagetFeatureSubsetStrategy, getOldStrategy, getSubsamplingRategetCacheNodeIds, getLeafCol, getMaxBins, getMaxDepth, getMaxMemoryInMB, getMinInfoGain, getMinInstancesPerNode, getMinWeightFractionPerNode, getOldStrategy, setLeafColextractInstances, extractInstancesgetLabelCol, labelColfeaturesCol, getFeaturesColgetPredictionCol, predictionColclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwntoStringgetCheckpointIntervalgetWeightColgetImpurity, getOldImpuritywritesave$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitializepublic static final String[] supportedLossTypes()
public static GBTRegressor load(String path)
public static MLReader<T> read()
public Param<String> lossType()
GBTRegressorParamslossType in interface GBTRegressorParamspublic final Param<String> impurity()
HasVarianceImpurityimpurity in interface HasVarianceImpuritypublic final DoubleParam validationTol()
GBTParamsvalidationTol in interface GBTParamsvalidationIndicatorColpublic final DoubleParam stepSize()
GBTParamsstepSize in interface HasStepSizestepSize in interface GBTParamspublic final Param<String> validationIndicatorCol()
HasValidationIndicatorColvalidationIndicatorCol in interface HasValidationIndicatorColpublic final IntParam maxIter()
HasMaxItermaxIter in interface HasMaxIterpublic final DoubleParam subsamplingRate()
TreeEnsembleParamssubsamplingRate in interface TreeEnsembleParamspublic final Param<String> featureSubsetStrategy()
TreeEnsembleParamsThese various settings are based on the following references: - log2: tested in Breiman (2001) - sqrt: recommended by Breiman manual for random forests - The defaults of sqrt (classification) and onethird (regression) match the R randomForest package.
featureSubsetStrategy in interface TreeEnsembleParamspublic final Param<String> leafCol()
DecisionTreeParamsleafCol in interface DecisionTreeParamspublic final IntParam maxDepth()
DecisionTreeParamsmaxDepth in interface DecisionTreeParamspublic final IntParam maxBins()
DecisionTreeParamsmaxBins in interface DecisionTreeParamspublic final IntParam minInstancesPerNode()
DecisionTreeParamsminInstancesPerNode in interface DecisionTreeParamspublic final DoubleParam minWeightFractionPerNode()
DecisionTreeParamsminWeightFractionPerNode in interface DecisionTreeParamspublic final DoubleParam minInfoGain()
DecisionTreeParamsminInfoGain in interface DecisionTreeParamspublic final IntParam maxMemoryInMB()
DecisionTreeParamsmaxMemoryInMB in interface DecisionTreeParamspublic final BooleanParam cacheNodeIds()
DecisionTreeParamscacheNodeIds in interface DecisionTreeParamspublic final Param<String> weightCol()
HasWeightColweightCol in interface HasWeightColpublic final LongParam seed()
HasSeedpublic final IntParam checkpointInterval()
HasCheckpointIntervalcheckpointInterval in interface HasCheckpointIntervalpublic String uid()
Identifiableuid in interface Identifiablepublic GBTRegressor setMaxDepth(int value)
public GBTRegressor setMaxBins(int value)
public GBTRegressor setMinInstancesPerNode(int value)
public GBTRegressor setMinWeightFractionPerNode(double value)
public GBTRegressor setMinInfoGain(double value)
public GBTRegressor setMaxMemoryInMB(int value)
public GBTRegressor setCacheNodeIds(boolean value)
public GBTRegressor setCheckpointInterval(int value)
SparkContext.
Must be at least 1.
(default = 10)value - (undocumented)public GBTRegressor setImpurity(String value)
value - (undocumented)public GBTRegressor setSubsamplingRate(double value)
public GBTRegressor setSeed(long value)
public GBTRegressor setMaxIter(int value)
public GBTRegressor setStepSize(double value)
public GBTRegressor setLossType(String value)
public GBTRegressor setFeatureSubsetStrategy(String value)
public GBTRegressor setValidationIndicatorCol(String value)
public GBTRegressor setWeightCol(String value)
weightCol.
If this is not set or empty, we treat all instance weights as 1.0.
By default the weightCol is not set, so all instances have weight 1.0.
value - (undocumented)public GBTRegressor copy(ParamMap extra)
ParamsdefaultCopy().copy in interface Paramscopy in class Predictor<Vector,GBTRegressor,GBTRegressionModel>extra - (undocumented)