public class GradientBoostedTrees
extends Object
Constructor and Description |
---|
GradientBoostedTrees() |
Modifier and Type | Method and Description |
---|---|
static scala.Tuple2<DecisionTreeRegressionModel[],double[]> |
boost(RDD<org.apache.spark.ml.feature.Instance> input,
RDD<org.apache.spark.ml.feature.Instance> validationInput,
BoostingStrategy boostingStrategy,
boolean validate,
long seed,
String featureSubsetStrategy)
Internal method for performing regression using trees as base learners.
|
static RDD<scala.Tuple2<Object,Object>> |
computeInitialPredictionAndError(RDD<org.apache.spark.ml.feature.Instance> data,
double initTreeWeight,
DecisionTreeRegressionModel initTree,
Loss loss)
Compute the initial predictions and errors for a dataset for the first
iteration of gradient boosting.
|
static double |
computeWeightedError(RDD<org.apache.spark.ml.feature.Instance> data,
DecisionTreeRegressionModel[] trees,
double[] treeWeights,
Loss loss)
Method to calculate error of the base learner for the gradient boosting calculation.
|
static double |
computeWeightedError(RDD<org.apache.spark.ml.feature.Instance> data,
RDD<scala.Tuple2<Object,Object>> predError)
Method to calculate error of the base learner for the gradient boosting calculation.
|
static double[] |
evaluateEachIteration(RDD<org.apache.spark.ml.feature.Instance> data,
DecisionTreeRegressionModel[] trees,
double[] treeWeights,
Loss loss,
scala.Enumeration.Value algo)
Method to compute error or loss for every iteration of gradient boosting.
|
static void |
org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) |
static org.slf4j.Logger |
org$apache$spark$internal$Logging$$log_() |
static scala.Tuple2<DecisionTreeRegressionModel[],double[]> |
run(RDD<org.apache.spark.ml.feature.Instance> input,
BoostingStrategy boostingStrategy,
long seed,
String featureSubsetStrategy)
Method to train a gradient boosting model
|
static scala.Tuple2<DecisionTreeRegressionModel[],double[]> |
runWithValidation(RDD<org.apache.spark.ml.feature.Instance> input,
RDD<org.apache.spark.ml.feature.Instance> validationInput,
BoostingStrategy boostingStrategy,
long seed,
String featureSubsetStrategy)
Method to validate a gradient boosting model
|
static double |
updatePrediction(Vector features,
double prediction,
DecisionTreeRegressionModel tree,
double weight)
Add prediction from a new boosting iteration to an existing prediction.
|
static RDD<scala.Tuple2<Object,Object>> |
updatePredictionError(RDD<org.apache.spark.ml.feature.Instance> data,
RDD<scala.Tuple2<Object,Object>> predictionAndError,
double treeWeight,
DecisionTreeRegressionModel tree,
Loss loss)
Update a zipped predictionError RDD
(as obtained with computeInitialPredictionAndError)
|
public static scala.Tuple2<DecisionTreeRegressionModel[],double[]> run(RDD<org.apache.spark.ml.feature.Instance> input, BoostingStrategy boostingStrategy, long seed, String featureSubsetStrategy)
input
- Training dataset: RDD of Instance
.seed
- Random seed.boostingStrategy
- (undocumented)featureSubsetStrategy
- (undocumented)public static scala.Tuple2<DecisionTreeRegressionModel[],double[]> runWithValidation(RDD<org.apache.spark.ml.feature.Instance> input, RDD<org.apache.spark.ml.feature.Instance> validationInput, BoostingStrategy boostingStrategy, long seed, String featureSubsetStrategy)
input
- Training dataset: RDD of Instance
.validationInput
- Validation dataset.
This dataset should be different from the training dataset,
but it should follow the same distribution.
E.g., these two datasets could be created from an original dataset
by using org.apache.spark.rdd.RDD.randomSplit()
seed
- Random seed.boostingStrategy
- (undocumented)featureSubsetStrategy
- (undocumented)public static RDD<scala.Tuple2<Object,Object>> computeInitialPredictionAndError(RDD<org.apache.spark.ml.feature.Instance> data, double initTreeWeight, DecisionTreeRegressionModel initTree, Loss loss)
data:
- training data.initTreeWeight:
- learning rate assigned to the first tree.initTree:
- first DecisionTreeModel.loss:
- evaluation metric.public static RDD<scala.Tuple2<Object,Object>> updatePredictionError(RDD<org.apache.spark.ml.feature.Instance> data, RDD<scala.Tuple2<Object,Object>> predictionAndError, double treeWeight, DecisionTreeRegressionModel tree, Loss loss)
data:
- training data.predictionAndError:
- predictionError RDDtreeWeight:
- Learning rate.tree:
- Tree using which the prediction and error should be updated.loss:
- evaluation metric.public static double updatePrediction(Vector features, double prediction, DecisionTreeRegressionModel tree, double weight)
features
- Vector of features representing a single data point.prediction
- The existing prediction.tree
- New Decision Tree model.weight
- Tree weight.public static double computeWeightedError(RDD<org.apache.spark.ml.feature.Instance> data, DecisionTreeRegressionModel[] trees, double[] treeWeights, Loss loss)
data
- Training dataset: RDD of Instance
.trees
- Boosted Decision Tree modelstreeWeights
- Learning rates at each boosting iteration.loss
- evaluation metric.public static double computeWeightedError(RDD<org.apache.spark.ml.feature.Instance> data, RDD<scala.Tuple2<Object,Object>> predError)
data
- Training dataset: RDD of Instance
.predError
- Prediction and error.public static double[] evaluateEachIteration(RDD<org.apache.spark.ml.feature.Instance> data, DecisionTreeRegressionModel[] trees, double[] treeWeights, Loss loss, scala.Enumeration.Value algo)
data
- RDD of Instance
trees
- Boosted Decision Tree modelstreeWeights
- Learning rates at each boosting iteration.loss
- evaluation metric.algo
- algorithm for the ensemble, either Classification or Regressionpublic static scala.Tuple2<DecisionTreeRegressionModel[],double[]> boost(RDD<org.apache.spark.ml.feature.Instance> input, RDD<org.apache.spark.ml.feature.Instance> validationInput, BoostingStrategy boostingStrategy, boolean validate, long seed, String featureSubsetStrategy)
input
- training datasetvalidationInput
- validation dataset, ignored if validate is set to false.boostingStrategy
- boosting parametersvalidate
- whether or not to use the validation dataset.seed
- Random seed.featureSubsetStrategy
- (undocumented)public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)