public class RelationalGroupedDataset
extends Object
DataFrame, created by groupBy,
 cube or rollup (and also pivot).
 
 The main method is the agg function, which has multiple variants. This class also contains
 some first-order statistics such as mean, sum for convenience.
 
GroupedData in Spark 1.x.
 | Modifier and Type | Class and Description | 
|---|---|
| static class  | RelationalGroupedDataset.CubeType$To indicate it's the CUBE | 
| static class  | RelationalGroupedDataset.GroupByType$To indicate it's the GroupBy | 
| static interface  | RelationalGroupedDataset.GroupTypeThe Grouping Type | 
| static class  | RelationalGroupedDataset.PivotType$ | 
| static class  | RelationalGroupedDataset.RollupType$To indicate it's the ROLLUP | 
| Modifier and Type | Method and Description | 
|---|---|
| Dataset<Row> | agg(Column expr,
   Column... exprs)Compute aggregates by specifying a series of aggregate columns. | 
| Dataset<Row> | agg(Column expr,
   scala.collection.Seq<Column> exprs)Compute aggregates by specifying a series of aggregate columns. | 
| Dataset<Row> | agg(scala.collection.immutable.Map<String,String> exprs)(Scala-specific) Compute aggregates by specifying a map from column name to
 aggregate methods. | 
| Dataset<Row> | agg(java.util.Map<String,String> exprs)(Java-specific) Compute aggregates by specifying a map from column name to
 aggregate methods. | 
| Dataset<Row> | agg(scala.Tuple2<String,String> aggExpr,
   scala.collection.Seq<scala.Tuple2<String,String>> aggExprs)(Scala-specific) Compute aggregates by specifying the column names and
 aggregate methods. | 
| static RelationalGroupedDataset | apply(Dataset<Row> df,
     scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs,
     RelationalGroupedDataset.GroupType groupType) | 
| <K,T> KeyValueGroupedDataset<K,T> | as(Encoder<K> evidence$1,
  Encoder<T> evidence$2)Returns a  KeyValueGroupedDatasetwhere the data is grouped by the grouping expressions
 of currentRelationalGroupedDataset. | 
| Dataset<Row> | avg(scala.collection.Seq<String> colNames)Compute the mean value for each numeric columns for each group. | 
| Dataset<Row> | avg(String... colNames)Compute the mean value for each numeric columns for each group. | 
| Dataset<Row> | count()Count the number of rows for each group. | 
| Dataset<Row> | max(scala.collection.Seq<String> colNames)Compute the max value for each numeric columns for each group. | 
| Dataset<Row> | max(String... colNames)Compute the max value for each numeric columns for each group. | 
| Dataset<Row> | mean(scala.collection.Seq<String> colNames)Compute the average value for each numeric columns for each group. | 
| Dataset<Row> | mean(String... colNames)Compute the average value for each numeric columns for each group. | 
| Dataset<Row> | min(scala.collection.Seq<String> colNames)Compute the min value for each numeric column for each group. | 
| Dataset<Row> | min(String... colNames)Compute the min value for each numeric column for each group. | 
| RelationalGroupedDataset | pivot(Column pivotColumn)Pivots a column of the current  DataFrameand performs the specified aggregation. | 
| RelationalGroupedDataset | pivot(Column pivotColumn,
     java.util.List<Object> values)(Java-specific) Pivots a column of the current  DataFrameand performs the specified
 aggregation. | 
| RelationalGroupedDataset | pivot(Column pivotColumn,
     scala.collection.Seq<Object> values)Pivots a column of the current  DataFrameand performs the specified aggregation. | 
| RelationalGroupedDataset | pivot(String pivotColumn)Pivots a column of the current  DataFrameand performs the specified aggregation. | 
| RelationalGroupedDataset | pivot(String pivotColumn,
     java.util.List<Object> values)(Java-specific) Pivots a column of the current  DataFrameand performs the specified
 aggregation. | 
| RelationalGroupedDataset | pivot(String pivotColumn,
     scala.collection.Seq<Object> values)Pivots a column of the current  DataFrameand performs the specified aggregation. | 
| Dataset<Row> | sum(scala.collection.Seq<String> colNames)Compute the sum for each numeric columns for each group. | 
| Dataset<Row> | sum(String... colNames)Compute the sum for each numeric columns for each group. | 
| String | toString() | 
public static RelationalGroupedDataset apply(Dataset<Row> df, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs, RelationalGroupedDataset.GroupType groupType)
public Dataset<Row> agg(Column expr, Column... exprs)
spark.sql.retainGroupColumns to false.
 
 The available aggregate methods are defined in functions.
 
   // Selects the age of the oldest employee and the aggregate expense for each department
   // Scala:
   import org.apache.spark.sql.functions._
   df.groupBy("department").agg(max("age"), sum("expense"))
   // Java:
   import static org.apache.spark.sql.functions.*;
   df.groupBy("department").agg(max("age"), sum("expense"));
 
 Note that before Spark 1.4, the default behavior is to NOT retain grouping columns. To change
 to that behavior, set config variable spark.sql.retainGroupColumns to false.
 
   // Scala, 1.3.x:
   df.groupBy("department").agg($"department", max("age"), sum("expense"))
   // Java, 1.3.x:
   df.groupBy("department").agg(col("department"), max("age"), sum("expense"));
 expr - (undocumented)exprs - (undocumented)public Dataset<Row> mean(String... colNames)
avg.
 The resulting DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the average values for them.
 colNames - (undocumented)public Dataset<Row> max(String... colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the max values for them.
 colNames - (undocumented)public Dataset<Row> avg(String... colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the mean values for them.
 colNames - (undocumented)public Dataset<Row> min(String... colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the min values for them.
 colNames - (undocumented)public Dataset<Row> sum(String... colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the sum for them.
 colNames - (undocumented)public <K,T> KeyValueGroupedDataset<K,T> as(Encoder<K> evidence$1, Encoder<T> evidence$2)
KeyValueGroupedDataset where the data is grouped by the grouping expressions
 of current RelationalGroupedDataset.
 evidence$1 - (undocumented)evidence$2 - (undocumented)public Dataset<Row> agg(scala.Tuple2<String,String> aggExpr, scala.collection.Seq<scala.Tuple2<String,String>> aggExprs)
DataFrame will also contain the grouping columns.
 
 The available aggregate methods are avg, max, min, sum, count.
 
   // Selects the age of the oldest employee and the aggregate expense for each department
   df.groupBy("department").agg(
     "age" -> "max",
     "expense" -> "sum"
   )
 aggExpr - (undocumented)aggExprs - (undocumented)public Dataset<Row> agg(scala.collection.immutable.Map<String,String> exprs)
DataFrame will also contain the grouping columns.
 
 The available aggregate methods are avg, max, min, sum, count.
 
   // Selects the age of the oldest employee and the aggregate expense for each department
   df.groupBy("department").agg(Map(
     "age" -> "max",
     "expense" -> "sum"
   ))
 exprs - (undocumented)public Dataset<Row> agg(java.util.Map<String,String> exprs)
DataFrame will also contain the grouping columns.
 
 The available aggregate methods are avg, max, min, sum, count.
 
   // Selects the age of the oldest employee and the aggregate expense for each department
   import com.google.common.collect.ImmutableMap;
   df.groupBy("department").agg(ImmutableMap.of("age", "max", "expense", "sum"));
 exprs - (undocumented)public Dataset<Row> agg(Column expr, scala.collection.Seq<Column> exprs)
spark.sql.retainGroupColumns to false.
 
 The available aggregate methods are defined in functions.
 
   // Selects the age of the oldest employee and the aggregate expense for each department
   // Scala:
   import org.apache.spark.sql.functions._
   df.groupBy("department").agg(max("age"), sum("expense"))
   // Java:
   import static org.apache.spark.sql.functions.*;
   df.groupBy("department").agg(max("age"), sum("expense"));
 
 Note that before Spark 1.4, the default behavior is to NOT retain grouping columns. To change
 to that behavior, set config variable spark.sql.retainGroupColumns to false.
 
   // Scala, 1.3.x:
   df.groupBy("department").agg($"department", max("age"), sum("expense"))
   // Java, 1.3.x:
   df.groupBy("department").agg(col("department"), max("age"), sum("expense"));
 expr - (undocumented)exprs - (undocumented)public Dataset<Row> count()
DataFrame will also contain the grouping columns.
 public Dataset<Row> mean(scala.collection.Seq<String> colNames)
avg.
 The resulting DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the average values for them.
 colNames - (undocumented)public Dataset<Row> max(scala.collection.Seq<String> colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the max values for them.
 colNames - (undocumented)public Dataset<Row> avg(scala.collection.Seq<String> colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the mean values for them.
 colNames - (undocumented)public Dataset<Row> min(scala.collection.Seq<String> colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the min values for them.
 colNames - (undocumented)public Dataset<Row> sum(scala.collection.Seq<String> colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the sum for them.
 colNames - (undocumented)public RelationalGroupedDataset pivot(String pivotColumn)
DataFrame and performs the specified aggregation.
 
 There are two versions of pivot function: one that requires the caller to specify the list
 of distinct values to pivot on, and one that does not. The latter is more concise but less
 efficient, because Spark needs to first compute the list of distinct values internally.
 
   // Compute the sum of earnings for each year by course with each course as a separate column
   df.groupBy("year").pivot("course", Seq("dotNET", "Java")).sum("earnings")
   // Or without specifying column values (less efficient)
   df.groupBy("year").pivot("course").sum("earnings")
 pivotColumn - Name of the column to pivot.org.apache.spark.sql.Dataset.unpivot for the reverse operation,
      except for the aggregation.
 public RelationalGroupedDataset pivot(String pivotColumn, scala.collection.Seq<Object> values)
DataFrame and performs the specified aggregation.
 There are two versions of pivot function: one that requires the caller to specify the list
 of distinct values to pivot on, and one that does not. The latter is more concise but less
 efficient, because Spark needs to first compute the list of distinct values internally.
 
   // Compute the sum of earnings for each year by course with each course as a separate column
   df.groupBy("year").pivot("course", Seq("dotNET", "Java")).sum("earnings")
   // Or without specifying column values (less efficient)
   df.groupBy("year").pivot("course").sum("earnings")
 
 From Spark 3.0.0, values can be literal columns, for instance, struct. For pivoting by
 multiple columns, use the struct function to combine the columns and values:
 
   df.groupBy("year")
     .pivot("trainingCourse", Seq(struct(lit("java"), lit("Experts"))))
     .agg(sum($"earnings"))
 pivotColumn - Name of the column to pivot.values - List of values that will be translated to columns in the output DataFrame.org.apache.spark.sql.Dataset.unpivot for the reverse operation,
      except for the aggregation.
 public RelationalGroupedDataset pivot(String pivotColumn, java.util.List<Object> values)
DataFrame and performs the specified
 aggregation.
 There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.
   // Compute the sum of earnings for each year by course with each course as a separate column
   df.groupBy("year").pivot("course", Arrays.<Object>asList("dotNET", "Java")).sum("earnings");
   // Or without specifying column values (less efficient)
   df.groupBy("year").pivot("course").sum("earnings");
 pivotColumn - Name of the column to pivot.values - List of values that will be translated to columns in the output DataFrame.org.apache.spark.sql.Dataset.unpivot for the reverse operation,
      except for the aggregation.
 public RelationalGroupedDataset pivot(Column pivotColumn)
DataFrame and performs the specified aggregation.
 This is an overloaded version of the pivot method with pivotColumn of the String type.
 
   // Or without specifying column values (less efficient)
   df.groupBy($"year").pivot($"course").sum($"earnings");
 pivotColumn - he column to pivot.org.apache.spark.sql.Dataset.unpivot for the reverse operation,
      except for the aggregation.
 public RelationalGroupedDataset pivot(Column pivotColumn, scala.collection.Seq<Object> values)
DataFrame and performs the specified aggregation.
 This is an overloaded version of the pivot method with pivotColumn of the String type.
 
   // Compute the sum of earnings for each year by course with each course as a separate column
   df.groupBy($"year").pivot($"course", Seq("dotNET", "Java")).sum($"earnings")
 pivotColumn - the column to pivot.values - List of values that will be translated to columns in the output DataFrame.org.apache.spark.sql.Dataset.unpivot for the reverse operation,
      except for the aggregation.
 public RelationalGroupedDataset pivot(Column pivotColumn, java.util.List<Object> values)
DataFrame and performs the specified
 aggregation. This is an overloaded version of the pivot method with pivotColumn of
 the String type.
 pivotColumn - the column to pivot.values - List of values that will be translated to columns in the output DataFrame.org.apache.spark.sql.Dataset.unpivot for the reverse operation,
      except for the aggregation.
 public String toString()
toString in class Object