GroupedData

java.lang.Object
- org.apache.spark.sql.GroupedData

```
public class GroupedData
extends java.lang.Object
```
:: Experimental :: A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy.
The main method is the agg function, which has multiple variants. This class also contains convenience some first order statistics such as mean, sum for convenience.

Since:

1.3.0

Constructor Summary

Constructors
Modifier	Constructor and Description
`protected`	`GroupedData(DataFrame df, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs, org.apache.spark.sql.GroupedData.GroupType groupType)`

Method Summary

Methods
Modifier and Type	Method and Description
`DataFrame`	`agg(Column expr, Column... exprs)` Compute aggregates by specifying a series of aggregate columns.
`DataFrame`	`agg(Column expr, scala.collection.Seq<Column> exprs)` Compute aggregates by specifying a series of aggregate columns.
`DataFrame`	`agg(scala.collection.immutable.Map<java.lang.String,java.lang.String> exprs)` (Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods.
`DataFrame`	`agg(java.util.Map<java.lang.String,java.lang.String> exprs)` (Java-specific) Compute aggregates by specifying a map from column name to aggregate methods.
`DataFrame`	`agg(scala.Tuple2<java.lang.String,java.lang.String> aggExpr, scala.collection.Seq<scala.Tuple2<java.lang.String,java.lang.String>> aggExprs)` (Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods.
`static GroupedData`	`apply(DataFrame df, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs, org.apache.spark.sql.GroupedData.GroupType groupType)`
`DataFrame`	`avg(scala.collection.Seq<java.lang.String> colNames)` Compute the mean value for each numeric columns for each group.
`DataFrame`	`avg(java.lang.String... colNames)` Compute the mean value for each numeric columns for each group.
`DataFrame`	`count()` Count the number of rows for each group.
`DataFrame`	`max(scala.collection.Seq<java.lang.String> colNames)` Compute the max value for each numeric columns for each group.
`DataFrame`	`max(java.lang.String... colNames)` Compute the max value for each numeric columns for each group.
`DataFrame`	`mean(scala.collection.Seq<java.lang.String> colNames)` Compute the average value for each numeric columns for each group.
`DataFrame`	`mean(java.lang.String... colNames)` Compute the average value for each numeric columns for each group.
`DataFrame`	`min(scala.collection.Seq<java.lang.String> colNames)` Compute the min value for each numeric column for each group.
`DataFrame`	`min(java.lang.String... colNames)` Compute the min value for each numeric column for each group.
`GroupedData`	`pivot(java.lang.String pivotColumn)` Pivots a column of the current `DataFrame` and perform the specified aggregation.
`GroupedData`	`pivot(java.lang.String pivotColumn, java.util.List<java.lang.Object> values)` Pivots a column of the current `DataFrame` and perform the specified aggregation.
`GroupedData`	`pivot(java.lang.String pivotColumn, scala.collection.Seq<java.lang.Object> values)` Pivots a column of the current `DataFrame` and perform the specified aggregation.
`DataFrame`	`sum(scala.collection.Seq<java.lang.String> colNames)` Compute the sum for each numeric columns for each group.
`DataFrame`	`sum(java.lang.String... colNames)` Compute the sum for each numeric columns for each group.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - GroupedData
```
protected GroupedData(DataFrame df,
           scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs,
           org.apache.spark.sql.GroupedData.GroupType groupType)
```
- Method Detail
  - apply
```
public static GroupedData apply(DataFrame df,
                scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs,
                org.apache.spark.sql.GroupedData.GroupType groupType)
```
  - agg
```
public DataFrame agg(Column expr,
            Column... exprs)
```
    Compute aggregates by specifying a series of aggregate columns. Note that this function by default retains the grouping columns in its output. To not retain grouping columns, set spark.sql.retainGroupColumns to false.
    The available aggregate methods are defined in functions.
```
   // Selects the age of the oldest employee and the aggregate expense for each department

   // Scala:
   import org.apache.spark.sql.functions._
   df.groupBy("department").agg(max("age"), sum("expense"))

   // Java:
   import static org.apache.spark.sql.functions.*;
   df.groupBy("department").agg(max("age"), sum("expense"));
 
```
    Note that before Spark 1.4, the default behavior is to NOT retain grouping columns. To change to that behavior, set config variable spark.sql.retainGroupColumns to false.
```
   // Scala, 1.3.x:
   df.groupBy("department").agg($"department", max("age"), sum("expense"))

   // Java, 1.3.x:
   df.groupBy("department").agg(col("department"), max("age"), sum("expense"));
 
```
    Parameters:
    expr - (undocumented)
    exprs - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - mean
```
public DataFrame mean(java.lang.String... colNames)
```
    Compute the average value for each numeric columns for each group. This is an alias for avg. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the average values for them.
    
    Parameters:
    colNames - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - max
```
public DataFrame max(java.lang.String... colNames)
```
    Compute the max value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the max values for them.
    
    Parameters:
    colNames - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - avg
```
public DataFrame avg(java.lang.String... colNames)
```
    Compute the mean value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the mean values for them.
    
    Parameters:
    colNames - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - min
```
public DataFrame min(java.lang.String... colNames)
```
    Compute the min value for each numeric column for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the min values for them.
    
    Parameters:
    colNames - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - sum
```
public DataFrame sum(java.lang.String... colNames)
```
    Compute the sum for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the sum for them.
    
    Parameters:
    colNames - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - agg
```
public DataFrame agg(scala.Tuple2<java.lang.String,java.lang.String> aggExpr,
            scala.collection.Seq<scala.Tuple2<java.lang.String,java.lang.String>> aggExprs)
```
    (Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.
    The available aggregate methods are avg, max, min, sum, count.
```
   // Selects the age of the oldest employee and the aggregate expense for each department
   df.groupBy("department").agg(
     "age" -> "max",
     "expense" -> "sum"
   )
 
```
    Parameters:
    aggExpr - (undocumented)
    aggExprs - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - agg
```
public DataFrame agg(scala.collection.immutable.Map<java.lang.String,java.lang.String> exprs)
```
    (Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.
    The available aggregate methods are avg, max, min, sum, count.
```
   // Selects the age of the oldest employee and the aggregate expense for each department
   df.groupBy("department").agg(Map(
     "age" -> "max",
     "expense" -> "sum"
   ))
 
```
    Parameters:
    exprs - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - agg
```
public DataFrame agg(java.util.Map<java.lang.String,java.lang.String> exprs)
```
    (Java-specific) Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.
    The available aggregate methods are avg, max, min, sum, count.
```
   // Selects the age of the oldest employee and the aggregate expense for each department
   import com.google.common.collect.ImmutableMap;
   df.groupBy("department").agg(ImmutableMap.of("age", "max", "expense", "sum"));
 
```
    Parameters:
    exprs - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - agg
```
public DataFrame agg(Column expr,
            scala.collection.Seq<Column> exprs)
```
    Compute aggregates by specifying a series of aggregate columns. Note that this function by default retains the grouping columns in its output. To not retain grouping columns, set spark.sql.retainGroupColumns to false.
    The available aggregate methods are defined in functions.
```
   // Selects the age of the oldest employee and the aggregate expense for each department

   // Scala:
   import org.apache.spark.sql.functions._
   df.groupBy("department").agg(max("age"), sum("expense"))

   // Java:
   import static org.apache.spark.sql.functions.*;
   df.groupBy("department").agg(max("age"), sum("expense"));
 
```
    Note that before Spark 1.4, the default behavior is to NOT retain grouping columns. To change to that behavior, set config variable spark.sql.retainGroupColumns to false.
```
   // Scala, 1.3.x:
   df.groupBy("department").agg($"department", max("age"), sum("expense"))

   // Java, 1.3.x:
   df.groupBy("department").agg(col("department"), max("age"), sum("expense"));
 
```
    Parameters:
    expr - (undocumented)
    exprs - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - count
```
public DataFrame count()
```
    Count the number of rows for each group. The resulting DataFrame will also contain the grouping columns.
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - mean
```
public DataFrame mean(scala.collection.Seq<java.lang.String> colNames)
```
    Compute the average value for each numeric columns for each group. This is an alias for avg. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the average values for them.
    
    Parameters:
    colNames - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - max
```
public DataFrame max(scala.collection.Seq<java.lang.String> colNames)
```
    Compute the max value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the max values for them.
    
    Parameters:
    colNames - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - avg
```
public DataFrame avg(scala.collection.Seq<java.lang.String> colNames)
```
    Compute the mean value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the mean values for them.
    
    Parameters:
    colNames - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - min
```
public DataFrame min(scala.collection.Seq<java.lang.String> colNames)
```
    Compute the min value for each numeric column for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the min values for them.
    
    Parameters:
    colNames - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - sum
```
public DataFrame sum(scala.collection.Seq<java.lang.String> colNames)
```
    Compute the sum for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the sum for them.
    
    Parameters:
    colNames - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - pivot
```
public GroupedData pivot(java.lang.String pivotColumn)
```
    Pivots a column of the current DataFrame and perform the specified aggregation. There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.
```
   // Compute the sum of earnings for each year by course with each course as a separate column
   df.groupBy("year").pivot("course", Seq("dotNET", "Java")).sum("earnings")

   // Or without specifying column values (less efficient)
   df.groupBy("year").pivot("course").sum("earnings")
 
```
    Parameters:
    pivotColumn - Name of the column to pivot.
    
    Returns:
    (undocumented)
    Since:
    
    1.6.0
  - pivot
```
public GroupedData pivot(java.lang.String pivotColumn,
                scala.collection.Seq<java.lang.Object> values)
```
    Pivots a column of the current DataFrame and perform the specified aggregation. There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.
```
   // Compute the sum of earnings for each year by course with each course as a separate column
   df.groupBy("year").pivot("course", Seq("dotNET", "Java")).sum("earnings")

   // Or without specifying column values (less efficient)
   df.groupBy("year").pivot("course").sum("earnings")
 
```
    Parameters:
    pivotColumn - Name of the column to pivot.
    values - List of values that will be translated to columns in the output DataFrame.
    
    Returns:
    (undocumented)
    Since:
    
    1.6.0
  - pivot
```
public GroupedData pivot(java.lang.String pivotColumn,
                java.util.List<java.lang.Object> values)
```
    Pivots a column of the current DataFrame and perform the specified aggregation. There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.
```
   // Compute the sum of earnings for each year by course with each course as a separate column
   df.groupBy("year").pivot("course", Arrays.<Object>asList("dotNET", "Java")).sum("earnings");

   // Or without specifying column values (less efficient)
   df.groupBy("year").pivot("course").sum("earnings");
 
```
    Parameters:
    pivotColumn - Name of the column to pivot.
    values - List of values that will be translated to columns in the output DataFrame.
    
    Returns:
    (undocumented)
    Since:
    
    1.6.0

Class GroupedData

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

GroupedData

Method Detail

apply

agg

mean

max

avg

min

sum

agg

agg

agg

agg

count

mean

max

avg

min

sum

pivot

pivot

pivot