DataFrameNaFunctions (Spark 2.1.1 JavaDoc)

Object
- org.apache.spark.sql.DataFrameNaFunctions

```
public final class DataFrameNaFunctions
extends Object
```
Functionality for working with missing data in DataFrames.

Since:

1.3.1

Method Summary

Methods
Modifier and Type	Method and Description
`Dataset<Row>`	`drop()` Returns a new `DataFrame` that drops rows containing any null or NaN values.
`Dataset<Row>`	`drop(int minNonNulls)` Returns a new `DataFrame` that drops rows containing less than `minNonNulls` non-null and non-NaN values.
`Dataset<Row>`	`drop(int minNonNulls, scala.collection.Seq<String> cols)` (Scala-specific) Returns a new `DataFrame` that drops rows containing less than `minNonNulls` non-null and non-NaN values in the specified columns.
`Dataset<Row>`	`drop(int minNonNulls, String[] cols)` Returns a new `DataFrame` that drops rows containing less than `minNonNulls` non-null and non-NaN values in the specified columns.
`Dataset<Row>`	`drop(scala.collection.Seq<String> cols)` (Scala-specific) Returns a new `DataFrame` that drops rows containing any null or NaN values in the specified columns.
`Dataset<Row>`	`drop(String how)` Returns a new `DataFrame` that drops rows containing null or NaN values.
`Dataset<Row>`	`drop(String[] cols)` Returns a new `DataFrame` that drops rows containing any null or NaN values in the specified columns.
`Dataset<Row>`	`drop(String how, scala.collection.Seq<String> cols)` (Scala-specific) Returns a new `DataFrame` that drops rows containing null or NaN values in the specified columns.
`Dataset<Row>`	`drop(String how, String[] cols)` Returns a new `DataFrame` that drops rows containing null or NaN values in the specified columns.
`Dataset<Row>`	`fill(double value)` Returns a new `DataFrame` that replaces null or NaN values in numeric columns with `value`.
`Dataset<Row>`	`fill(double value, scala.collection.Seq<String> cols)` (Scala-specific) Returns a new `DataFrame` that replaces null or NaN values in specified numeric columns.
`Dataset<Row>`	`fill(double value, String[] cols)` Returns a new `DataFrame` that replaces null or NaN values in specified numeric columns.
`Dataset<Row>`	`fill(long value)` Returns a new `DataFrame` that replaces null or NaN values in numeric columns with `value`.
`Dataset<Row>`	`fill(long value, scala.collection.Seq<String> cols)` (Scala-specific) Returns a new `DataFrame` that replaces null or NaN values in specified numeric columns.
`Dataset<Row>`	`fill(long value, String[] cols)` Returns a new `DataFrame` that replaces null or NaN values in specified numeric columns.
`Dataset<Row>`	`fill(java.util.Map<String,Object> valueMap)` Returns a new `DataFrame` that replaces null values.
`Dataset<Row>`	`fill(scala.collection.immutable.Map<String,Object> valueMap)` (Scala-specific) Returns a new `DataFrame` that replaces null values.
`Dataset<Row>`	`fill(String value)` Returns a new `DataFrame` that replaces null values in string columns with `value`.
`Dataset<Row>`	`fill(String value, scala.collection.Seq<String> cols)` (Scala-specific) Returns a new `DataFrame` that replaces null values in specified string columns.
`Dataset<Row>`	`fill(String value, String[] cols)` Returns a new `DataFrame` that replaces null values in specified string columns.
`<T> Dataset<Row>`	`replace(scala.collection.Seq<String> cols, scala.collection.immutable.Map<T,T> replacement)` (Scala-specific) Replaces values matching keys in `replacement` map.
`<T> Dataset<Row>`	`replace(String[] cols, java.util.Map<T,T> replacement)` Replaces values matching keys in `replacement` map with the corresponding values.
`<T> Dataset<Row>`	`replace(String col, java.util.Map<T,T> replacement)` Replaces values matching keys in `replacement` map with the corresponding values.
`<T> Dataset<Row>`	`replace(String col, scala.collection.immutable.Map<T,T> replacement)` (Scala-specific) Replaces values matching keys in `replacement` map.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - drop
```
public Dataset<Row> drop()
```
    Returns a new DataFrame that drops rows containing any null or NaN values.
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - drop
```
public Dataset<Row> drop(String how)
```
    Returns a new DataFrame that drops rows containing null or NaN values.
    If how is "any", then drop rows containing any null or NaN values. If how is "all", then drop rows only if every column is null or NaN for that row.
    
    Parameters:
    how - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - drop
```
public Dataset<Row> drop(String[] cols)
```
    Returns a new DataFrame that drops rows containing any null or NaN values in the specified columns.
    
    Parameters:
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - drop
```
public Dataset<Row> drop(scala.collection.Seq<String> cols)
```
    (Scala-specific) Returns a new DataFrame that drops rows containing any null or NaN values in the specified columns.
    
    Parameters:
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - drop
```
public Dataset<Row> drop(String how,
                String[] cols)
```
    Returns a new DataFrame that drops rows containing null or NaN values in the specified columns.
    If how is "any", then drop rows containing any null or NaN values in the specified columns. If how is "all", then drop rows only if every specified column is null or NaN for that row.
    
    Parameters:
    how - (undocumented)
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - drop
```
public Dataset<Row> drop(String how,
                scala.collection.Seq<String> cols)
```
    (Scala-specific) Returns a new DataFrame that drops rows containing null or NaN values in the specified columns.
    If how is "any", then drop rows containing any null or NaN values in the specified columns. If how is "all", then drop rows only if every specified column is null or NaN for that row.
    
    Parameters:
    how - (undocumented)
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - drop
```
public Dataset<Row> drop(int minNonNulls)
```
    Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values.
    
    Parameters:
    minNonNulls - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - drop
```
public Dataset<Row> drop(int minNonNulls,
                String[] cols)
```
    Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values in the specified columns.
    
    Parameters:
    minNonNulls - (undocumented)
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - drop
```
public Dataset<Row> drop(int minNonNulls,
                scala.collection.Seq<String> cols)
```
    (Scala-specific) Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values in the specified columns.
    
    Parameters:
    minNonNulls - (undocumented)
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - fill
```
public Dataset<Row> fill(long value)
```
    Returns a new DataFrame that replaces null or NaN values in numeric columns with value.
    
    Parameters:
    value - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    2.1.1
  - fill
```
public Dataset<Row> fill(double value)
```
    Returns a new DataFrame that replaces null or NaN values in numeric columns with value.
    
    Parameters:
    value - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - fill
```
public Dataset<Row> fill(String value)
```
    Returns a new DataFrame that replaces null values in string columns with value.
    
    Parameters:
    value - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - fill
```
public Dataset<Row> fill(long value,
                String[] cols)
```
    Returns a new DataFrame that replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.
    
    Parameters:
    value - (undocumented)
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    2.1.1
  - fill
```
public Dataset<Row> fill(double value,
                String[] cols)
```
    Returns a new DataFrame that replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.
    
    Parameters:
    value - (undocumented)
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - fill
```
public Dataset<Row> fill(long value,
                scala.collection.Seq<String> cols)
```
    (Scala-specific) Returns a new DataFrame that replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.
    
    Parameters:
    value - (undocumented)
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    2.1.1
  - fill
```
public Dataset<Row> fill(double value,
                scala.collection.Seq<String> cols)
```
    (Scala-specific) Returns a new DataFrame that replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.
    
    Parameters:
    value - (undocumented)
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - fill
```
public Dataset<Row> fill(String value,
                String[] cols)
```
    Returns a new DataFrame that replaces null values in specified string columns. If a specified column is not a string column, it is ignored.
    
    Parameters:
    value - (undocumented)
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - fill
```
public Dataset<Row> fill(String value,
                scala.collection.Seq<String> cols)
```
    (Scala-specific) Returns a new DataFrame that replaces null values in specified string columns. If a specified column is not a string column, it is ignored.
    
    Parameters:
    value - (undocumented)
    cols - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - fill
```
public Dataset<Row> fill(java.util.Map<String,Object> valueMap)
```
    Returns a new DataFrame that replaces null values.
    The key of the map is the column name, and the value of the map is the replacement value. The value must be of the following type: Integer, Long, Float, Double, String, Boolean. Replacement values are cast to the column data type.
    For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.
```
   import com.google.common.collect.ImmutableMap;
   df.na.fill(ImmutableMap.of("A", "unknown", "B", 1.0));
 
```
    Parameters:
    valueMap - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - fill
```
public Dataset<Row> fill(scala.collection.immutable.Map<String,Object> valueMap)
```
    (Scala-specific) Returns a new DataFrame that replaces null values.
    The key of the map is the column name, and the value of the map is the replacement value. The value must be of the following type: Int, Long, Float, Double, String, Boolean. Replacement values are cast to the column data type.
    For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.
```
   df.na.fill(Map(
     "A" -> "unknown",
     "B" -> 1.0
   ))
 
```
    Parameters:
    valueMap - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - replace
```
public <T> Dataset<Row> replace(String col,
                       java.util.Map<T,T> replacement)
```
    Replaces values matching keys in replacement map with the corresponding values. Key and value of replacement map must have the same type, and can only be doubles, strings or booleans. If col is "*", then the replacement is applied on all string columns or numeric columns.
```
   import com.google.common.collect.ImmutableMap;

   // Replaces all occurrences of 1.0 with 2.0 in column "height".
   df.replace("height", ImmutableMap.of(1.0, 2.0));

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
   df.replace("name", ImmutableMap.of("UNKNOWN", "unnamed"));

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
   df.replace("*", ImmutableMap.of("UNKNOWN", "unnamed"));
 
```
    Parameters:
    col - name of the column to apply the value replacement
    replacement - value replacement map, as explained above
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - replace
```
public <T> Dataset<Row> replace(String[] cols,
                       java.util.Map<T,T> replacement)
```
    Replaces values matching keys in replacement map with the corresponding values. Key and value of replacement map must have the same type, and can only be doubles, strings or booleans.
```
   import com.google.common.collect.ImmutableMap;

   // Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight".
   df.replace(new String[] {"height", "weight"}, ImmutableMap.of(1.0, 2.0));

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname".
   df.replace(new String[] {"firstname", "lastname"}, ImmutableMap.of("UNKNOWN", "unnamed"));
 
```
    Parameters:
    cols - list of columns to apply the value replacement
    replacement - value replacement map, as explained above
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - replace
```
public <T> Dataset<Row> replace(String col,
                       scala.collection.immutable.Map<T,T> replacement)
```
    (Scala-specific) Replaces values matching keys in replacement map. Key and value of replacement map must have the same type, and can only be doubles, strings or booleans. If col is "*", then the replacement is applied on all string columns , numeric columns or boolean columns.
```
   // Replaces all occurrences of 1.0 with 2.0 in column "height".
   df.replace("height", Map(1.0 -> 2.0))

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
   df.replace("name", Map("UNKNOWN" -> "unnamed")

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
   df.replace("*", Map("UNKNOWN" -> "unnamed")
 
```
    Parameters:
    col - name of the column to apply the value replacement
    replacement - value replacement map, as explained above
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1
  - replace
```
public <T> Dataset<Row> replace(scala.collection.Seq<String> cols,
                       scala.collection.immutable.Map<T,T> replacement)
```
    (Scala-specific) Replaces values matching keys in replacement map. Key and value of replacement map must have the same type, and can only be doubles , strings or booleans.
```
   // Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight".
   df.replace("height" :: "weight" :: Nil, Map(1.0 -> 2.0));

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname".
   df.replace("firstname" :: "lastname" :: Nil, Map("UNKNOWN" -> "unnamed");
 
```
    Parameters:
    cols - list of columns to apply the value replacement
    replacement - value replacement map, as explained above
    
    Returns:
    (undocumented)
    Since:
    
    1.3.1

Class DataFrameNaFunctions

Method Summary

Methods inherited from class Object

Method Detail

drop

drop

drop

drop

drop

drop

drop

drop

drop

fill

fill

fill

fill

fill

fill

fill

fill

fill

fill

fill

replace

replace

replace

replace