ParquetRelation (Spark 1.3.1 JavaDoc)

Object
- org.apache.spark.sql.catalyst.trees.TreeNode<PlanType>
- - org.apache.spark.sql.catalyst.plans.QueryPlan<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>
  - - org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
    - - org.apache.spark.sql.catalyst.plans.logical.LeafNode
      - org.apache.spark.sql.parquet.ParquetRelation

All Implemented Interfaces:

java.io.Serializable, Logging, org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation, org.apache.spark.sql.catalyst.trees.LeafNode<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>, scala.Equals, scala.Product
```
public class ParquetRelation
extends org.apache.spark.sql.catalyst.plans.logical.LeafNode
implements org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation, scala.Product, scala.Serializable
```
Relation that consists of data stored in a Parquet columnar format.
Users should interact with parquet files though a DataFrame, created by a SQLContext instead of using this class directly.
```
   val parquetRDD = sqlContext.parquetFile("path/to/parquet.file")
 
```
See Also:
Serialized Form

Constructor Summary

Constructors
Constructor and Description
`ParquetRelation(String path, scala.Option<org.apache.hadoop.conf.Configuration> conf, SQLContext sqlContext, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> partitioningAttributes)`

Method Summary

Methods
Modifier and Type	Method and Description
`org.apache.spark.sql.catalyst.expressions.AttributeMap<org.apache.spark.sql.catalyst.expressions.Attribute>`	`attributeMap()`
`scala.Option<org.apache.hadoop.conf.Configuration>`	`conf()`
`static ParquetRelation`	`create(String pathString, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan child, org.apache.hadoop.conf.Configuration conf, SQLContext sqlContext)` Creates a new ParquetRelation and underlying Parquetfile for the given LogicalPlan.
`static ParquetRelation`	`createEmpty(String pathString, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes, boolean allowExisting, org.apache.hadoop.conf.Configuration conf, SQLContext sqlContext)` Creates an empty ParquetRelation and underlying Parquetfile that only consists of the Metadata for the given schema.
`static void`	`enableLogForwarding()`
`boolean`	`equals(Object other)`
`int`	`hashCode()`
`ParquetRelation`	`newInstance()`
`scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute>`	`output()` Attributes
`parquet.schema.MessageType`	`parquetSchema()` Schema derived from ParquetFile
`scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute>`	`partitioningAttributes()`
`String`	`path()`
`static scala.collection.immutable.Map<String,parquet.hadoop.metadata.CompressionCodecName>`	`shortParquetCompressionCodecNames()`
`SQLContext`	`sqlContext()`
`org.apache.spark.sql.catalyst.plans.logical.Statistics`	`statistics()`

Methods inherited from class org.apache.spark.sql.catalyst.plans.logical.LeafNode
children

Methods inherited from class org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
childrenResolved, cleanArgs, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$Logging$$log__$eq, org$apache$spark$Logging$$log_, org$apache$spark$sql$catalyst$plans$logical$LogicalPlan$$resolveAsColumn, org$apache$spark$sql$catalyst$plans$logical$LogicalPlan$$resolveAsTableColumn, resolve, resolve, resolve$default$3, resolveChildren, resolveChildren$default$3, resolved, resolveGetField, sameResult, statePrefix

Methods inherited from class org.apache.spark.sql.catalyst.plans.QueryPlan
expressions, inputSet, missingInput, org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionDown$1, org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1, outputSet, printSchema, references, schema, schemaString, simpleString, transformAllExpressions, transformExpressions, transformExpressionsDown, transformExpressionsUp

Methods inherited from class org.apache.spark.sql.catalyst.trees.TreeNode
apply, argString, asCode, collect, fastEquals, flatMap, foreach, foreachUp, generateTreeString, getNodeNumbered, makeCopy, map, mapChildren, nodeName, numberedTreeString, origin, otherCopyArgs, stringArgs, toString, transform, transformChildrenDown, transformChildrenUp, transformDown, transformUp, treeString, withNewChildren

Methods inherited from class Object
getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface scala.Product
productArity, productElement, productIterator, productPrefix

Methods inherited from interface scala.Equals
canEqual

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, log_

Constructor Detail

ParquetRelation

public ParquetRelation(String path,
               scala.Option<org.apache.hadoop.conf.Configuration> conf,
               SQLContext sqlContext,
               scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> partitioningAttributes)

Method Detail

enableLogForwarding

public static void enableLogForwarding()

shortParquetCompressionCodecNames

public static scala.collection.immutable.Map<String,parquet.hadoop.metadata.CompressionCodecName> shortParquetCompressionCodecNames()

create
```
public static ParquetRelation create(String pathString,
                     org.apache.spark.sql.catalyst.plans.logical.LogicalPlan child,
                     org.apache.hadoop.conf.Configuration conf,
                     SQLContext sqlContext)
```
Creates a new ParquetRelation and underlying Parquetfile for the given LogicalPlan. Note that this is used inside SparkStrategies to create a resolved relation as a data sink for writing to a Parquetfile. The relation is empty but is initialized with ParquetMetadata and can be inserted into.

Parameters:
pathString - The directory the Parquetfile will be stored in.
child - The child node that will be used for extracting the schema.
conf - A configuration to be used.

Returns:
An empty ParquetRelation with inferred metadata.

createEmpty

public static ParquetRelation createEmpty(String pathString,
                          scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes,
                          boolean allowExisting,
                          org.apache.hadoop.conf.Configuration conf,
                          SQLContext sqlContext)

Creates an empty ParquetRelation and underlying Parquetfile that only consists of the Metadata for the given schema.

Parameters:: pathString - The directory the Parquetfile will be stored in.; attributes - The schema of the relation.; conf - A configuration to be used.
Returns:: An empty ParquetRelation.

path
```
public String path()
```

conf

public scala.Option<org.apache.hadoop.conf.Configuration> conf()

sqlContext
```
public SQLContext sqlContext()
```

partitioningAttributes

public scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> partitioningAttributes()

parquetSchema

public parquet.schema.MessageType parquetSchema()

Schema derived from ParquetFile

output
```
public scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> output()
```
Attributes

Specified by:

output in class org.apache.spark.sql.catalyst.plans.QueryPlan<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>

attributeMap

public org.apache.spark.sql.catalyst.expressions.AttributeMap<org.apache.spark.sql.catalyst.expressions.Attribute> attributeMap()

newInstance
```
public ParquetRelation newInstance()
```
Specified by:

newInstance in interface org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation

equals
```
public boolean equals(Object other)
```
Specified by:

equals in interface scala.Equals

Overrides:

equals in class Object

hashCode
```
public int hashCode()
```
Overrides:

hashCode in class Object

statistics
```
public org.apache.spark.sql.catalyst.plans.logical.Statistics statistics()
```
Overrides:

statistics in class org.apache.spark.sql.catalyst.plans.logical.LogicalPlan

Class ParquetRelation

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.sql.catalyst.plans.logical.LeafNode

Methods inherited from class org.apache.spark.sql.catalyst.plans.logical.LogicalPlan

Methods inherited from class org.apache.spark.sql.catalyst.plans.QueryPlan

Methods inherited from class org.apache.spark.sql.catalyst.trees.TreeNode

Methods inherited from class Object

Methods inherited from interface scala.Product

Methods inherited from interface scala.Equals

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

ParquetRelation

Method Detail

enableLogForwarding

shortParquetCompressionCodecNames

create

createEmpty

path

conf

sqlContext

partitioningAttributes

parquetSchema

output

attributeMap

newInstance

equals

hashCode

statistics