public class ParquetRelation
extends org.apache.spark.sql.catalyst.plans.logical.LeafNode
implements org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation, scala.Product, scala.Serializable
Users should interact with parquet files though a DataFrame, created by a SQLContext
instead of using this class directly.
val parquetRDD = sqlContext.parquetFile("path/to/parquet.file")
| Constructor and Description |
|---|
ParquetRelation(String path,
scala.Option<org.apache.hadoop.conf.Configuration> conf,
SQLContext sqlContext,
scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> partitioningAttributes) |
| Modifier and Type | Method and Description |
|---|---|
org.apache.spark.sql.catalyst.expressions.AttributeMap<org.apache.spark.sql.catalyst.expressions.Attribute> |
attributeMap() |
scala.Option<org.apache.hadoop.conf.Configuration> |
conf() |
static ParquetRelation |
create(String pathString,
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan child,
org.apache.hadoop.conf.Configuration conf,
SQLContext sqlContext)
Creates a new ParquetRelation and underlying Parquetfile for the given LogicalPlan.
|
static ParquetRelation |
createEmpty(String pathString,
scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes,
boolean allowExisting,
org.apache.hadoop.conf.Configuration conf,
SQLContext sqlContext)
Creates an empty ParquetRelation and underlying Parquetfile that only
consists of the Metadata for the given schema.
|
static void |
enableLogForwarding() |
boolean |
equals(Object other) |
int |
hashCode() |
ParquetRelation |
newInstance() |
scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> |
output()
Attributes
|
parquet.schema.MessageType |
parquetSchema()
Schema derived from ParquetFile
|
scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> |
partitioningAttributes() |
String |
path() |
static scala.collection.immutable.Map<String,parquet.hadoop.metadata.CompressionCodecName> |
shortParquetCompressionCodecNames() |
SQLContext |
sqlContext() |
org.apache.spark.sql.catalyst.plans.logical.Statistics |
statistics() |
childrenResolved, cleanArgs, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$Logging$$log__$eq, org$apache$spark$Logging$$log_, org$apache$spark$sql$catalyst$plans$logical$LogicalPlan$$resolveAsColumn, org$apache$spark$sql$catalyst$plans$logical$LogicalPlan$$resolveAsTableColumn, resolve, resolve, resolve$default$3, resolveChildren, resolveChildren$default$3, resolved, resolveGetField, sameResult, statePrefixexpressions, inputSet, missingInput, org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionDown$1, org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1, outputSet, printSchema, references, schema, schemaString, simpleString, transformAllExpressions, transformExpressions, transformExpressionsDown, transformExpressionsUpapply, argString, asCode, collect, fastEquals, flatMap, foreach, foreachUp, generateTreeString, getNodeNumbered, makeCopy, map, mapChildren, nodeName, numberedTreeString, origin, otherCopyArgs, stringArgs, toString, transform, transformChildrenDown, transformChildrenUp, transformDown, transformUp, treeString, withNewChildrenproductArity, productElement, productIterator, productPrefixinitializeIfNecessary, initializeLogging, log_public ParquetRelation(String path,
scala.Option<org.apache.hadoop.conf.Configuration> conf,
SQLContext sqlContext,
scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> partitioningAttributes)
public static void enableLogForwarding()
public static scala.collection.immutable.Map<String,parquet.hadoop.metadata.CompressionCodecName> shortParquetCompressionCodecNames()
public static ParquetRelation create(String pathString, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan child, org.apache.hadoop.conf.Configuration conf, SQLContext sqlContext)
SparkStrategies to
create a resolved relation as a data sink for writing to a Parquetfile. The relation is empty
but is initialized with ParquetMetadata and can be inserted into.
pathString - The directory the Parquetfile will be stored in.child - The child node that will be used for extracting the schema.conf - A configuration to be used.public static ParquetRelation createEmpty(String pathString, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes, boolean allowExisting, org.apache.hadoop.conf.Configuration conf, SQLContext sqlContext)
pathString - The directory the Parquetfile will be stored in.attributes - The schema of the relation.conf - A configuration to be used.public String path()
public scala.Option<org.apache.hadoop.conf.Configuration> conf()
public SQLContext sqlContext()
public scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> partitioningAttributes()
public parquet.schema.MessageType parquetSchema()
public scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> output()
output in class org.apache.spark.sql.catalyst.plans.QueryPlan<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>public org.apache.spark.sql.catalyst.expressions.AttributeMap<org.apache.spark.sql.catalyst.expressions.Attribute> attributeMap()
public ParquetRelation newInstance()
newInstance in interface org.apache.spark.sql.catalyst.analysis.MultiInstanceRelationpublic boolean equals(Object other)
equals in interface scala.Equalsequals in class Objectpublic int hashCode()
hashCode in class Objectpublic org.apache.spark.sql.catalyst.plans.logical.Statistics statistics()
statistics in class org.apache.spark.sql.catalyst.plans.logical.LogicalPlan