public class ParquetRelation
extends org.apache.spark.sql.catalyst.plans.logical.LeafNode
implements org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation, scala.Product, scala.Serializable
Users should interact with parquet files though a SchemaRDD, created by a SQLContext
instead
of using this class directly.
val parquetRDD = sqlContext.parquetFile("path/to/parquet.file")
Constructor and Description |
---|
ParquetRelation(String path,
scala.Option<org.apache.hadoop.conf.Configuration> conf,
SQLContext sqlContext,
scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> partitioningAttributes) |
Modifier and Type | Method and Description |
---|---|
org.apache.spark.sql.catalyst.expressions.AttributeMap<org.apache.spark.sql.catalyst.expressions.Attribute> |
attributeMap() |
scala.Option<org.apache.hadoop.conf.Configuration> |
conf() |
static ParquetRelation |
create(String pathString,
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan child,
org.apache.hadoop.conf.Configuration conf,
SQLContext sqlContext)
Creates a new ParquetRelation and underlying Parquetfile for the given LogicalPlan.
|
static ParquetRelation |
createEmpty(String pathString,
scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes,
boolean allowExisting,
org.apache.hadoop.conf.Configuration conf,
SQLContext sqlContext)
Creates an empty ParquetRelation and underlying Parquetfile that only
consists of the Metadata for the given schema.
|
static void |
enableLogForwarding() |
boolean |
equals(Object other) |
ParquetRelation |
newInstance() |
scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> |
output()
Attributes
|
parquet.schema.MessageType |
parquetSchema()
Schema derived from ParquetFile
|
scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> |
partitioningAttributes() |
String |
path() |
static scala.collection.immutable.Map<String,parquet.hadoop.metadata.CompressionCodecName> |
shortParquetCompressionCodecNames() |
SQLContext |
sqlContext() |
org.apache.spark.sql.catalyst.plans.logical.Statistics |
statistics() |
childrenResolved, cleanArgs, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$Logging$$log__$eq, org$apache$spark$Logging$$log_, resolve, resolve, resolveChildren, resolved, sameResult, statePrefix
expressions, inputSet, missingInput, org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionDown$1, org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1, outputSet, printSchema, references, schema, schemaString, simpleString, transformAllExpressions, transformExpressions, transformExpressionsDown, transformExpressionsUp
apply, argString, asCode, collect, fastEquals, flatMap, foreach, generateTreeString, getNodeNumbered, makeCopy, map, mapChildren, nodeName, numberedTreeString, otherCopyArgs, stringArgs, toString, transform, transformChildrenDown, transformChildrenUp, transformDown, transformUp, treeString, withNewChildren
productArity, productElement, productIterator, productPrefix
initializeIfNecessary, initializeLogging, log_
public ParquetRelation(String path, scala.Option<org.apache.hadoop.conf.Configuration> conf, SQLContext sqlContext, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> partitioningAttributes)
public static void enableLogForwarding()
public static scala.collection.immutable.Map<String,parquet.hadoop.metadata.CompressionCodecName> shortParquetCompressionCodecNames()
public static ParquetRelation create(String pathString, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan child, org.apache.hadoop.conf.Configuration conf, SQLContext sqlContext)
SparkStrategies
to
create a resolved relation as a data sink for writing to a Parquetfile. The relation is empty
but is initialized with ParquetMetadata and can be inserted into.
pathString
- The directory the Parquetfile will be stored in.child
- The child node that will be used for extracting the schema.conf
- A configuration to be used.public static ParquetRelation createEmpty(String pathString, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes, boolean allowExisting, org.apache.hadoop.conf.Configuration conf, SQLContext sqlContext)
pathString
- The directory the Parquetfile will be stored in.attributes
- The schema of the relation.conf
- A configuration to be used.public String path()
public scala.Option<org.apache.hadoop.conf.Configuration> conf()
public SQLContext sqlContext()
public scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> partitioningAttributes()
public parquet.schema.MessageType parquetSchema()
public scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> output()
output
in class org.apache.spark.sql.catalyst.plans.QueryPlan<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>
public org.apache.spark.sql.catalyst.expressions.AttributeMap<org.apache.spark.sql.catalyst.expressions.Attribute> attributeMap()
public ParquetRelation newInstance()
newInstance
in interface org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
public boolean equals(Object other)
equals
in interface scala.Equals
equals
in class Object
public org.apache.spark.sql.catalyst.plans.logical.Statistics statistics()
statistics
in class org.apache.spark.sql.catalyst.plans.logical.LogicalPlan