HadoopFsRelation

Instance Constructors

new HadoopFsRelation()

Abstract Value Members

abstract def dataSchema: StructType

Specifies schema of actual data files.
Specifies schema of actual data files. For partitioned relations, if one or more partitioned columns are contained in the data files, they should also appear in dataSchema.

Since
1.4.0
abstract def paths: Array[String]

Base paths of this relation.
Base paths of this relation. For partitioned relations, it should be either root directories of all partition directories.

Since
1.4.0
abstract def prepareJobForWrite(job: Job): OutputWriterFactory

Prepares a write job and returns an OutputWriterFactory.
Prepares a write job and returns an OutputWriterFactory. Client side job preparation can be put here. For example, user defined output committer can be configured here by setting the output committer class in the conf of spark.sql.sources.outputCommitterClass.
Note that the only side effect expected here is mutating job via its setters. Especially, Spark SQL caches BaseRelation instances for performance, mutating relation internal states may cause unexpected behaviors.

Since
1.4.0
abstract def sqlContext: SQLContext

Definition Classes
BaseRelation

Concrete Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def buildScan(requiredColumns: Array[String], filters: Array[Filter], inputFiles: Array[FileStatus]): RDD[Row]

For a non-partitioned relation, this method builds an RDD[Row] containing all rows within this relation.
For a non-partitioned relation, this method builds an RDD[Row] containing all rows within this relation. For partitioned relations, this method is called for each selected partition, and builds an RDD[Row] containing all rows within that single partition.
requiredColumns
Required columns.
filters
Candidate filters to be pushed down. The actual filter should be the conjunction of all filters. The pushed down filters are currently purely an optimization as they will all be evaluated again. This means it is safe to use them with methods that produce false positives such as filtering partitions based on a bloom filter.
inputFiles
For a non-partitioned relation, it contains paths of all data files in the relation. For a partitioned relation, it contains paths of all data files in a single selected partition.

Since
1.4.0
def buildScan(requiredColumns: Array[String], inputFiles: Array[FileStatus]): RDD[Row]

For a non-partitioned relation, this method builds an RDD[Row] containing all rows within this relation.
For a non-partitioned relation, this method builds an RDD[Row] containing all rows within this relation. For partitioned relations, this method is called for each selected partition, and builds an RDD[Row] containing all rows within that single partition.
requiredColumns
Required columns.
inputFiles
For a non-partitioned relation, it contains paths of all data files in the relation. For a partitioned relation, it contains paths of all data files in a single selected partition.

Since
1.4.0
def buildScan(inputFiles: Array[FileStatus]): RDD[Row]

For a non-partitioned relation, this method builds an RDD[Row] containing all rows within this relation.
For a non-partitioned relation, this method builds an RDD[Row] containing all rows within this relation. For partitioned relations, this method is called for each selected partition, and builds an RDD[Row] containing all rows within that single partition.
inputFiles
For a non-partitioned relation, it contains paths of all data files in the relation. For a partitioned relation, it contains paths of all data files in a single selected partition.

Since
1.4.0
def cachedLeafStatuses(): Set[FileStatus]

Attributes
protected
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def needConversion: Boolean

Whether does it need to convert the objects in Row to internal representation, for example: java.
Whether does it need to convert the objects in Row to internal representation, for example: java.lang.String -> UTF8String java.lang.Decimal -> Decimal
Note: The internal representation is not stable across releases and thus data sources outside of Spark SQL should leave this as true.

Definition Classes
BaseRelation
Since
1.4.0
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
final def partitionColumns: StructType

Partition columns.
Partition columns. Can be either defined by userDefinedPartitionColumns or automatically discovered. Note that they should always be nullable.

Since
1.4.0
lazy val schema: StructType

Schema of this relation.
Schema of this relation. It consists of columns appearing in dataSchema and all partition columns not appearing in dataSchema.

Definition Classes
HadoopFsRelation → BaseRelation
Since
1.4.0
def sizeInBytes: Long

Returns an estimated size of this relation in bytes.
Returns an estimated size of this relation in bytes. This information is used by the planner to decided when it is safe to broadcast a relation and can be overridden by sources that know the size ahead of time. By default, the system will assume that tables are too large to broadcast. This method will be called multiple times during query planning and thus should not perform expensive operations for each invocation.
Note that it is always better to overestimate size than underestimate, because underestimation could lead to execution plans that are suboptimal (i.e. broadcasting a very large table).

Definition Classes
BaseRelation
Since
1.3.0
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
def userDefinedPartitionColumns: Option[StructType]

Optional user defined partition columns.
Optional user defined partition columns.

Since
1.4.0
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

abstract class HadoopFsRelation extends BaseRelation

Instance Constructors

new HadoopFsRelation()

Abstract Value Members

abstract def dataSchema: StructType

abstract def paths: Array[String]

abstract def prepareJobForWrite(job: Job): OutputWriterFactory

abstract def sqlContext: SQLContext

Concrete Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def buildScan(requiredColumns: Array[String], filters: Array[Filter], inputFiles: Array[FileStatus]): RDD[Row]

def buildScan(requiredColumns: Array[String], inputFiles: Array[FileStatus]): RDD[Row]

def buildScan(inputFiles: Array[FileStatus]): RDD[Row]

def cachedLeafStatuses(): Set[FileStatus]

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

def needConversion: Boolean

final def notify(): Unit

final def notifyAll(): Unit

final def partitionColumns: StructType

lazy val schema: StructType

def sizeInBytes: Long

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

def userDefinedPartitionColumns: Option[StructType]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from BaseRelation

Inherited from AnyRef

Inherited from Any

Ungrouped