public class JdbcRDD<T> extends RDD<T> implements Logging
param: getConnection a function that returns an open Connection. The RDD takes care of closing the connection. param: sql the text of the query. The query must contain two ? placeholders for parameters used to partition the results. E.g. "select title, author from books where ? <= id and id <= ?" param: lowerBound the minimum value of the first placeholder param: upperBound the maximum value of the second placeholder The lower and upper bounds are inclusive. param: numPartitions the number of partitions. Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2, the query would be executed twice, once with (1, 10) and once with (11, 20) param: mapRow a function from a ResultSet to a single row of the desired result type(s). This should only call getInt, getString, etc; the RDD takes care of calling next. The default maps a ResultSet to an array of Object.
Modifier and Type | Class and Description |
---|---|
static interface |
JdbcRDD.ConnectionFactory |
Constructor and Description |
---|
JdbcRDD(SparkContext sc,
scala.Function0<java.sql.Connection> getConnection,
java.lang.String sql,
long lowerBound,
long upperBound,
int numPartitions,
scala.Function1<java.sql.ResultSet,T> mapRow,
scala.reflect.ClassTag<T> evidence$1) |
Modifier and Type | Method and Description |
---|---|
scala.collection.Iterator<T> |
compute(Partition thePart,
TaskContext context)
:: DeveloperApi ::
Implemented by subclasses to compute a given partition.
|
static JavaRDD<java.lang.Object[]> |
create(JavaSparkContext sc,
JdbcRDD.ConnectionFactory connectionFactory,
java.lang.String sql,
long lowerBound,
long upperBound,
int numPartitions)
Create an RDD that executes an SQL query on a JDBC connection and reads results.
|
static <T> JavaRDD<T> |
create(JavaSparkContext sc,
JdbcRDD.ConnectionFactory connectionFactory,
java.lang.String sql,
long lowerBound,
long upperBound,
int numPartitions,
Function<java.sql.ResultSet,T> mapRow)
Create an RDD that executes an SQL query on a JDBC connection and reads results.
|
Partition[] |
getPartitions()
Implemented by subclasses to return the set of partitions in this RDD.
|
static java.lang.Object[] |
resultSetToObjectArray(java.sql.ResultSet rs) |
aggregate, cache, cartesian, checkpoint, checkpointData, clearDependencies, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, filterWith, first, firstParent, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getDependencies, getNumPartitions, getPreferredLocations, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, localCheckpoint, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, numericRDDToDoubleRDDFunctions, parent, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public JdbcRDD(SparkContext sc, scala.Function0<java.sql.Connection> getConnection, java.lang.String sql, long lowerBound, long upperBound, int numPartitions, scala.Function1<java.sql.ResultSet,T> mapRow, scala.reflect.ClassTag<T> evidence$1)
public static java.lang.Object[] resultSetToObjectArray(java.sql.ResultSet rs)
public static <T> JavaRDD<T> create(JavaSparkContext sc, JdbcRDD.ConnectionFactory connectionFactory, java.lang.String sql, long lowerBound, long upperBound, int numPartitions, Function<java.sql.ResultSet,T> mapRow)
connectionFactory
- a factory that returns an open Connection.
The RDD takes care of closing the connection.sql
- the text of the query.
The query must contain two ? placeholders for parameters used to partition the results.
E.g. "select title, author from books where ? <= id and id <= ?"lowerBound
- the minimum value of the first placeholderupperBound
- the maximum value of the second placeholder
The lower and upper bounds are inclusive.numPartitions
- the number of partitions.
Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2,
the query would be executed twice, once with (1, 10) and once with (11, 20)mapRow
- a function from a ResultSet to a single row of the desired result type(s).
This should only call getInt, getString, etc; the RDD takes care of calling next.
The default maps a ResultSet to an array of Object.sc
- (undocumented)public static JavaRDD<java.lang.Object[]> create(JavaSparkContext sc, JdbcRDD.ConnectionFactory connectionFactory, java.lang.String sql, long lowerBound, long upperBound, int numPartitions)
Object
array. For usage example, see test case JavaAPISuite.testJavaJdbcRDD.
connectionFactory
- a factory that returns an open Connection.
The RDD takes care of closing the connection.sql
- the text of the query.
The query must contain two ? placeholders for parameters used to partition the results.
E.g. "select title, author from books where ? <= id and id <= ?"lowerBound
- the minimum value of the first placeholderupperBound
- the maximum value of the second placeholder
The lower and upper bounds are inclusive.numPartitions
- the number of partitions.
Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2,
the query would be executed twice, once with (1, 10) and once with (11, 20)sc
- (undocumented)public Partition[] getPartitions()
RDD
getPartitions
in class RDD<T>
public scala.collection.Iterator<T> compute(Partition thePart, TaskContext context)
RDD