An input stream that always returns the same RDD on each time step.
An input stream that always returns the same RDD on each time step. Useful for testing.
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs).
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous
sequence of RDDs (of the same type) representing a continuous stream of data (see
org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs).
DStreams can either be created from live data (such as, data from TCP sockets, Kafka, Flume,
etc.) using a org.apache.spark.streaming.StreamingContext or it can be generated by
transforming existing DStreams using operations such as map
,
window
and reduceByKeyAndWindow
. While a Spark Streaming program is running, each DStream
periodically generates a RDD, either from live data or by transforming the RDD generated by a
parent DStream.
This class contains the basic operations available on all DStreams, such as map
, filter
and
window
. In addition, org.apache.spark.streaming.dstream.PairDStreamFunctions contains
operations available only on DStreams of key-value pairs, such as groupByKeyAndWindow
and
join
. These operations are automatically available on any DStream of pairs
(e.g., DStream[(Int, Int)] through implicit conversions.
A DStream internally is characterized by a few basic properties:
This is the abstract base class for all input streams.
This is the abstract base class for all input streams. This class provides methods start() and stop() which are called by Spark Streaming system to start and stop receiving data, respectively. Input streams that can generate RDDs from new data by running a service/thread only on the driver node (that is, without running a receiver on worker nodes), can be implemented by directly inheriting this InputDStream. For example, FileInputDStream, a subclass of InputDStream, monitors an HDFS directory from the driver for new files and generates RDDs with the new files. For implementing input streams that requires running a receiver on the worker nodes, use org.apache.spark.streaming.dstream.ReceiverInputDStream as the parent class.
:: Experimental ::
DStream representing the stream of data generated by mapWithState
operation on a
pair DStream.
:: Experimental ::
DStream representing the stream of data generated by mapWithState
operation on a
pair DStream.
Additionally, it also gives access to the stream of state snapshots, that is, the state data of
all keys after a batch has updated them.
Class of the key
Class of the value
Class of the state data
Class of the mapped data
Extra functions available on DStream of (key, value) pairs through an implicit conversion.
Abstract class for defining any org.apache.spark.streaming.dstream.InputDStream that has to start a receiver on worker nodes to receive external data.
Abstract class for defining any org.apache.spark.streaming.dstream.InputDStream that has to start a receiver on worker nodes to receive external data. Specific implementations of ReceiverInputDStream must define getReceiver function that gets the receiver object of type org.apache.spark.streaming.receiver.Receiver that will be sent to the workers to receive data.
Class type of the object of this stream
Various implementations of DStream's.
org.apache.spark.streaming.dstream.DStream