Recreate a StreamingContext from a checkpoint file using an existing SparkContext.
Recreate a StreamingContext from a checkpoint file using an existing SparkContext.
Path to the directory that was specified as the checkpoint directory
Existing SparkContext
Recreate a StreamingContext from a checkpoint file.
Recreate a StreamingContext from a checkpoint file.
Path to the directory that was specified as the checkpoint directory
Recreate a StreamingContext from a checkpoint file.
Recreate a StreamingContext from a checkpoint file.
Path to the directory that was specified as the checkpoint directory
Optional, configuration object if necessary for reading from HDFS compatible filesystems
Create a StreamingContext by providing the details necessary for creating a new SparkContext.
Create a StreamingContext by providing the details necessary for creating a new SparkContext.
cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]).
a name for your job, to display on the cluster web UI
the time interval at which streaming data will be divided into batches
Create a StreamingContext by providing the configuration necessary for a new SparkContext.
Create a StreamingContext by providing the configuration necessary for a new SparkContext.
a org.apache.spark.SparkConf object specifying Spark parameters
the time interval at which streaming data will be divided into batches
Create a StreamingContext using an existing SparkContext.
Create a StreamingContext using an existing SparkContext.
existing SparkContext
the time interval at which streaming data will be divided into batches
Create an input stream with any arbitrary user implemented actor receiver.
Create an input stream with any arbitrary user implemented actor receiver. Find more details at: http://spark.apache.org/docs/latest/streaming-custom-receivers.html
Props object defining creation of the actor
Name of the actor
RDD storage level (default: StorageLevel.MEMORY_AND_DISK_SER_2)
An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.
Add a org.apache.spark.streaming.scheduler.StreamingListener object for receiving system events related to streaming.
Wait for the execution to stop.
Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.
Wait for the execution to stop.
Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.
time to wait in milliseconds
true
if it's stopped; or throw the reported error during the execution; or false
if the waiting time elapsed before returning from the method.
:: Experimental ::
:: Experimental ::
Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as flat binary files, assuming a fixed length per record, generating one byte array per record. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.
Note: We ensure that the byte array for each record in the resulting RDDs of the DStream has the provided record length.
HDFS directory to monitor for new file
length of each record in bytes
Set the context to periodically checkpoint the DStream operations for driver fault-tolerance.
Set the context to periodically checkpoint the DStream operations for driver fault-tolerance.
HDFS-compatible directory where the checkpoint data will be reliably stored. Note that this must be a fault-tolerant file system like HDFS for
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.
Key type for reading HDFS file
Value type for reading HDFS file
Input format for reading HDFS file
HDFS directory to monitor for new file
Function to filter paths to process
Should process only new files and ignore existing files in the directory
Hadoop configuration
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system.
Key type for reading HDFS file
Value type for reading HDFS file
Input format for reading HDFS file
HDFS directory to monitor for new file
Function to filter paths to process
Should process only new files and ignore existing files in the directory
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.
Key type for reading HDFS file
Value type for reading HDFS file
Input format for reading HDFS file
HDFS directory to monitor for new file
:: DeveloperApi ::
:: DeveloperApi ::
Return the current state of the context. The context can be in three possible states - - StreamingContextState.INTIALIZED - The context has been created, but not been started yet. Input DStreams, transformations and output operations can be created on the context. - StreamingContextState.ACTIVE - The context has been started, and been not stopped. Input DStreams, transformations and output operations cannot be created on the context. - StreamingContextState.STOPPED - The context has been stopped and cannot be used any more.
Create an input stream from a queue of RDDs.
Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.
NOTE: Arbitrary RDDs can be added to queueStream
, there is no way to recover data of
those RDDs, so queueStream
doesn't support checkpointing.
Type of objects in the RDD
Queue of RDDs
Whether only one RDD should be consumed from the queue in every interval
Default RDD is returned by the DStream when the queue is empty. Set as null if no RDD should be returned when empty
Create an input stream from a queue of RDDs.
Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.
NOTE: Arbitrary RDDs can be added to queueStream
, there is no way to recover data of
those RDDs, so queueStream
doesn't support checkpointing.
Type of objects in the RDD
Queue of RDDs
Whether only one RDD should be consumed from the queue in every interval
Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.
Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.
Type of the objects in the received blocks
Hostname to connect to for receiving data
Port to connect to for receiving data
Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)
Create an input stream with any arbitrary user implemented receiver.
Create an input stream with any arbitrary user implemented receiver. Find more details at: http://spark.apache.org/docs/latest/streaming-custom-receivers.html
Custom implementation of Receiver
Set each DStreams in this context to remember RDDs it generated in the last given duration.
Set each DStreams in this context to remember RDDs it generated in the last given duration. DStreams remember RDDs only for a limited duration of time and releases them for garbage collection. This method allows the developer to specify how long to remember the RDDs ( if the developer wishes to query old data outside the DStream computation).
Minimum duration that each DStream should remember its RDDs
Create a input stream from TCP source hostname:port.
Create a input stream from TCP source hostname:port. Data is received using a TCP socket and the receive bytes it interepreted as object using the given converter.
Type of the objects received (after converting bytes to objects)
Hostname to connect to for receiving data
Port to connect to for receiving data
Function to convert the byte stream to objects
Storage level to use for storing the received objects
Create a input stream from TCP source hostname:port.
Create a input stream from TCP source hostname:port. Data is received using
a TCP socket and the receive bytes is interpreted as UTF8 encoded \n
delimited
lines.
Hostname to connect to for receiving data
Port to connect to for receiving data
Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)
Return the associated Spark context
Start the execution of the streams.
Start the execution of the streams.
if the StreamingContext is already stopped.
Stop the execution of the streams, with option of ensuring all received data has been processed.
Stop the execution of the streams, with option of ensuring all received data has been processed.
if true, stops the associated SparkContext. The underlying SparkContext will be stopped regardless of whether this StreamingContext has been started.
if true, stops gracefully by waiting for the processing of all received data to be completed
Stop the execution of the streams immediately (does not wait for all received data to be processed).
Stop the execution of the streams immediately (does not wait for all received data
to be processed). By default, if stopSparkContext
is not specified, the underlying
SparkContext will also be stopped. This implicit behavior can be configured using the
SparkConf configuration spark.streaming.stopSparkContextByDefault.
If true, stops the associated SparkContext. The underlying SparkContext will be stopped regardless of whether this StreamingContext has been started.
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat).
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.
HDFS directory to monitor for new file
Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.
Create a unified DStream from multiple DStreams of the same type and same slide duration.
Wait for the execution to stop.
Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.
time to wait in milliseconds
(Since version 1.3.0) Use awaitTerminationOrTimeout(Long) instead
Create an input stream with any arbitrary user implemented receiver.
Create an input stream with any arbitrary user implemented receiver. Find more details at: http://spark.apache.org/docs/latest/streaming-custom-receivers.html
Custom implementation of Receiver
(Since version 1.0.0) Use receiverStream
Main entry point for Spark Streaming functionality. It provides methods used to create org.apache.spark.streaming.dstream.DStreams from various input sources. It can be either created by providing a Spark master URL and an appName, or from a org.apache.spark.SparkConf configuration (see core Spark documentation), or from an existing org.apache.spark.SparkContext. The associated SparkContext can be accessed using
context.sparkContext
. After creating and transforming DStreams, the streaming computation can be started and stopped usingcontext.start()
andcontext.stop()
, respectively.context.awaitTermination()
allows the current thread to wait for the termination of the context bystop()
or by an exception.