Spark stream processing frameworks include Apache Spark Streaming, Apache Storm, and Apache Flink.
With Spark Streaming, we often deal with discrete streams (DStream ).

scc
DStream has RDD transforms + its own transforms and actionsKey concepts with Spark Streaming:
DStream
Transformations
map, countByValue , reduce , join , etc.)window, countByValueAndWindow , etc.)Output Operations
saveAsHadoopFiles - save to HDFSforeach - do anything with a batch of resultse.g. usage with Tweet streams:
Basic example with constructing

Count hashtags for last second - problem is that this is a very small window

Instead, can use .window - get things over 10 minute time interval, 1 second windows

These are fault tolerant - are RDDs, Spark deals with this
In general, incremental counting generalizes to many reduce operations. We need a function to “inverse reduce” (e.g. “subtract” for counting).

Spark streaming is fast:
