Spark is a fast and expressive cluster computing engine compatible with Apache Hadoop.
Programs are written in terms of operations on distributed datasets (RDDs).
Spark does lazy evaluation - nothing is done until an action is reached.
For fault recovery, RDDs track lineage information that can be used to efficiently recompute lost data.