RDD

A RDD is a resilient distributed dataset.

RDDs are divided into “partitions”, which workers operate on independently.

RDDs can be made in a variety of ways:

There are a variety of operations which can be performed on RDDs, similar to MapReduce but with more variety.

Map-Like Operations

image.png

Map-like operations are one in which another RDD is returned.