YARN
YARN (Yet-Another-Resource-Negotiator) provides an API to develop any generic distributed application.
Spark Architecture
There is an “executor” rather than a worker, which executes tasks.
Usually, there are multiple tasks sent to an executor. the Spark driver must send relevant code to run each task. This can be bad.
Broadcast
If a value is Broaadcast, Spark will only send one copy of the value per Executor, not per task.
thresh = sc.broadcast(5)
myRdd.filter(lambda x: x > thresh.value)