MapReduce Algorithm Design | Notion

Only feasible approach to dealing with big-data problems is to divide & conquer - partition big problem into smaller sub-problems.

Some considerations for divide/conquer in general:

how to break problem into smaller tasks? (i.e. parallelize)
how to assign tasks to workers across a potentially large number of machines
how to ensure workers get the data they need
synchronization?
passing partial results?
dealing with faults (software/hardware)?

MapReduce Basics

Algorithm Design