Only feasible approach to dealing with big-data problems is to divide & conquer - partition big problem into smaller sub-problems.
Some considerations for divide/conquer in general:
- how to break problem into smaller tasks? (i.e. parallelize)
- how to assign tasks to workers across a potentially large number of machines
- how to ensure workers get the data they need
- synchronization?
- passing partial results?
- dealing with faults (software/hardware)?
MapReduce Basics
Algorithm Design