Common approach for synchronization is to construct keys and values in a way that data necessary for a computation is naturally brought together by the execution framework.

A common problem is co-occurrence of values.

Pairs - keys are pairs of desired ids.

image.png

Stripes - keys are same, values are a map with all associated values.

image.png

Both algorithms benefit from combiners - respective operations in reducers are both commutative and associative.

In terms of scalability: