Avoid using groupByKey if possible, as it is similar to the old MapReduce process.
reduceByKey


combineByKey

reduceByKey is simply combineByKey(identity, reduce, reduce)aggregateByKey
Midpoint between reduceByKey and combineByKey

Zero value is provided instead of an initialize function

groupByKey into a map or mapPartitions may be needed
Note that we can also repartition - this triggers shuffling, but we can get more balanced partitions.
coalesce - should only use to reduce number of partitions