Avoid using groupByKey if possible, as it is similar to the old MapReduce process.

image.png

image.png

image.png

Note that we can also repartition - this triggers shuffling, but we can get more balanced partitions.