Data Engineering
Spark is well known in Big Data for its incredible performance and expressive API. However, it just takes one small misstep to transform a massively parallel powerhouse into a pathetically poor performer. This post presents an example and the steps that can be taken to indentify the problem.