Mathew de Beneducci

Senior Developer in the Bristol Office, interested in Java backend development and Big Data tools
Data Engineering
Apache Spark is the major talking point in Big Data pipelines, boasting performance 10-100x faster than comparable tools. But how achievable are these speeds and what can you do to avoid memory errors? In this blog I will use a real example to introduce two mechanisms of data movement within Spark and demonstrate how they form the cornerstone of performance.