~/devreads

#apache-spark

3 posts

9 Dec 2025

Guna Chandra Durgapu 24 min read

Context Ambitious goal and the scope of the problem We were chasing a critical, futuristic business requirement for our big data pipelines achieving “1-day planning,” defined as a total execution window of 24 hours or less . The entire process is extensive, processing a 7TB data volume each cycle across 19 distinct plans (16 Spark pipelines and 3 Data Science…

big-dataapache-sparkperformance-optimization

6 Oct 2025

Poorva Patil 9 min read

As a data engineer, I used to see metrics as just numbers on a dashboard — until I realized they’re the lens through which customers view and run their operations. In customer support, for example, agent productivity metrics aren’t just figures, they’re actionable insights that drive efficiency, shape staffing decisions, and directly impact customer satisfaction. These aren’t just charts —…

apache-sparkanalyticsbig-datadata-analysis

21 Feb 2024

Ilay Chen 12 min read

By Ilay Chen and Tomer Akirav At PayPal, hundreds of thousands of Apache Spark jobs run on an hourly basis, processing petabytes of data and requiring a high volume of resources. To handle the growth of machine learning solutions, PayPal requires scalable environments, cost awareness and constant innovation. This blog explains how Apache Spark 3 and GPUs can help enterprises…

cloud-computinggpubig-datamachine-learningapache-spark