Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe I am missing something but would there ever be a scenario where taking a single albeit large sql statement and rewriting it as several pyspark scripts would result in faster runtime for your data pipeline? In most cases, this will be much much slower.



Greatly depends on your environment. I am thankfully in an area where there are very modest timeliness requirements. Improving the speed of a job means little to me. However, improving debugability or checkpointing when things go wrong is always valuable.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: