Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pure python scripts, maybe using the #%%-convention (https://code.visualstudio.com/docs/python/jupyter-support-py...) so you get the best of both notebooks and scripts, in a right-sized instance/container/machine. And if you need to run jobs in parallel, then orchestrate using make, like so: https://www.sumsar.net/blog/makefile-recipe-python-data-pipe...


Yeah, I love this — pure Python with cron or periodic tasks (e.g., Django) works great. Celery task for parallelization, and if you pipe logs/alerts into a Slack channel, you can actually get really far without needing a "proper" orchestration layer.

I recently took over an Airflow system from a former colleague, and in our case, it’s just overly complex for what’s really a pretty simple data flow.


I don’t know much about airflow

But isn’t it just also python with cron?


Airflow is python with cron, and the option for very sophisticated and useful orchestration tools, like retries, dependencies, etc. All the stuff you'll end up rolling yourself as your simple scheduled tasks grow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: