Hacker News new | past | comments | ask | show | jobs | submit login

Are there any pipeline tools for command line stream processing? Because when you have several terabytes of data you can't exactly afford to restart due to a stray comma in your CSV file.



If you have a stray comma in your multi-TB CSV file, you probably don't _want_ it to keep going. You risk misinterpreting the mistake and having a grossly malformed output... There's no way to reliably and elegantly recover from something like that. Validation should preferably happen before processing




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: