Hacker News new | past | comments | ask | show | jobs | submit login

CSV is a pretty bad format any engine will choke on it. It basically requires a full table scan to get at any data.

You need to convert it into Parquet or some columnar format that lets engines do predicate pushdowns and fast scans. Each parquet file stores statistics about the data it contains so engines can quickly decide if it’s worth reading the file or skipping it altogether.






Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: